Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in differences with fixed effects

    Dear Statlist members,
    I am working with a DID for my thesis, but I am still not familiar with DID and I have some technical questions (I got them from reading some papers, so I am sorry not to give a data example)

    Here are the settings:
    A cross section dataset at the individual level (the outcome variable at the individual level)
    A shock that occured in time t at the county level
    We want to look at the long term effects of this shock (say whether they go to high school or not, high_school =1 if the individual go to high school and 0 otherwise) on individuals that were in uterus and aged < 1 at the time of the shock.
    Treatment group: Those born in time t, t+1 and t+2 (birth_year>= t)
    Control group: Those born in time t-1, t-2 and t-3 (birth_year< t)
    define variable post= 1 if Treatment group, and 0 if Control group
    define variable treat= 1 if born in county with the shock and 0 if born in county without the shock.

    My questions:
    • Is it possibble to tsset the date at the county level then do the estimations at the individual level (meaning that the outcome variable and some of the covariates are at the individual level)
    Code:
    tsset county birth_year
    xtlogit high_school  i.post#c.treat covariates i.stateXyear, fe
    I saw in this post https://www.statalist.org/forums/for...ferences-model that it is possible to use one # when using fixed effects, so I did it this way, I also included state by birth year fixed effects.
    • My second question is when testing for parallel trends, is it better to use ## instead of # even with fixed effects in order to make Stata drop one more fixed effect instead of drop one year from the interaction part?
    Code:
    tsset county birth_year
    
    xtlogit high_school ib(t-1).birth_year##c.treat covariates i.stateXyear, fe
    I assume here that we will compare with respect to the last year in the pre-period.
    • My last question is related to the second one, in parallel trend we are only interested in the sign and the significance, right? so should we run the following code:
    Code:
    tsset county birth_year
    xtlogit high_school ib(t-1).birth_year##c.treat covariates i.stateXyear, fe
    est sto m4
    coefplot m4, other options
    or the following one:
    Code:
    tsset county birth_year
    xtlogit high_school ib(t-1).birth_year##c.treat covariates i.stateXyear, fe
    margins birth_year, dydx(treat) noestimcheck post
    marginsplot, yline(0) other options
    I really appreciate any remarks and I thank you in advance.
    Please let me know if I should give more details.

    Best,


  • #2
    Apparently I did not ask my questions in the right way.
    I am sorry about that.

    • Is it possibble to tsset the date at the county level then do the estimations at the individual level (meaning that the outcome variable and some of the covariates are at the individual level)
    Code:

    tsset county birth_year xtlogit high_school i.post#c.treat covariates i.stateXyear, fe
    But I have found an answer for my first question.
    It is not possible to tsset the date at the county level then do the estimations at the individual level, because it will give the error message: "repeated time values within panel".
    Second, we still can use xtreg , fe without tsset the data if we use
    Code:
    xi: xtreg y other covariates, fe (i.county)

    Comment


    • #3
      Marry:
      please note that the best appraoch is to -xtset- your dataset before running any -xt- command:
      Code:
      xtset country birth_year
      .

      Besides, if Stata gives back the error message: "repeated time values within panel", you can -xtset- your dataset with the -panelid- only. However, this fix comes at the cost of making time-series related commands (such as lags and leads) unfeasible.
      Eventually, the -xi:- prefix is often redundant if you're using a recent version of Stata.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Thank you Carlo Lazzaro.
        Last edited by Marry Lee; 25 Nov 2020, 08:06.

        Comment


        • #5
          Hello...i had posted this on earlier but the thread seems outdated there now. so i m posting it here
          I m facing some issues in Difference-in-Difference estimation on consumption expenditure. I am using four rounds of cross-sectional Household Survey ( with five years gap and where the same individuals are not repeated/independent cross-sectional). I am using the first round as a pre-intervention period and the rest three as post-intervention assuming time-varying effects of the policy (policy is continues with different intensities). there are two regions surveyed in a State. Region 1 (Treatment group) having 5 districts, and Region 2 (Control group) with 4 districts. so Treatment here is geographical. Following is the model specification
          Yijt=a0 + a1*TREAT + a2*POST + a3TREAT_POST + covariates + e
          i=individual, j= district and t=time
          TREAT=1 if individuals belong to 4 districts of treatment Region 1 and 0 otherwise.
          POST =0, 1, 2, 3 for four years respectively.
          should I run the regression with district fixed effects using fe i(district) after regression. TREAT here will be dropped by Stata.
          my questions are
          1. is this the correct specification with fixed effects, and have years to be given 0,1,2,3 values?
          2. for multi-period (with only region 1 remaining affected by policy intervention through time), is code similar to your code in #2 applicable? like this command
          Code:
          xtreg Y i.TREAT##i.POST covariates i.time, fe i(district)
          3. do i have to exclude clustering for standard errors as only 7 districts are available?

          Comment


          • #6
            Carlo Lazzaro sir, can u suggest something here?

            Comment


            • #7
              Saeed:
              you don seem to have a panel dataset, rather a survey, where individuals change along data waves.
              An excerpt/example of your data (via -dataex-, please) may make things clearer. Thanks.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Yes. it is household survey. this is an example data set with selective observation and covariates

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(post treat post_treat year) double mpce float mpce_log double HHID int age_head float(age_squared female_headed) byte(head_educ marriage hh_size) long(children elderly) float round_id byte district int region
                0 1 0 1 256.83 5.548414    17002101 40 1600 0 3 2  9 4 2 1 21 2
                0 1 0 1 197.21 5.284269    17010101 55 3025 0 2 2 12 6 0 1 26 2
                0 0 0 1 130.94 4.874739    17012101 54 2916 0 1 2  8 2 0 1 31 1
                0 1 0 1    292 5.676754    17013101 32 1024 0 3 1  1 0 0 1 25 2
                0 1 0 1 114.78 4.743017    17021101 35 1225 0 1 2 12 6 0 1 24 2
                0 0 0 1 125.72 4.834057    17041202 23  529 0 3 2  5 1 0 1 32 1
                0 0 0 1 251.39 5.527006    17090205 80 6400 0 1 2  7 1 3 1 30 1
                1 1 1 2    793 6.675823 22735110101 48 2304 0 1 2  8 1 1 2 24 2
                1 1 1 2    837 6.729824 22738110101 90 8100 0 1 2 11 3 2 2 21 2
                1 1 1 2    689 6.535241 22742110101 34 1156 0 3 2  4 2 0 2 25 2
                1 1 1 2    888 6.788972 82653110201 42 1764 0 2 2  5 1 0 2 26 2
                1 0 0 2   1495 7.309882 82649110219 29  841 0 2 1  1 0 0 2 32 1
                1 0 0 2   1077 6.981935 82644110101 42 1764 0 4 2  3 1 0 2 31 1
                1 0 0 2   1691 7.433075 82673110101 42 1764 0 3 2  2 0 0 2 30 1
                2 1 2 3  563.8   6.3347   342502302 48 2304 0 1 2  5 0 0 3 21 2
                2 1 2 3  997.5 6.905252   342042202 40 1600 0 1 2  4 2 0 3 24 2
                2 1 2 3 637.56 6.457648   343012101 62 3844 0 1 2  9 3 2 3 25 2
                2 1 2 3 406.27 6.007018   343271202 55 3025 0 3 2 11 2 0 3 26 2
                2 0 0 3    949 6.855409   244621202 37 1369 0 3 2  3 0 0 3 30 1
                2 0 0 3 879.88 6.779786   344752301 62 3844 0 3 2  8 2 2 3 31 1
                2 0 0 3 2028.5 7.615052   344891102 50 2500 1 3 2  2 0 0 3 32 1
                end

                post variable=0,1,2 for years 1(baseline), 2 (post), 3 (post) respectively
                treat=1 if household belongs to treatment 4 districts ( or region 2)
                post_treat is interaction term
                i m trying to see the impact of treatment policy on consumption expenditure (mpce or mpce_log variable) on these treated districts using 4 survey years (i have kept only 3 years in example dataset to save space)

                i m using this command
                Code:
                xtreg mpce_log i.post##i.treat covariates, fe i(district)
                please some suggestions needed for this multiperiod (four) DID analysis with only one group remaining under treatment throught years

                Carlo Lazzaro
                Last edited by Saeed Owais Mushtaq; 29 Nov 2020, 09:27.

                Comment


                • #9
                  Saeed:
                  since you do not have a panel dataset, I would go:
                  Code:
                  . reg mpce_log i.post##i.treat i.district
                  note: 32.district omitted because of collinearity
                  
                        Source |       SS           df       MS      Number of obs   =        21
                  -------------+----------------------------------   F(10, 10)       =      9.70
                         Model |  13.8830708        10  1.38830708   Prob > F        =    0.0007
                      Residual |   1.4317697        10   .14317697   R-squared       =    0.9065
                  -------------+----------------------------------   Adj R-squared   =    0.8130
                         Total |  15.3148405        20  .765742027   Root MSE        =    .37839
                  
                  ------------------------------------------------------------------------------
                      mpce_log |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                          post |
                            1  |    2.16303    .308952     7.00   0.000     1.474642    2.851418
                            2  |   2.004815    .308952     6.49   0.000     1.316427    2.693203
                               |
                       1.treat |   .1797999    .388756     0.46   0.654    -.6864025    1.046002
                               |
                    post#treat |
                          1 1  |  -.7936785    .408705    -1.94   0.081     -1.70433    .1169731
                          2 1  |  -.8917742    .408705    -2.18   0.054    -1.802426    .0188774
                               |
                      district |
                           24  |   -.096282    .308952    -0.31   0.762    -.7846699    .5921059
                           25  |   .0189015    .308952     0.06   0.952    -.6694864    .7072894
                           26  |  -.1775599    .308952    -0.57   0.578    -.8659477     .510828
                           30  |    .018833    .308952     0.06   0.953    -.6695549    .7072209
                           31  |   -.374177    .308952    -1.21   0.254    -1.062565    .3142109
                           32  |          0  (omitted)
                               |
                         _cons |   5.197049   .2820333    18.43   0.000     4.568639    5.825458
                  ------------------------------------------------------------------------------
                  Clustering the standard errors on districts with 7 district only is not appropriate.
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    Yes. I got you. Thanks a lot Carlo Lazzaro
                    Just have some confusions.
                    1. I know Clustering is not possible, does that mean I have to discard fixed effects (fe) also as they are bit different? Right? meaning only reg command not xtreg is correct?
                    2. Can I replace post variable with 0 for baseline and 1 for all post-intervention years (not 1,2 and 3 as i have in the example dataset) for overall Average Treatment Effect (ATE) rather than per year average treatment effects?
                    3 .I also tried to add district-level control variables to this dataset that is household level and set the district variable as same within each district? For instance, adding an unemployment rate of 20% for district 30 as 20 (or 0.20) for all households in the in district 30. similarly for rest districts with respective unemployment rates. is this the correct way?
                    Last edited by Saeed Owais Mushtaq; 29 Nov 2020, 12:06.

                    Comment

                    Working...
                    X