Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I'm afraid I can't help you. I can't reproduce your problem with my Stata. The code I gave earlier runs fine with the new example data, with no error messages, and it produces results at the end that look believable:

    Code:
    . //    SAVE THE DATA FOR LATER USE
    . tempfile holding
    
    . save `holding'
    file C:\Users\CLYDES~1\AppData\Local\Temp\ST_0h000001.tmp saved
    
    . 
    . //    SPLIT OUT THE CONTROL GROUP IDs INTO A SEPARATE
    . //    FILE OF CONTROL IDS SORTED IN RANDOM ORDER
    . keep if survey == 0
    (333 observations deleted)
    
    . keep ID
    
    . duplicates drop
    
    Duplicates in terms of all variables
    
    (655 observations deleted)
    
    . set seed 1234
    
    . gen double shuffle = runiform()
    
    . sort shuffle
    
    . rename ID ID0
    
    . tempfile control_ids
    
    . save `control_ids'
    file C:\Users\CLYDES~1\AppData\Local\Temp\ST_0h000002.tmp saved
    
    . 
    . //    NOW CREATE A FILE OF CASE IDs
    . use `holding', clear
    
    . keep if survey == 1
    (667 observations deleted)
    
    . keep ID
    
    . rename ID ID1
    
    . duplicates drop
    
    Duplicates in terms of all variables
    
    (327 observations deleted)
    
    . 
    . //    CREATE MATCHED PAIRS
    . merge 1:1 _n using `control_ids', keep(match) nogenerate
    
        Result                           # of obs.
        -----------------------------------------
        not matched                             0
        matched                                 6  
        -----------------------------------------
    
    . isid ID0
    
    . isid ID1, sort
    (data now sorted by ID1)
    
    . gen long pair = _n
    
    . 
    . //    GO TO LONG LAYOUT AND BRING BACK THE OTHER VARIABLES
    . reshape long ID, i(pair) j(_j)
    (note: j = 0 1)
    
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                        6   ->      12
    Number of variables                   4   ->       4
    j variable (2 values)                     ->   _j
    xij variables:
                                    ID0 ID1   ->   ID
    -----------------------------------------------------------------------------
    
    . merge 1:m ID using `holding', keep(match) nogenerate
    
        Result                           # of obs.
        -----------------------------------------
        not matched                             0
        matched                               669  
        -----------------------------------------
    
    . assert survey == _j
    
    . drop _j
    
    . 
    . //    PREPARE PRE-POST VARIABLE
    . //    BEGIN BY APPLYING CASE SURVEY DATE TO MATCHED CONTROL
    . by pair (survey_date), sort: assert survey_date == survey_date[1] if !missing(survey_date)
    
    . by pair (survey_date): replace survey_date = survey_date[1]
    (336 real changes made)
    
    . gen byte pre_post = (date >= survey_date)
    
    . 
    . //    DID ANALYSIS
    . xtset ID
           panel variable:  ID (unbalanced)
    
    . xtreg consumption i.survey##i.pre_post, fe vce(cluster pair)
    note: 1.survey omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs      =       669
    Group variable: ID                              Number of groups   =        12
    
    R-sq:  within  = 0.0184                         Obs per group: min =        52
           between = 0.2156                                        avg =      55.8
           overall = 0.0130                                        max =        57
    
                                                    F(2,5)             =      2.57
    corr(u_i, Xb)  = -0.2708                        Prob > F           =    0.1706
    
                                          (Std. Err. adjusted for 6 clusters in pair)
    ---------------------------------------------------------------------------------
                    |               Robust
        consumption |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
           1.survey |          0  (omitted)
         1.pre_post |   30.08198   52.27459     0.58   0.590    -104.2941    164.4581
                    |
    survey#pre_post |
               1 1  |  -255.2466   112.8719    -2.26   0.073    -545.3931    34.89981
                    |
              _cons |   1109.229   17.18425    64.55   0.000     1065.056    1153.403
    ----------------+----------------------------------------------------------------
            sigma_u |   650.7295
            sigma_e |  499.10863
                rho |  .62960918   (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------
    I can only assume that you have somehow changed the code from what was originally written. I suggest you scrap what you have and do a fresh copy/paste from #10 into a do-file and try again.

    Comment


    • #17
      Dear Clyde,

      wow. it worked. Thank you very very much. I managed to create the "pre_post" variable, and be able to estimate the ATE effects of the program participation at various times.
      two questions:

      1) out off 200,000 observation from the control group, only about 40000 were left, and rest were dropped. I inserted the picture below. How could I explain this significant drop in control group. No drops in survey - treatment group!!!
      2) using the final regression below which is with "fe", regression drops some of the time invariant variables.
      xtreg consumption i.survey##i.pre_post, fe vce(cluster pair)
      when we fit fe DiD model then the "survey" is dropped - its time invariant, along with some other variables (time invariant ones). However, when i run the regression without "fe" i get more reasonable results. xtreg consumption i.survey##i.pre_post, vce(cluster pair) I attached both below. Any suggestions to use one versus the other? Click image for larger version

Name:	Screenshot 2017-03-29 16.50.24.png
Views:	1
Size:	14.5 KB
ID:	1380999 Click image for larger version

Name:	Screenshot 2017-03-29 16.49.15.png
Views:	1
Size:	56.9 KB
ID:	1380998 Click image for larger version

Name:	Screenshot 2017-03-29 17.39.50.png
Views:	1
Size:	53.7 KB
ID:	1381000

      3) do you also recommend to test the marginal effects? - margins -

      Thank you very much

      Comment


      • #18
        Your screen shots are too small for me to read on my computer. The best way to post Stata output is to copy it from the Results window or your log file and paste it in between code delimiters (see FAQ #12 about code delimiters). That way it comes out as nicely aligned, full-size text (just like the blocks of code that I post.)

        How could I explain this significant drop in control group. No drops in survey - treatment group!!!
        This is the result of the matching process. Your control group is much larger than your case group. In the matching algorithm, each case was assigned only one control. So you have an excess of unmatched controls. In those observations, there is no survey date value, and hence no pre_post value. Since pre_post has missing values for those observations, they are excluded from the estimation sample. This is exactly as it should be.

        when we fit fe DiD model then the "survey" is dropped - its time invariant, along with some other variables (time invariant ones).
        Yes, anything that is time invariant within an ID is omitted: its effects are inestimable. As for the variable survey, it is of no importance anyway. Even if you were not doing fixed effects and it were retained in the model, its coefficient contains information that is really just a nuisance parameter and not usually of any interest. Remember that in the interaction model the variable survey does not have information about the effect of being in the survey. It simply estimates the mean outcome in the survey group before they took the survey. Usually one doesn't care much, if at all, about that. It's just the baseline from which you measure the change following the survey.

        However, when i run the regression without "fe" i get more reasonable results. xtreg consumption i.survey##i.pre_post, vce(cluster pair)
        Well, that doesn't matter at all. You have, at this point, two reasons why you must use fixed effects regression: you have repeated observations within ID and you have matched pairs. So your OLS results (i.e., without fe), regardless of whether they look more reasonable, are simply wrong. You can't use them. OLS estimation requires independent observations and you don't even come close to having that.

        3) do you also recommend to test the marginal effects? - margins -
        I definitely recommend running -margins- both to get the predicted values and the marginal effects. I'm not sure what you mean by "testing" the marginal effects. I hardly ever recommend "testing" anything! I recommend looking at them and understanding them by focusing on the expected values and the confidence intervals and comparing to effect sizes that are of practical interest.

        Comment


        • #19
          Dear Clyde,

          below are the results from #17 to make it visible.

          Code:
          . xtreg consumption i.survey##i.pre_post, fe vce(cluster pair)
          note: 1.survey omitted because of collinearity
          
          Fixed-effects (within) regression               Number of obs      =     85998
          Group variable: ID                              Number of groups   =      1536
          
          R-sq:  within  = 0.0002                         Obs per group: min =        36
                 between = 0.0001                                        avg =      56.0
                 overall = 0.0000                                        max =        64
          
                                                          F(2,767)           =      2.58
          corr(u_i, Xb)  = -0.0111                        Prob > F           =    0.0762
          
                                              (Std. Err. adjusted for 768 clusters in pair)
          ---------------------------------------------------------------------------------
                          |               Robust
              consumption |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ----------------+----------------------------------------------------------------
                 1.survey |          0  (omitted)
               1.pre_post |   17.98432   10.01396     1.80   0.073    -1.673704    37.64234
                          |
          survey#pre_post |
                     1 1  |  -30.97424   13.85047    -2.24   0.026    -58.16357   -3.784917
                          |
                    _cons |   1209.913   1.877807   644.32   0.000     1206.227    1213.599
          ----------------+----------------------------------------------------------------
                  sigma_u |  656.22017
                  sigma_e |  468.22482
                      rho |  .66264321   (fraction of variance due to u_i)
          ---------------------------------------------------------------------------------
          
          . xtreg consumption i.survey##i.pre_post, vce(cluster pair)
          
          Random-effects GLS regression                   Number of obs      =     85998
          Group variable: ID                              Number of groups   =      1536
          
          R-sq:  within  = 0.0002                         Obs per group: min =        36
                 between = 0.0002                                        avg =      56.0
                 overall = 0.0000                                        max =        64
          
                                                          Wald chi2(3)       =      5.21
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.1568
          
                                              (Std. Err. adjusted for 768 clusters in pair)
          ---------------------------------------------------------------------------------
                          |               Robust
              consumption |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          ----------------+----------------------------------------------------------------
                 1.survey |   1.430068   34.65126     0.04   0.967    -66.48515    69.34528
               1.pre_post |   17.89325   9.988801     1.79   0.073    -1.684436    37.47094
                          |
          survey#pre_post |
                     1 1  |  -30.54743   13.79323    -2.21   0.027    -57.58167   -3.513203
                          |
                    _cons |   1206.752   24.29805    49.66   0.000     1159.129    1254.375
          ----------------+----------------------------------------------------------------
                  sigma_u |  653.27713
                  sigma_e |  468.22482
                      rho |  .66063062   (fraction of variance due to u_i)
          ---------------------------------------------------------------------------------
          
          . tab survey
          
               0 or 1 |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |     42,944       49.94       49.94
                    1 |     43,054       50.06      100.00
          ------------+-----------------------------------
                Total |     85,998      100.00

          Comment


          • #20
            OK. Thanks for posting them in code blocks. These are quite readable.

            So, why do you think the results in the second model are more reasonable than those in the first? They look almost the same to me. The only material difference is the omission of the survey variable in the -fe- model, but that is normal and expected in fixed effects, and involve only a nuisance parameter anyway. Everything else is almost the same. I don't see an issue here.

            Comment


            • #21
              Million Thanks Clyde,

              Truly appreciate it.

              1. To your question: I have a variable called value, it is the value of the houses that people are living in (I use it proxy to the income). Anyways, when I use -fe- this variable is dropped. I was surprised. If I don't use it, I get a meaningful results. That is why I was wondering the role of -fe- in diff-in-diff.

              2. I run -margins - and I get below.

              Code:
              . . xtreg consumption i.survey##i.pre_post, fe  vce(cluster pair)
              note: 1.survey omitted because of collinearity
              
              Fixed-effects (within) regression               Number of obs      =     85998
              Group variable: ID                              Number of groups   =      1536
              
              R-sq:  within  = 0.0002                         Obs per group: min =        36
                     between = 0.0001                                        avg =      56.0
                     overall = 0.0000                                        max =        64
              
                                                              F(2,767)           =      2.58
              corr(u_i, Xb)  = -0.0111                        Prob > F           =    0.0762
              
                                                  (Std. Err. adjusted for 768 clusters in pair)
              ---------------------------------------------------------------------------------
                              |               Robust
                  consumption |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              ----------------+----------------------------------------------------------------
                     1.survey |          0  (omitted)
                   1.pre_post |   17.98432   10.01396     1.80   0.073    -1.673704    37.64234
                              |
              survey#pre_post |
                         1 1  |  -30.97424   13.85047    -2.24   0.026    -58.16357   -3.784917
                              |
                        _cons |   1209.913   1.877807   644.32   0.000     1206.227    1213.599
              ----------------+----------------------------------------------------------------
                      sigma_u |  656.22017
                      sigma_e |  468.22482
                          rho |  .66264321   (fraction of variance due to u_i)
              ---------------------------------------------------------------------------------
              
              . margins survey#pre_post, noestimcheck
              
              Adjusted predictions                              Number of obs   =      85998
              Model VCE    : Robust
              
              Expression   : Linear prediction, predict()
              
              ---------------------------------------------------------------------------------
                              |            Delta-method
                              |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
              ----------------+----------------------------------------------------------------
              survey#pre_post |
                         0 0  |   1209.913   1.877807   644.32   0.000     1206.232    1213.593
                         0 1  |   1227.897   8.758179   140.20   0.000     1210.731    1245.063
                         1 0  |   1209.913   1.877807   644.32   0.000     1206.232    1213.593
                         1 1  |   1196.923    9.02493   132.62   0.000     1179.234    1214.611
              ---------------------------------------------------------------------------------
              
              . . margins survey, dydx(pre_post) noestimcheck
              
              Conditional marginal effects                      Number of obs   =      85998
              Model VCE    : Robust
              
              Expression   : Linear prediction, predict()
              dy/dx w.r.t. : 1.pre_post
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              1.pre_post   |
                    survey |
                        0  |   17.98432   10.01396     1.80   0.073    -1.642684    37.61132
                        1  |  -12.98993   10.33578    -1.26   0.209    -33.24769    7.267829
              ------------------------------------------------------------------------------
              Note: dy/dx for factor levels is the discrete change from the base level.
              
              . . margins survey, dydx(consumption) noestimcheck
              'consumption' not found in list of covariates
              r(322);
              
              . . margins consumption, dydx(survey) noestimcheck
              r(103);
              
              . margins, dydx(*) atmeans
              
              Conditional marginal effects                      Number of obs   =      85998
              Model VCE    : Robust
              
              Expression   : Linear prediction, predict()
              dy/dx w.r.t. : 1.survey 1.pre_post
              at           : 0.survey        =    .4993605 (mean)
                             1.survey        =    .5006395 (mean)
                             0.pre_post      =    .7481802 (mean)
                             1.pre_post      =    .2518198 (mean)
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                  1.survey |          .  (not estimable)
                1.pre_post |          .  (not estimable)
              ------------------------------------------------------------------------------
              Note: dy/dx for factor levels is the discrete change from the base level.
              - How can I understand the insignificant value when survey is 1?
              - Also, at the last -margins- if I want to look into marginal effect of other variables, I receive the error notice? Why would this be the case.

              Thank you very very much,
              Regards.

              Comment


              • #22
                The reason your variable value is dropping is because in your data the value of the home the person lives in does not change during the period of observation in the study for any person. So it is colinear with the fixed effect and its effects cannot be estimated in the fixed effects model.

                How can I understand the insignificant value when survey is 1?
                I assume you are referring to this table:
                Code:
                . . margins survey, dydx(pre_post) noestimcheck
                
                Conditional marginal effects Number of obs = 85998
                
                Model VCE : Robust
                
                Expression : Linear prediction, predict()
                dy/dx w.r.t. : 1.pre_post
                ------------------------------------------------------------------------------
                            | Delta-method
                            | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                1.pre_post |
                     survey |
                         0 | 17.98432 10.01396 1.80 0.073 -1.642684 37.61132
                         1 | -12.98993 10.33578 -1.26 0.209 -33.24769 7.267829
                ------------------------------------------------------------------------------
                Note: dy/dx for factor levels is the discrete change from the base level.
                So when survey is 1, these results tell you that the expected value of your outcome variable is about 12.999 units lower in the post-survey era than in the pre-survey era. That may seem large (I don't know, really--I'm not sure what the outcome variable means and what its units are) but if we look at the confidence interval, we see that it is very wide, ranging from -33.25 to 7.27. So the magnitude of the effect itself (almost 13) is dwarfed by the uncertainty with which it was estimated (-33.25 to 7.27). While the bulk of that interval lies in negative territory, a substantial piece of it lies in positive territory. So the data and study design simply have not produced a precise enough estimate of the change to tell us with confidence whether it is positive or negative.

                As for the error messages you are geting in
                Code:
                . . margins survey, dydx(consumption) noestimcheck
                'consumption' not found in list of covariates
                r(322);
                
                . . margins consumption, dydx(survey) noestimcheck
                r(103);
                it is because you are trying to do something that is impossible and makes no sense: consumption is the outcome variable in your regression, so you cannot compute its marginal effect, nor can you compute the marginal effect of anything else over its values. Those things only make sense for predictor variables in your model.


                As for the final -margins- command, you forgot the -noestimcheck- option, which is crucial when you are looking at a fixed-effects model with an interaction term.

                Comment

                Working...
                X