Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question About Double Observations in Stata DiD Output

    Hello,

    I am currently working on a panel data analysis in Stata, and I've noticed that my output seems to contain double observations. I have a dataset with 4893 unique individuals observed in two time periods (long-form data, i.e, 9786 rows), so I expected to see 4893 observations in my results. However, when I run my Difference-in-Differences (DiD) regression, the output shows 9786 observations.

    I am wondering why I have double the number of observations I expected. Is this a normal behavior in Stata, or could there be something in my data or code that's causing this?

    I am using Stata 15, and I'd appreciate any insights or guidance on how to resolve this issue.

    Thank you in advance for your help!

    Takudzwa
    Last edited by Takudzwa Mutize; 26 Sep 2023, 04:58.

  • #2
    An observation in Stata is not ambiguous. In a panel dataset, it represents a specific combination of values for all variables associated with a particular unit of analysis and time period. This corresponds to one row in the dataset. It appears that you might be expecting an observation to refer to the number of units in the dataset. This is only accurate when dealing with cross-sectional data. If you are using commands like xtreg or other panel estimators, the total number of units is displayed as "Number of groups" in the output.

    Comment


    • #3
      Thanks for the response, i am using the simple -reg- command with interaction terms i.e reg y PostTreatment##Time covariates and my data is in long form for two time period that is, two rows for one personal identifier. Hope it makes sense to you

      Comment


      • #4
        regress does not take into account the panel structure of your data. It is not a panel data command in the Stata sense (these are prefixed with -xt-). So it will just report the number of observations. You will have to determine the number of units using other means.

        Comment


        • #5
          Hi Andrew. When I run the regression using -xtreg- and -fe-, the number of groups are now 4,893 and the number of obs are now 9,775. Gender and race are ommitted ofcourse due to FE assumptions. Is there a way to do the DiD with long form data without FE? Or I have to transform the data in short form first then use the simple -reg- command? I hope I am clear?

          Comment


          • #6
            If you have Stata 17+, use didregress which will display the number of groups.

            Code:
            help didregress
            Code:
            webuse hospdd, clear
            didregress (satis)(procedure), group(hospital) time(month)
            Res.:

            Code:
            . didregress (satis)(procedure), group(hospital) time(month)
            
            Number of groups and treatment time
            
            Time variable: month
            Control:       procedure = 0
            Treatment:     procedure = 1
            -----------------------------------
                         |   Control  Treatment
            -------------+---------------------
            Group        |
                hospital |        28         18
            -------------+---------------------
            Time         |
                 Minimum |         1          4
                 Maximum |         1          4
            -----------------------------------
            
            Difference-in-differences regression                     Number of obs = 7,368
            Data type: Repeated cross-sectional
            
                                           (Std. err. adjusted for 46 clusters in hospital)
            -------------------------------------------------------------------------------
                          |               Robust
                    satis | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            --------------+----------------------------------------------------------------
            ATET          |
                procedure |
            (New vs Old)  |   .8479879   .0321121    26.41   0.000     .7833108     .912665
            -------------------------------------------------------------------------------
            Note: ATET estimate adjusted for group effects and time effects.
            
            .

            As you are interested in the treatment effect in DID, why does it matter that the coefficients on gender and race are omitted? Effects of such time invariant variables are captured by the fixed effects, so you don't need to separately control for them.
            Last edited by Andrew Musau; 26 Sep 2023, 12:55.

            Comment


            • #7
              Hi Andrew. I am using STATA 15. I also have to mention that I have two treatment groups. I am wondering whether I should use -xt-reg with fe or just reg? A paper I referred to had results for gender, and race (time-invariant) for the DiD: https://www.tandfonline.com/doi/full...8.2016.1171844
              my code right now is like this for the long form data: xtreg log_real_income i.PostTreatment##cohortphase1 w_best_age_yrs agesquared2 i.w_a_gen i.new_maristat i.pop_group i.new_employment w_hhsizer i.workerskill i.cohortphase1#workerskill, fe .

              Comment


              • #8
                It will have to be simple DID as opposed to generalized DID. With Stata 15, use regress and include these time invariant variables.

                Comment


                • #9
                  Hi Andrew. Isnt the regress command only appropriate for one-time point analysis? Since it won't take into account the panel nature?

                  Comment


                  • #10
                    You need a pre and post period in DID (so a minimum of two time periods). See my illustration in #12 of https://www.statalist.org/forums/for...ummy-variables.

                    Comment

                    Working...
                    X