Question About Double Observations in Stata DiD Output

Takudzwa Mutize

Join Date: Sep 2023

Posts: 5
#1

Question About Double Observations in Stata DiD Output

26 Sep 2023, 04:54

Hello,

I am currently working on a panel data analysis in Stata, and I've noticed that my output seems to contain double observations. I have a dataset with 4893 unique individuals observed in two time periods (long-form data, i.e, 9786 rows), so I expected to see 4893 observations in my results. However, when I run my Difference-in-Differences (DiD) regression, the output shows 9786 observations.

I am wondering why I have double the number of observations I expected. Is this a normal behavior in Stata, or could there be something in my data or code that's causing this?

I am using Stata 15, and I'd appreciate any insights or guidance on how to resolve this issue.

Thank you in advance for your help!

Takudzwa

Last edited by Takudzwa Mutize; 26 Sep 2023, 04:58.
Tags: difference-in-differences, panel data, regression
Andrew Musau

Join Date: Oct 2014

Posts: 10274
#2

26 Sep 2023, 06:36

An observation in Stata is not ambiguous. In a panel dataset, it represents a specific combination of values for all variables associated with a particular unit of analysis and time period. This corresponds to one row in the dataset. It appears that you might be expecting an observation to refer to the number of units in the dataset. This is only accurate when dealing with cross-sectional data. If you are using commands like xtreg or other panel estimators, the total number of units is displayed as "Number of groups" in the output.
Comment
Takudzwa Mutize

Join Date: Sep 2023

Posts: 5
#3

26 Sep 2023, 07:02

Thanks for the response, i am using the simple -reg- command with interaction terms i.e reg y PostTreatment##Time covariates and my data is in long form for two time period that is, two rows for one personal identifier. Hope it makes sense to you
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10274
#4

26 Sep 2023, 07:10

regress does not take into account the panel structure of your data. It is not a panel data command in the Stata sense (these are prefixed with -xt-). So it will just report the number of observations. You will have to determine the number of units using other means.
Comment
Takudzwa Mutize

Join Date: Sep 2023

Posts: 5
#5

26 Sep 2023, 11:27

Hi Andrew. When I run the regression using -xtreg- and -fe-, the number of groups are now 4,893 and the number of obs are now 9,775. Gender and race are ommitted ofcourse due to FE assumptions. Is there a way to do the DiD with long form data without FE? Or I have to transform the data in short form first then use the simple -reg- command? I hope I am clear?
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10274

26 Sep 2023, 12:47

If you have Stata 17+, use didregress which will display the number of groups.

Code:

help didregress

Code:

webuse hospdd, clear
didregress (satis)(procedure), group(hospital) time(month)

Res.:

Code:

. didregress (satis)(procedure), group(hospital) time(month)

Number of groups and treatment time

Time variable: month
Control:       procedure = 0
Treatment:     procedure = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
    hospital |        28         18
-------------+---------------------
Time         |
     Minimum |         1          4
     Maximum |         1          4
-----------------------------------

Difference-in-differences regression                     Number of obs = 7,368
Data type: Repeated cross-sectional

                               (Std. err. adjusted for 46 clusters in hospital)
-------------------------------------------------------------------------------
              |               Robust
        satis | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
ATET          |
    procedure |
(New vs Old)  |   .8479879   .0321121    26.41   0.000     .7833108     .912665
-------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.

.

As you are interested in the treatment effect in DID, why does it matter that the coefficients on gender and race are omitted? Effects of such time invariant variables are captured by the fixed effects, so you don't need to separately control for them.

Last edited by Andrew Musau; 26 Sep 2023, 12:55.

Comment

Takudzwa Mutize

Join Date: Sep 2023

Posts: 5
#7

26 Sep 2023, 14:27

Hi Andrew. I am using STATA 15. I also have to mention that I have two treatment groups. I am wondering whether I should use -xt-reg with fe or just reg? A paper I referred to had results for gender, and race (time-invariant) for the DiD: https://www.tandfonline.com/doi/full...8.2016.1171844
my code right now is like this for the long form data: xtreg log_real_income i.PostTreatment##cohortphase1 w_best_age_yrs agesquared2 i.w_a_gen i.new_maristat i.pop_group i.new_employment w_hhsizer i.workerskill i.cohortphase1#workerskill, fe .
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10274
#8

27 Sep 2023, 04:37

It will have to be simple DID as opposed to generalized DID. With Stata 15, use regress and include these time invariant variables.
Comment
Takudzwa Mutize

Join Date: Sep 2023

Posts: 5
#9

27 Sep 2023, 04:47

Hi Andrew. Isnt the regress command only appropriate for one-time point analysis? Since it won't take into account the panel nature?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10274
#10

27 Sep 2023, 04:55

You need a pre and post period in DID (so a minimum of two time periods). See my illustration in #12 of https://www.statalist.org/forums/for...ummy-variables.
Comment

Announcement

Question About Double Observations in Stata DiD Output

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment