Problem with csdid (staggered did), getting only 0 coefficients

Lara Lebed

Join Date: Dec 2023

Posts: 4
#1

Problem with csdid (staggered did), getting only 0 coefficients

20 Dec 2023, 08:22

Hi everyone,
I am trying to get csdid working and I keep getting 0 estimates.
My dataset is individual and it contains birth records including month and year of birth. It goes from 1997 to 2001. My outcome is below_avg_bw - a dummy capturing whether birth weight is below 3500 or not. I want to estimate a staggered did with csdid.
I created a time variable timevar which goes from 1 to 60, for the first month in my dataset it's 1 (January 1997), then second month 2 (Feb. 1997), etc. My data is repeated cross-section, and I want the see the impact of a policy which affected potentially child's birth weight for children born from June to October 1999. According to my definition of timevar, these are the months 30 to 34 in my dataset. So I set treat_month to 30 for every June, 31 for every July, etc. 34 for every October. All the other months in treat_month are set to 0.
When I run

csdid below_avg_bw, time(timevar) gvar(treat_month)

I get all 0 coefficients, see below just an extract, same for g32-34.

Outcome model : regression adjustment
Treatment model: none
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g30 |
t_1_2 | 0 (omitted)
t_2_3 | 0 (omitted)
t_3_4 | 0 (omitted)
t_4_5 | 0 (omitted)
t_5_6 | 0 (omitted)
t_6_7 | 0 (omitted)
t_7_8 | 0 (omitted)
t_8_9 | 0 (omitted)
t_9_10 | 0 (omitted)
t_10_11 | 0 (omitted)
t_11_12 | 0 (omitted)
t_12_13 | 0 (omitted)
t_13_14 | 0 (omitted)
t_14_15 | 0 (omitted)
t_15_16 | 0 (omitted)
t_16_17 | 0 (omitted)
t_17_18 | 0 (omitted)
t_18_19 | 0 (omitted)
t_19_20 | 0 (omitted)
t_20_21 | 0 (omitted)

This is what my data looks like:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte below_avg_bw float(timevar treat_month) 1 51 0 1 35 0 0 46 34 1 56 32 1 18 30 0 45 33 1 8 32 0 15 0 1 23 0 0 55 31 1 57 33 1 59 0 1 29 0 1 9 33 0 7 31 0 50 0 1 55 31 0 43 31 1 44 32 0 17 0 1 4 0 1 9 33 1 24 0 1 38 0 0 38 0 0 9 33 1 34 34 1 23 0 1 39 0 1 42 30 1 26 0 1 49 0 0 42 30 1 39 0 1 54 30 1 20 32 0 60 0 1 33 33 0 19 31 1 30 30 1 19 31 1 19 31 1 27 0 1 36 0 0 13 0 1 6 30 0 18 30 0 40 0 1 20 32 0 21 33 1 40 0 1 51 0 1 8 32 1 23 0 1 55 31 1 1 0 1 14 0 0 11 0 1 49 0 0 49 0 1 46 34 1 47 0 1 48 0 0 6 30 0 49 0 0 59 0 0 32 32 1 33 33 1 31 31 1 55 31 1 59 0 1 18 30 0 2 0 1 32 32 1 3 0 1 53 0 1 13 0 0 58 34 0 9 33 1 32 32 0 46 34 1 49 0 1 14 0 1 21 33 1 35 0 1 2 0 1 38 0 1 6 30 1 52 0 1 10 34 1 27 0 1 23 0 1 55 31 1 50 0 0 15 0 1 56 32 1 25 0 1 25 0 0 42 30 1 9 33 end

Can someone please help me understand what I am doing wrong? Thank you so much!
Lara
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2476
#2

20 Dec 2023, 08:39

Hi Lara
the most likely scenario is that your Gvar is not correctly defined.
If you can tabulate year gvar, (or month and treatmonth) it will be easy to see if you are having that kind of problem here.
F
Comment
Lara Lebed

Join Date: Dec 2023

Posts: 4
#3

20 Dec 2023, 12:58

Hi Fernando, thanks for your quick response.

This is the tab of my gvar:
. tab treat_month

treat_month | Freq. Percent Cum.
------------+-----------------------------------
0 | 97,295 58.29 58.29
30 | 13,811 8.27 66.56
31 | 13,957 8.36 74.92
32 | 13,889 8.32 83.25
33 | 14,011 8.39 91.64
34 | 13,955 8.36 100.00
------------+-----------------------------------
Total | 166,918 100.00

It seems correct to me, but obviously there is something wrong. Do you see anything?

Also, I noticed that when I run the csdid
csdid below_avg_bw, time(timevar) gvar(treat_month)
I somehow lose all observations, this is a part of the output that I did not copy previously in the post:

Difference-in-difference with Multiple Time Periods

Number of obs = 0
Outcome model : regression adjustment
Treatment model: none
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g30 |
t_1_2 | 0 (omitted)
t_2_3 | 0 (omitted)
t_3_4 | 0 (omitted)
...

But when I regress:
reg below_avg_bw timevar treat_month
everything is ok, I have all 166,918 observations in the regression. Maybe this is the hint, but I still can't figure out what it is.

Thanks again.
Lara

Last edited by Lara Lebed; 20 Dec 2023, 13:28.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2476
#4

20 Dec 2023, 18:06

Hi
please do the cross tab of time and gvar
i need to see both

tab time se treat_month
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2183
#5

20 Dec 2023, 19:37

I see no reason to use Callaway-Sant'Anna here. There are no controls, and the staggering is incidental in the sense that the outcome is measured only once. Unlike something like a job training program, where it makes sense to follow individuals across time (even if you can't), being lower birth can happen only one time. With the treatment happening in 5 adjacent month I don't even see a concern about time-varying TE.

One thing does puzzle me about the data structure. I get that the first two units are controls and they were lower birth, and you observe their births in months 51 and 35, respectively. That makes sense. But then the next two are, evidently, part of the treatment -- in months 34 and 32, respectively. But these months don't match up with tvar, which are 46 and 56. I don't see how this can be. Why aren't these 34 and 32, respectively? Is the tvar when the information was obtained as opposed to the birth month? When I see a difference between 56 (tvar) and 32 (treat month) I see 24 months, which means that 32 can't be the "treatment" and 56 the actual birth month.

It seems to me that you should simply have a treat variable, set zero for the control group, one for the treated group. I can't see that time plays any particular role outside of determining the treatment group. Then just do a simple regression of below_avg_bw on treat (including a constant), with vce(robust).

If you get the time index sorted out, you could add i.tvar, but I suspect when properly defined, this is perfectly collinear with treat -- unless there were mothers in the treatment period who were not treated. I can't tell from the data.

Could you confirm the data structure?
Comment
Lara Lebed

Join Date: Dec 2023

Posts: 4
#6

21 Dec 2023, 02:59

Fernando, thanks for your help. Please see the tab output at the end of this post.

@Jeff, thank you for your response and suggestions. I agree with you that Callaway-Sant'Anna might not be the first option or necessary. We do have controls though (female parents_married years_educ_mother employed_mother age_mother years_educ_father employed_father age_father), I wanted first get the command working and add them later on. We have something similar to a diff-in-diff as the main estimation, but we were asked by reviewers to look at the lags, leads and heterogeneity using Callaway-Sant'Anna.

Just to give you a quick context, we are looking at the effect of bombing of Serbia on infant weight, we consider children born in the months June to October, 1999 to be treated. In our main regression we have something similar to a diff-in-diff (but no spatial variation) and we compare children born in June to October, 1999 (treated) to children born January to March 1999, and same two periods in the year before 1998 (June to October, 1998 and January to March 1998).

About the data structure, this is how I understood I should set it up, but there has to be something wrong with it. We use monthly data for 60 months (1997 to 2001), each entry is a birth. So timevar goes from 1 to 60. And we consider children born in June to October, 1999 to be treated. These are the months 30 to 34 in our data. I set the gvar to 0 for all months except June to October for all years. As we consider children born in the months June to October in years other than 1999 to be the control group group, I set treat_month to 30 for June, 31 for July, ...., 34 for October for all years. When timevar is equal to 30 and treat_month is equal to 30, these are the actual treated months. This is how I thought to set up something like a repeated cross section, I consider the months June to October as treated. In the case you mention from the data tvar=56 and treat_month 32, this is because tvar is August, 2021, I set all Augusts to 32, but this is not the year 2019 when children born in August are treated.

I hope you can help me now with the information I provided..

. tab timevar treat_month

treat_month
timevar 0 30 31 32 33 34 Total

1 2,704 0 0 0 0 0 2,704
2 2,867 0 0 0 0 0 2,867
3 2,843 0 0 0 0 0 2,843
4 2,793 0 0 0 0 0 2,793
5 2,796 0 0 0 0 0 2,796
6 0 2,777 0 0 0 0 2,777
7 0 0 2,751 0 0 0 2,751
8 0 0 0 2,714 0 0 2,714
9 0 0 0 0 2,800 0 2,800
10 0 0 0 0 0 2,717 2,717
11 2,755 0 0 0 0 0 2,755
12 2,710 0 0 0 0 0 2,710
13 2,768 0 0 0 0 0 2,768
14 2,735 0 0 0 0 0 2,735
15 2,826 0 0 0 0 0 2,826
16 2,740 0 0 0 0 0 2,740
17 2,761 0 0 0 0 0 2,761
18 0 2,731 0 0 0 0 2,731
19 0 0 2,778 0 0 0 2,778
20 0 0 0 2,765 0 0 2,765
21 0 0 0 0 2,822 0 2,822
22 0 0 0 0 0 2,842 2,842
23 2,777 0 0 0 0 0 2,777
24 2,769 0 0 0 0 0 2,769
25 2,732 0 0 0 0 0 2,732
26 2,815 0 0 0 0 0 2,815
27 2,844 0 0 0 0 0 2,844
28 2,738 0 0 0 0 0 2,738
29 2,762 0 0 0 0 0 2,762
30 0 2,721 0 0 0 0 2,721
31 0 0 2,875 0 0 0 2,875
32 0 0 0 2,796 0 0 2,796
33 0 0 0 0 2,789 0 2,789
34 0 0 0 0 0 2,794 2,794
35 2,831 0 0 0 0 0 2,831
36 2,771 0 0 0 0 0 2,771
37 2,783 0 0 0 0 0 2,783
38 2,815 0 0 0 0 0 2,815
39 2,904 0 0 0 0 0 2,904
40 2,802 0 0 0 0 0 2,802
41 2,738 0 0 0 0 0 2,738
42 0 2,784 0 0 0 0 2,784
43 0 0 2,733 0 0 0 2,733
44 0 0 0 2,821 0 0 2,821
45 0 0 0 0 2,772 0 2,772
46 0 0 0 0 0 2,798 2,798
47 2,805 0 0 0 0 0 2,805
48 2,766 0 0 0 0 0 2,766
49 2,758 0 0 0 0 0 2,758
50 2,741 0 0 0 0 0 2,741
51 2,813 0 0 0 0 0 2,813
52 2,710 0 0 0 0 0 2,710
53 2,821 0 0 0 0 0 2,821
54 0 2,798 0 0 0 0 2,798
55 0 0 2,820 0 0 0 2,820
56 0 0 0 2,793 0 0 2,793
57 0 0 0 0 2,828 0 2,828
58 0 0 0 0 0 2,804 2,804
59 2,721 0 0 0 0 0 2,721
60 2,781 0 0 0 0 0 2,781

Total 97,295 13,811 13,957 13,889 14,011 13,955 166,918
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2476
#7

21 Dec 2023, 05:24

Hi Lara

Thanks for the additional information. In regards to CSDID application, the problem is related to how gvar is created. Specifically, the way you have it set up, you cannot see a "treated unit" before treatment happens.
If you open the helpfile and example dataset, you will see the correct way the data should look.

Now regarding your data itself. I have a few comments and questions
1. Is the data panel or repeated crossection?
2. Because "treatment" was applied to everyone, i don't think you have a setup for DID. I think that at best, you can make a comparison across both groups (born before (not-treated) and born after (treated)) looking at "weight" for different age groups.
3. Perhaps another alternative would be to use age as the time variable, and the age when they would have been when the bombing happened as the Gvar (time of treatment). THis, however, will not allow you to estimate the impact of weight among the ones born after the bombing, only those born before it.

Perhaps Jeff Wooldridge had other insights on your specific case.
Best wishes
Fernando
Comment
Lara Lebed

Join Date: Dec 2023

Posts: 4
#8

21 Dec 2023, 08:08

Thanks so much for your response, Fernando. Much appreciated. I understand that my treated units are observed only once. I would like to construct a pseudo panel. Maybe you can give me a hint how to do this.

My data is cross-sectional, I observe each individual at their birth and I observe his/her weight. My treated individuals are born between June and October, 1999. What I would like to do is to use individuals born between June and October in 1998 (and earlier or later years, but same months) as treated individual before treatment and consider their outcomes as the counterfactual to my treated group. The other months January to May, November - December would be never treated. In a way I would need to prepare data as a repeated cross-section, but I am not sure how to do this. I though that by setting gvar to 30, 31, 32, 33, 34 (the actual treated months) for all births from June to July for all years, I would achieve this, but this is not the case.

As to your questions.
1. The data is cross-sectional, but I would like to use it as repeated cross section.
2. Yes, treatment is universal for the months June to October. I don't have age groups, I have only weight at birth, so at one point of time.
3. Can't use age as time variable, because there is not age in my dataset.

If you have any idea how to structure my dataset so that I can do use it as repeated cross section, please let me know. I am right now looking at the repeated cross section example and trying to figure out if I can restructure the data in a similar way.

Thanks again and best wishes,
Lara

Last edited by Lara Lebed; 21 Dec 2023, 08:18.
Comment

Announcement

Problem with csdid (staggered did), getting only 0 coefficients

Comment

Comment

Comment

Comment

Comment

Comment

Comment