Problem with csdid

chris maizano

Join Date: Feb 2026

Posts: 3
#1

Problem with csdid

18 Feb 2026, 02:24

Hi, I'm trying to do a staggered DID whose treatment is at a group level (CCC_No). For context, some CCC_No have other subgroups but mostly do not (i'm mentioning it here bec i don't know if this is one of the reason). My unit of observation is household and is from a repeated cross-sectional survey. I built the unbalanced panel dataset by grouping respondents by city. A CCC_No can cover usually just one but sometimes more than one city.

I'm trying to run this command for the staggered DID but i am getting this error:

. csdid reliability ///
> educ8 H05_AGE HSIZE Min_Charge, ///
> ivar(CCC_No) ///
> time(WaveYear) ///
> gvar(cohort_strict) ///
> method(dripw) ///
> vce(cluster CCC_No)
repeated time values within panel
r(451);

end of do-file

r(451);

I am also including here a tabulation of the observations of CCC_No per WaveYear. As you can see, there are instances where there are no respondents under a CCC_No across some of the survey waves and I am also not sure if this could also be the culprit:

WaveYear
CCC_No 2019 2020 2022 2024 Total

1 406 404 394 399 1,603
2 380 365 390 406 1,541
4 46 37 14 15 112
5 402 425 412 402 1,641
6 413 408 459 422 1,702
7 372 344 389 386 1,491
11 898 1,036 1,208 1,216 4,358
12 29 53 61 23 166
13 0 35 52 0 87
17 214 250 260 273 997
18 289 332 395 372 1,388
19 324 398 371 391 1,484
21 0 24 35 0 59
23 23 87 35 73 218
24 0 0 27 40 67
25 67 75 39 82 263
26 335 359 383 386 1,463
27 21 4 0 24 49
29 28 0 65 84 177
30 222 228 250 308 1,008
32 58 55 121 50 284
33 42 35 64 77 218
35 91 80 120 101 392
36 0 26 41 0 67
39 62 0 43 50 155
41 48 35 38 27 148
47 53 34 45 31 163
49 33 37 0 14 84
50 68 53 26 53 200
51 33 24 32 50 139
53 0 43 0 40 83
61 54 14 80 88 236
65 44 61 71 57 233
69 28 28 62 27 145
70 16 11 19 33 79
71 344 372 419 411 1,546
83 29 63 93 48 233
90 46 19 0 11 76
92 62 0 23 22 107
102 35 0 18 24 77
105 13 18 22 14 67
106 0 24 46 13 83
107 85 31 88 79 283
116 45 56 82 23 206
117 0 15 0 22 37
123 65 67 93 121 346
124 39 63 44 26 172
129 28 46 0 26 100
141 50 10 81 34 175
147 100 87 144 78 409
149 50 23 33 33 139
156 62 122 43 86 313
158 53 20 40 14 127
163 10 15 32 29 86
165 0 16 15 18 49
173 29 13 75 97 214
175 134 99 162 83 478
179 28 37 31 46 142
183 7 88 0 14 109
184 62 10 49 18 139
221 23 50 29 33 135
243 33 54 39 59 185
247 58 44 48 51 201
250 21 106 13 39 179
252 13 20 0 28 61
284 74 104 86 114 378
288 13 0 35 0 48
291 102 58 95 40 295
297 15 0 44 19 78
317 0 19 11 18 48
322 317 386 396 419 1,518
324 107 101 114 137 459
328 19 27 18 0 64
330 137 139 161 150 587
333 0 29 26 0 55
343 17 23 0 0 40
370 343 381 379 375 1,478
386 14 15 15 22 66
407 0 22 0 32 54
530 0 22 23 16 61
533 15 32 16 0 63
564 22 16 14 14 66
571 13 61 0 84 158
574 9 51 0 8 68
577 8 0 4 20 32
596 15 20 64 15 114
694 23 42 47 57 169

Total 7,886 8,536 9,311 9,140 34,873

Your inputs are very much appreciated. Thank you.

Last edited by chris maizano; 18 Feb 2026, 02:54.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#2

18 Feb 2026, 09:13

repeated time values within panel
r(451);

While sometimes error messages in Stata are obscure or misleading (and, perhaps, more so in user-written commands), usually the best way to begin investigating is to take the message seriously. The message you are getting is telling you that there is a problem with your data whereby some panels (presumably CCC_No's) have multiple observations with the same value of WaveYear.

You can verify that this is the source of the problem by running:

Code:

isid CCC_No WaveYear

If, as I suspect, this is your problem, then this command, too, will generate an error message. (If this is not the case, then this command produces no output at all, and then the situation will be more complicated.)

Assuming you get an error message from that command, then you have to figure out why your data are unsuitable. Start with

Code:

duplicates tag CCC_No WaveYear, gen(d_flag) browse if d_flag

so you can see the offending CCC_No's. It may be that you have surplus copies of some of the observations in your data. In that case, you have to trace back over the data management that created the data and find out where those surplus copies crept in. While you are fixing that, there is a fair chance you will find other errors, so take the opportunity to fix those as well.

Or it may be that you are improperly specifying the panel structure of your data. For example, you mentioned that some CCC_Nos cover multiple cities, but most only cover 1. If you have an observation for each city in each WaveYear, then there is no CCC_No WaveYear panel structure: it is city WaveYear panel data. In that case you either have to reduce the multi-city CCC_Nos to single observations, calculating suitable CCC_No level aggregate values for all the variables from their city-level values, or you need to reconceptualize your DID approach, and change the -csdid- command's -ivar()- option accordingly, with the city, rather than the CCC_No as the unit of analysis.
Comment
chris maizano

Join Date: Feb 2026

Posts: 3
#3

19 Feb 2026, 02:17

Thank you very much for your response.

CCC_No would appear multiple times for each WaveYear since my unit of observation is household. The survey dataset I'm using is a household survey. I built my cross-sectional panel from this by identifying which CCC_No handles specific city/ies. Would it be still be possible to do staggered DID via csdid given this setup (i.e. treatment is CCC_No level but unit of observation is household)?

Thank you very much in advance for any inputs!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#4

19 Feb 2026, 08:57

Would it be still be possible to do staggered DID via csdid given this setup (i.e. treatment is CCC_No level but unit of observation is household)?

I think the solution here is to change the code slightly:

Code:

csdid reliability /// educ8 H05_AGE HSIZE Min_Charge, /// ivar(household_id) /// time(WaveYear) /// gvar(cohort_strict) /// method(dripw) /// vce(cluster CCC_No)

replacing household_id by the actual name of the variable which uniquely identifies households.
Comment
chris maizano

Join Date: Feb 2026

Posts: 3
#5

19 Feb 2026, 23:56

Originally posted by Clyde Schechter View Post

I think the solution here is to change the code slightly:

Code:

csdid reliability /// educ8 H05_AGE HSIZE Min_Charge, /// ivar(household_id) /// time(WaveYear) /// gvar(cohort_strict) /// method(dripw) /// vce(cluster CCC_No)

replacing household_id by the actual name of the variable which uniquely identifies households.

Thank you very much for this. I have a question though, wouldn't this mean that the panel unit here is households? My dataset is a repeated cross-section and as such, households surveyed are not the same across the survey waves. Despite this, given that the city information of respondents is available, I was able to tag which CCC_No serves a particular household depending on the city that they live in, which in my head would kind of be a panel of CCC_No. While there are cases where some cities do not consistently have respondents across the 4 waves of dataset that I have, I was hoping that maybe I can still do staggered DID with households as my unit of observation. I can collapse this by CCC_No and do staggered DID from there but I really wanted to preserve the original identification strategy.

I am hoping this makes sense and would appreciate further inputs from everyone. Thank you so much.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#6

20 Feb 2026, 10:33

...wouldn't this mean that the panel unit here is households? My dataset is a repeated cross-section...

You are correct that the code offered in #4 was predicated on the (mis)understanding that your data is panel data at the household level.

Since it is now clear to me that you do not have panel data at all, the solution is simply to not specify the -ivar()- option at all. As the -csdid- help file makes clear, -ivar()- is used to specify the panel identifier when you have panel data, and that analysis of repeated cross-section data is specified by omitting -ivar()-.
Comment

Announcement

Problem with csdid

Comment

Comment

Comment

Comment

Comment