Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with csdid

    Hi, I'm trying to do a staggered DID whose treatment is at a group level (CCC_No). For context, some CCC_No have other subgroups but mostly do not (i'm mentioning it here bec i don't know if this is one of the reason). My unit of observation is household and is from a repeated cross-sectional survey. I built the unbalanced panel dataset by grouping respondents by city. A CCC_No can cover usually just one but sometimes more than one city.

    I'm trying to run this command for the staggered DID but i am getting this error:

    . csdid reliability ///
    > educ8 H05_AGE HSIZE Min_Charge, ///
    > ivar(CCC_No) ///
    > time(WaveYear) ///
    > gvar(cohort_strict) ///
    > method(dripw) ///
    > vce(cluster CCC_No)
    repeated time values within panel
    r(451);

    end of do-file

    r(451);

    I am also including here a tabulation of the observations of CCC_No per WaveYear. As you can see, there are instances where there are no respondents under a CCC_No across some of the survey waves and I am also not sure if this could also be the culprit:

    WaveYear
    CCC_No 2019 2020 2022 2024 Total

    1 406 404 394 399 1,603
    2 380 365 390 406 1,541
    4 46 37 14 15 112
    5 402 425 412 402 1,641
    6 413 408 459 422 1,702
    7 372 344 389 386 1,491
    11 898 1,036 1,208 1,216 4,358
    12 29 53 61 23 166
    13 0 35 52 0 87
    17 214 250 260 273 997
    18 289 332 395 372 1,388
    19 324 398 371 391 1,484
    21 0 24 35 0 59
    23 23 87 35 73 218
    24 0 0 27 40 67
    25 67 75 39 82 263
    26 335 359 383 386 1,463
    27 21 4 0 24 49
    29 28 0 65 84 177
    30 222 228 250 308 1,008
    32 58 55 121 50 284
    33 42 35 64 77 218
    35 91 80 120 101 392
    36 0 26 41 0 67
    39 62 0 43 50 155
    41 48 35 38 27 148
    47 53 34 45 31 163
    49 33 37 0 14 84
    50 68 53 26 53 200
    51 33 24 32 50 139
    53 0 43 0 40 83
    61 54 14 80 88 236
    65 44 61 71 57 233
    69 28 28 62 27 145
    70 16 11 19 33 79
    71 344 372 419 411 1,546
    83 29 63 93 48 233
    90 46 19 0 11 76
    92 62 0 23 22 107
    102 35 0 18 24 77
    105 13 18 22 14 67
    106 0 24 46 13 83
    107 85 31 88 79 283
    116 45 56 82 23 206
    117 0 15 0 22 37
    123 65 67 93 121 346
    124 39 63 44 26 172
    129 28 46 0 26 100
    141 50 10 81 34 175
    147 100 87 144 78 409
    149 50 23 33 33 139
    156 62 122 43 86 313
    158 53 20 40 14 127
    163 10 15 32 29 86
    165 0 16 15 18 49
    173 29 13 75 97 214
    175 134 99 162 83 478
    179 28 37 31 46 142
    183 7 88 0 14 109
    184 62 10 49 18 139
    221 23 50 29 33 135
    243 33 54 39 59 185
    247 58 44 48 51 201
    250 21 106 13 39 179
    252 13 20 0 28 61
    284 74 104 86 114 378
    288 13 0 35 0 48
    291 102 58 95 40 295
    297 15 0 44 19 78
    317 0 19 11 18 48
    322 317 386 396 419 1,518
    324 107 101 114 137 459
    328 19 27 18 0 64
    330 137 139 161 150 587
    333 0 29 26 0 55
    343 17 23 0 0 40
    370 343 381 379 375 1,478
    386 14 15 15 22 66
    407 0 22 0 32 54
    530 0 22 23 16 61
    533 15 32 16 0 63
    564 22 16 14 14 66
    571 13 61 0 84 158
    574 9 51 0 8 68
    577 8 0 4 20 32
    596 15 20 64 15 114
    694 23 42 47 57 169

    Total 7,886 8,536 9,311 9,140 34,873

    Your inputs are very much appreciated. Thank you.
    Last edited by chris maizano; 18 Feb 2026, 02:54.

  • #2
    repeated time values within panel
    r(451);
    While sometimes error messages in Stata are obscure or misleading (and, perhaps, more so in user-written commands), usually the best way to begin investigating is to take the message seriously. The message you are getting is telling you that there is a problem with your data whereby some panels (presumably CCC_No's) have multiple observations with the same value of WaveYear.

    You can verify that this is the source of the problem by running:
    Code:
    isid CCC_No WaveYear
    If, as I suspect, this is your problem, then this command, too, will generate an error message. (If this is not the case, then this command produces no output at all, and then the situation will be more complicated.)

    Assuming you get an error message from that command, then you have to figure out why your data are unsuitable. Start with
    Code:
    duplicates tag CCC_No WaveYear, gen(d_flag)
    browse if d_flag
    so you can see the offending CCC_No's. It may be that you have surplus copies of some of the observations in your data. In that case, you have to trace back over the data management that created the data and find out where those surplus copies crept in. While you are fixing that, there is a fair chance you will find other errors, so take the opportunity to fix those as well.

    Or it may be that you are improperly specifying the panel structure of your data. For example, you mentioned that some CCC_Nos cover multiple cities, but most only cover 1. If you have an observation for each city in each WaveYear, then there is no CCC_No WaveYear panel structure: it is city WaveYear panel data. In that case you either have to reduce the multi-city CCC_Nos to single observations, calculating suitable CCC_No level aggregate values for all the variables from their city-level values, or you need to reconceptualize your DID approach, and change the -csdid- command's -ivar()- option accordingly, with the city, rather than the CCC_No as the unit of analysis.

    Comment


    • #3
      Thank you very much for your response.

      CCC_No would appear multiple times for each WaveYear since my unit of observation is household. The survey dataset I'm using is a household survey. I built my cross-sectional panel from this by identifying which CCC_No handles specific city/ies. Would it be still be possible to do staggered DID via csdid given this setup (i.e. treatment is CCC_No level but unit of observation is household)?

      Thank you very much in advance for any inputs!

      Comment


      • #4
        Would it be still be possible to do staggered DID via csdid given this setup (i.e. treatment is CCC_No level but unit of observation is household)?
        I think the solution here is to change the code slightly:
        Code:
        csdid reliability ///
             educ8 H05_AGE HSIZE Min_Charge, ///
             ivar(household_id) ///
             time(WaveYear) ///
             gvar(cohort_strict) ///
             method(dripw) ///
             vce(cluster CCC_No)
        replacing household_id by the actual name of the variable which uniquely identifies households.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          I think the solution here is to change the code slightly:
          Code:
          csdid reliability ///
          educ8 H05_AGE HSIZE Min_Charge, ///
          ivar(household_id) ///
          time(WaveYear) ///
          gvar(cohort_strict) ///
          method(dripw) ///
          vce(cluster CCC_No)
          replacing household_id by the actual name of the variable which uniquely identifies households.
          Thank you very much for this. I have a question though, wouldn't this mean that the panel unit here is households? My dataset is a repeated cross-section and as such, households surveyed are not the same across the survey waves. Despite this, given that the city information of respondents is available, I was able to tag which CCC_No serves a particular household depending on the city that they live in, which in my head would kind of be a panel of CCC_No. While there are cases where some cities do not consistently have respondents across the 4 waves of dataset that I have, I was hoping that maybe I can still do staggered DID with households as my unit of observation. I can collapse this by CCC_No and do staggered DID from there but I really wanted to preserve the original identification strategy.

          I am hoping this makes sense and would appreciate further inputs from everyone. Thank you so much.

          Comment


          • #6
            ...wouldn't this mean that the panel unit here is households? My dataset is a repeated cross-section...
            You are correct that the code offered in #4 was predicated on the (mis)understanding that your data is panel data at the household level.

            Since it is now clear to me that you do not have panel data at all, the solution is simply to not specify the -ivar()- option at all. As the -csdid- help file makes clear, -ivar()- is used to specify the panel identifier when you have panel data, and that analysis of repeated cross-section data is specified by omitting -ivar()-.

            Comment

            Working...
            X