Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with csdid (staggered did), getting only 0 coefficients

    Hi everyone,
    I am trying to get csdid working and I keep getting 0 estimates.
    My dataset is individual and it contains birth records including month and year of birth. It goes from 1997 to 2001. My outcome is below_avg_bw - a dummy capturing whether birth weight is below 3500 or not. I want to estimate a staggered did with csdid.
    I created a time variable timevar which goes from 1 to 60, for the first month in my dataset it's 1 (January 1997), then second month 2 (Feb. 1997), etc. My data is repeated cross-section, and I want the see the impact of a policy which affected potentially child's birth weight for children born from June to October 1999. According to my definition of timevar, these are the months 30 to 34 in my dataset. So I set treat_month to 30 for every June, 31 for every July, etc. 34 for every October. All the other months in treat_month are set to 0.
    When I run

    csdid below_avg_bw, time(timevar) gvar(treat_month)

    I get all 0 coefficients, see below just an extract, same for g32-34.

    Outcome model : regression adjustment
    Treatment model: none
    ------------------------------------------------------------------------------
    | Coefficient Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    g30 |
    t_1_2 | 0 (omitted)
    t_2_3 | 0 (omitted)
    t_3_4 | 0 (omitted)
    t_4_5 | 0 (omitted)
    t_5_6 | 0 (omitted)
    t_6_7 | 0 (omitted)
    t_7_8 | 0 (omitted)
    t_8_9 | 0 (omitted)
    t_9_10 | 0 (omitted)
    t_10_11 | 0 (omitted)
    t_11_12 | 0 (omitted)
    t_12_13 | 0 (omitted)
    t_13_14 | 0 (omitted)
    t_14_15 | 0 (omitted)
    t_15_16 | 0 (omitted)
    t_16_17 | 0 (omitted)
    t_17_18 | 0 (omitted)
    t_18_19 | 0 (omitted)
    t_19_20 | 0 (omitted)
    t_20_21 | 0 (omitted)

    This is what my data looks like:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte below_avg_bw float(timevar treat_month)
    1 51  0
    1 35  0
    0 46 34
    1 56 32
    1 18 30
    0 45 33
    1  8 32
    0 15  0
    1 23  0
    0 55 31
    1 57 33
    1 59  0
    1 29  0
    1  9 33
    0  7 31
    0 50  0
    1 55 31
    0 43 31
    1 44 32
    0 17  0
    1  4  0
    1  9 33
    1 24  0
    1 38  0
    0 38  0
    0  9 33
    1 34 34
    1 23  0
    1 39  0
    1 42 30
    1 26  0
    1 49  0
    0 42 30
    1 39  0
    1 54 30
    1 20 32
    0 60  0
    1 33 33
    0 19 31
    1 30 30
    1 19 31
    1 19 31
    1 27  0
    1 36  0
    0 13  0
    1  6 30
    0 18 30
    0 40  0
    1 20 32
    0 21 33
    1 40  0
    1 51  0
    1  8 32
    1 23  0
    1 55 31
    1  1  0
    1 14  0
    0 11  0
    1 49  0
    0 49  0
    1 46 34
    1 47  0
    1 48  0
    0  6 30
    0 49  0
    0 59  0
    0 32 32
    1 33 33
    1 31 31
    1 55 31
    1 59  0
    1 18 30
    0  2  0
    1 32 32
    1  3  0
    1 53  0
    1 13  0
    0 58 34
    0  9 33
    1 32 32
    0 46 34
    1 49  0
    1 14  0
    1 21 33
    1 35  0
    1  2  0
    1 38  0
    1  6 30
    1 52  0
    1 10 34
    1 27  0
    1 23  0
    1 55 31
    1 50  0
    0 15  0
    1 56 32
    1 25  0
    1 25  0
    0 42 30
    1  9 33
    end
    Can someone please help me understand what I am doing wrong? Thank you so much!
    Lara

  • #2
    Hi Lara
    the most likely scenario is that your Gvar is not correctly defined.
    If you can tabulate year gvar, (or month and treatmonth) it will be easy to see if you are having that kind of problem here.
    F

    Comment


    • #3
      Hi Fernando, thanks for your quick response.

      This is the tab of my gvar:
      . tab treat_month

      treat_month | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 97,295 58.29 58.29
      30 | 13,811 8.27 66.56
      31 | 13,957 8.36 74.92
      32 | 13,889 8.32 83.25
      33 | 14,011 8.39 91.64
      34 | 13,955 8.36 100.00
      ------------+-----------------------------------
      Total | 166,918 100.00

      It seems correct to me, but obviously there is something wrong. Do you see anything?

      Also, I noticed that when I run the csdid
      csdid below_avg_bw, time(timevar) gvar(treat_month)
      I somehow lose all observations, this is a part of the output that I did not copy previously in the post:

      Difference-in-difference with Multiple Time Periods

      Number of obs = 0
      Outcome model : regression adjustment
      Treatment model: none
      ------------------------------------------------------------------------------
      | Coefficient Std. err. z P>|z| [95% conf. interval]
      -------------+----------------------------------------------------------------
      g30 |
      t_1_2 | 0 (omitted)
      t_2_3 | 0 (omitted)
      t_3_4 | 0 (omitted)
      ...


      But when I regress:
      reg below_avg_bw timevar treat_month
      everything is ok, I have all 166,918 observations in the regression. Maybe this is the hint, but I still can't figure out what it is.

      Thanks again.
      Lara
      Last edited by Lara Lebed; 20 Dec 2023, 13:28.

      Comment


      • #4
        Hi
        please do the cross tab of time and gvar
        i need to see both

        tab time se treat_month

        Comment


        • #5
          I see no reason to use Callaway-Sant'Anna here. There are no controls, and the staggering is incidental in the sense that the outcome is measured only once. Unlike something like a job training program, where it makes sense to follow individuals across time (even if you can't), being lower birth can happen only one time. With the treatment happening in 5 adjacent month I don't even see a concern about time-varying TE.

          One thing does puzzle me about the data structure. I get that the first two units are controls and they were lower birth, and you observe their births in months 51 and 35, respectively. That makes sense. But then the next two are, evidently, part of the treatment -- in months 34 and 32, respectively. But these months don't match up with tvar, which are 46 and 56. I don't see how this can be. Why aren't these 34 and 32, respectively? Is the tvar when the information was obtained as opposed to the birth month? When I see a difference between 56 (tvar) and 32 (treat month) I see 24 months, which means that 32 can't be the "treatment" and 56 the actual birth month.

          It seems to me that you should simply have a treat variable, set zero for the control group, one for the treated group. I can't see that time plays any particular role outside of determining the treatment group. Then just do a simple regression of below_avg_bw on treat (including a constant), with vce(robust).

          If you get the time index sorted out, you could add i.tvar, but I suspect when properly defined, this is perfectly collinear with treat -- unless there were mothers in the treatment period who were not treated. I can't tell from the data.

          Could you confirm the data structure?


          Comment


          • #6
            Fernando, thanks for your help. Please see the tab output at the end of this post.

            @Jeff, thank you for your response and suggestions. I agree with you that Callaway-Sant'Anna might not be the first option or necessary. We do have controls though (female parents_married years_educ_mother employed_mother age_mother years_educ_father employed_father age_father), I wanted first get the command working and add them later on. We have something similar to a diff-in-diff as the main estimation, but we were asked by reviewers to look at the lags, leads and heterogeneity using Callaway-Sant'Anna.

            Just to give you a quick context, we are looking at the effect of bombing of Serbia on infant weight, we consider children born in the months June to October, 1999 to be treated. In our main regression we have something similar to a diff-in-diff (but no spatial variation) and we compare children born in June to October, 1999 (treated) to children born January to March 1999, and same two periods in the year before 1998 (June to October, 1998 and January to March 1998).

            About the data structure, this is how I understood I should set it up, but there has to be something wrong with it. We use monthly data for 60 months (1997 to 2001), each entry is a birth. So timevar goes from 1 to 60. And we consider children born in June to October, 1999 to be treated. These are the months 30 to 34 in our data. I set the gvar to 0 for all months except June to October for all years. As we consider children born in the months June to October in years other than 1999 to be the control group group, I set treat_month to 30 for June, 31 for July, ...., 34 for October for all years. When timevar is equal to 30 and treat_month is equal to 30, these are the actual treated months. This is how I thought to set up something like a repeated cross section, I consider the months June to October as treated. In the case you mention from the data tvar=56 and treat_month 32, this is because tvar is August, 2021, I set all Augusts to 32, but this is not the year 2019 when children born in August are treated.

            I hope you can help me now with the information I provided..

            . tab timevar treat_month

            treat_month
            timevar 0 30 31 32 33 34 Total

            1 2,704 0 0 0 0 0 2,704
            2 2,867 0 0 0 0 0 2,867
            3 2,843 0 0 0 0 0 2,843
            4 2,793 0 0 0 0 0 2,793
            5 2,796 0 0 0 0 0 2,796
            6 0 2,777 0 0 0 0 2,777
            7 0 0 2,751 0 0 0 2,751
            8 0 0 0 2,714 0 0 2,714
            9 0 0 0 0 2,800 0 2,800
            10 0 0 0 0 0 2,717 2,717
            11 2,755 0 0 0 0 0 2,755
            12 2,710 0 0 0 0 0 2,710
            13 2,768 0 0 0 0 0 2,768
            14 2,735 0 0 0 0 0 2,735
            15 2,826 0 0 0 0 0 2,826
            16 2,740 0 0 0 0 0 2,740
            17 2,761 0 0 0 0 0 2,761
            18 0 2,731 0 0 0 0 2,731
            19 0 0 2,778 0 0 0 2,778
            20 0 0 0 2,765 0 0 2,765
            21 0 0 0 0 2,822 0 2,822
            22 0 0 0 0 0 2,842 2,842
            23 2,777 0 0 0 0 0 2,777
            24 2,769 0 0 0 0 0 2,769
            25 2,732 0 0 0 0 0 2,732
            26 2,815 0 0 0 0 0 2,815
            27 2,844 0 0 0 0 0 2,844
            28 2,738 0 0 0 0 0 2,738
            29 2,762 0 0 0 0 0 2,762
            30 0 2,721 0 0 0 0 2,721
            31 0 0 2,875 0 0 0 2,875
            32 0 0 0 2,796 0 0 2,796
            33 0 0 0 0 2,789 0 2,789
            34 0 0 0 0 0 2,794 2,794
            35 2,831 0 0 0 0 0 2,831
            36 2,771 0 0 0 0 0 2,771
            37 2,783 0 0 0 0 0 2,783
            38 2,815 0 0 0 0 0 2,815
            39 2,904 0 0 0 0 0 2,904
            40 2,802 0 0 0 0 0 2,802
            41 2,738 0 0 0 0 0 2,738
            42 0 2,784 0 0 0 0 2,784
            43 0 0 2,733 0 0 0 2,733
            44 0 0 0 2,821 0 0 2,821
            45 0 0 0 0 2,772 0 2,772
            46 0 0 0 0 0 2,798 2,798
            47 2,805 0 0 0 0 0 2,805
            48 2,766 0 0 0 0 0 2,766
            49 2,758 0 0 0 0 0 2,758
            50 2,741 0 0 0 0 0 2,741
            51 2,813 0 0 0 0 0 2,813
            52 2,710 0 0 0 0 0 2,710
            53 2,821 0 0 0 0 0 2,821
            54 0 2,798 0 0 0 0 2,798
            55 0 0 2,820 0 0 0 2,820
            56 0 0 0 2,793 0 0 2,793
            57 0 0 0 0 2,828 0 2,828
            58 0 0 0 0 0 2,804 2,804
            59 2,721 0 0 0 0 0 2,721
            60 2,781 0 0 0 0 0 2,781

            Total 97,295 13,811 13,957 13,889 14,011 13,955 166,918

            Comment


            • #7
              Hi Lara

              Thanks for the additional information. In regards to CSDID application, the problem is related to how gvar is created. Specifically, the way you have it set up, you cannot see a "treated unit" before treatment happens.
              If you open the helpfile and example dataset, you will see the correct way the data should look.

              Now regarding your data itself. I have a few comments and questions
              1. Is the data panel or repeated crossection?
              2. Because "treatment" was applied to everyone, i don't think you have a setup for DID. I think that at best, you can make a comparison across both groups (born before (not-treated) and born after (treated)) looking at "weight" for different age groups.
              3. Perhaps another alternative would be to use age as the time variable, and the age when they would have been when the bombing happened as the Gvar (time of treatment). THis, however, will not allow you to estimate the impact of weight among the ones born after the bombing, only those born before it.

              Perhaps Jeff Wooldridge had other insights on your specific case.
              Best wishes
              Fernando

              Comment


              • #8
                Thanks so much for your response, Fernando. Much appreciated. I understand that my treated units are observed only once. I would like to construct a pseudo panel. Maybe you can give me a hint how to do this.

                My data is cross-sectional, I observe each individual at their birth and I observe his/her weight. My treated individuals are born between June and October, 1999. What I would like to do is to use individuals born between June and October in 1998 (and earlier or later years, but same months) as treated individual before treatment and consider their outcomes as the counterfactual to my treated group. The other months January to May, November - December would be never treated. In a way I would need to prepare data as a repeated cross-section, but I am not sure how to do this. I though that by setting gvar to 30, 31, 32, 33, 34 (the actual treated months) for all births from June to July for all years, I would achieve this, but this is not the case.

                As to your questions.
                1. The data is cross-sectional, but I would like to use it as repeated cross section.
                2. Yes, treatment is universal for the months June to October. I don't have age groups, I have only weight at birth, so at one point of time.
                3. Can't use age as time variable, because there is not age in my dataset.

                If you have any idea how to structure my dataset so that I can do use it as repeated cross section, please let me know. I am right now looking at the repeated cross section example and trying to figure out if I can restructure the data in a similar way.

                Thanks again and best wishes,
                Lara
                Last edited by Lara Lebed; 21 Dec 2023, 08:18.

                Comment

                Working...
                X