Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Fernando

    Thanks for the advice! I'll try with a more consistent group / run some placebo tests and see how it looks.

    Thanks again!

    David

    Comment


    • I have seen the same error in some other posts, but none of the solutions worked for me. I am running the following code with the long2 option:

      Code:
      csdid per_des,  ivar(id_municipio) gvar(lag_treat) time(ano) long2 method(reg)
      estat event, window(-5 5) estore(cs)
      csdid_plot, legen(off) xlabel(#10,labsize(large)) xtitle("Years relative to treatment",size(large)) ylabel(#5,labsize(large)) ytitle("ATT",size(large))
      estat simple, estore(arrecadacao_total)
      graph export "figures/desmatamento_total.pdf", replace
      Nevertheless, I keep getting this error message:

      Code:
      . estat event, window(-5 5) estore(cs)
      ATT by Periods Before and After treatment
      Event Study:Dynamic effects
                             *:  3200  conformability error
                 csdid_event():     -  function returned error
                       <istmt>:     -  function returned error
      Last edited by Mateus Maciel; 13 Feb 2026, 17:48.

      Comment


      • Mateus: Is it true the error disappears if you drop window(-5 5)? You might try jwdid, as that accomplishes the same thing via running a single TWFE regression.

        Code:
        jwdid per_des, ivar(id_municipio) tvar(ano) gvar(lag_treat) never
        estat event, window(-5 5) estore(cs)

        Comment


        • Dear @FernandoRios,

          Thank you for developing the incredibly helpful csdid package. I am currently using it for my Master's thesis to estimate the dynamic effects of a fertility shock on gig economy income, using a balanced panel data from 2011 to 2023. My treatment cohorts (first childbirth) occur between 2017 and 2023.

          I am running the following command:

          Code:
          csdid gigjob_income c.age##c.age, ivar(mom_id) time(year) gvar(first_birth_year) long2
          
          estat event, window(-4 4)
          However, the output table displays event-time coefficients starting from Tm5 instead of Tm4, as shown below:

          Code:
          ------------------------------------------------------------------------------
                      | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
              Pre_avg |  -44.71372   29.70189    -1.51   0.132    -102.9283    13.50091
             Post_avg |   516.9128   70.77064     7.30   0.000     378.2049    655.6207
                  Tm5 |  -44.67074   32.52688    -1.37   0.170    -108.4223    19.08079
                  Tm4 |  -42.92353   32.55161    -1.32   0.187    -106.7235    20.87647
                  Tm3 |  -43.42168   31.83051    -1.36   0.173    -105.8083    18.96496
                  Tm2 |  -47.83893   27.54437    -1.74   0.082    -101.8249    6.147045
                  Tp0 |   7.317326   25.82761     0.28   0.777    -43.30385     57.9385
                  Tp1 |     177.32   44.67611     3.97   0.000     89.75648    264.8836
                  Tp2 |    526.611   69.73675     7.55   0.000     389.9295    663.2926
                  Tp3 |   791.2632   106.6887     7.42   0.000     582.1571    1000.369
                  Tp4 |   1082.053   157.5165     6.87   0.000      773.326    1390.779
          ------------------------------------------------------------------------------
          Based on my manual calculation, the Pre_avg (-44.71372) is the exact simple average of the displayed coefficients from Tm2 to Tm5.

          Could you kindly clarify the following points regarding the underlying mechanism of this output?
          1. Why is Tm5 displayed when the window is explicitly set to (-4 4)?
          2. Does Tm5 represent "endpoint binning"? Since my panel data traces back to 2011 (meaning that for the 2017 cohort, pre-treatment periods go up to e = -6), does the Tm5 coefficient represent an aggregated/binned average of all early periods (e <= -5)? Or does it strictly represent the isolated relative time e = -5?
          I want to ensure I interpret and report the pre-trend test and the event window correctly in my thesis. Any clarification would be greatly appreciated.

          Thank you very much for your time and your contribution to the Stata community.
          Last edited by YITING HUANG; 24 Mar 2026, 00:20.

          Comment


          • I dont remember complletly, but i think that is because -4 meant (for me) to use 4 periods before treatment (-5 to -2) so you can use -3 if you want up to -4
            and No, window event DOES NOT show binning. It ignores Treatments above or below the shown threshold
            F

            Comment


            • Dear Statalist,

              I am using csdid2 (Callaway & Sant'Anna 2021) with a balanced panel of N = 753,050 individuals observed over 13 years (2011–2023).
              All individuals are eventually treated (first childbirth between 2017–2023); I use the not-yet-treated as the control group (notyet option). The base period is g-1 (universal base, default).

              After estimation, the reported "Number of obs" is 9,036,600, which is exactly 753,050 × 12 — one fewer year per individual than the full balanced panel (9,789,650 = 753,050 × 13).

              My understanding is that the base period (t = -1, i.e., g-1 for each individual) is used as the reference for first-differencing and therefore does not count as an independent estimation period, reducing the reported N by one observation per individual.

              Is this the correct explanation for the discrepancy? Is there any documentation or reference that explicitly describes this behavior?

              I note that a similar pattern appears in published work using csdid with the same base period convention, where t = -1 is omitted from the event-study table entirely, consistent with this interpretation.

              Thank you in advance.

              Comment


              • You are on point. The other approach you can use is to report the number of panel observations rather than Inidividual x time
                Or that would be my suggestion

                Comment


                • Dear FernandoRios

                  Thank you so much for your prompt and helpful reply. I will follow your suggestion and report the number of individuals.

                  Comment


                  • Dear Statalist,

                    I am using csdid with not-yet-treated units as the control group in a large administrative panel, with approximately nine million person-year observations in the gender-specific estimation sample.

                    In the event-study plot, the pre-treatment coefficients look visually close to zero and economically small, but the aggregated pre-treatment average is statistically significant.

                    For example, for fathers' delivery participation:
                    Outcome: Has Delivery Income Pre_avg = -0.0004, p < 0.05 Post_avg = 0.0029, p < 0.01
                    The pre-treatment estimate is about -0.04 percentage points, while the post-treatment effect is about +0.29 percentage points. The post-treatment effects are much larger and increase after childbirth.

                    In this setting, is it reasonable to discuss the parallel trends assumption mainly in terms of visual trajectory and economic magnitude, while explicitly acknowledging that some pre-treatment estimates are statistically significant due to the very large sample size? Are there recommended ways to present this issue when using csdid with large administrative data?

                    Thank you in advance.

                    CSDID_dad_has_del.png

                    Comment


                    • I wonder if your treatment variable is incorrectly identified.
                      So not sure what you mean with "has delivery income". But what if the impact of the treatment starts BEFORE they start to actively recive that income (changing base ). If you set this at t-2 as nontreatment instead of t-1, the pretreatment will be non significant
                      THe question, in that case, is when does the treatment has an effective impact on the outcome. Right away? or few periods before?
                      For example...for a topic I worked on Parenthood. Does the treatment start When the baby is born? or when it is conceived? or when parents start considering having a baby?
                      F

                      Comment


                      • Dear FernandoRios,

                        Thank you very much for your reply. This is very helpful.

                        Just to clarify, “Has Delivery Income” is an outcome variable, defined as an indicator for whether the individual received income from identified food-delivery platforms in a given calendar year. The treatment variable is the year of first childbirth.

                        I think your point about the timing of the effective treatment is very relevant for my setting. Since the outcomes are measured annually, some labor-supply adjustments may begin before the actual birth year, for example during pregnancy or when parents start preparing for childbirth. In that case, the coefficient at t = -1 may partly capture anticipatory behavior rather than a pure violation of parallel trends.

                        Would it be reasonable to interpret and discuss some statistically significant pre-treatment estimates in this way—that is, as potentially reflecting anticipatory labor-supply responses before childbirth, while still presenting the results cautiously?

                        Thank you again for your helpful suggestion.

                        Best regards,
                        YITING

                        Comment


                        • I would re-frame the treatment.
                          The goal of using a pre-treament period is that we can identify a period where outcomes are paralllel because units are not yet treated.
                          Normally we do this using the t-1...but there is nothing to say we cannot use an earlier period (t-2)
                          I think that would be a better approach.
                          (or do an adhoc adjustment, since estimating all numbers may take a bit of time )

                          Comment


                          • Yiting: Here is what I would do. Both csdid and jwdid with the never option estimate separate effects for all possible combinations, you can simply shift the treatment year to be the year before the birth of the first child. This will force the reference period to be two years before the birth of the first child. The year just before becomes a "treated" period and so you can see if there are practical and statistically significant differences in the year before the first birth as a placebo test. With a balanced panel, the estimated treatment effects will be identical to if you just drop the data on the year before the first birth (because then it will use the two years before the first birth as the reference period). To me, it makes sense to keep all of the data, shift back the treatment year, and then see if it makes a difference.

                            One suggestion: jwdid allows for binary outcomes by using a logit model. This means the (conditional) parallel trends assumption is different, and so it's a useful robustness check. You can estimate the effects on the probability of "has delivery income = 1) and compare those with the linear model. You can do this by setting the first "treatment" period to be the year before the birth, or the year of the birth.

                            Comment

                            Working...
                            X