Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dear @FernandoRios,

    Excuse me for my lack of knowledge of the csdid and post-estimation processes. I'm learning
    Please help:

    with csdid STATA commands using firm-level panel data and not-yet-treated as the control group. My data is from 2009-2020 and treatment at various times/ staggered adoption.
    - I need to understand by the form of example commands, how do I differentiate between "Conditional and Unconditional" PTA? and how to apply it with my commands. e.g. csdid loglabprod logcapital loglabour lograwmaterials , ivar(firm_id) time(year) gvar(first_treat) notyet. Where would conditional PTA fit in this equation? How do I show which PTA is in the model?

    - Also, How do I deal with respective PTA violations where not-yet-treated control is consent? (I'm reading papers by Rambachan & Roth, 2023 and Ryan, Kontopantelis, Linden, and Burgess, 2019) to try and understand. However, the data for the control group used for simulation is never treated, whereas I use not-yet-treated.
    - I also read about the "CSDID2", are the guidelines commands for those available on STATA and how different it is from to first CSDID? Would this help me in anyhow?


    again, my apologies for these loaded questions. But I'm hoping to get clarity or guidance on a solution.

    Thanks and much appreciated.
    Last edited by Kagiso Matswalela; 25 Mar 2024, 13:18.

    Comment


    • Hi Kagiso
      1. If you add any controls to the model (here logcapital loglabor and lograwmaterials) you are already using conditional PTA. If you add no controls, then its the Unconditional PTA
      2. If PTA fails, it fails. However, there are other methods that allow for some violations of PTA adapting it before moving into estimating treatment effects. CSDID does not have any built in feature for that
      3. csdid2 works just as csdid. to show results you need to type estat event, estat attgt, estat group, etc
      HTH

      Comment


      • Hi Fernandos

        Thank you.
        So these other methods you speak of would be the DID with propensity score matching; and single/multiple-group interrupted time-series analysis. As stated in Ryan et al. 2019 "Coping with non-parallel trends in difference-in-differences analysis". and to those alternative approaches stated in Roth (2020). "Pre-test with Caution: Event-study Estimates After Testing for Parallel Trends"?.

        I'm just trying to figure out how I can explore some alternative options without having to redo or change my research approach.

        Thanks for everything.

        Comment


        • Hi Fernando

          I work for the National Health Service (NHS) in England and we are using csdid to assess the impact of a specific intervention at NHS acute hospitals on patient outcomes. The rollout of the intervention was staggered over time making csdid a useful approach for us. We have a very small set of control providers depending on the time period analysed – the preferred time period of analysis would leave only a single provider in the control set – and so we are using the not yet treated option.

          The models run and we're able to produce interesting results, but the natural experiment which gave us our dataset has some peculiarities, and it would be very helpful to make sure we are using the model correctly given our unusual dataset:

          1) The intervention at the first treatment provider occurred must earlier than the next treated provider. This means that a large part of our 60 months post implementation from the csdid model is driver by a single early implementing provider. To avoid our results being largely driven by a single provider, we’ve used the censored event and truncated at 20 months post-implementation as per the code below:
          Code:
          csdid LOS, ivar(Group_ID) time(Calendar_Month) gvar(Group_Var) notyet agg(event) saverif(mod) wboot replace rseed(123456)
          use mod, clear
          estat pretrend
          csdid_stats cevent, wboot window(-80 20) rseed(123456)
          Unfortunately we can’t share the underlying data or details, but is there anything obviously problematic with the above? Is it appropriate to say that the ATTC the average treatment effect on the treated for the first 20 periods post-implementation?

          2) We also get a peculiar result when graphing our results with csdid_plot: the confidence intervals become impossibly large. When we drop the first and the last treatment groups – which coincidentally have only a single treated provider in each – then the confidence intervals on the plot become normal again. Is this a known issue with a solution?

          Many thanks

          Andrew
          Last edited by Andrew Sylvester; 02 Apr 2024, 04:39.

          Comment


          • Hi FernandoRios

            An update on question 2 in my above post, we're able to get meaningful graphs when we use the Rademacher option for wild bootstrap type applied to the model. Is the Rademacher option appropriate to use when sample sizes are small per group?

            Many thanks

            Andrew

            Comment


            • Hello I am attempting to use the csdid command for my paper and have run in to some issues.

              I am trying to estimate the effects of the introduction of carbon taxation in119 countries from 1989 to 2019. Since I am dealing with panel data containing multiple countries receiving treatment at different time periods in the dataset, I thought csdid would be appropriate.

              When I try to run csdid in stata the output gives 0 observations.

              the code I have run is as follows:
              csdid lco2 co2price, time(year) gvar(gvar) ivar(id)

              lco2 = log of co2 emission
              co2price = Average price on emissions covered by a carbon tax
              gvar = year for a given countries first year of treatment, otherwise 0 if never treated.
              id = numeric country id
              year = year (1989-2019)

              control : never treated


              first lines of stata output:
              ---------------------------------
              xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              Difference-in-difference with Multiple Time Periods

              Number of obs = 0
              Outcome model : least squares
              Treatment model: inverse probability
              ------------------------------------------------------------------------------
              | Coefficient Std. err. z P>|z| [95% conf. interval]
              -------------+----------------------------------------------------------------
              g1990 |
              t_1989_1990 | 0 (omitted)
              t_1989_1991 | 0 (omitted)

              etc..
              ----------------------------------


              Any help would be greatly appreciated!

              Kind Regards,
              Sebastian


              Comment


              • basic question. do you have drdid installed?
                also, can you tab year gvar?

                Comment


                • I do have drdid installed

                  here is tab year gvar, I hope that it is somewhat legible

                  Year | 0 1990 1991 1992 2008 2010 2011 2012 2013 2014 2015 2017 2018 2019 | Total
                  -----------+----------------------------------------------------------------------------------------------------------------------------------------------------------+----------
                  1989 | 118 0 0 0 0 0 0 0 0 0 0 0 0 0 | 118
                  1990 | 117 1 0 0 0 0 0 0 0 0 0 0 0 0 | 118
                  1991 | 115 1 2 0 0 0 0 0 0 0 0 0 0 0 | 118
                  1992 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  1993 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  1994 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  1995 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  1996 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  1997 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  1998 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  1999 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2000 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2001 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2002 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2003 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2004 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2005 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2006 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2007 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
                  2008 | 113 1 2 1 1 0 0 0 0 0 0 0 0 0 | 118
                  2009 | 113 1 2 1 1 0 0 0 0 0 0 0 0 0 | 118
                  2010 | 112 1 2 1 1 1 0 0 0 0 0 0 0 0 | 118
                  2011 | 111 1 2 1 1 1 1 0 0 0 0 0 0 0 | 118
                  2012 | 109 1 2 1 1 1 1 2 0 0 0 0 0 0 | 118
                  2013 | 108 1 2 1 1 1 1 2 1 0 0 0 0 0 | 118
                  2014 | 106 1 2 1 1 1 1 2 1 2 0 0 0 0 | 118
                  2015 | 105 1 2 1 1 1 1 2 1 2 1 0 0 0 | 118
                  2016 | 105 1 2 1 1 1 1 2 1 2 1 0 0 0 | 118
                  2017 | 103 1 2 1 1 1 1 2 1 2 1 2 0 0 | 118
                  2018 | 102 1 2 1 1 1 1 2 1 2 1 2 1 0 | 118
                  2019 | 100 1 2 1 1 1 1 2 1 2 1 2 1 2 | 118
                  -----------+----------------------------------------------------------------------------------------------------------------------------------------------------------+----------
                  Total | 3,461 30 58 28 12 10 9 16 7 12 5 6 2 2 | 3,658







                  Comment


                  • Thatis our answer. GVAR is incorrectly defined.
                    Gvar has to be such that its value is constant across all years...it cannot go from 0 to the year imputed.
                    Do something like
                    by country:egen mgvar = max(gvar)

                    Comment


                    • That seems to have fixed it thank you kindly!

                      Comment


                      • Also no controls! You don’t have enough data to add any controls

                        Comment


                        • Dear FernandoRios

                          I hope this message finds you well. I am reaching out with a question on csdid application, to expand on an existing question asked by my colleague Andrew Sylvester in this forum. We are conducting our analysis in collaboration with Dr. Giuseppe Moscelli, and are facing two main challenges and would greatly appreciate your insights.

                          Context and Challenges:

                          1. Bootstrap Analysis:
                          - When bootstrapping our model across the entire analytical period using the `notyet` csdid approach, we encounter very wide and symmetrical confidence intervals.
                          - Conversely, when we specify `rademacher` as our wildbootstrap type, the confidence intervals significantly narrow and become non-symmetrical. This variance might stem from having an early adopter at the start and a late adopter at the end of our period. Removing these two providers results in more reasonable, non-symmetrical intervals using the default bootstrap method. Dropping only one does not resolve the issue.

                          2. Desired Analysis Range:
                          - We aim to focus our analysis on a specific timeframe (month -79 to month 20) and exclude data outside this range from our csdid plots. Despite trying `addplot`, adjusting csdid settings, and using the graph editor, we only managed to achieve this by excluding event time values outside our desired range, which led to an unbalanced panel due to differing calendar months remaining in our dataset for each provider. The results, although slightly different to the cevented results on the model that retains all data, seem usable and allow for the default bootstrapping approach.

                          Questions:

                          1. Could you share your thoughts on why the `rademacher` type might yield more realistic intervals in our full-provider model and whether it’s advisable to use this method for our main results?
                          2. Given the unbalanced panel when we restrict our analysis to the desired months, do you think it's acceptable to use these results, or should we consider an alternative approach? This is the only approach we have found to date that allows us to generate our desired graphical output, but I have some concerns around the implications of the panel now being unbalanced.

                          Thank you very much for your time and expertise. I apologise for the inability to share code due to data confidentiality.

                          Kind regards,

                          Toby

                          Comment


                          • Hi Toby
                            Couple of thoughts.
                            1. Im not sure about what is happening with Rademacher WBootstrap. I usually simply apply the default WB options. Have not played around with other Noise multipliers, however, for similar applications, i have played with other options , and they all produce rather similar results.
                            In other words, I have my doubts that the problem is caused by the choice of multiplier.

                            It may be that the problem is due to "small" groups used for the estimation of CI. In which case, it is advisable to drop the events that are far in the past of far in the future.

                            2. So for the event studies constraints you have two options.

                            a) As you say, restrict the data to cover only specific periods
                            b) Estimate using all data, but restrict the events estimation only.
                            The later could be done using
                            estat event, window(#1 #2)

                            3. Since you are using so many periods, it may be that CSDID is slow. I would suggest to install csdid2, which has the same syntax, except that you need to explicitly use "estat event" or other aggregations to see the results, To install it type:

                            net install csdid2, from("https://raw.githubusercontent.com/friosavila/stpackages/main")
                            HTH

                            Comment


                            • Thanks FernandoRios, this is very helpful.

                              We will proceed by reporting the outputs of our unbalanced panel, as this also allows us to utilise the default WB type.

                              Would you mind answering one follow up question we have?

                              As our analysis uses the notyettreated csdid model, we are looking to limit our pre and post periods to ensure that we retain a minimum of 5 providers at each Event_Time in our analysis. We run the below lines to ensure this happens.

                              drop if Event_Time > 20
                              drop if Event_Time <-79

                              Below is the code we then run on our remaining unbalanced panel

                              csdid `depvar', ivar(Group_ID) time(Calendar_Month) gvar(Group_Var) notyet agg(event) saverif(mod) wboot replace rseed(123456)
                              use mod, clear
                              estat pretrend
                              csdid_stats event, wboot rseed(123456)

                              We then plan to report the outputs shown.

                              Is there a more optimal way for us to specify our model to ensure that we retain and utilise as many providers as possible in our remaining dataset? We have not previously applied and reported results from an unbalanced panel.

                              Thanks again for your help.

                              Kind regards,

                              Toby













                              Comment


                              • Dear @FernandoRios,

                                I have a question regarding the overlap assumption in csdid.

                                In my setup, I have a panel that includes all U.S. counties for the period 2003-2019. The outcome variable is employment. I am also adding some covariates (total population and other characteristics). Around 200 of these counties receive treatment in various years throughout this period, in a way that there are treated counties in each year.

                                However, I would like to know if it is possible to add state fixed effects in this setting. If there are states that, in a given year, do not have any treated counties, would this violate the overlap assumption?

                                Also, there is no need to add time fixed effects in this setting, is there?

                                Thank you in advance!

                                Comment

                                Working...
                                X