Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Fernando,

    Thank you for your reply!
    You can access the data here. (The `var' variable should be changed to cell_counter in the previous codes.)
    I did check that I have the latest versions of csdid and drdid.

    Thank you,
    Gabor

    Comment


    • Dear FernandoRios ,

      Thanks a lot for your answer, no worries, I thought so!

      I would be very grateful if you have any advice on the following issue. I am studying the effect of a change in a firm's legal form on the adoption of a new technology.

      My main code is this:

      Code:
      csdid technology, ivar(firm_id) time(year) gvar(legal_form) method(drimp) notyet
      estat all
      I am using the latest versions of csdid and drdid (1.57 and 1.67).

      The tab between year and the gvar (legal_form) is the following (and similarly for the years after 1880 in the top row; I didn't want to include a second screenshot but I am happy to do so if necessary):
      Click image for larger version

Name:	Screenshot 2022-07-07 114406.png
Views:	1
Size:	151.3 KB
ID:	1672557



      If I do this, I get estimates for g1865 but not for g1866 and onwards. They are all omitted. It looks like this:

      Click image for larger version

Name:	Screenshot 2022-07-07 114812.png
Views:	1
Size:	233.6 KB
ID:	1672558



      However, I then installed an earlier version of drdid. Specifically, I ran "net install csdid, from ("https://raw.githubusercontent.com/friosavila/csdid_drdid/main/code/") replace" which I found on your website. This installs version 1.63 of drdid. Now, the g's look very much different and I also get pre-treatment averages (and effects by year).
      Click image for larger version

Name:	Screenshot 2022-07-07 115219.png
Views:	1
Size:	345.7 KB
ID:	1672559



      Additionally, when I am using the latest version of drdid and drop the first year, the program then estimates the g's for the new first year (so g1866) but not for the following years.

      So I am wondering whether you have any idea what's going on here.

      Thank you very much for your help!
      Last edited by Leon Schmidt; 07 Jul 2022, 03:58.

      Comment


      • Dear Fernando,

        Thank you so much for your help so far. I have yet another question regarding the post-estimation output, and I apologize because I am sure my question is trivial, but: I don't understand what estat cevent, window(t1 t2) is exactly. The help file says that this command estimates censored event averages. Then, it further explains that it estimates the average across all ATTGT's that correspond to periods between t1 and t2, inclusive. But I cannot figure out what ATTGT's are being averaged.

        As an example, with the data you kindly provide in the help file, https://friosavila.github.io/playing...rdid/mpdta.dta,

        estat cevent, window(-3 -1)

        has as output
        ATT for events between -3 -1
        Event Study:Aggregate effects
        ------------------------------------------------------------------------------
        | Coefficient Std. err. z P>|z| [95% conf. interval]
        -------------+----------------------------------------------------------------
        ATTC | -.0023082 .0075001 -0.31 0.758 -.0170082 .0123917
        ------------------------------------------------------------------------------


        which I seem unable to obtain neither from the output of estat event, which is

        ATT by Periods Before and After treatment
        Event Study:Dynamic effects
        ------------------------------------------------------------------------------
        | Coefficient Std. err. z P>|z| [95% conf. interval]
        -------------+----------------------------------------------------------------
        Pre_avg | -.0000442 .0075204 -0.01 0.995 -.014784 .0146955
        Post_avg | -.0803539 .0189576 -4.24 0.000 -.1175101 -.0431978
        Tm3 | .0267278 .0140657 1.90 0.057 -.0008404 .054296
        Tm2 | -.0036165 .0129283 -0.28 0.780 -.0289555 .0217226
        Tm1 | -.023244 .0144851 -1.60 0.109 -.0516343 .0051463
        Tp0 | -.0210604 .0114942 -1.83 0.067 -.0435886 .0014679
        Tp1 | -.0530032 .0163465 -3.24 0.001 -.0850417 -.0209647
        Tp2 | -.1404483 .0353782 -3.97 0.000 -.2097882 -.0711084
        Tp3 | -.1069039 .0328865 -3.25 0.001 -.1713602 -.0424476
        ------------------------------------------------------------------------------


        nor from the raw DiD estimates, below,

        Difference-in-difference with Multiple Time Periods

        Number of obs = 2,500
        Outcome model : least squares
        Treatment model: inverse probability
        ------------------------------------------------------------------------------
        | Coefficient Std. err. z P>|z| [95% conf. interval]
        -------------+----------------------------------------------------------------
        g2004 |
        t_2003_2004 | -.0145297 .0221292 -0.66 0.511 -.057902 .0288427
        t_2003_2005 | -.0764219 .0286713 -2.67 0.008 -.1326166 -.0202271
        t_2003_2006 | -.1404483 .0353782 -3.97 0.000 -.2097882 -.0711084
        t_2003_2007 | -.1069039 .0328865 -3.25 0.001 -.1713602 -.0424476
        -------------+----------------------------------------------------------------
        g2006 |
        t_2003_2004 | -.0004721 .0222234 -0.02 0.983 -.0440293 .043085
        t_2004_2005 | -.0062025 .0184957 -0.34 0.737 -.0424534 .0300484
        t_2005_2006 | .0009606 .0194002 0.05 0.961 -.0370631 .0389843
        t_2005_2007 | -.0412939 .0197211 -2.09 0.036 -.0799466 -.0026411
        -------------+----------------------------------------------------------------
        g2007 |
        t_2003_2004 | .0267278 .0140657 1.90 0.057 -.0008404 .054296
        t_2004_2005 | -.0045766 .0157178 -0.29 0.771 -.0353828 .0262297
        t_2005_2006 | -.0284475 .0181809 -1.56 0.118 -.0640814 .0071864
        t_2006_2007 | -.0287814 .016239 -1.77 0.076 -.0606091 .0030464
        ------------------------------------------------------------------------------
        Control: Never Treated


        I hope you can help me and I am sorry for bothering you again!

        Comment


        • Originally posted by Gabor Mugge View Post
          To add: the following error message also appears after calling 'estat, event' with the 'long2' option specified beforehand:
          Code:
           csdid_event(): 3301 subscript invalid
          <istmt>: - function returned error
          Hi Gabor
          Sorry it took this long to answer.
          So, I think the problem may be because of a different way I m implementing drimp now. So, there are two options
          1) because you have no covariates, you would do better using method(reg).
          2) if you add covariates, you could use method(dripw), although you do not have enough observations to add more than 1 or 2 controls to your model.

          Hope this helps
          F

          Comment


          • Hi Leon
            I think this has to do with a different way i m estimating drimp. Which is very sensitive, in addition, in earlier versions i wasn't using the correct control groups when using pre-treatment effects and when you had no never-treated observations.

            Now, if you are using no controls, you would do better and use method(reg) or method(dripw) if you use controls
            HTH

            Comment


            • Hi Georgina
              When you use "cevent, window(#1 #2)" you are simply getting the weighted average of all ATTGTs between #1 and #2
              for example, for the -3 to -1 gets the average of all pretreatment ATTGTs but weighted based on how many observations were used in that ATTGT.
              you can see the weights i use if you type "ereturn display"

              Now, the difference with pre_avg and post_avg is that those are average that give the same weight to Tm3 Tm2 and Tm1.

              Hope this helps
              Fernando

              Comment


              • Thanks a lot, Fernando for this explanation!

                So just to better understand, the csdid command estimates and averages ATTGTs but the user can choose different methodologies to do so (and traditional OLS is one of them for estimating the ATTGTs)? Is there any reference that formally discusses what you wrote above (use OLS when you have no controls)?

                Thanks again!

                Comment


                • Hi Leon
                  Yes, one can choose different methodologies, but OLS (as is usually implemented as a TWFE) is NOT one. The way it does, is kind of estimating a OLS for each state (treated untreated X before and After), and get the DID as usual, using the predicted values.

                  Formal references, refer to Sant'Anna and Zhao (2020), where he explains all estimators, and you can easily derive that all of them collapse into the standard 2x2 DID when there are no controls
                  F

                  Comment


                  • Hi FernandoRios
                    I am reachingout to you in-regards to your csdid stata command in implementing staggered treament. My problem is that when I ran the code, get back omitted results (see attched pdf)
                    I wanted to improve on my analysis using the new developments in DiD aside from my preliminary analysis which used the following code:
                    reghdfe edyrtotal interactionspost_f5 [pweight= perweight] if byear>1983 & compprimSample==1, abs(ethnicityug religion distrikt birthyear) vce(cl clustervar)

                    Instead, I run the following code:
                    ​ csdid edyrtotal interactionspost_f5 if byear>1983 & compprimSample==1, time(birthyear) gvar(first_treat ) method(dripw) reps(20) cluster(clustervar)

                    Note: data (link) used is a single survey wave treated as an rc with treated individuals born 1990 - 1997 (see first_treat var) and control born 1984 -1989

                    Do not know what I am getting wrong in my setup and any guidance you might offer is highly appreciated.
                    Attached Files

                    Comment


                    • Hi Doug
                      I think the problem is that you are using a setup that is not compatible with csdid.
                      Basically, with reghdfe, you are including the treatment post interaction, because that calculates the ATT
                      with csdid, that is the same as using NO controls. By trying to add this, it creates multicolinearity problems causing it to crash, just as you report.
                      HTH

                      Comment


                      • Hi FernandoRios
                        Thanks for the feedback.
                        How do you reckon I should proceed in setting up the specification to take advantage of CSDID?
                        Note: my treatment/control indicator var is Post while treatment is non-binary (using an intensity measure - pIntenf5 and is by geographic area).
                        Treatment itself can be staggered based on cohorts or not (all born after 1989 are eligible for treatment)

                        Comment


                        • Hi Fernando,

                          I was wondering how the reported ‘number of obs’ is calculated for the csdid command? I need to run additional analysis on the exact sample used in the csdid regression, so after the csdid command I ran ‘keep if e(sample)==1’ ,which keeps exactly as many observations as ‘number of obs’ indicated.

                          But I noticed that many observations were missing from the resulting sample: in particular, all the in-sample observations were from pre-treatment cohort-years only, though I can see from the output that post-treatment observations were also used. (To confirm this, I re-ran the csdid command only on the observations that were in-sample, and got very different results.)

                          Would really appreciate any help on this, thanks!
                          Last edited by Zara Contractor; 21 Jul 2022, 18:03.

                          Comment


                          • Hi FernandoRios,

                            Following up on Zara's question, the attached toy data (in which I use the names of mpdta.dta although it is NOT mpdta.dta) emphasizes the problem she pointed out.

                            In particular, the code

                            use "toy_mpdta.dta", clear

                            csdid lemp lpop lpop_sq, ivar(countyreal) time(year) gvar(first_treat) method(dripw) notyet
                            * Number of obs = 409 observations

                            keep if e(sample)==1
                            * but if we keep the 409 observations,
                            csdid lemp lpop lpop_sq, ivar(countyreal) time(year) gvar(first_treat) method(dripw) notyet
                            * Number of obs = 363 and the results change


                            leads to different estimates. Unlike Zara, I do not see that the in-sample observations were coming ONLY from pre-treatment cohort-years. However, I also need to run additional analysis on the exact sample used in the csdid regression and I don't know how to proceed.

                            Thank you so much for your thorough responses to all previous queries and any guidance with this would be extremely helpful.
                            Attached Files
                            Last edited by noriko amanop; 25 Jul 2022, 10:36.

                            Comment


                            • Hi Noriko
                              Thank you for the replicable example. It took some time but I found the reason for this. Unfortunately, I don't see how to fix it within the code, other than doing additional cross checks in the data (making the code slower)
                              so here is the problem.
                              1) you have an extremely unbalanced dataset. Which is normally not a problem, because csdid uses locally balanced data.
                              2) The problem: Your data is Badly balanced.
                              What I mean with this is that for many (possible all??) of your observations do not observe your units "WHEN" they were treated

                              Code:
                              year    unit_id    countyreal    first_treat    lpop    lpop_sq    lemp
                              2006    6865    266    1998    5.379897    28.94329    5.990784
                              2010    550632    266    1998    4.394449    19.31118    18.51852
                              Consider the case above with countyreal=266. Based on your data this unit was treated in 1998, but you only observe it in 2006 and 2010. Technically, this data is unusable for DID.
                              You will also see that your data is not panel either. If you try to do "xtset countyreal year" it will give you a warning. CSDID isn't catching that.

                              So, 1), make your data to be a panel, or treated as repeated crossection
                              2) make sure that if your panel, your units HAVE to be observed at the year of the treatment, and the year Before treatment. Otherwise, the results will not make sense.

                              HTH
                              Fernando


                              Comment


                              • Thank you so much, Fernando. This makes a lot of sense.

                                In my actual data, I do observe all units the year they were treated... However, I may not be able to observe them all the year before...

                                I will try to think of ways to address this, but I really appreciate your help. Thank you!!

                                Comment

                                Working...
                                X