Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences between results from 'csdid' command and 'did' package in R

    Dear @FernandoRios,

    Sorry to bother! I'm hoping you might be able to help me with an issues I'm having with the 'csdid' command in Stata.

    I'm trying to implement the Callaway & Sant'Anna estimator for staggered differences-in-differences design. I have used the R package 'did' with the function 'attgt()', as well as the Stata function 'csdid'. I have about 12 different outcome variables. Essentially I can't make the results converge: for some outcome variables, results are very similar, but for others they are quite different.

    I've attached a dataset for one variable. I'm using the Stata command:

    Code:
    csdid tempO ln_GNI_pc ln_wdi_pop, ivar(ccode) time(year) gvar(firstZyear) method(dripw)
    estat group, post
    and the R function:

    Code:
    att_gt(yname = "tempO",
                      tname = "year",
                      idname = "ccode",
                      gname = "firstZyear",
                      data = raw, 
                      xformla = ~ln_GNI_pc+ln_wdi_pop+1
      ) 
     aggte(attgt, type="group", na.rm=TRUE)
    The results are a bit different and I just can't work out why. I've also experimented with the 'notyet' and 'asinr' options, which do change Stata results a bit but still aren't the same as in R. I've also experimented with all of the different 'method()' options, but again results don't converge.

    Do you have any suggestions?

    Thanks a lot!
    Rory
    Attached Files

  • #2
    Hard to say. Is the data balanced?
    what happens if you use reg as the method (outcome regression)
    are the problems for all pre and post atts?
    could you replicate this using the example dataset?

    Comment


    • #3
      Hi Fernando,

      Thanks for replying and I'm sorry for the delay in getting back to you!

      Yes, the data is balanced.

      So when I use reg for both R and Stata, the point estimates are the same actually! (although the standard errors are a little different).

      When both are set to 'ipw' or doubly robust, point estimates are different, including group averages and dynamic averages (post ATT I mean - the output from attgt() in R doesn't seem to show pre ATTs?)

      I tried to replicate the problem with the example dataset, but in that case the R/Stata output does align.

      Maybe below screenshots will help though. When set to doubly robust ('dripw' in the Stata) and I use 'estat group' (which is my aggregation of interest): the group estimates for 3 cohorts are identical between R and Stata. Only for the 2001 cohort, Stata makes an estimate while R is all 'NAs' in the output. R then gives overall average of 0.37 while Stata gives 'omitted'.

      Thanks again for your help!

      Click image for larger version

Name:	R output.png
Views:	1
Size:	23.7 KB
ID:	1759862

      Click image for larger version

Name:	Stata output.png
Views:	1
Size:	60.9 KB
ID:	1759863

      Comment


      • #4
        ok that gives the clue
        you see how Stata produces a 2001 result? but not in R? I think there are other incode decisions regarding how to use or not use data, that may be explaining the differences.
        So, unfortunately, there is nothing that can be done about it other than making an in-depth exploration for each 2x2 case, and see where differences arise.

        Comment


        • #5
          Thanks for this. Could you explain how I'd do that? Would I have to go into the code for each command?

          Comment


          • #6
            I've been running the same model in R and Stata using did and csdid (or csdid2), and I'm seeing different results as well — especially in the event-study plots. This is worrying from a replicability standpoint, since it suggests that the choice of software can affect conclusions even with the same specification and data. Is there any right or wrong in these different results?
            Last edited by Livia Almeida; 21 May 2025, 09:04.

            Comment


            • #7
              Can you share more of the setup?
              one of the things that could make a difference was that r’s version handled small cohorts differently than Stata csdid2.
              if you are having very different results may be due to specificity of your data
              (time varying convariates, small cohorts, overspecification etc)
              without more info it’s hard to say which is causing this

              Comment


              • #8
                Dear Fernando,

                1. Could you please explain why 2001 result in Stata but not in R? How can I make ATE for treatment year NULL?

                Originally posted by FernandoRios View Post
                ok that gives the clue
                you see how Stata produces a 2001 result? but not in R? I think there are other incode decisions regarding how to use or not use data, that may be explaining the differences.
                So, unfortunately, there is nothing that can be done about it other than making an in-depth exploration for each 2x2 case, and see where differences arise.
                2. I would also appreciate if you can tell me whether the two commands below are running exactly the same analysis for repeated cross sectional data? I see very tiny differences in confidence intervals like a 0.001 difference. As we do not explicitly state nature of the data as repeated cross section I want to be sure. Thanks in advance!

                Code:
                 csdid depression gender migration, ///
                  cluster(education) time(year) ///
                  gvar(first_treat) method(dripw) long
                Code:
                m1 <- att_gt(
                  yname = "depression",
                  gname = "first_treat",
                  idname = "id_num",
                  tname = "year",
                  xformla = ~ gender + migration,
                  data = sample_18,
                  panel = FALSE,
                  est_method = "dr",
                  base_period = "universal",
                  control_group = c("nevertreated"),
                  clustervars = c("education")
                )

                Comment


                • #9
                  1. Almost. Universal should be equivalent to long2
                  2. SE in Stata are the asymptotic. not Bootstrap, you need wboot to call for those.
                  in R, Wboot are the default.
                  3. Most likely 2001 is very small.

                  Comment


                  • #10
                    Dear Fernando,

                    Thank you for your reply. I have a follow-up question on clustering. I'm encountering a puzzling issue with the csdid command where clustering produces standard errors that are much smaller than non-clustered standard errors,

                    Code:
                    // CSDID without clustering
                    csdid depression [controls], time(academic_year) gvar(first_treat) method(dripw) long2
                    // SE ≈ 0.000064
                    
                    // CSDID with clustering  
                    csdid depression [controls], time(academic_year) gvar(first_treat) method(dripw) cluster(higher_edu_year) long2
                    // SE ≈ 0.000007 (10x smaller)
                    What I've verified:
                    1. Clustering variable is correct: 10 balanced clusters with ~55k observations each
                    2. Standard OLS behaves normally:
                    Code:
                    reg depression treatment, robust // SE = 0.0002519 
                    reg depression treatment, cluster(higher_edu_year) // SE = 0.0005737 (appropriately larger)
                    Questions:
                    1. Is this a known issue with csdid clustering implementation with repeated cross sectional data?
                    2. Could this be related to the "few clusters" problem (only 10 clusters)?
                    Any insights would be greatly appreciated.

                    Best regards,
                    Nursena

                    Comment

                    Working...
                    X