Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fuzzydid troubleshoot

    I have data on a binary policy that is aggregated to a neighborhood level, meaning that for each neighborhood I have a variable for the percentage of people who are treated within the group, p_treat. The policy is implemented at the same time for all individuals, and thus all neighborhoods, and I have two pre-periods and two post-periods in my dataset. As all neighborhoods have at least one individual who is treated, I want to implement the fuzzydid command proposed by de Chaisemartin and D'Haultfoeuille.

    Since I don't have only two groups and two periods, I know I need to create G_T and G_T1 groups to run this. I implement the code in their help file,

    Code:
    sort c_id Year
    by c_id Year: egen mean_treat = mean(p_treat)
    by c_id: gen lag_mean_treat = mean_treat[_n-1] if c_id == c_id[_n-1] & Year - 2 == Year[_n-1]
    gen g_t = sign(mean_treat - lag_mean_treat)
    gen g_t1 = g_t[_n+1] if c_id==c_id[_n+1] & Year +2 == Year[_n+1]
    where c_id is the unique id for each neighborhood, and I use "Year -2" and "Year +2" because my data is every other year (2011, 2013, 2015, 2017). If I understand the command correctly, I want it to be that g_t= 1 for each neighborhood in the third period (first treated period) and g_t=0 for all other periods, and I want g_t1 = 1 for each neighborhood in the second period (last period before treatment) and g_t1 = 0 for all other periods. This is what happens when I run their code, so so far so good. I altered my p_treat variable so that for the first two periods it equals 0 to make that run correctly. The issue comes in when I run the fuzzydid command,

    Code:
    fuzzydid Y g_t g_t1 Year p_treat, did tc cic cluster(c_id)
    and get the following output,

    Estimator(s) of the local average treatment effect with bootstrapped standard errors. Cluster variable: c_id. Number of observations: 8708 .

    | LATE Std_Err t p_value lower_ic upper_ic
    -------------+------------------------------------------------------------------
    W_DID | . . . . . .
    W_TC | . . . . . .
    W_CIC | . . . . . .
    Apologies for not knowing how to format that better, but essentially I get a blank table. The program does appear to be doing something, as it takes a few moments to run the default 50 bootstrap replications, but isn't working as I would hope.

    As I wasn't sure if making p_treat = 0 in the pre-periods was causing the problem, I've re-run the regression but this time I simply created g_t and g_t1 to be as described above. When I run it like this, I get the following error,

    Given the data structure, impossible to estimate tc, cic or lqte. Often, this error arises because the treatment takes too many values. It can then be solved using newcateg.
    When I run the regression only for the WALD_did estimator, I get the same blank table as before. When I try and utilize newcateg it gives me the same error, even when I divide p_treat into 3 groups. Any help or tips would be appreciated.

  • #2
    Hi Noah,
    I'm having the same issue with Stata 16.1 SE on Linux (Ubuntu).

    Have you installed moremata ? fuzzydid depends on that.
    Code:
    ssc install moremata
    The command wasn't working on my data, so I tried to replicate the example provided in this document, at the end of page 16: http://crest.fr/ckfinder/userfiles/f...ydid_Stata.pdf

    The data can be found on https://ideas.repec.org/c/boc/bocode/s458549.html by clicking 'download' and then selecting 'turnout_dailies_1868-1928.dta' at the bottom of the page.

    Then, I run the following script:
    Code:
    use "replication_fuzzydid/turnout_dailies_1868-1928.dta", clear
    
    sum pres_turnout numdailies // check that you have the same output as on the paper
    
    gen G1872=(fd_numdailies>0) if (year==1872) & fd_numdailies!=. & fd_numdailies>= 0 & sample==1
    sort cnty90 year
    replace G1872=G1872[_n+1] if cnty90==cnty90[_n+1] & year==1868
    
    fuzzydid pres_turnout G1872 year numdailies, did tc cic newcateg(0 1 2 45) breps(200) cluster(cnty90)
    
    gen numdailies_bin = (numdailies >= 1)
    
    fuzzydid pres_turnout G1872 year numdailies_bin, lqte breps(200) cluster(cnty90)
    but I get the following result:

    Code:
    (running estim_wrapper on estimation sample)
    
    Bootstrap replications (200)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..................................................    50
    ..................................................   100
    ..................................................   150
    ..................................................   200
    
    Estimator(s) of the local average treatment effect with bootstrapped standard errors.  Cluster variable:
    cnty90.  Number of observations:  1424 .
    
                 |      LATE    Std_Err          t    p_value   lower_ic   upper_ic
    -------------+------------------------------------------------------------------
           W_DID |         .          .          .          .          .          .
            W_TC |         .          .          .          .          .          .
           W_CIC |         .          .          .          .          .          .
    I also get empty tables for the other examples provided in the paper.

    Could you please give us more info on your Stata version, OS, and try to replicate the same example as I did ?

    Comment


    • #3
      This works on my end


      Code:
      use "http://fmwww.bc.edu/repec/bocode/t/turnout_dailies_1868-1928.dta", clear
      
      
      sum pres_turnout numdailies // check that you have the same output as on the paper
      
      gen G1872=(fd_numdailies>0) if (year==1872) & fd_numdailies!=. & fd_numdailies>= 0 & sample==1
      sort cnty90 year
      replace G1872=G1872[_n+1] if cnty90==cnty90[_n+1] & year==1868
      
      fuzzydid pres_turnout G1872 year numdailies, did tc cic newcateg(0 1 2 45) breps(200) cluster(cnty90)
      
      gen numdailies_bin = (numdailies >= 1)
      
      fuzzydid pres_turnout G1872 year numdailies_bin, lqte breps(200) cluster(cnty90)
      
      
      Estimator(s) of the local average treatment effect with bootstrapped standard errors.
      Cluster variable: cnty90.  Number of observations:  1424 .
      
                   |      LATE    Std_Err          t    p_value   lower_ic   upper_ic
      -------------+------------------------------------------------------------------
             W_DID |  .0047699   .0160903   .2964428    .766892  -.0230387   .0377381
              W_TC |  .0266618   .0164816   1.617671   .1057335  -.0021458   .0586236
             W_CIC |  .0133223   .0132744   1.003613   .3155653  -.0116416   .0348834
      Code:
      . about
      
      Stata/SE 16.1 for Mac (Intel 64-bit)
      Revision 06 Apr 2021
      Copyright 1985-2019 StataCorp LLC
      Also see

      https://www.statalist.org/forums/for...mmand-fuzzydid

      Comment


      • #4
        Hi Justin,
        Unfortunately your example does not work for me... I also updated my Stata version to the latest but it didn't change anything, the tables are still empty. Could it be a problem on Linux only ?

        Code:
        . about
        
        Stata/SE 16.1 for Unix (Linux 64-bit x86-64)
        Revision 06 Apr 2021
        Copyright 1985-2019 StataCorp LLC
        
        Total usable memory: 31.03 GB

        Comment


        • #5
          Possibly. I'm not sure what else it could be. I was able to successfully run the code in #3 on my other machine.

          Code:
           about
          
          Stata/SE 16.1 for Windows (64-bit x86-64)
          Revision 06 Apr 2021
          Copyright 1985-2019 StataCorp LLC

          Comment


          • #6
            I have a problem that I think statistician statisticians here can help me. I am doing data analysis on some cash transfers data where we have a treatment and control group variables. I have used difference in difference in my work. however, at some point, the control variables received some cash transfers although limited which means that this is not a pure control group. My question is, how do I measure the intensity of of the program. Is there a better way than fuzzydid or can you help me better understand fuzzydid in my scenario? thankyou.

            Comment


            • #7
              Hello,

              I study the results of an RCT that tests the behavioral nudge in the area of energy efficiency of residential buildings. I have single-family homes randomly allocated between a treatment group and a control group. The homes in the treatment group received a treatment messaging with their energy bills, and I want to test how the messaging affected their gas consumption.

              The treatment began by including the messaging on the February 2018 billing cycle (the intervention was also repeated on the March, April and November billing cycles of 2018). The households in the treatment group were on different billing cycles, so not all treated households received the treatment messaging on the same day. I mean, all the households got their first treatment during the February 2018 billing cycle, but the exact dates they got the treatment were different for the households: for example, the sub-group of the treated households on the billing cycle 1 received the treatment on February 8th, another sub-group on the billing cycle 2 got the treatment on February 17th, etc.

              The Stata command fyzzydid could be helpful in my case with multiple treatment start dates. My problem is that I do not fully understand how to apply the command to my setup. Specifically, I am confused about the T and G variables.

              fuzzydid Y G T D
              Y, outcome: this is the hourly gas consumption of a single-family home.
              T, time: a calendar day (I have about a year of pre-treatment hourly energy consumption data, and roughly a year of post-treatment data).
              D, treatment: a dummy variable that is equal to 1 if a treated household received the treatment, and 0 otherwise (and it is always 0 for the houses in the control group).
              G, group: this is problematic.
              I tried using a household ID here, but I got “When only one G variable is specified by the user, that variable must either be equal to 0, 1, or be missing”.
              Then I tried to group the treated households into around 25 sub-groups depending on when they received their first treatment. However, I got “With more than two time periods, the forward and backward group identifiers Gb and Gf must be used”. I tried to create those 2 variables as well (G_T and G_Tplus) based on the algorithms described in the Stata journal paper, but those two variables did not really make sense in my case because my D variable is either 0 or 1 {unless I am just missing something, which is very likely}.

              I’d appreciate any help in terms of how I could use the command in my setup.

              Comment


              • #8
                Originally posted by Peter Ndirangu View Post
                I have a problem that I think statistician statisticians here can help me. I am doing data analysis on some cash transfers data where we have a treatment and control group variables. I have used difference in difference in my work. however, at some point, the control variables received some cash transfers although limited which means that this is not a pure control group. My question is, how do I measure the intensity of of the program. Is there a better way than fuzzydid or can you help me better understand fuzzydid in my scenario? thankyou.
                Are your units randomly assigned to treatment? Is there a clear before and after period? And finally, is there a clear eligibility index and a threshold above or under which a unit is eligible for treatment?

                Comment


                • #9
                  Originally posted by Katherine Adams View Post
                  Hello,

                  I study the results of an RCT that tests the behavioral nudge in the area of energy efficiency of residential buildings. I have single-family homes randomly allocated between a treatment group and a control group. The homes in the treatment group received a treatment messaging with their energy bills, and I want to test how the messaging affected their gas consumption.

                  The treatment began by including the messaging on the February 2018 billing cycle (the intervention was also repeated on the March, April and November billing cycles of 2018). The households in the treatment group were on different billing cycles, so not all treated households received the treatment messaging on the same day. I mean, all the households got their first treatment during the February 2018 billing cycle, but the exact dates they got the treatment were different for the households: for example, the sub-group of the treated households on the billing cycle 1 received the treatment on February 8th, another sub-group on the billing cycle 2 got the treatment on February 17th, etc.

                  The Stata command fyzzydid could be helpful in my case with multiple treatment start dates. My problem is that I do not fully understand how to apply the command to my setup. Specifically, I am confused about the T and G variables.

                  fuzzydid Y G T D
                  Y, outcome: this is the hourly gas consumption of a single-family home.
                  T, time: a calendar day (I have about a year of pre-treatment hourly energy consumption data, and roughly a year of post-treatment data).
                  D, treatment: a dummy variable that is equal to 1 if a treated household received the treatment, and 0 otherwise (and it is always 0 for the houses in the control group).
                  G, group: this is problematic.
                  I tried using a household ID here, but I got “When only one G variable is specified by the user, that variable must either be equal to 0, 1, or be missing”.
                  Then I tried to group the treated households into around 25 sub-groups depending on when they received their first treatment. However, I got “With more than two time periods, the forward and backward group identifiers Gb and Gf must be used”. I tried to create those 2 variables as well (G_T and G_Tplus) based on the algorithms described in the Stata journal paper, but those two variables did not really make sense in my case because my D variable is either 0 or 1 {unless I am just missing something, which is very likely}.

                  I’d appreciate any help in terms of how I could use the command in my setup.
                  I also had similar problems with this command. fuzzydid really requires you to run the code on pages 13 and 14 of the following paper: https://faculty.crest.fr/xdhaultfoeu...ydid_stata.pdf and adapt to your data, if it is necessary. If you already have then unfortunately I do not know what went wrong...

                  On a side note, one thing that puzzles me is that you want to run a DiD although assignment to treatment is randomised. How come? You presumably have a reason, I'm simply curious.


                  Comment


                  • #10
                    Hello Maxence,

                    Thank you for your reply.

                    Yes, my identification strategy is an RCT. In order to estimate the results of the RCT, initially I used the difference-in-difference specification: Y = a0 + a1*Tr* Post-tr + time FE + location FE. But then I came across the Chaisemartin et al. study saying that the a1 coefficient will not give me an accurate estimate of the treatment effect if the treatment starts at different times for the treated units. So, I have started digging into the problem more and found out about the fuzzydid Stata command that Chaisemartin et al developed.

                    Comment


                    • #11
                      Originally posted by Leonard Martin View Post
                      Hi Justin,
                      Unfortunately your example does not work for me... I also updated my Stata version to the latest but it didn't change anything, the tables are still empty. Could it be a problem on Linux only ?

                      Code:
                      . about
                      
                      Stata/SE 16.1 for Unix (Linux 64-bit x86-64)
                      Revision 06 Apr 2021
                      Copyright 1985-2019 StataCorp LLC
                      
                      Total usable memory: 31.03 GB
                      It is indeed a problem on Linux only. Did you manage to solve it somehow?

                      Comment

                      Working...
                      X