fuzzydid troubleshoot

Noah Braun

Join Date: Apr 2021

Posts: 6
#1

fuzzydid troubleshoot

12 Apr 2021, 13:44

I have data on a binary policy that is aggregated to a neighborhood level, meaning that for each neighborhood I have a variable for the percentage of people who are treated within the group, p_treat. The policy is implemented at the same time for all individuals, and thus all neighborhoods, and I have two pre-periods and two post-periods in my dataset. As all neighborhoods have at least one individual who is treated, I want to implement the fuzzydid command proposed by de Chaisemartin and D'Haultfoeuille.

Since I don't have only two groups and two periods, I know I need to create G_T and G_T1 groups to run this. I implement the code in their help file,

Code:

sort c_id Year by c_id Year: egen mean_treat = mean(p_treat) by c_id: gen lag_mean_treat = mean_treat[_n-1] if c_id == c_id[_n-1] & Year - 2 == Year[_n-1] gen g_t = sign(mean_treat - lag_mean_treat) gen g_t1 = g_t[_n+1] if c_id==c_id[_n+1] & Year +2 == Year[_n+1]

where c_id is the unique id for each neighborhood, and I use "Year -2" and "Year +2" because my data is every other year (2011, 2013, 2015, 2017). If I understand the command correctly, I want it to be that g_t= 1 for each neighborhood in the third period (first treated period) and g_t=0 for all other periods, and I want g_t1 = 1 for each neighborhood in the second period (last period before treatment) and g_t1 = 0 for all other periods. This is what happens when I run their code, so so far so good. I altered my p_treat variable so that for the first two periods it equals 0 to make that run correctly. The issue comes in when I run the fuzzydid command,

Code:

fuzzydid Y g_t g_t1 Year p_treat, did tc cic cluster(c_id)

and get the following output,

Estimator(s) of the local average treatment effect with bootstrapped standard errors. Cluster variable: c_id. Number of observations: 8708 .

| LATE Std_Err t p_value lower_ic upper_ic
-------------+------------------------------------------------------------------
W_DID | . . . . . .
W_TC | . . . . . .
W_CIC | . . . . . .

Apologies for not knowing how to format that better, but essentially I get a blank table. The program does appear to be doing something, as it takes a few moments to run the default 50 bootstrap replications, but isn't working as I would hope.

As I wasn't sure if making p_treat = 0 in the pre-periods was causing the problem, I've re-run the regression but this time I simply created g_t and g_t1 to be as described above. When I run it like this, I get the following error,

Given the data structure, impossible to estimate tc, cic or lqte. Often, this error arises because the treatment takes too many values. It can then be solved using newcateg.

When I run the regression only for the WALD_did estimator, I get the same blank table as before. When I try and utilize newcateg it gives me the same error, even when I divide p_treat into 3 groups. Any help or tips would be appreciated.
Tags: None

Leonard Martin

Join Date: May 2021
Posts: 2

27 May 2021, 08:47

Hi Noah,
I'm having the same issue with Stata 16.1 SE on Linux (Ubuntu).

Have you installed moremata ? fuzzydid depends on that.

Code:

ssc install moremata

The command wasn't working on my data, so I tried to replicate the example provided in this document, at the end of page 16: http://crest.fr/ckfinder/userfiles/f...ydid_Stata.pdf

The data can be found on https://ideas.repec.org/c/boc/bocode/s458549.html by clicking 'download' and then selecting 'turnout_dailies_1868-1928.dta' at the bottom of the page.

Then, I run the following script:

Code:

use "replication_fuzzydid/turnout_dailies_1868-1928.dta", clear

sum pres_turnout numdailies // check that you have the same output as on the paper

gen G1872=(fd_numdailies>0) if (year==1872) & fd_numdailies!=. & fd_numdailies>= 0 & sample==1
sort cnty90 year
replace G1872=G1872[_n+1] if cnty90==cnty90[_n+1] & year==1868

fuzzydid pres_turnout G1872 year numdailies, did tc cic newcateg(0 1 2 45) breps(200) cluster(cnty90)

gen numdailies_bin = (numdailies >= 1)

fuzzydid pres_turnout G1872 year numdailies_bin, lqte breps(200) cluster(cnty90)

but I get the following result:

Code:

(running estim_wrapper on estimation sample)

Bootstrap replications (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200

Estimator(s) of the local average treatment effect with bootstrapped standard errors.  Cluster variable:
cnty90.  Number of observations:  1424 .

             |      LATE    Std_Err          t    p_value   lower_ic   upper_ic
-------------+------------------------------------------------------------------
       W_DID |         .          .          .          .          .          .
        W_TC |         .          .          .          .          .          .
       W_CIC |         .          .          .          .          .          .

I also get empty tables for the other examples provided in the paper.

Could you please give us more info on your Stata version, OS, and try to replicate the same example as I did ?

Comment

Justin Niakamal

Join Date: Aug 2017
Posts: 760

27 May 2021, 09:23

This works on my end

Code:

use "http://fmwww.bc.edu/repec/bocode/t/turnout_dailies_1868-1928.dta", clear


sum pres_turnout numdailies // check that you have the same output as on the paper

gen G1872=(fd_numdailies>0) if (year==1872) & fd_numdailies!=. & fd_numdailies>= 0 & sample==1
sort cnty90 year
replace G1872=G1872[_n+1] if cnty90==cnty90[_n+1] & year==1868

fuzzydid pres_turnout G1872 year numdailies, did tc cic newcateg(0 1 2 45) breps(200) cluster(cnty90)

gen numdailies_bin = (numdailies >= 1)

fuzzydid pres_turnout G1872 year numdailies_bin, lqte breps(200) cluster(cnty90)


Estimator(s) of the local average treatment effect with bootstrapped standard errors.
Cluster variable: cnty90.  Number of observations:  1424 .

             |      LATE    Std_Err          t    p_value   lower_ic   upper_ic
-------------+------------------------------------------------------------------
       W_DID |  .0047699   .0160903   .2964428    .766892  -.0230387   .0377381
        W_TC |  .0266618   .0164816   1.617671   .1057335  -.0021458   .0586236
       W_CIC |  .0133223   .0132744   1.003613   .3155653  -.0116416   .0348834

Code:

. about

Stata/SE 16.1 for Mac (Intel 64-bit)
Revision 06 Apr 2021
Copyright 1985-2019 StataCorp LLC

Also see

https://www.statalist.org/forums/for...mmand-fuzzydid

Comment

Leonard Martin

Join Date: May 2021

Posts: 2
#4

28 May 2021, 06:21

Hi Justin,
Unfortunately your example does not work for me... I also updated my Stata version to the latest but it didn't change anything, the tables are still empty. Could it be a problem on Linux only ?

Code:

. about Stata/SE 16.1 for Unix (Linux 64-bit x86-64) Revision 06 Apr 2021 Copyright 1985-2019 StataCorp LLC Total usable memory: 31.03 GB
Comment
Justin Niakamal

Join Date: Aug 2017

Posts: 760
#5

28 May 2021, 06:50

Possibly. I'm not sure what else it could be. I was able to successfully run the code in #3 on my other machine.

Code:

about Stata/SE 16.1 for Windows (64-bit x86-64) Revision 06 Apr 2021 Copyright 1985-2019 StataCorp LLC
Comment
Peter Ndirangu

Join Date: Jul 2021

Posts: 1
#6

01 Jul 2021, 07:27

I have a problem that I think statistician statisticians here can help me. I am doing data analysis on some cash transfers data where we have a treatment and control group variables. I have used difference in difference in my work. however, at some point, the control variables received some cash transfers although limited which means that this is not a pure control group. My question is, how do I measure the intensity of of the program. Is there a better way than fuzzydid or can you help me better understand fuzzydid in my scenario? thankyou.
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#7

19 Feb 2022, 12:05

Hello,

I study the results of an RCT that tests the behavioral nudge in the area of energy efficiency of residential buildings. I have single-family homes randomly allocated between a treatment group and a control group. The homes in the treatment group received a treatment messaging with their energy bills, and I want to test how the messaging affected their gas consumption.

The treatment began by including the messaging on the February 2018 billing cycle (the intervention was also repeated on the March, April and November billing cycles of 2018). The households in the treatment group were on different billing cycles, so not all treated households received the treatment messaging on the same day. I mean, all the households got their first treatment during the February 2018 billing cycle, but the exact dates they got the treatment were different for the households: for example, the sub-group of the treated households on the billing cycle 1 received the treatment on February 8th, another sub-group on the billing cycle 2 got the treatment on February 17th, etc.

The Stata command fyzzydid could be helpful in my case with multiple treatment start dates. My problem is that I do not fully understand how to apply the command to my setup. Specifically, I am confused about the T and G variables.

fuzzydid Y G T D
Y, outcome: this is the hourly gas consumption of a single-family home.
T, time: a calendar day (I have about a year of pre-treatment hourly energy consumption data, and roughly a year of post-treatment data).
D, treatment: a dummy variable that is equal to 1 if a treated household received the treatment, and 0 otherwise (and it is always 0 for the houses in the control group).
G, group: this is problematic.
I tried using a household ID here, but I got “When only one G variable is specified by the user, that variable must either be equal to 0, 1, or be missing”.
Then I tried to group the treated households into around 25 sub-groups depending on when they received their first treatment. However, I got “With more than two time periods, the forward and backward group identifiers Gb and Gf must be used”. I tried to create those 2 variables as well (G_T and G_Tplus) based on the algorithms described in the Stata journal paper, but those two variables did not really make sense in my case because my D variable is either 0 or 1 {unless I am just missing something, which is very likely}.

I’d appreciate any help in terms of how I could use the command in my setup.
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#8

20 Feb 2022, 06:35

Originally posted by Peter Ndirangu View Post

I have a problem that I think statistician statisticians here can help me. I am doing data analysis on some cash transfers data where we have a treatment and control group variables. I have used difference in difference in my work. however, at some point, the control variables received some cash transfers although limited which means that this is not a pure control group. My question is, how do I measure the intensity of of the program. Is there a better way than fuzzydid or can you help me better understand fuzzydid in my scenario? thankyou.

Are your units randomly assigned to treatment? Is there a clear before and after period? And finally, is there a clear eligibility index and a threshold above or under which a unit is eligible for treatment?
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#9

20 Feb 2022, 06:43

Originally posted by Katherine Adams View Post

Hello,

I study the results of an RCT that tests the behavioral nudge in the area of energy efficiency of residential buildings. I have single-family homes randomly allocated between a treatment group and a control group. The homes in the treatment group received a treatment messaging with their energy bills, and I want to test how the messaging affected their gas consumption.

The treatment began by including the messaging on the February 2018 billing cycle (the intervention was also repeated on the March, April and November billing cycles of 2018). The households in the treatment group were on different billing cycles, so not all treated households received the treatment messaging on the same day. I mean, all the households got their first treatment during the February 2018 billing cycle, but the exact dates they got the treatment were different for the households: for example, the sub-group of the treated households on the billing cycle 1 received the treatment on February 8th, another sub-group on the billing cycle 2 got the treatment on February 17th, etc.

The Stata command fyzzydid could be helpful in my case with multiple treatment start dates. My problem is that I do not fully understand how to apply the command to my setup. Specifically, I am confused about the T and G variables.

fuzzydid Y G T D
Y, outcome: this is the hourly gas consumption of a single-family home.
T, time: a calendar day (I have about a year of pre-treatment hourly energy consumption data, and roughly a year of post-treatment data).
D, treatment: a dummy variable that is equal to 1 if a treated household received the treatment, and 0 otherwise (and it is always 0 for the houses in the control group).
G, group: this is problematic.
I tried using a household ID here, but I got “When only one G variable is specified by the user, that variable must either be equal to 0, 1, or be missing”.
Then I tried to group the treated households into around 25 sub-groups depending on when they received their first treatment. However, I got “With more than two time periods, the forward and backward group identifiers Gb and Gf must be used”. I tried to create those 2 variables as well (G_T and G_Tplus) based on the algorithms described in the Stata journal paper, but those two variables did not really make sense in my case because my D variable is either 0 or 1 {unless I am just missing something, which is very likely}.

I’d appreciate any help in terms of how I could use the command in my setup.

I also had similar problems with this command. fuzzydid really requires you to run the code on pages 13 and 14 of the following paper: https://faculty.crest.fr/xdhaultfoeu...ydid_stata.pdf and adapt to your data, if it is necessary. If you already have then unfortunately I do not know what went wrong...

On a side note, one thing that puzzles me is that you want to run a DiD although assignment to treatment is randomised. How come? You presumably have a reason, I'm simply curious.
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#10

21 Feb 2022, 08:18

Hello Maxence,

Thank you for your reply.

Yes, my identification strategy is an RCT. In order to estimate the results of the RCT, initially I used the difference-in-difference specification: Y = a0 + a1*Tr* Post-tr + time FE + location FE. But then I came across the Chaisemartin et al. study saying that the a1 coefficient will not give me an accurate estimate of the treatment effect if the treatment starts at different times for the treated units. So, I have started digging into the problem more and found out about the fuzzydid Stata command that Chaisemartin et al developed.
Comment
Kevin Michael Frick

Join Date: Nov 2023

Posts: 5
#11

09 Nov 2023, 10:52

Originally posted by Leonard Martin View Post

Hi Justin,
Unfortunately your example does not work for me... I also updated my Stata version to the latest but it didn't change anything, the tables are still empty. Could it be a problem on Linux only ?

Code:

. about Stata/SE 16.1 for Unix (Linux 64-bit x86-64) Revision 06 Apr 2021 Copyright 1985-2019 StataCorp LLC Total usable memory: 31.03 GB

It is indeed a problem on Linux only. Did you manage to solve it somehow?
Comment

Announcement