Problems with first-stage estimates in Fuzzy RD with rdrobust

Freddy Hernandez

Join Date: Aug 2019
Posts: 1

Problems with first-stage estimates in Fuzzy RD with rdrobust

31 Aug 2019, 16:35

Hi everyone,

I'm using rdrobust to get fuzzy regression discontinuity estimators to analyze causal effect of a cash transfer program with a household survey of one year in particular. As rdrobust with the fuzzy option shows estimates for 2 stage regression, I mainly have 2 questions or problems dealing with:

Code:

 rdrobust depvar runvar [if] [in] [, c(#) fuzzy(fuzzyvar)

My data is a household survey in which my outcome variable is CS1914, my running or forcing variable is score14 which is an index by which treatment is assigned according to the rules of the cash transfer program using a cutoff at 28.20351 (below the cutoff people are consider eligible to the program, and above not), and my treatment variable is hog_benef.

Problem 1:

I've been running the following code to get the RD estimators, but, as it follows, I get non-significant coefficients for the first-stage regression.

Code:

. rdrobust CS1914 score14, fuzzy(hog_benef) c(28.20351) all 

Fuzzy RD estimates using local polynomial regression.

 Cutoff c = 28.20351 | Left of cRight of c            Number of obs =      28676
-------------------+----------------------            BW type       =      mserd
     Number of obs |      5711       22965            Kernel        = Triangular
Eff. Number of obs |      3046        3982            VCE method    =         NN
    Order est. (p) |         1           1
    Order bias (q) |         2           2
       BW est. (h) |     6.854       6.854
       BW bias (b) |    12.537      12.537
         rho (h/b) |     0.547       0.547

First-stage estimates. Outcome: hog_benef. Running variable: score14.
--------------------------------------------------------------------------------
            Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
-------------------+------------------------------------------------------------
      Conventional |  .01233     .02517   0.4900   0.624   -.037001      .061666
    Bias-corrected |  .02111     .02517   0.8387   0.402   -.028223      .070444
            Robust |  .02111      .0287   0.7355   0.462   -.035144      .077364
--------------------------------------------------------------------------------

Treatment effect estimates. Outcome: CS1914. Running variable: score14. Treatment Status: hog_benef.
--------------------------------------------------------------------------------
            Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
-------------------+------------------------------------------------------------
      Conventional | -8.3531     18.752   -0.4454  0.656   -45.1071       28.401
    Bias-corrected | -3.0851     18.752   -0.1645  0.869   -39.8392      33.6689
            Robust | -3.0851     21.393   -0.1442  0.885   -45.0142      38.8439
--------------------------------------------------------------------------------

I wonder if anyone could help me to interpret or correct this since I've been also trying to figure out why the running variable isn't a good instrument to identify treatment status. As I can show you I've been running OLS regressions and correlations to see if score14 isn't really a good predictor of hog_benef:

Code:

. reg hog_benef score14

      Source |       SS           df       MS      Number of obs   =    28,676
-------------+----------------------------------   F(1, 28674)     =  11209.88
       Model |  1838.31955         1  1838.31955   Prob > F        =    0.0000
    Residual |  4702.27952    28,674  .163991055   R-squared       =    0.2811
-------------+----------------------------------   Adj R-squared   =    0.2810
       Total |  6540.59907    28,675  .228094126   Root MSE        =    .40496

------------------------------------------------------------------------------
   hog_benef |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     score14 |  -.0159688   .0001508  -105.88   0.000    -.0162645   -.0156732
       _cons |   1.035558   .0068851   150.41   0.000     1.022063    1.049054
------------------------------------------------------------------------------

. corr hog_benef score14
(obs=28,676)

             | hog_be~f  score14
-------------+------------------
   hog_benef |   1.0000
     score14 |  -0.5302   1.0000

So far score14 seems to be a good predictor of hog_benef but I also tried by restricting this to the bandwidth considered in the RD estimate:

Code:

. reg hog_benef score14 if (score14>=21.34951 & score14<=35.05751)

      Source |       SS           df       MS      Number of obs   =     7,028
-------------+----------------------------------   F(1, 7026)      =    175.37
       Model |  39.7971642         1  39.7971642   Prob > F        =    0.0000
    Residual |  1594.40247     7,026  .226928902   R-squared       =    0.0244
-------------+----------------------------------   Adj R-squared   =    0.0242
       Total |  1634.19963     7,027  .232560073   Root MSE        =    .47637

------------------------------------------------------------------------------
   hog_benef |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     score14 |  -.0193368   .0014602   -13.24   0.000    -.0221991   -.0164744
       _cons |   1.188693   .0424057    28.03   0.000     1.105565    1.271821
------------------------------------------------------------------------------

. corr hog_benef score14 if (score14>=21.34951 & score14<=35.05751)
(obs=7,028)

             | hog_be~f  score14
-------------+------------------
   hog_benef |   1.0000
     score14 |  -0.1561   1.0000

But again I get the same evidence, the running variable might be a good predictor of treatment status, so why isn't this the same for the first-stage of the RD estimation?

Problem 2:

Can anyone help me with an idea of how can I recover first-stage estimates and model statistics such as F-value to report with other commands such as esttab or outreg2?

According to the help file of rdrobust there is no option in which one could ask stata to report model statistics, and I also do not find an e() matrix or scalar in which to recover coefficients and statistics of the first-stage.

Thank you very much for any of your answers or ideas,

Freddy.

Tags: None

Quang Thien

Join Date: Oct 2019

Posts: 3
#2

06 Nov 2019, 03:23

Hi Freddy Hernandez. Have you solved your problem? I am also facing with the same problem.
Comment
Katrin Schneider

Join Date: Oct 2019

Posts: 2
#3

24 Nov 2019, 03:42

Hi, I don't know if this is still relevant to you. Rdrobust uses triangular kernel weights as a default. So, to have comparable regressions, you either need to estimate rdrobust with the option kernel(uniform), or you need to add weights to your reg command. According to https://stats.stackexchange.com/ques...uzzy-rdd-issue, you can use

Code:

gen w=max(0,1-abs(cutoff_Y))

as a triangular weight for your reg estimation. Also, I suggest you use cmogram to do a visual inspection of your first stage to help find out what's going on in your data.

I am also looking for a way to report the first stage in a table. Have you found any solution to that problem?
1 like
Comment
Maria Iocco

Join Date: Dec 2019

Posts: 8
#4

05 Mar 2020, 05:14

Hello Freddy,

I have a similar problem. Did you find a solution?

Best regards,
Comment
Celia Zhu

Join Date: Aug 2020

Posts: 1
#5

18 Aug 2020, 13:08

Hello Freddy,
I find it might be useful if you set the triangular weight as

Code:

gen w = max(0, `bandwidth' - abs(running_var))

where the `bandwidth' can be retrieved from `rdbwselect` and running_var is your running variable.

The results I got is closer to the first-stage estimate (both coefficient and s.e.) reported by -rdrobust-. Though I'm not exactly sure if this is a coincidence or how it works under the hood.

Might also be helpful to look at https://www.stata.com/statalist/arch.../msg01198.html . In that post, -rd- is another user-contributed command for RDD. The first-stage coefficient (to be clear, the coefficient of the dummy indicating the running variable is above the cutoff value) is called "denom" if you use -rd-.

And thanks Katrin for the insightful hint!

Last edited by Celia Zhu; 18 Aug 2020, 13:14.
Comment

Announcement

Problems with first-stage estimates in Fuzzy RD with rdrobust

Comment

Comment

Comment

Comment