Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sharp RDD with time as running variable

    Dear members,

    I am fairly new to Stata and would appreciate your help.
    I am writing my thesis and I want to to measure the effect of the financial crisis on intergenerational income mobility.
    I am trying to find the effect of the crisis on the rank-rank slope (As defined in Chetty et al. 2014. Where is the land of Opportunity? The Geography of Intergenerational Mobility in the United States, https://academic.oup.com/qje/article/129/4/1553/1853754), and I'm using a Sharp Regression Discontinuity Design for that.

    My data looks like this (it runs from 2004 to 2014):
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year double idpers long idhous byte sex int(birthy age) byte relarp float(totygCPI Rank) double idfath__ byte ownkid double CPI float(rtotygPARENT Rankfath t ptile ptileFather)
    2004  1428104  14281 1 1978 26 4 14220.602 1  1428101 3 107.5  74604.65  7 0  33  28
    2004  2147104  21471 1 1979 25 4  68732.91 7  2147102 6   111  194594.6 10 0  89  91
    2004  2147105  21471 2 1980 24 4  30628.99 2  2147102 6   111  194594.6 10 0  50  91
    2004  2338103  23381 1 1979 25 4 2917.0464 1  2338101 2 108.9 122320.48 10 0   9  64
    2004  2693103  26931 1 1976 28 4  41595.26 3  2693101 2 106.4 122180.45 10 0  61  64
    2004  3099104  30991 1 1979 25 4   45606.2 3  3099101 3 108.9  51576.68  4 0  66  14
    2004  3596104  35961 1 1979 25 4  6381.039 1  3596101 4 108.2 258777.27 10 0  17  97
    2004  3596106  35961 1 1979 25 4  6563.354 1  3596101 4 108.2 258777.27 10 0  17  96
    2004  3697101  36971 2 1978 26 4  54694.62 5  3697103 4 108.9  62072.54  6 0  76  20
    2004 20198103 201981 1 1980 24 4  36809.48 3 20198102 3 109.7  273473.1 10 0  56  98
    2004 20989104 209891 2 1976 28 4  66271.65 6 20989101 2 109.7  49225.16  4 0  87  12
    2004 21395103 213951 2 1978 26 4  56882.41 5 21395101 3   113  57522.13  5 0  79  17
    2004 21458103 214581 2 1979 25 4 9115.7705 1 21458101 2 115.2  52256.95  4 0  24  15
    2004 21612103 216121 1 1980 24 4  57429.35 5 21612101 3 109.7  48395.63  4 0  80  11
    2004 22047103 220471 2 1977 27 4  59671.83 5 22047101 1 109.7  76572.47  8 0  82  29
    2004 22850103 228501 2 1976 28 4 33181.402 2 22850101 1 109.7 164083.86 10 0  53  84
    2004 23155103 231551 1 1975 29 4  60774.84 6 23155101 2 115.8 12435.233  1 0  84   3
    2004 23155104 231551 1 1978 26 4     47402 4 23155101 2 115.8 12435.233  1 0  68   3
    2004 23375103 233751 1 1978 26 4  52670.92 4 23375101 2 109.7  70191.43  7 0  74  25
    2004 23572103 235721 1 1979 25 4  3646.308 1 23572101 1 115.8 1727.1157  1 0   9   1
    2004 23867101 238671 1 1978 26 4 3746.5815 1 23867104 5   113  91769.91  9 0  10  40
    2004 23972103 239721 2 1978 26 4 35113.945 3 23972102 2   113 126548.67 10 0  55  67
    2004 24022103 240221 1 1980 24 4 18960.803 2 24022101 1 109.7  76572.47  8 0  38  29
    end
    label values idpers IDPERS
    label values idhous IDHOUS04
    label values sex SEX04
    label def SEX04 1 "man", modify
    label def SEX04 2 "woman", modify
    label values birthy BIRTHY
    label values age AGE04
    label values relarp RELARP04
    label def RELARP04 4 "Son/daughter of Reference Person or spouse/cohabite", modify
    label values idfath__ IDFATH_
    label values ownkid OWNKID04

    My specification is Rcit = α0 + α1Rpit + α2 × post + α3(Rpit × post) + εit (Rcit being child income rank, Rpcit being the father income rank and post being the time dummy for my cutoff - it's =0 pre and =1 post)

    First I wanted to check the Rank rank slope, so I ran
    reg ptile ptileFather (ptile = percentile = rank in my sample, for children and fathers) in Stata (version 16.1) and got:

    Source | SS df MS Number of obs = 857
    -------------+---------------------------------- F(1, 855) = 4.66
    Model | 3870.98541 1 3870.98541 Prob > F = 0.0311
    Residual | 710046.405 855 830.463632 R-squared = 0.0054
    -------------+---------------------------------- Adj R-squared = 0.0043
    Total | 713917.391 856 834.015644 Root MSE = 28.818

    ------------------------------------------------------------------------------
    ptile | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    ptileFather | -.0736354 .0341064 -2.16 0.031 -.1405776 -.0066933
    _cons | 54.15658 1.982127 27.32 0.000 50.26617 58.04698
    ------------------------------------------------------------------------------

    This is pretty clear.


    However, now I have ran the command reg c.ptile c.ptileFather##t, r​​​​​​​ to get the effect of the crisis and got the following (unfortunately no significant) which I cannot interpret:


    Linear regression Number of obs = 857
    F(3, 853) = 2.69
    Prob > F = 0.0451
    R-squared = 0.0102
    Root MSE = 2.3513

    ------------------------------------------------------------------------------
    | Robust
    Rank | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Rankfath | .0357066 .0499495 0.71 0.475 -.0623316 .1337449
    1.t | .1928553 .5266254 0.37 0.714 -.8407781 1.226489
    |
    t#c.Rankfath |
    1 | -.0795672 .0640849 -1.24 0.215 -.2053498 .0462153
    |
    _cons | 3.099359 .4057073 7.64 0.000 2.303057 3.895661
    ------------------------------------------------------------------------------

    My first question would be: Am I right to think that the second line gives me α2 from my specification? This would be the effect of the crisis.

    My second question is: This is a time-based RDD, so it's difficult to compute a graph using -rdplot- to showcase the evolution of the rank rank slope (RRS) and ideally a jump when I don't have daily or monthly data, right? Would there be any other way to showcase the evolution over the years? For now, the only graph that I can compute is the RRS (Child rank as Y and father rank as X).

    Thanks a lot for your time!

    Best,
    Melanie



  • #2
    Please do watch this, and then reask your question. Also, please, do read the FAQ.

    Then re-ask the question (on this post), where you've properly formatted your output, as well as clarify your questions. Example:
    This is a time-based RDD, so it's difficult to compute a graph using -rdplot-
    I don't think you understand RDD very well. RD is intended to be a panel data estimator, RDD is specifically designed for time based designs.

    Comment


    • #3
      Thanks @Jared. Here's my second try, I hope it is clearer.

      Dear statalisters,

      I am writing my thesis and I want to to measure the effect of the financial crisis on intergenerational income mobility.
      I am trying to find the effect of the crisis on the rank-rank slope. The rank rank slope is the correlation between child and parent (I use fathers) ranks (as defined in Chetty et al. 2014. Where is the land of Opportunity? The Geography of Intergenerational Mobility in the United States, https://academic.oup.com/qje/article/129/4/1553/1853754).
      I'm using a Sharp Regression Discontinuity Design to measure the effect, with Stata 16.1 on Mac.

      My data looks like this (it runs from 2004 to 2014):

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float year double idpers long idhous byte sex int(birthy age) float(totygCPI Rank ptile) double idfath__ float(rtotygPARENT Rankfath ptileFather) double CPI float t
      2009 23572103 235721 1 1979 30 72447.914 7 92 23572101 1727.1157 1  1 115.8 1
      2006 23572103 235721 1 1979 27  4456.328 1 11 23572101 1727.1157 1  1 115.8 0
      2004 23572103 235721 1 1979 25  3646.308 1  9 23572101 1727.1157 1  1 115.8 0
      2005 23572103 235721 1 1979 26 2432.4324 1  7 23572101 1727.1157 1  1 115.8 0
      2008 23572103 235721 1 1979 29  76070.81 8 94 23572101 1727.1157 1  1 115.8 0
      2007 23572103 235721 1 1979 28  5769.912 1 15 23572101 1727.1157 1  1 115.8 0
      2010 23572103 235721 1 1979 31  70560.34 7 91 23572101 1727.1157 1  1 115.8 1
      2007  6912104  69121 1 1982 25 13079.646 1 31  6912101 2212.3894 1  1   113 0
      2008  6912104  69121 1 1982 26  28411.05 2 48  6912101 2212.3894 1  1   113 0
      2008  5797103  57971 1 1982 26  76226.25 8 95  5797101  5555.556 1  2 115.2 0
      2012 22239103 222391 1 1986 26  10389.61 1 27 22239101      6750 1  2   116 1
      2014 22239103 222391 1 1986 28  42534.72 3 62 22239101      6750 1  2   116 1
      2013 14140103 141401 1 1988 25      6250 1 16 14140101  7927.461 1  2 115.8 1
      2012 14140103 141401 1 1988 24  11255.41 1 29 14140101  7927.461 1  2 115.8 1
      2007 11676103 116761 2 1978 29  79539.82 8 96 11676101  8812.672 1  2 108.9 0
      2004  6697103  66971 2 1972 32   25606.2 2 46  6697101  8812.672 1  2 108.9 0
      2005  6697103  66971 2 1972 33  52216.21 4 74  6697101  8812.672 1  2 108.9 0
      2010 23231104 232311 1 1981 29  11241.38 1 29 23231102     11250 1  2   116 1
      2012 23231104 232311 1 1981 31 27272.727 2 48 23231102     11250 1  3   116 1
      2011 23231104 232311 1 1981 30  23731.73 2 43 23231102     11250 1  3   116 1
      2014 23231104 232311 1 1981 33 22916.666 2 42 23231102     11250 1  3   116 1
      2013 23231104 232311 1 1981 32 23958.334 2 44 23231102     11250 1  3   116 1
      end
      label values idpers IDPERS
      label values idhous IDHOUS04
      label values sex SEX04
      label def SEX04 1 "man", modify
      label def SEX04 2 "woman", modify
      label values birthy BIRTHY
      label values age AGE04
      label values idfath__ IDFATH_
      My specification is Rcit = α0 + α1Rpit + α2 × post + α3(Rpit × post) + εit (Rcit being child income rank, Rpcit being the father income rank and post being the time dummy for my cutoff - it's =0 pre and =1 post)

      To get the effect of the crisis, aka the difference in correlation of child and father ranks at the cutoff point (year 2009, my data is from Switzerland where the crisis hit later), I ran the following command:

      Code:
      reg c.ptile c.ptileFather##t, r​​​​​
      from which I got

      Code:
      Linear regression Number of obs = 857
      F(3, 853) = 2.69
      Prob > F = 0.0451
      R-squared = 0.0102
      Root MSE = 2.3513
      
      ------------------------------------------------------------------------------
      | Robust
      Rank | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      Rankfath | .0357066 .0499495 0.71 0.475 -.0623316 .1337449
      1.t | .1928553 .5266254 0.37 0.714 -.8407781 1.226489
      |
      t#c.Rankfath |
      1 | -.0795672 .0640849 -1.24 0.215 -.2053498 .0462153
      |
      _cons | 3.099359 .4057073 7.64 0.000 2.303057 3.895661
      ------------------------------------------------------------------------------
      1) My first question is the following: Is the effect of the crisis, i.e. α2 in my specification, the estimate found in the second line of the output? (.1928553)


      Furthermore, I would've liked to showcase the evolution of my data using -rdplot-, ideally like can be seen in the attached .png
      Click image for larger version

Name:	Capture d’écran 2022-07-12 à 21.44.50.png
Views:	1
Size:	96.1 KB
ID:	1673273

      However, for now I only have the correlation of Child and Father ranks (meaning the Rank-Rank Slope), which is one number and is not changing over time as I am using my whole panel for it. The only solution I see would be for me to compute the RRS separately for each year, and then showcase this evolution with RRS as outcome variable (Y axis) and time as running variable (X axis). In this case I would only get 11 data points, which can hardly be used as in the graph above.

      2) My second question is hence: Is there a way for me to have Child rank as outcome variable (Y axis), and Father rank AND time as running variable (X axis)?

      Comment


      • #4
        I'm not kidding when I tell you this was SO much better than the first time. I don't even format my questions quite this well, with all the colors and stuff.


        To your first question: Yeah you're right this is the treatment effect, but my question for you now is what's the cutoff for the running variable? Your cutoff isn't simply 0 1- let me give an example. If I say that every kid in 1000 classrooms who scores over an 85 on a test gets ice cream, we have kids, their grades they earn, time, and an outcome (say happiness points). Here 85 is our cutoff. Make an 86, ice cream for you. Make an 84.6, so sorry, no ice cream.

        what is your cutoff here? I'm not understanding the research question- what even is
        the rank-rank slope
        ? How are we defining treatment assignment here? How do we know RD is the appropriate design here, even?

        I think once you explain these important contextual details, other people who are more knowledgeable about this than me may be able to give better comments. My only other question is why use reg? Why not just use rdrobust?

        Comment


        • #5
          Thank you @Jared.

          what's the cutoff for the running variable?
          My cutoff is the year 2009. The treatment is the global financial crisis (again, Swiss date). Children before the cutoff have not experience the crisis, and children after have.

          The goal is to measure whether the financial crisis has an impact on intergenerational income mobility. So I try to answer the questions: does the crisis change whether children's income ranks are above (upward mobility) or underneath (downward mobility) father income ranks? Does it lower intergenerational income mobility (that would mean ranks or children are less dependent on ranks of fathers)?
          One could assume (and some papers shows it - but not with RDD which is why I'm stuck) that children from lower income households (lower father income rank in dataset) would experience a stronger negative effect from the crisis than children from upper income households.


          So in this methodology (as far as I understood):

          Step 1: calculate Rank-Rank-Slope. As I said above, the RRS, as defined by Chetty et al. (2014) and other authors, is the correlation between child income rank and parent (I use father) income ranks. It is a slope, but at the same time it's just one correlation number as ranks from both generations follow a uniform distribution.

          Code:
          reg ptile ptileFather
          and can look like this (from Chetty et al. 2014)

          Click image for larger version

Name:	Capture d’écran 2022-07-13 à 18.15.41.png
Views:	1
Size:	67.2 KB
ID:	1673433


          And step 2 would theoretically be to show, with -rdplot-, how rank child and rank parent move together over time and through the cutoff.

          I've tried something new for that
          Code:
          qui reg c.ptile c.ptileFather 
          predict RRS
          rdplot RRS year, c(2009) deriv(0) nbins(50 50)
          .... and get the disappointing averaged RRS through the cutoff like bellow.

          Click image for larger version

Name:	Capture d’écran 2022-07-13 à 21.23.50.png
Views:	1
Size:	91.2 KB
ID:	1673434

          My question: does my code make sens for what I want to achieve? And is there a way for me to have more datapoints or is my data just not fitting for an RDD? (monthly incomes might have been more promising)

          Thank you for your time.
          Melanie


          Comment

          Working...
          X