Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Differences with repeated cross-sectional data

    Hello everyone. I want to study the impact of a reform occured in 2014 on a series of binary outcomes using DD strategy.
    For simplicity I'll show you an example of my dataset without controls so I that I can ask whether the model I coded is correct or not.

    "d1" is my group variable: d1 = 1 if the unit is treated, 0 otherwise.

    "time" is a dummy such that:
    Code:
    time = 1 if year >= 2014
    time = 0 if year < 2014
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 id int year byte outcome float(d1 time pwght) int psu
    "0020090000030101" 2009 1 0 0  779.5   2
    "0020180555630101" 2018 0 1 1  138.7 114
    "0020180555600101" 2018 0 1 1  790.2  23
    "0020180555570101" 2018 1 0 1  714.6   6
    "0020150535800101" 2015 0 1 1  239.1   2
    "0020120050600103" 2012 0 0 0  142.5   1
    "0020170049010101" 2017 0 1 1  366.5  61
    "0020120121400101" 2012 0 0 0  637.9   1
    "0020130172240101" 2013 0 0 0  371.7   2
    "0020120118210101" 2012 0 0 0  410.7   1
    "0020180555340101" 2018 0 1 1  246.9  74
    "0020140816720101" 2014 0 0 1  367.9   2
    "0020100252210101" 2010 0 0 0  313.4   2
    "0020180469970102" 2018 0 0 1    432  29
    "0020110394270101" 2011 0 0 0  211.2   2
    "0020120260720101" 2012 0 1 0  334.4   2
    "0020160246150101" 2016 0 0 1   88.1   5
    "0020110576340101" 2011 0 0 0  687.8   2
    "0020180555140102" 2018 0 0 1  562.9  31
    "0020140477650101" 2014 0 0 1   77.8   2
    "0020130677130101" 2013 0 1 0  495.6   1
    "0020120806010101" 2012 0 0 0    201   2
    "0020160675090101" 2016 0 0 1  700.1  61
    "0020100075250102" 2010 0 0 0  152.4   1
    "0020110061240102" 2011 0 0 0  257.2   1
    "0020180367070102" 2018 0 0 1  473.1   3
    "0020150057500102" 2015 0 0 1   75.1   1
    "0020180554930101" 2018 0 0 1  693.8  89
    "0020180072240101" 2018 0 0 1  408.5   2
    "0020170036690101" 2017 0 0 1  291.5  42
    "0020140031390101" 2014 0 0 1  222.4   2
    "0020110654250101" 2011 0 0 0  240.3   2
    "0020110481240101" 2011 0 0 0  207.4   1
    "0020120711030101" 2012 0 0 0  153.5   2
    "0020180303640101" 2018 0 0 1    352 102
    "0020120217660102" 2012 0 0 0  364.4   2
    "0020160165890103" 2016 0 0 1  240.4   8
    "0020100216650102" 2010 0 0 0  321.3   1
    "0020180554740102" 2018 0 0 1  934.7   6
    "0020180554700101" 2018 0 0 1  694.8  26
    "0020170213230101" 2017 0 0 1  360.9   3
    "0020160174310101" 2016 0 0 1   83.6  17
    "0020090149770101" 2009 0 0 0  254.6   1
    "0020130668530102" 2013 0 0 0   74.5   1
    "0020120218340101" 2012 0 0 0  228.1   1
    "0020110645430102" 2011 0 0 0  516.7   1
    "0020090030940201" 2009 0 0 0  797.6   1
    "0020120698250102" 2012 0 1 0  221.4   2
    "0020100469160101" 2010 0 0 0  486.9   2
    "0020150272400102" 2015 0 1 1  260.5   2
    "0020150157210101" 2015 0 0 1  162.4   2
    "0020130043100102" 2013 0 0 0  628.8   2
    "0020140785230102" 2014 0 1 1   13.2   1
    "0020090421790102" 2009 0 0 0  249.6   2
    "0020120150250101" 2012 1 1 0  845.6   1
    "0020120506470101" 2012 0 0 0  280.5   2
    "0020130111800101" 2013 0 1 0  101.4   2
    "0020120187490101" 2012 0 0 0  469.8   1
    "0020150372290101" 2015 0 0 1  223.7   2
    "0020100590830101" 2010 0 0 0  267.8   1
    "0020110072060101" 2011 0 0 0  489.3   1
    "0020140792010102" 2014 0 0 1  247.3   2
    "0020140721050101" 2014 0 0 1  196.3   1
    "0020090023250102" 2009 0 1 0 1065.3   2
    "0020090514690102" 2009 1 0 0  157.4   2
    "0020180357200101" 2018 0 0 1  317.2  23
    "0020180554090103" 2018 0 0 1  694.1   4
    "0020120151530101" 2012 0 0 0  421.4   1
    "0020100317120103" 2010 1 0 0  462.5   2
    "0020140096890201" 2014 0 0 1   58.8   2
    "0020110410240101" 2011 1 0 0  548.3   2
    "0020110222130101" 2011 0 0 0  283.9   2
    "0020090292240101" 2009 1 1 0  549.5   2
    "0020090143760101" 2009 0 0 0  459.4   1
    "0020110053070101" 2011 0 0 0  251.3   2
    "0020090121830101" 2009 0 0 0  180.6   2
    "0020180553970101" 2018 0 1 1  542.8   7
    "0020180313260102" 2018 0 0 1  472.7  21
    "0020110007460101" 2011 0 0 0  361.9   1
    "0020180553910102" 2018 0 0 1 1016.1   2
    "0020120336790102" 2012 0 0 0  159.2   1
    "0020150256350102" 2015 0 0 1   32.4   2
    "0020160336000101" 2016 0 0 1  162.5   2
    "0020170120080101" 2017 0 0 1  388.2  19
    "0020150054010103" 2015 0 0 1  200.4   1
    "0020170422730101" 2017 0 0 1  538.4  34
    "0020100598680101" 2010 0 0 0  159.5   2
    "0020100514410102" 2010 0 0 0  215.5   2
    "0020110307350101" 2011 0 0 0  200.5   1
    "0020150393090101" 2015 0 0 1  308.4   2
    "0020100343370101" 2010 0 1 0  150.6   1
    "0020180525720101" 2018 0 0 1  657.1   3
    "0020130247650101" 2013 1 1 0  247.8   1
    "0020090056300101" 2009 0 0 0  349.2   1
    "0020110577230101" 2011 0 0 0    527   1
    "0020160395150101" 2016 0 1 1  376.6  33
    "0020180553540101" 2018 0 0 1  693.3  22
    "0020180405630101" 2018 0 0 1  325.2   8
    "0020110004910101" 2011 0 0 0  302.2   1
    "0020150743510101" 2015 0 1 1  275.1   1
    end

    So far I coded:
    Code:
    gen time = 0 if year < 2014
    replace time = 1 if year >= 2014
    gen treatment = d1 * time
    
    global y outcome
    global treatment treatment
    global time time
    global group d1



    First of all I want to perform a plain DD without any cluster nor fixed effects, hence I coded:
    Code:
    reg $y i.treatment [pweight=pwght]

    Then, I want to add FEs and cluster:
    Code:
    reg $y i.treatment i.d1 i.time [pweight=pwght], vce(cluster psu)
    1) Should my time variable be "year" or "time"?
    2) Should I control for year fixed effects or time (defined exactly as I defined the variable "time" in my dataset)? I guess this depends on my time variable, right?
    3) Should I use the code I typed earlier to perform a DD or is this better:
    Code:
    reg $y i.time#i.d1 [pweight=pwght]
    reg $y i.d1##i.time [pweight=pwght], vce(cluster psu)

    Thanks a lot!
    Last edited by Mike McDonald; 18 Jan 2024, 08:59. Reason: difference in differences

  • #2
    1. it all depends on what the time variable in your dataset captures.
    2. again, it depends on what time means. What is "time" in your dataset? Months?

    3. I would cluster at the unit level, perhaps that is psu, and include unit and time (whatever time is in your dataset) fixed effects:
    Note reghdfe is from ssc:

    Code:
    reghdfe $y 1.post#1.treated, abs(unit time) cl(unit)

    Comment


    • #3
      2. again, it depends on what time means. What is "time" in your dataset? Months?
      What I called "time" is just an indicator defined as I coded in the question. In my dataset I have "year".

      Maybe my doubt is just about the theory: in a setting where I have a pre/post period and a treated/control group. When I want to control for time fixed effects should I add a dummy for pre/post (i.e. i.time in my case) or i.year ?
      Last edited by Mike McDonald; 18 Jan 2024, 10:36.

      Comment


      • #4
        Is it valid to do DID with repeated cross section, instead of panel data?

        Comment


        • #5
          Sure. What’s the exact structure of your data?

          Comment

          Working...
          X