Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to construct treatment and control group for a difference-in-differences design?

    Dear Statalist,

    I am looking at the effects of the UK minimum wage on some outcome variable.
    I am trying to define the treatment group (treat=1) with those individuals whose hourly wage (rhw) is below the minimum wage prior to policy (April 1999).
    And I am trying to define the control group (treat=0) with those individuals who earn more than the minimum wage and less than 6.50 prior to policy.
    Furthermore, I define a time variable post = 1 if year>=1999 & year<=2022, and post=0 if year>=1993 & year<=1998.
    When I do:
    Code:
    tab treat post
    I must have observations in each cell.
    However, I am not sure how to code the prior to policy dimension for the treatment and control group. If I code the following, I obviously do not end up with observations in each cell using the cross tab command because I condition on year<1999:
    Code:
    gen treat=1 if rhw<minwage & year<1999
    replace treat = 0 if rhw>minwage & rhw<=6.50 & year<1999
    Here is a panel data example:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(year rhw minwage)
     2727 1992  1.889728    1
     2727 1993 2.2677436    1
     2727 2000  2.826062  3.6
     4091 1992  7.117305    1
     4091 1993  8.435369    1
     4091 1994  6.237461    1
     5451 1992  3.212383    1
     5451 1993 3.3510616    1
     6807 1992  2.472252    1
     6807 1993 2.0533466    1
     6882 1998  8.406841    1
     8167 1992  9.420202    1
     8167 1993  7.369303    1
     8167 1995 14.222715    1
     8167 1996  18.16304    1
     9527 1992   4.03008    1
     9527 1993   3.80602    1
     9527 1998  5.400845    1
     9527 1999  6.150339  3.6
     9527 2000  6.204168  3.6
     9527 2002  6.217543  4.1
    12935 1998  2.953587    1
    15645 1994 11.111948    1
    15645 1995  6.148856    1
    15645 1996  6.054346    1
    15645 1997  4.781212    1
    15645 1999  5.868946  3.6
    15645 2000   6.34688  3.7
    15645 2001  10.16255  4.1
    15645 2002  7.870307  4.1
    15645 2003  7.243289  4.5
    15645 2004  6.877482  4.5
    15645 2005 10.736366 4.85
    15645 2006  9.333656 5.05
    16339 1994 3.4724836    1
    16339 1995  4.013677    1
    16339 1997  5.145015    1
    16339 1998  4.853835    1
    17015 2003  3.603311  4.2
    17015 2004  6.533608  4.5
    17015 2005  7.482922 4.85
    21087 1992    5.6321    1
    21087 1993  6.570708    1
    21087 1994  5.676175    1
    21087 1995  6.200527    1
    21087 1996  5.297553    1
    21087 1997  5.502968    1
    21087 1998  6.891378    1
    21087 2000  5.933015  3.6
    21767 1993  9.818517    1
    21767 1996   9.08152    1
    21767 1998  7.706389    1
    21767 2000 10.178714  3.7
    21767 2002 16.396473  4.1
    22445 2006  5.675872 5.05
    22445 2007  4.837647 5.52
    22445 2013  4.018023 6.19
    22445 2014  6.398956 6.31
    22445 2015  6.291113  6.5
    22445 2017  5.650069  7.5
    22445 2018  4.920743 7.83
    22445 2021  5.730185 8.91
    22445 2022 4.6051707  9.5
    22451 1992  7.148911    1
    22451 1994 8.6687975    1
    23807 1996  7.926869    1
    25847 1992 4.3915005    1
    27211 1993  4.004482    1
    27284 2001  5.001255  4.1
    27284 2002  5.246871  4.2
    27284 2003  5.857276  4.5
    27284 2004  6.686457 4.85
    27284 2005  6.014242 5.05
    28575 1996 2.5836325    1
    28575 1998  4.143921    1
    28575 1999  6.914976  3.6
    28575 2001  6.059072  3.7
    28575 2002  8.057695  4.2
    28575 2003  5.367658  4.2
    28575 2004 4.6423006 4.85
    29259 1996  5.442605    1
    29259 1997 4.0254436    1
    29259 2000  6.736046  3.7
    29925 1999 1.6302627  3.6
    29925 2016  5.486196  7.2
    29925 2017  4.943817  7.5
    29925 2018   5.94055 7.83
    29925 2019  5.140785 8.21
    29925 2020  4.877749 8.72
    29925 2021  6.419764 8.91
    29925 2022  6.143013  9.5
    30615 1999  3.283815  3.6
    30615 2005 4.4138393 4.85
    34007 1992  7.914739    1
    34007 1993 10.016324    1
    34122 1997  5.791275    1
    34122 1998  7.982667    1
    34691 1992  4.135785    1
    34691 1993  5.288999    1
    34691 1995  6.195757    1
    end
    Note that the 1s can be replaced with 0s.
    I would appreciate your help.

    Best,
    Nico


















  • #2
    It is easy enough to define the pre and post period variable:

    Code:
    gen byte post = (year >= 1999 & !missing(year))
    However, it is not at all clear how to define your treatment variable. It is possible, in principle, that in the years before 1999, the same person may have rhw < minwage in some years, but not in others. Is such a person in the treatment group or not? I don't see any instances of this problem in your example data, but there is nothing you have said that prevents it from happening. If your solution to this is that if they ever have rhw < minwage in any year before 1999 they are in the treated group, then it would be
    Code:
    by id (year), sort: egen treat = max(rhw < minwage & !post)
    Alternatively, if your solution is that to be in the treatment group they must have rhw < minwage in every year before 1999, it would be:
    Code:
    by id (year), sort: egen treat = min(cond(!post, rhw < minwage, 1))
    If your definition of the treatment group is different from both of these, then post back with a clear explanation.

    Added: The above definitions of treat will not work correctly if there are observations where rhw or minwage is missing. It is possible to code around this, but it gets complicated. So I suggest that before using this code you either verify that there are no such observations in your data, or, if there are, you drop them.
    Last edited by Clyde Schechter; 27 Apr 2025, 08:27.

    Comment


    • #3
      Hi Clyde,
      Thanks for your prompt and detailed reply.
      After having read your response, I think I want the treatment for those with rhw<minwage only in 1998.
      Thanks for your kind help.
      Best,
      Nico

      Comment


      • #4
        In that case, it's
        Code:
        by id (year), sort: egen treat = max(year == 1998 & rhw <minwage)
        Note: This code assumes that everybody has an observation for year 1998. If you have any id that has no year 1998 observation, that id's status for treat is undefined. So I would precede this code with a check to see if everybody has a 1998 observation, and eliminate any id that doesn't:
        Code:
        by id (year), sort: egen has1998 = max(year == 1998)
        keep if has1998

        Comment


        • #5
          Thanks Clyde, I will do that.
          Have a great day,
          Nico

          Comment


          • #6
            Dear Clyde,

            I would like to ask you a brief follow-up question.
            If I write the following two commands:
            Code:
             
             by id (year), sort: egen has1998 = max(year == 1998) keep if has1998    
             by id (year), sort: egen treat = max(year == 1998 & rhw <minwage)
            How would I modify the second command so that I have treat=1 if year==1998 & rhw<minwage for the treatment group and for the control group treat=0 if year==1998 & rhw<=1.6*minwage & rhw>minwage?
            Essentially, I would like to add a condition for the control group in your second command.

            I would appreciate your help once again.
            Have a good day,
            Best,
            Nico

            Comment


            • #7
              What do you want to do for observations where rhw > 1.6*minwage in 1998?

              Comment


              • #8
                Hi Clyde,
                Congratulations and well I would not want to consider them.
                They can be dropped I would think.
                I want to have control and treatment groups as similar as possible.
                Thanks for your help.
                Best,
                Nico

                Comment


                • #9
                  I believe that this will do it:
                  Code:
                  by id (year), sort: egen keeper = max(year == 1998 & rhw <= 1.6*minwage)
                  keep if keeper  
                  by id (year), sort: egen treat = max(year == 1998 & rhw < minwage)
                  The first line of code identifies those id's that have a 1998 observation with an rh2 that does not exceed 1.6*minwage. The second keeps only those. The final one assigns the treat variable to those where rhw < minwage.

                  I could not test this on your example data from #1 because there all 1998 observations have an rhw/minwage ratio exceeding 1.6, so there are no observations to classify as treat or control.

                  Comment


                  • #10
                    Thanks Clyde, much appreciated.
                    Nico

                    Comment


                    • #11
                      Hi Clyde,

                      I hope you are fine.
                      I would like to ask you yet another follow-up question.
                      Let's say I do the following changes to your previous suggestion #9, i.e., I take 2015 as pre-treatment period and use hourly wage (hw):
                      Code:
                      by id (year), sort: egen keeper = max(year == 2015 & hw <= 1.6*minwage)
                      keep if keeper 
                      by id (year), sort: egen treat = max(year == 2015 & hw < minwage)
                      I would like to have as my treatment group in the year 2015: those hw<minwage AND in the year 2016 hw==minwage, so I want to have compliers to the minimum wage in my treatment group.
                      Lastly, control group remains as before: hw<=1.6*minwage & hw>minwage in the year 2015.
                      I hope I am making sense.
                      I would appreciate your help.

                      Best,
                      Nico

                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input float(year hw minwage)
                      2015  7.168789  6.5
                      2016  7.839338  6.5
                      2015  7.106058  6.5
                      2016  7.648902  7.2
                      2017  7.665787  7.2
                      2015  7.698229  6.5
                      2016 8.2904005  7.2
                      2018   7.80482 7.83
                      2019  9.327708 7.83
                      2020 8.2904005 8.21
                      2021 9.1786585 8.72
                      2022  9.456979 8.91
                      2015 10.331834  6.5
                      2016  9.083911  6.5
                      2017  9.382217  7.2
                      2018  11.18649  7.5
                      2019  11.18649 7.83
                      2020 12.629908 8.21
                      2021 12.733288 8.72
                      2022 13.107796 8.91
                      2015  8.578027  6.5
                      2016  9.237875  6.5
                      2018  9.237875 7.83
                      2015  4.503464  6.5
                      2016  6.510682  7.2
                      2015  10.28822  6.5
                      2016 10.778753  7.2
                      2017  10.01655  7.5
                      2018   8.18753 7.83
                      2019  8.667292 8.21
                      2015  6.505052  6.5
                      2018  9.244096  7.5
                      2019  9.367147 7.83
                      2020  8.954215 8.72
                      2015   5.67985  6.5
                      2016  5.569212  6.5
                      2017  7.088761  7.2
                      2018  6.505004  7.5
                      2019   6.69746 7.83
                      2020  7.528868 8.21
                      2021  9.382217 8.72
                      2022  9.214549 8.91
                      2015  7.918179  6.5
                      2016  8.660508  6.5
                      2017  8.775982  7.2
                      2018  9.006928  7.5
                      2015  8.812447  6.5
                      2017  8.981268  7.2
                      2018  9.083398  7.5
                      2020  7.505774 8.21
                      2021   6.73601 8.72
                      2022   10.3958 8.91
                      2015  7.240864  6.5
                      2016  6.072816  6.5
                      2017  7.888466  7.2
                      2019  7.283522 7.83
                      2020   7.94729 8.21
                      2021  9.405923 8.72
                      2022   8.14142 8.91
                      2015 2.2502518  6.5
                      2016  7.698229  6.5
                      2017  9.545804  7.2
                      2018  8.083141  7.5
                      2020  8.852963 8.21
                      2022 10.742347 8.91
                      2015  8.471363  6.5
                      2016 11.815704  6.5
                      2018 15.424172  7.5
                      2019 16.249962 7.83
                      2020   16.7398 8.21
                      2022  19.80323 8.91
                      2015  6.928406  6.5
                      2016  6.735951  6.5
                      2015  6.928406  6.5
                      2017  8.929946  7.2
                      2019 13.897337 7.83
                      2020 13.028753 8.21
                      2015  9.451733  6.5
                      2016  8.506524  6.5
                      2017  9.900693  7.2
                      2018 10.103926  7.5
                      2019  8.506524 7.83
                      2020   8.32099 8.21
                      2021  9.237875 8.72
                      2015  5.337183 5.13
                      2015  6.466513  6.5
                      2016  6.928406  6.5
                      2017  8.018957  7.2
                      2015  8.032935  6.5
                      2016  8.133347  6.5
                      2017  8.622017  7.2
                      2018  8.878625  7.5
                      2019  8.775982 7.83
                      2020  9.622814 8.21
                      2021  9.897723 8.72
                      2022  9.897723 8.91
                      2015  5.278786 5.13
                      2017 3.0985374 6.95
                      2018  6.004619 7.05
                      2015  9.145496  6.5
                      end

                      Comment


                      • #12
                        I would like to have as my treatment group in the year 2015: those hw<minwage AND in the year 2016 hw==minwage, so I want to have compliers to the minimum wage in my treatment group.
                        This will not fly. There are two problems with it. First, an employer might, for various reasons, comply with the minimum wage by raising the wage to value that is slightly above minwage, so conceptually this is flawed. Second, and much more important, it is mathematically treacherous to code for exact equality between floating point numbers. The problem is that a number like 9.145496 (to choose an arbitrary example from your -dataex- output) has no exact finite binary representation, just as 1/3 has no exact finite decimal representation. So the actual number Stata uses is the binary number of size float that is as close as possible to 9.145496. That's if you just give 9.145496 as an input. But if 9.145496 is (the decimal representation to 6 places of) the result of some calculation, there may be other rounding and truncation errors along the way, so that the result may not to be the same. Even the simplest situations can produce paradoxical results:
                        Code:
                        . clear
                        
                        . set obs 1
                        Number of observations (_N) was 0, now 1.
                        
                        . gen x = 9.145496
                        
                        . assert x == 9.145496
                        assertion is false
                        r(9);
                        So you need to come up with a workable definition. Probably it should be something like hw < minwage in 2015, and in 2016 hw between 99.5% of minwage and 102% of minwage. (I just picked a 2% increment and 0.5% round-below error out of my head--you might prefer something different.) Or you might prefer to bound the difference between hw and minwage instead of the ratio. Anyway, here's how it would work with my definition: you should be able to modify it to reflect your own definition.

                        Code:
                        by id (year), sort: egen keeper = max(year == 2015 & hw <= 1.6*minwage)
                        keep if keeper
                        by id (year), sort: egen criterion2015 = max(year == 2015 & hw < minwage)
                        by id (year): egen criterion2016 = max(year == 2016 & inrange(hw, 0.995*minwage, 1.02*minwage))
                        gen treat = criterion2015 & criterion2016
                        Note: Your example data does not include an id variable. But you previously had one, and without it none of this makes any sense. I've written the code on the assumption that in your real data, you still have an id variable.
                        Last edited by Clyde Schechter; 28 May 2025, 08:46.

                        Comment


                        • #13
                          Hi Clyde,

                          Thank you very much once again, your assumption is correct, I failed to include the id by accident.
                          Otherwise, I can follow you code, it makes sense to me.
                          Have a great day,
                          All the best,
                          Nico




                          Comment

                          Working...
                          X