Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to construct treatment and control group for a difference-in-differences design?

    Dear Statalist,

    I am looking at the effects of the UK minimum wage on some outcome variable.
    I am trying to define the treatment group (treat=1) with those individuals whose hourly wage (rhw) is below the minimum wage prior to policy (April 1999).
    And I am trying to define the control group (treat=0) with those individuals who earn more than the minimum wage and less than 6.50 prior to policy.
    Furthermore, I define a time variable post = 1 if year>=1999 & year<=2022, and post=0 if year>=1993 & year<=1998.
    When I do:
    Code:
    tab treat post
    I must have observations in each cell.
    However, I am not sure how to code the prior to policy dimension for the treatment and control group. If I code the following, I obviously do not end up with observations in each cell using the cross tab command because I condition on year<1999:
    Code:
    gen treat=1 if rhw<minwage & year<1999
    replace treat = 0 if rhw>minwage & rhw<=6.50 & year<1999
    Here is a panel data example:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(year rhw minwage)
     2727 1992  1.889728    1
     2727 1993 2.2677436    1
     2727 2000  2.826062  3.6
     4091 1992  7.117305    1
     4091 1993  8.435369    1
     4091 1994  6.237461    1
     5451 1992  3.212383    1
     5451 1993 3.3510616    1
     6807 1992  2.472252    1
     6807 1993 2.0533466    1
     6882 1998  8.406841    1
     8167 1992  9.420202    1
     8167 1993  7.369303    1
     8167 1995 14.222715    1
     8167 1996  18.16304    1
     9527 1992   4.03008    1
     9527 1993   3.80602    1
     9527 1998  5.400845    1
     9527 1999  6.150339  3.6
     9527 2000  6.204168  3.6
     9527 2002  6.217543  4.1
    12935 1998  2.953587    1
    15645 1994 11.111948    1
    15645 1995  6.148856    1
    15645 1996  6.054346    1
    15645 1997  4.781212    1
    15645 1999  5.868946  3.6
    15645 2000   6.34688  3.7
    15645 2001  10.16255  4.1
    15645 2002  7.870307  4.1
    15645 2003  7.243289  4.5
    15645 2004  6.877482  4.5
    15645 2005 10.736366 4.85
    15645 2006  9.333656 5.05
    16339 1994 3.4724836    1
    16339 1995  4.013677    1
    16339 1997  5.145015    1
    16339 1998  4.853835    1
    17015 2003  3.603311  4.2
    17015 2004  6.533608  4.5
    17015 2005  7.482922 4.85
    21087 1992    5.6321    1
    21087 1993  6.570708    1
    21087 1994  5.676175    1
    21087 1995  6.200527    1
    21087 1996  5.297553    1
    21087 1997  5.502968    1
    21087 1998  6.891378    1
    21087 2000  5.933015  3.6
    21767 1993  9.818517    1
    21767 1996   9.08152    1
    21767 1998  7.706389    1
    21767 2000 10.178714  3.7
    21767 2002 16.396473  4.1
    22445 2006  5.675872 5.05
    22445 2007  4.837647 5.52
    22445 2013  4.018023 6.19
    22445 2014  6.398956 6.31
    22445 2015  6.291113  6.5
    22445 2017  5.650069  7.5
    22445 2018  4.920743 7.83
    22445 2021  5.730185 8.91
    22445 2022 4.6051707  9.5
    22451 1992  7.148911    1
    22451 1994 8.6687975    1
    23807 1996  7.926869    1
    25847 1992 4.3915005    1
    27211 1993  4.004482    1
    27284 2001  5.001255  4.1
    27284 2002  5.246871  4.2
    27284 2003  5.857276  4.5
    27284 2004  6.686457 4.85
    27284 2005  6.014242 5.05
    28575 1996 2.5836325    1
    28575 1998  4.143921    1
    28575 1999  6.914976  3.6
    28575 2001  6.059072  3.7
    28575 2002  8.057695  4.2
    28575 2003  5.367658  4.2
    28575 2004 4.6423006 4.85
    29259 1996  5.442605    1
    29259 1997 4.0254436    1
    29259 2000  6.736046  3.7
    29925 1999 1.6302627  3.6
    29925 2016  5.486196  7.2
    29925 2017  4.943817  7.5
    29925 2018   5.94055 7.83
    29925 2019  5.140785 8.21
    29925 2020  4.877749 8.72
    29925 2021  6.419764 8.91
    29925 2022  6.143013  9.5
    30615 1999  3.283815  3.6
    30615 2005 4.4138393 4.85
    34007 1992  7.914739    1
    34007 1993 10.016324    1
    34122 1997  5.791275    1
    34122 1998  7.982667    1
    34691 1992  4.135785    1
    34691 1993  5.288999    1
    34691 1995  6.195757    1
    end
    Note that the 1s can be replaced with 0s.
    I would appreciate your help.

    Best,
    Nico


















  • #2
    It is easy enough to define the pre and post period variable:

    Code:
    gen byte post = (year >= 1999 & !missing(year))
    However, it is not at all clear how to define your treatment variable. It is possible, in principle, that in the years before 1999, the same person may have rhw < minwage in some years, but not in others. Is such a person in the treatment group or not? I don't see any instances of this problem in your example data, but there is nothing you have said that prevents it from happening. If your solution to this is that if they ever have rhw < minwage in any year before 1999 they are in the treated group, then it would be
    Code:
    by id (year), sort: egen treat = max(rhw < minwage & !post)
    Alternatively, if your solution is that to be in the treatment group they must have rhw < minwage in every year before 1999, it would be:
    Code:
    by id (year), sort: egen treat = min(cond(!post, rhw < minwage, 1))
    If your definition of the treatment group is different from both of these, then post back with a clear explanation.

    Added: The above definitions of treat will not work correctly if there are observations where rhw or minwage is missing. It is possible to code around this, but it gets complicated. So I suggest that before using this code you either verify that there are no such observations in your data, or, if there are, you drop them.
    Last edited by Clyde Schechter; 27 Apr 2025, 08:27.

    Comment


    • #3
      Hi Clyde,
      Thanks for your prompt and detailed reply.
      After having read your response, I think I want the treatment for those with rhw<minwage only in 1998.
      Thanks for your kind help.
      Best,
      Nico

      Comment


      • #4
        In that case, it's
        Code:
        by id (year), sort: egen treat = max(year == 1998 & rhw <minwage)
        Note: This code assumes that everybody has an observation for year 1998. If you have any id that has no year 1998 observation, that id's status for treat is undefined. So I would precede this code with a check to see if everybody has a 1998 observation, and eliminate any id that doesn't:
        Code:
        by id (year), sort: egen has1998 = max(year == 1998)
        keep if has1998

        Comment


        • #5
          Thanks Clyde, I will do that.
          Have a great day,
          Nico

          Comment

          Working...
          X