Matching followed by diff-in-diff for policy evaluation

Rebecca Lyneham

Join Date: Oct 2018

Posts: 9
#1

Matching followed by diff-in-diff for policy evaluation

14 Oct 2018, 07:27

Hello there, I am currently completing a project which looks at evaluating the impact of a policy which came into effect in 2006. I have panel data which goes from 2000-2016, with an average of 15,000 people per 'wave'.

The policy: changed the eligibility for welfare parental payments (pre-2006 child could be up to 16 years old, post-2006 child is not able to be over 8 years old)
My question: did this change to the eligibility affect workforce participation of those who were receiving the payments pre-change

I've done a trial of the diff-in-diff using the below code

Code:

*wave==6 means 2006, which is when the change came into effect gen time=0 replace time=1 if wave==6 *receivingpayments is equal to 1 if they are, and 0 if not. 'childundereight' is equal to 1 if true and equal to 0 if not (referring to youngest child) gen treated=0 replace treated=1 if receivingpayments==1&childundereight==0 *lfp refers to lfp. It is 1 if in labour force (unemployed or employed) and 0 if not. reg lfp time##treated, robust

I'm not too sure about the above yet, as I think I need to add in more variables into the regression first, so if you have any points on that it would be good. More importantly, I would like to know if i've set up the regression correctly?

Also, prior to the diff-in-diff estimation it's been suggested I use

Code:

psmatch2

to match up households to make the idff-in-diff more 'precise'. I wanted to get your opinion on this?

Thank you very much in advance for any comments!

Rebecca
Tags: difference-in-difference, econometric techniques, evaluation, matching, policy impact
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

14 Oct 2018, 15:36

Unless this policy came into effect in 2006 and then was repealed as of 2007, your specification of the time variable is incorrect. You need to have time = 0 in all years before 2006, and time = 1 in all years from 2006 on, not just in 2006.

I'm also skeptical about doing a linear regression of the dichotomous output lfp. It may be appropriate, but if the values of lfp are close to 0 or 1, this kind of linear model typically performs poorly. A logistic regression would overcome this limitation.

If you decide to do a matched pairs analysis, then you must replace the -regress- or -logit- command by a command that will work appropriately with matched pair data, such as -xtreg, fe- or -xtlogit, fe-.

As for whether you should do a matched pair analysis, it is true that using matched pairs can reduce extraneous variance in your analysis giving you better statistical power. But in practice, one sometimes finds that there are participants in your data who have no suitable match. Then you are faced with either excluding those unmatchables from the analysis, or matching them with somebody who is a very bad match. Either of those responses will undo, at least partially, the benefit you would derive from matching. So I think the answer is that it depends on how easy or hard it is to get a good match for your full data set.
Comment
Rebecca Lyneham

Join Date: Oct 2018

Posts: 9
#3

22 Oct 2018, 00:51

Many thanks Clyde for your responses.

Just a note that wave refers to year, and that the policy came into effect in 2006. Eventually I am trying to do the diff-and-diff analysis based on a subsequent policy change that occurred in 2012. Basically, those who qualified for welfare payments prior to the 2006 changes were 'grandfathered' off them, but in 2012 that gradnfatherin provision was removed, meaning that everyone receiving welfare payments was subject to the same income testing. I'm not too sure if this is just a simple change to the 'time' var below or not (to make it reference wave 12 instead of 6), but I suppose that can be my next question.

The sample: all lone parents receiving this specific welfare in 2006.

I have changed the code to be the below:

Code:

*create child between 8 and 15 in wave 6 gen childagevar=0 replace childagevar=1 if childage>=8&childage<=15&wave==6 *create variables for diff-in-diff estimation gen time=0 replace time=1 if wave<=6 pause gen treated=0 replace treated=1 if childagevar==1 pause *diff-in-diff regression probit lfp time##treated, robust /*

I get the below response

Code:

note: 0.time#1.treated identifies no observations in the sample note: 1.time#1.treated omitted because of collinearity

I think this might be because I require the children's age to be between 8 and 15 in wave 6? Do you know if there is a way around this?

Again, thank you very very much for your help with this. I really appreciate it!

Also I have decided that matching may not be plausible based on your response, as I only have around 300 people per wave after creating my sample
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

22 Oct 2018, 10:54

I am now really confused about what you are doing here, and quite worried that your data are not at all suitable for this research.

But first let's deal with your code, because that part is simpler.

Your time variable is workable. It is conventional to make the time variable 1 in the post-intervention period and 0 before; you have it the other way. But that is OK so long as you remember that's what you did. Actually, to make sure it's clear to anybody seeing your work that that is what you did, I would create a value label and apply it to time:

Code:

label define time 0 "Post-policy" 1 "Pre-policy" label values time time

That way, all of your outputs will show that variable as Pre-policy of Post-policy rather than 1 or 0, so it should be easier to keep things straight.

The warning messages Stata is giving you are self-explanatory. The first one says that there is no observation in your data that has both time = 0 and treated = 1. You defined your treated variable so that it is identical to the childage variable. So I'm inferring that you have no data on children between 8 and 15 years of age in wave 6 in any time period after wave 6. The subsequent omission of 1.time#1.treated is a consequence of that: because it follows that 0.time#1.treated is just the same thing as 1.treated. So your model has collapsed from lack of supporting data.

There is no way to work around this. It is a fatal flaw in your data. You cannot do a difference in differences analysis without all four combinations of treatment and time. That is, you must have distinguishable treatment and control groups, and in each group you must have data from both before and after the intervention took place. Without all four combinations, you cannot do a DID analysis. If you believe that you do actually have data for all four of these combinations, then something has gone wrong with your data management up to this point, as either some of the data has gone missing, or some of the variables have gotten messed up so that the identification of your treatment groups or time periods is incorrect. There are any number of ways that might have happened, so you will have to revisit the entire data management path up to this point to find out where it went wrong.

Until you get that ironed out, I don't think there is any point in discussing how to fold the second 2012 intervention into things. But, looking forward to the possibility, you need to think carefully whether for present purposes the 2012 policy change should be thought of as the implementation of the same policy to a new group of people, or whether it represents a different, second intervention. Your description of it leads me to think you would consider it the former, but it's not 100% clear.

Now, there is another problem I want to point out here. The combination of

Basically, those who qualified for welfare payments prior to the 2006 changes were 'grandfathered' off them,

and

The sample: all lone parents receiving this specific welfare in 2006.

has alarm bells ringing in my head. If I understand this correctly (and, again, ignoring the devevlopments in 2012), your pool of subjects is all welfare recipients in 2006, your "treatment" group is new welfare recipients as of 2006, and the "control" group is people already receiving welfare before 2006. If I have that right, this may also be a serious problem for your study. I don't know how familiar you are with epidemiologic terminology. But, in that jargon, you are trying to compare an incident cohort to a prevalent cohort. The term incident refers to new cases, and prevalent means all cases, regardless of age, that are available at the time. Such cohorts are usually not suitable for comparisons because they often differ considerably on many important attributes. (Maybe this is why you were originally inclined to do matching.) In particular, prevalent cohorts exhibit "length bias" compared to incident cohorts. In the particular domain of welfare recipients, this would mean that the prevalent cohort will be a mix of long-term chronic welfare clients and some others who go on and off transiently, with a heavier weighting towards the former. By contrast, in the incident cohort, the mix will be more heavily weighted towards those who go on and off transiently, with fewer long-term chronic welfare clients. Long-term clients and transient clients represent very different mixes of behavioral and sociological attributes, and may also differ greatly on things like age, number of children, health, etc. At the very least, you cannot start out by presuming that a prevalent and incident cohort are comparable. You must, at a minimum, compare them on all observable attributes to see if they are appreciably different. And if they are, at a minimum, you have to do something to adjust for that difference. Matching might be one approach, if it is feasible. Even then, you may fail to adequately adjust for differences based on unobserved attributes. This is a messy situation and, in epidemiology, we go out of our way to avoid comparisons between incident and prevalent cohorts: it usually ends in tears. So, if I have properly understood your study design, I think you should do some very serious rethinking of your approach.
Comment
Rebecca Lyneham

Join Date: Oct 2018

Posts: 9
#5

22 Oct 2018, 15:26

Firstly, thank you again very much for your thoughtful response. I really appreciate all your thoughts! I've added that label too, to make it clearer.

For the first problem, I'm wondering if the variable 'childage' that I've generated will only show a '1' for those that are in wave 6 and have a child aged between 8-15 and that it will not put a '1' against all those with the same ID across all waves? Is there a way to make the criteria that I've set (ie. age between 8 and 15 in wave 6) apply for all those with the ID for which is applies?

You raise a lot of important questions about the policy. Maybe if I explain it better (apologies) it will be clearer.

The policy impacted the Parenting Payment eligibility. It meant that instead of being able to receive payments up until your child was 16, you could only receive it up until they were 7. This was the eligibility for new applicants only, as those who were already receiving payments were 'grandfathered' off the policy. However in 2012, this 'grandfathering' was abolished (ideally this is what I am trying to measure the impact off).

Control: those who were not immediately impacted by the change (child seven or below in 2006)
Treatment: those who were immediately impacted by the change (child aged between eight and fifteen in 2006)

So pre-2006 these groups would have plausibly been similar (subject to same income tests for eligibility of welfare, for example).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#6

22 Oct 2018, 18:19

For the first problem, I'm wondering if the variable 'childage' that I've generated will only show a '1' for those that are in wave 6 and have a child aged between 8-15 and that it will not put a '1' against all those with the same ID across all waves? Is there a way to make the criteria that I've set (ie. age between 8 and 15 in wave 6) apply for all those with the ID for which is applies?

The code

Code:

gen childagevar=0 replace childagevar=1 if childage>=8&childage<=15&wave==6

will mark the 1 only in the wave 6 observation. That's precisely what the command says to do. You need, instead, to mark a 1 in all observations for a family that has a child in that age range in all of their observations, as you note. I assume your data set has a variable that identifies the families, and I will assume here that it's called familyid. This code will do what you need here:

Code:

by familyid, sort: egen treated = max(childage>=8&childage<=15&wave==6)

Now your variable treated will be 1 in every observation of such a family. By the way, you might want to abbreviate the -childage >= 8&childage<=15- part with -inrange(childage, 8, 15)-. It will work in exactly the same way but is probably easier to read and understand.

I'm not sure I fully grasp the way your data is organized, but I think that at least you will get your DID analysis to run with this new calculation of the treated variable. Anyway, why don't you verify that you've got the kinks out of that part, and then maybe we can see about extending the analysis to include the 2012 extension of the policy.
Comment

Rebecca Lyneham

Join Date: Oct 2018
Posts: 9

22 Oct 2018, 19:12

Hi Clyde, that worked great - thank you again!
Here is my code so far-

Code:

program drop _all
clear
set more off
capture log close
pause on

*Directories from which to read data
local readdatadir "\\EMP\Home$\RC3132\Desktop\WtWProjects\mydata"
local writedatadir "\\EMP\Home$\RC3132\Desktop\WtWProjects\mydata"
local logdatadir "\\EMP\Home$\RC3132\Desktop\WtWProjects"

log using `logdatadir'/Analyze_WtW.log, replace

use "`readdatadir'/allWtW.dta", clear

***********************************************
/* Create sample                             */
***********************************************

*getting rid of those in the population who were not receiving payment pre-2006. Here bncpari=amount of payments received
bysort xwaveid (wave): gen tokeep = sum(wave<6&bncpari>1)
by xwaveid : keep if tokeep[_N]
pause
tab hgsex wave
pause

*getting rid of those in the population who are not elligible for PPS in wave 6. hhtype=household type
gen loneparent_hh=0
replace loneparent_hh=1 if hhtype > 12 & hhtype < 22
label var loneparent_hh "1 if lone parent household"

bysort xwaveid (wave): gen tokeep1 = sum(wave==6&loneparent_hh==1)
by xwaveid : keep if tokeep1[_N]
pause
tab hgsex wave
pause

drop if tcyng<0

***********************************************
/* Create variables                           */
***********************************************

*create labour force participation variable (inc employed and unemployed)
gen lfp=0
replace lfp=1 if esbrd==1 | esbrd==2
label var lfp "Labour Force Participation"
pause
*create aboriginal or torres strait islander variable
gen AOTSI=0
replace AOTSI=1 if anatsi==2|anatsi==3|anatsi==4
label var AOTSI "1 if Aboriginal or Torres Strait Islander"
*create variable for married
gen Married=0
replace Married=1 if mrcurr==1
label var Married "1 if married"
*create variable for born overseas
gen bornoverseas=0
replace bornoverseas=1 if ancob>=1102|ancob<1000
label var bornoverseas "1 if born in Australia"
*create gross annual regular income variable
*note: tifefp is equal to financial year gross regular income (positive values) and tifefn is equal to financial year gross regular income (neg values)
gen garincome=tifefp-tifefn
label var garincome "Gross regular income"
pause
 

***********************************************
/* Descriptive statistics table              */
***********************************************
summarize AOTSI
summarize Married
summarize bornoverseas
summarize hgsex
summarize tcyng
summarize lfp

*insert here how to export to ShareLatex (when find)*


***********************************************
/* Diff-in-diff estimation                   */
***********************************************
*create labour force participation variable (inc employed and unemployed)
gen lfp=0
replace lfp=1 if esbrd==1 | esbrd==2
label var lfp "Labour Force Participation"
* create treated group
by xwaveid, sort: egen treated=max(tcyng>=8&tcyng<=15&wave==6)
*create variables for diff-in-diff estimation
gen time=0
replace time=1 if wave<=6
pause

*diff-in-diff regression for labour force outcome
probit lfp time##treated, robust
*diff-in-diff regression for income
reg garincome time##treated, robust

The results from the diff-in-diff were a coefficient of 0.0864552 on the time#treated with p value of 0.251 (I'm assuming from the small sample size as there is only around 350 responding people now).

And the identifier as you'll see above is 'xwaveid' which is just the person id.

the var 'childage' is equal to 'tcyng' in my data (I had just used childage for ease of reading on here). When I defined the treated like this:

Code:

by xwaveid, sort: egen treated=max((tcyng, 8, 15)&wave==6)

It didn't work, but I'm okay to have it how it was before if they mean the same! I will remember your tip for future though - much appreciated again.

Hopefully this makes things a bit more clearer so that we might be able to change the time we're looking at to 2012? I really really appreciate your time Clyde.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#8

23 Oct 2018, 12:18

Code:

by xwaveid, sort: egen treated=max((tcyng, 8, 15)&wave==6)

didn't work because you left out the -inrange- function:

Code:

by xwaveid, sort: egen treated=max(inrange(tcyng, 8, 15)&wave==6)

Glad things are working out now.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#9

23 Oct 2018, 12:32

Here's how I think I would approach adding 2012 into this, based on my possibly incorrect understanding of what happened. It seems to me that with the 2006 and 2012 policy changes we can define three groups. The full controls are those who were not affected by the new policy at either time. Then there is a group that was affected in 2012 but not 2006, and finally there is the group that was affected already in 2006. (I assume that those affected by the 2006 change remained affected after 2012--correct?). We would also have three distinct time periods, pre 2006, 2006 through 2011, and 2012 forward.

Would I be correct in assuming that the criterion for being affected in 2012 is that a) you are not already cut off under the 2006 policy, and you had a child over the age of 7 in 2012?

Code:

label define era 0 "Pre 2006" 1 "2006-2011" 2 "2012 onward" gen byte era:era = 0 if year < 2006 replace era = 1 if inrange(year, 2006, 2011) replace era = 2 if year >= 2012 & !missing(year) label define group 0 "Control" 1 "Affected starting 2012" 2 "Affected starting 2006" gen byte group: group = 0 by xwaveid, sort: egen susceptible2006 = max(inrange(tcyng, 8, 15) & wave == 6) by xwaveid: egen susceptible 2012 = max(inrange(tcyng, 8, 15) & wave == 12) replace group = 1 if !susceptible2006 & susceptible2012 replace group = 2 if susceptible2006

Now you can do your regressions. And you should follow them up with margins to make interpretation easier:

Code:

regress income i.era##i.group, robust margins era#group // EXPECTED INCOME EACH GROUP IN EACH ERA margins group, dydx(era) // CHANGE IN INCOME AT TIMES 2006 & 2012 IN EACH GROUP probit lfp i.era##i.group, robust margins era#group // EXPECTED INCOME EACH GROUP IN EACH ERA margins group, dydx(era) // CHANGE IN INCOME AT TIMES 2006 & 2012 IN EACH GROUP

Reminder that none of the above code is tested; it may contain typos or substantive errors.

Now, one important question. You have panel data, right? Each family is observed in multiple years. So why are you using -regress- and -probit- instead of -xtreg- and -xtprobit-? Did you try the -xt- commands and get results suggesting that the panel structure is ignorable?
Comment

Rebecca Lyneham

Join Date: Oct 2018
Posts: 9

#10

23 Oct 2018, 19:11

Hi Clyde,

I've attached the different results for the first regression with the xtreg vs. reg and xtprobit vs. probit. There was no reason for me not using 'xt' (as I do have panel data), I've changed that in the code now to include only 'xtprobit' when the dependent variable can be equal to 0 or 1, and 'xtreg' when the dependent variable is not binary. Thank you for bringing that to my attention! I will be adding the control variables to this model today as well, which I was thinking could be age, number of children, housing tenur and rural (compared to regional). Hopefully share that will you tomorrow!

In relation to the code you've written in that last post - thank you very very much.

The regressions of the code you wrote have regress and probit towards the end - when I changed the probit one to xtprobit is said that 'robust option is not allowed'. Would you recommend taking off the 'xt' or dropping the 'robust' instead?

I've run it, thought it might just be easier to share the output here. It looks like making era = 1 or era = 2 did no change for some reason?

Code:

. label define era 0 "Pre 2006" 1 "2006-2011" 2 "2012 onward"

. gen byte era:era = 0 if wave < 2006

. replace era = 1 if inrange(wave, 2006, 2011)
(0 real changes made)

. replace era = 2 if wave >= 2012 & !missing(wave)
(0 real changes made)

.
. label define group 0 "Control" 1 "Affected starting 2012" 2 "Affected starting 2006"

. gen byte group: group = 0

. by xwaveid, sort: egen susceptible2006 = max(inrange(tcyng, 8, 15) & wave == 6)

. by xwaveid: egen susceptible2012 = max(inrange(tcyng, 8, 15) & wave == 12)

. replace group = 1 if !susceptible2006 & susceptible2012
(1,176 real changes made)

. replace group = 2 if susceptible2006
(2,205 real changes made)

.
. reg garincome i.era##i.group, robust
note: 0.era omitted because of collinearity
note: 0.era#1.group omitted because of collinearity
note: 0.era#2.group omitted because of collinearity

Linear regression                               Number of obs     =      4,931
                                                F(2, 4928)        =      46.39
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0231
                                                Root MSE          =      21700

--------------------------------------------------------------------------------------------------
                                 |               Robust
                       garincome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
                             era |
                       Pre 2006  |          0  (omitted)
                                 |
                           group |
         Affected starting 2012  |    8789.75   919.2045     9.56   0.000       6987.7     10591.8
         Affected starting 2006  |     2147.9   673.5837     3.19   0.001     827.3759    3468.424
                                 |
                       era#group |
Pre 2006#Affected starting 2012  |          0  (omitted)
Pre 2006#Affected starting 2006  |          0  (omitted)
                                 |
                           _cons |   33957.66   523.3577    64.88   0.000     32931.65    34983.68
--------------------------------------------------------------------------------------------------

. *Expected income each group in each era
. margins era#group

Adjusted predictions                            Number of obs     =      4,931
Model VCE    : Robust

Expression   : Linear prediction, predict()

--------------------------------------------------------------------------------------------------
                                 |            Delta-method
                                 |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
                       era#group |
               Pre 2006#Control  |          .  (not estimable)
Pre 2006#Affected starting 2012  |          .  (not estimable)
Pre 2006#Affected starting 2006  |          .  (not estimable)
--------------------------------------------------------------------------------------------------

. *Change in income at times 2006 & 2012 in each group
. margins group, dydx(era)
(note: continuous option implied because a factor with only one level was specified in the dydx() option)

Conditional marginal effects                    Number of obs     =      4,931
Model VCE    : Robust

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 0.era

-----------------------------------------------------------------------------------------
                        |            Delta-method
                        |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
0.era                   |
                  group |
               Control  |          .  (not estimable)
Affected starting 2012  |          .  (not estimable)
Affected starting 2006  |          .  (not estimable)
-----------------------------------------------------------------------------------------

.
.
. probit lfp i.era##i.group, robust

note: 0.era omitted because of collinearity
note: 0.era#1.group omitted because of collinearity
note: 0.era#2.group omitted because of collinearity
Iteration 0:   log pseudolikelihood = -3245.8488
Iteration 1:   log pseudolikelihood = -3222.2859
Iteration 2:   log pseudolikelihood =  -3222.284
Iteration 3:   log pseudolikelihood =  -3222.284

Probit regression                               Number of obs     =      4,931
                                                Wald chi2(2)      =      47.14
                                                Prob > chi2       =     0.0000
Log pseudolikelihood =  -3222.284               Pseudo R2         =     0.0073

--------------------------------------------------------------------------------------------------
                                 |               Robust
                             lfp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
                             era |
                       Pre 2006  |          0  (omitted)
                                 |
                           group |
         Affected starting 2012  |   .2115157    .049271     4.29   0.000     .1149463    .3080851
         Affected starting 2006  |   .2865909   .0422984     6.78   0.000     .2036875    .3694943
                                 |
                       era#group |
Pre 2006#Affected starting 2012  |          0  (omitted)
Pre 2006#Affected starting 2006  |          0  (omitted)
                                 |
                           _cons |   .1591529   .0319844     4.98   0.000     .0964647    .2218412
--------------------------------------------------------------------------------------------------

. margins era#group // EXPECTED INCOME EACH GROUP IN EACH ERA

Adjusted predictions                            Number of obs     =      4,931
Model VCE    : Robust

Expression   : Pr(lfp), predict()

--------------------------------------------------------------------------------------------------
                                 |            Delta-method
                                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
                       era#group |
               Pre 2006#Control  |   .5632258   .0125993    44.70   0.000     .5385315    .5879201
Pre 2006#Affected starting 2012  |   .6445578    .013959    46.17   0.000     .6171986    .6719171
Pre 2006#Affected starting 2006  |   .6721088   .0099983    67.22   0.000     .6525126    .6917051
--------------------------------------------------------------------------------------------------

. margins group, dydx(era) // CHANGE IN INCOME AT TIMES 2006 & 2012 IN EACH GROUP
(note: continuous option implied because a factor with only one level was specified in the dydx() option)

Conditional marginal effects                    Number of obs     =      4,931
Model VCE    : Robust

Expression   : Pr(lfp), predict()
dy/dx w.r.t. : 0.era

-----------------------------------------------------------------------------------------
                        |            Delta-method
                        |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
0.era                   |
                  group |
               Control  |          0  (omitted)
Affected starting 2012  |          0  (omitted)
Affected starting 2006  |          0  (omitted)
-----------------------------------------------------------------------------------------

.
end of do-file

.

Thank you again Clyde, I'm learning a lot and really appreciate your time!

Attached Files

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#11

27 Oct 2018, 11:59

OK. As between omitting xt and omitting robust, I would be inclined to retain the xt. I would go back to ordinary -probit- only if the output from -xtprobit- indicated the absence of variance at the panel level.

That said, I think I made a mistake on the definition of era that you then copied. I did not expect to see the variable era omitted from the regressions, and all the interaction terms along with it. I see now where I went wrong. I had forgotten that wave is not the same as the year, so my definition of era confused those. I was thinking of year 2006 and year 2012, but the corresponding values of wave in your data are 6 and 12. So it should be

Code:

gen byte era:era = 0 if wave < 6 replace era = 1 if inrange(wave, 6, 11) replace era = 2 if wave >= 12 & !missing(wave)

By using wave < 2006 as the criterion for era, we ended up with era = 0 in all observations, because wave never even gets close to 2006. And it was all downhill from there.

So you'll need to redo the regressions. I'm sorry. But at least the outputs when you redo them should be interpretable!
Comment

Announcement