Creating treatment and control groups for regression model

Rene Natasha

Join Date: Apr 2019

Posts: 52
#1

Creating treatment and control groups for regression model

18 Feb 2020, 18:11

This is a continuation of my previous post Grouping values within a variable that Clyde Schechter has been helping me with.

The question was really clarification of creating my control and treatment group once I grouped values within a variable.

I groups states that expanded medicaid into a 1/0 variable -statemedicaid-

Now that is created, I want to make sure that my treatment and control groups are accurately created.

The treatment condition is the presence of an FQHC (the variable is 1/0)
The unit and analysis is by county

This is how I am thinking about my control and treatment groups

medicaid states:
1 control group that did not have an fqhc in the county
1 treatment group that had an FQHC

non-medicaid state:
1 control group that did not have an fqhc
1 treatment group that did have an FQHC

I previously mentioned in post #20 in Creating a year identifier you provided help on how to create the -ever_had_FQHC- and -wanted- variables. These variables were created with the thought that they would help with the pre-post period
Tags: data, panel data, regression, treatment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

18 Feb 2020, 19:26

I'm still not completely clear what you want. When you say "did not have an FQHC" or "had an FQHC" are you referring to the already existing variable FQHC, from which the variable ever_had_FQHC was originally derived? That is, does this mean "did (or did not as the case may be) have an FQHC at the time recorded in this particular observation" or doesthis mean "did have an FQHC at some point in time, though it may or may not have had one at the time recorded in this particular observation." If the former, it's the variable FQHC itself, and if the latter, it's the variable ever_had_FQHC. So I think either way you don't need any new variables to move forward.

You have four groups defined by the combinations of FQHC (or ever_had_FQHC, if that is what you mean by treatment) and statemedicaid. Both of these are 0/1 variables, so it is possible to combine them into a single variable:

Code:

label define combined_group 0 "Control - No Medicaid" /// 1 "Control - Medicaid" /// 2 "Treatment - No Medicaid" /// 3 "Treatment - Medicaid" gen byte combined_group:combined_group = 2*FQHC + statemedicaid

if you want. (If by treatment you actually mean the variable ever_had_FQHC, just replace FQHC by ever_had_FQHC in that -gen- command.)

But that may or may not actually be useful for you, depending on what you want to do going forward. You might find that it is better to continue to work with the two variables separately, and also bring their interaction FQHC#statemedicaid into whatever modeling you will be doing with these. These are two different ways of using the information--ultimately, anything you can do with one can also be done with the other and get the same results, but depending on what you plan to do, one of these will be easier to work with than the other. Not knowing where you are headed with this, I can't say which.

If I am not correct in my understanding of what you mean by treatment, then please post back and give a more detailed explanation and I'll try to work with it.
1 like
Comment
Rene Natasha

Join Date: Apr 2019

Posts: 52
#3

19 Feb 2020, 22:09

Thanks Clyde for being so patient and helpful in thinking through this. I revisited a previous post on creating the year identifier and we delved a bit into this so I may have already answered my question. I outlined a few thoughts about this plus previous feedback you have given me. I may ramble so please forgive me.

Goal: I want to use this dataset for studying policy effect (the presence of an FQHC) on community outcomes (change in ER visits and number of ambulatory facilities) using a generalized DiD.

In a previous topic (post #9) you recommended:

In brief, you will not have a prepost variable for this analysis. Nor will you have a treat vs no treat variable. Instead you need a treatment_in_effect variable that is 1 in exactly those observations where the entity in question is receiving the treatment, and 0 in all other observations. This variable is, in fact, equivalent to the treat#prepost interaction in a classical DID analysis. When you run the regression, you regress on this special variable and you also include indicator variables for the entities themselves (which I guess in your case are states or counties or something like that) and indicator variables for the years. The coefficient of this treatment_in_effect variable is the generalized DID estimator of the intervention's effect on the outcome.

This was recommended because for the DiD there is no one time period where an FQHC was created at the same time across all counties (unit of observation).

I am trying to create the variables for the model conditions below:

Outcome = county FE + year FE + β(treatment*year)+ β(treatment*statemedicaid) + control var + error

in other words,

Outcomes var = i.n_county FE + i.year FE + β(FQHC*year) + L.FQHC + FQHC#statemedicaid + control vars + error

I have thought about the variables I have created thus far to get to this point:

Code:

xtreg outcome_var L.fqhc i.year, i.n_county fqhc#year fqhc#statemedicaid control vars fe

where,

L.FQHC = the lag time associated with when an FQHC opened in the county by one year
i.year = year fixed effect
i.n_county = numeric variable of counties fixed effects
fqhc#statemedicaid = the interaction term for fqhc and whether a state expanded medicaid or not
fqhc#year = the interaction variable in county with FQHC during time period when it is open (which is the treatment -in-effect term)

As you mentioned in your post: The statemedicaid 1/0 variable and FQHC 1/0 variable already provides the treatment vs no treatment groupings I was talking about. And the treatment-in-effect term/variable would be FQHC*

However, since the time an FQHC opened in a county, it is unclear how that is taken into consideration in my model? I am having a hard time conceptualizing.

I have thought about methods I could use beside Diff-in-Diff, maybe a Lagged dependent variable (LDV) regression approach. A concern I have is that my control/treatment groups are not stable or I would have a hard time determining the control/treatment results.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

19 Feb 2020, 23:31

Let's put aside L.FQHC for a moment and return to it at the end.

You have the ingredients you need. The basic analysis would be

Code:

xtset n_county xtreg outcome_var i.year i treatment_in_effect, fe

Since you also want to distinguish the possibility of different effects in Medicaid and non Medicaid expanding states, you extend the model to:

Code:

xtreg outcome_var i.year i.treatment_in_effect##i.statemedicaid, fe margins statemedicaid, dydx(treatment_in_effect)

You can add other covariates ("control variables") as you see fit to this model.

The variable treatment_in_effect was originally defined to be 1 for a county that gets a FQHC in those years where the FQHC is open, and 0 otherwise. It is your generalized DID effect variable. The interaction with statemedicaid will enable your model to estimate separate treatment effects for the Medicaid and non-Medicaid expanding counties. The -margins- command will show you the treatment effect in those two groups of counties. Time is represented by i.year, and county is represented by -fe-.

Let's return to L.fqhc.

L.FQHC = the lag time associated with when an FQHC opened in the county by one year

I'm not sure what that is supposed to mean, but I have difficulty matching it up with what L.fqhc really is. L.fqhc is nothing more or less than the value of fqhc in the same county in the preceding year. The inclusion of lagged variables like this is sometimes helpful and sometimes not, depending on circumstances that depend on the scientific substance--it's not a statistical question, it's a health services question whether to include that. And while I have some experience in health services research, it's not related to the kind of question you are posing in this research and I can't advise you.
Comment
Rene Natasha

Join Date: Apr 2019

Posts: 52
#5

20 Feb 2020, 17:56

Thanks Clyde!

Did you mean:

Code:

xtset n_county xtreg outcome_var i.year i.treatment_in_effect, fe

NOT,

Code:

xtset n_county xtreg outcome_var i.year i treatment_in_effect, fe

where the treatment_in_effect is my FQHC#year term. Should I just create a variable call treat effect?:

Code:

replace fqhc = 0 if missing(fqhc) by n_county (year), sort: gen treateffect= (fqhc== 1 & year ==1)

so that I can just replace fqhc#year with -treateffect- var

Also based on this, I no longer need to include vars -ever_had_fqhc- nor -wanted- for the model unless I want to do another type of analysis.

Also, just to clarify the use of the lag was to determine the true effect of the FQHC in the county given the assumption that the year they open in the county does not necessarily mean that effect happens immediately and there could be a year before the effect/impact of the FQHC presence can be measured. That is what I was thinking.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#6

20 Feb 2020, 18:09

Yes to all your questions. Sorry about the typo in the code.
Comment
Rene Natasha

Join Date: Apr 2019

Posts: 52
#7

20 Feb 2020, 19:26

And since county is the unit of analysis: would I have to use c. prefix for the control variables and not the i. prefix but sustain the i. prefix in the variables mentioned above? I had to create proportion variables for characteristics such as race and education but since they are not categorical, it was recommended to use the c. prefix in the regression model.

Hope that is clear.

Last edited by Rene Natasha; 20 Feb 2020, 19:30.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#8

20 Feb 2020, 19:52

Well, let me not make a generalization about the control variables, as I am not sure I know what all of your control variables are.

But it is safe to say that any variable that is a continuous gets the c.prefix. The i.prefix is used for categorical variables. So a variable indicating whether a person is female would get the i.prefix, but a variable showing the proportion of all people in the county who are females would get the c.prefix.
Comment
Rene Natasha

Join Date: Apr 2019

Posts: 52
#9

20 Feb 2020, 20:58

In thinking through the treatment_in_effect variable -treateffect-, I think my code is incorrect. First the var -year- is not a 1/0 variable so I couldn't use the conditions set for year since it is not a 1/0 variable.

Code:

replace fqhc = 0 if missing(fqhc) by n_county (year), sort: gen treateffect= (fqhc== 1 & year ==1) // THE TEXT IN RED IS NOT CORRECT

Also if the treatment_in_effect variable is 1 for a county that gets a FQHC in those years where the FQHC is open, and 0 otherwise, should the treat_in_effect variable then be one of the variables previously created -ever_had_fqhc- where the code was:

Code:

replace fqhc = 0 if missing(fqhc) // FQHC VAR IS CATEGORIZED AS 0 FOR WHEN MISSING AND ZERO. by n_county (year), sort: egen ever_had_fqhc = max(fqhc > 0) // -ever_had_fqhc_: 1= for all obs of any n_county that has had 1 or more FQHC's at any point in time and 0 for all obs of any n_county that never has any FQHC's

So the ever_had_fqhc##year would serve as the i.treatment_in_effect variable in the model.

[Stata 12 on MAC OS]
Comment
Rene Natasha

Join Date: Apr 2019

Posts: 52
#10

21 Feb 2020, 12:53

Hi Clyde Schechter I hope my follow up made sense? Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#11

21 Feb 2020, 13:23

Yes. This looks better. It's been a while since the original thread and I didn't correctly remember the different variables in play. But what you say in #9 is, I believe, correct.
Comment

Announcement

Creating treatment and control groups for regression model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment