Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Restricting the Sample (Econometrics Question)

    Hi everyone,

    I am researching how ethnic and racial disparities in health insurance coverage and health status changed as a result of the Affordable Care Act. My outcomes are two binary variables: insurance status (uninsured/insured) and health (positive health/not positive health). I am using a difference-in-difference framework within these linear probability models where I'm using years 2009 and 2014 and interactions between 2014 and the two minority groups. I have a combined sample of about 200,000 observations through the CDC's National Health Interview Survey.

    havecoverage = postACAHispanic + postACAblack + Hispanic + black + postACA + female + agegroup* + USregion* goodhealth = postACAHispanic + postACAblack + Hispanic + black + postACA + female + agegroup* + USregion*

    where the * represents k-1 levels of that variable (i.e. all U.S. regions except for one and all age groups except for one).

    My goal is to compare Hispanics to non-Hispanic whites and non-Hispanic blacks to non-Hispanic whites so my question is whether or not I am supposed to include both dummies (i.e. that for black and that for Hispanics) in the same equation and also whether or not to restrict the sample by just these three categories (i.e. delete those that are of any other ethnicity or race).

    I will also be including covariates like marital status and employment status in both equations so would I restrict the sample accordingly then (in these cases, have only 18+)?

    Thanks in advance for your responses.

  • #2
    I have several suggestions.

    1. While it probably won't take much effort on your part to use the good health dichotomous outcome, it is probably a waste of time. It is already known that having access to health insurance produces only subtle improvements in health over a period of 5 years. Your situation is even more borderline because you are actually just looking at single points in time, and have no longitudinal data. And classifying health as good or not good is an extraordinarily noisy way of doing it even with an optimal data design. It is hard to imagine that you can show anything with this variable, and if you do get results from it, they would really not be credible at all, at least not in the health policy sphere.

    2. It is unclear whether your goal is to separately assess the changes in insurance coverage (and health) following adoption of the ACA within each ethnic group, or whether you also want to compare those changes to each other. If it is the former only, a separate analysis of each group's data, without variables representing ethnicity, would be a reasonable (though not necessarily the best) approach. If, however, you want to do comparisons of the effects in different ethnic groups, then you should do a single analysis that includes an ethnicity variable, a pre-ACA vs post-ACA variable and the interaction(s) between them. That is something similar to what you show in #1.

    3. Use factor variable notation. Do not calculate your own interactiion terms as products. Rely on Stata to do it for you, and you will be rewarded with the ability to use the -margins- command afterward. This, by the way, also entails not creating separate indicator variables for each ethnicity, but rather having a single variable ethnicity, with multiple levels. Again, Stata will create "virtual" indicator variables for you. So the general scheme would be:

    Code:
    regression_command outcome i.ethnicity##i.pre_post covariates
    margins ethnicity#pre_post // MEAN ADJUSTED OUTCOME IN EACH GROUP BEFORE & AFTER ACA
    margins ethnicity, dydx(pre_post) // MEAN CHANGE AFTER - BEFORE IN EACH GROUP

    Comment

    Working...
    X