Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference with state and year fixed effects

    I have data on volunteer hours from 2010-2015 and I am looking at how the Medicaid expansion that took effect in 2014 has affected volunteer hours. I would like to know what codes to use in Stata to do a difference in difference regression on the states that have expanded in 2014 vs the ones that haven't with population as a control variable. My teacher told me to use state and year fixed effect regression which I don't know how to do and can't figure out at the moment. This is just a sample of what my data looks like, I have it for all of the sates. Any guidance on how to proceed would be helpful.
    State Year Volunteer Hrs Population Treat=1
    Alabama 2010 106.88593 4785579 0
    Alabama 2011 100.810044 4798649 0
    Alabama 2012 102.936671 4813946 0
    Alabama 2013 116.417864 4827660 0
    Alabama 2014 125.985802 4840037 0
    Alabama 2015 98.70755 4850858 0
    Florida 2010 434.844492 18846461 0
    Florida 2011 458.679654 19097369 0
    Florida 2012 475.948588 19341327 0
    Florida 2013 438.743281 19584927 0
    Florida 2014 495.717974 19897747 0
    Florida 2015 444.646822 20268567 0
    Arkansas 2010 53.0621463 2921737 0
    Arkansas 2011 59.2638898 2938640 0
    Arkansas 2012 45.6552647 2949208 0
    Arkansas 2013 55.5923406 2956780 0
    Arkansas 2014 59.0693401 2964800 1
    Arkansas 2015 48.5332943 2975626 1
    Colorado 2010 163.956951 5048029 0
    Colorado 2011 144.91652 5116411 0
    Colorado 2012 160.310765 5186330 0
    Colorado 2013 122.830502 5262556 0
    Colorado 2014 159.511402 5342311 1
    Colorado 2015 148.322003 5440445 1

  • #2
    If I have my facts straight, not all states that undertook Medicaid expansion did so in the same year. If that is correct, then you do not have a situation that lends itself to a classical difference-in-differences analysis. You can, instead, do generalized difference in differences.

    You already have a variable, treat, that takes on the value 1 in a state that expanded in those years when the expansion had occurred. So this is your interaction term. As you will be using both state and year fixed effects, you do not need the usual "main effects" of treatment-group and pre-post, because they would be colinear with the state and year fixed effects anyway.

    I don't know how you want to define your outcome variable. Perhaps you will calculate something like volunteer hours per 100,000 population. Anyway, you will have to decide on that before you can proceed.

    That's the conceptual part. In terms of technical details of implementation in Stata, you will need to create a numeric encoded variable for the state, as you cannot use a string variable for a fixed effect. The rest is straight forward. So, it will look more or less like this:

    Code:
    gen outcome = expression to calculate the outcome variable here
    
    encode state, gen(n_state)
    
    xtset n_state year
    
    xtreg outcome i.treat1 i.year, fe
    But you really need to do some reading in your textbooks, or perhaps get a tutor, to help you understand what all of this means. You won't learn much just from marking up my code. Fixed effects regression is a basic procedure in the analysis of economic and financial data, and has many applications outside those disciplines as well. You'll need to learn it.

    Also, in the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you for the timely response. I tried installing the dataex command but it appears the University I'm at won't allow me to download it.
      ssc install dataex
      checking dataex consistency and verifying not already installed...
      cannot write in directory \\cla-utility.ad.umn.edu\Profiles\All Users\ado\plus\d
      r(603);

      I calculated the outcome variable as,
      -gen outcome = (VolunteerHrs/Population)*100000- for each state to get the amount volunteer hours per 100,000

      I am confused about the Numeric Encoded Variable. Using the code -encode State, gen(n_State)- gave me another variable that was n_State which are the state names and then using -xtset n_State Year- it gave me an error that says -repeated time values within panel- and I am wondering how to get around that.

      Also just to be clear the code that my teacher gave me was
      -xtreg VolunteerHrs DID i.state i.year, fe-
      and you're saying that I wouldn't need the DID variable because it would be colinear with the state and year fixed effects

      Comment


      • #4
        I am confused about the Numeric Encoded Variable. Using the code -encode State, gen(n_State)- gave me another variable that was n_State which are the state names and then using -xtset n_State Year- it gave me an error that says -repeated time values within panel- and I am wondering how to get around that.
        No, the new variable n_State is not another variable which contains the state names. It contains numbers for the states, but they are labeled with the names so that when you -list- or -browse- they look like they are the state names. But they are actually numbers. And that is what -xtset- requires.


        The message is self explanatory: there is some country (or perhaps more than one) for which you have more than one observation on that country in the same year. This would probably be an error in your data and you need to fix the data error. To find the offending observations run:

        Code:
        duplicates tag n_State Year, gen(flag)
        browse if flag
        If the observations are duplicates in all respects on all variables, then you can just drop all but one from each set of duplicates. But if they differ in other respects then you will have to figure out how to reconcile the differences and settle ultimately on a single observation for each. (At least I believe that is the case--you should only have one observation on each country in each year, right?) So fix that.

        Also just to be clear the code that my teacher gave me was
        -xtreg VolunteerHrs DID i.state i.year, fe-
        and you're saying that I wouldn't need the DID variable because it would be colinear with the state and year fixed effects
        No, that's not what I'm saying. The variable you are calling DID here is identical to the variable you called treat1 earlier, and you will see that it is, indeed, included in the code I suggested. You don't need the i.state variable, however, because that is automatically taken care of with -fe-. As for the variables that I said you don't need back in #2, since you don't seem to fully understand the modeling here, just don't worry about them for now--they do not exist in your data set as you showed it, and you won't need to create them (whereas you would have needed to create them in a more classical difference-in-differences model.)
        Last edited by Clyde Schechter; 04 May 2018, 11:17.

        Comment


        • #5
          Thank you again, I am clearly not very good at using Stata but I got it to work. I am unsure about the interpretation of this regression. Does this mean that in the treatment group volunteer hours went up by .14 million hours per 100,000 people in the population, and then for each individual year it went down by the coefficient next to each year? If you have any other inferences about this regression it would be helpful too.
          Click image for larger version

Name:	fullsizeoutput_1e2.jpeg
Views:	2
Size:	98.6 KB
ID:	1442875

          Attached Files

          Comment


          • #6
            I'll assume, though you don't say as much, that the unit of measurement in the volunteer-hours variable is millions of hours. I'll then assume that you calculated the outcome according to what you wrote in #3.

            The interpretation would be that your difference-in-differences estimate of the effect of Medicaid expansion is an increase of 0.14 million volunteer hours per 100,000 population (95% CI, decrease of 0.07 to increase of 0.35).

            As for the Year coefficients, each of those represents the expected difference in your outcome between the year shown in the output table and the base year of your analysis, which is 2010. So, irrespective of Medicaid expansion status, your model predicts an average decrease of 0.20 million volunteer hours per 100,000 population in 2011 compared to 2010. In 2012, there was an across-the-board decrease of 0.17 million volunteer hours per 100,000 population compared to 2010, etc.


            Comment


            • #7
              Thank you for all of these responses they have been extremely helpful. I just have a couple final questions for you. Could you try to explain why the state.fe can be dropped from the regression? Would this simply be called 'time fixed effect regression' rather than 'state and time fixed effect regression' since the form of the regression would be Y= β0 + β1*[DID] + β2*[Time fe] + ε , where Y is the outcome variable and DID is the Treat1 variable

              Comment


              • #8
                On Stata's Help menu, select PDF documentation and then open the [XT] volume. Read the chapters on -xtset- and on -xtreg-. I can answer your specific questions, but you need to understand what's going on,

                What you will see is that the -xtreg- command assumes that you have previously -xtset- your data, and it automatically incorporates fixed effects for whatever variable was declared as the panel identifier in your -xtset- command. So, with -xtset state- followed by -xtreg whatever, fe-, you automatically have state fixed effects in the model. You will not get output for those effects, but they are meaningless and unnecessary in any case. The key thing is that they have been included and adjusted for.

                Time fixed effects, however, are not automatically included by -xtreg-, which is why your command must include i.time if you want fixed effects for time as well. So the code you want (and used) is:

                Code:
                xtset n_State
                xtreg outcome i.Treated1 i.Year, fe
                and the underlying model for that is Y= β0 + β1*[DID] + β2*[Time fe] + β3*[State fe] + ε.

                Comment

                Working...
                X