Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mixed or xtreg for model with both fixed effects and random effects

    I have a Difference-in-Differences (DID) model that includes both year fixed effects (fe) and country random effects (re). My question is: Should I be using the "mixed" command or the "xtreg" command to include both fixed effects and random effects in the same DID equation? I have attempted both and not been successful. Below is a summary of my study design and attempts.
    Study Design: 10 years of pooled cross-sectional survey data from 16 countries. Binary outcome variable. DID study design. Treatment group consists of 4 countries participating in an intervention. The remaining 12 countries never have the intervention. The interaction term pools together the 4 countries (labeled A, B, C, D here for simplicity) participating in the intervention, i.e., treat=1 if (country==A | country==B | country==C | country==D). Post=1 if year>2010. The intervention is implemented at the national level. All subject-level data are collapsed to the country-region level. Following survey instructions, I included denormalized country-year-subject probability weights, which allowed me to analyze multiple country-years together. I am using Stata v13.1 on a PC.

    Test model with year fe and country fe:
    reg y treat*post treat post controlsvar countryFE* yearFE* [pweight=weight], robust cluster(countryregion)
    • First, I use the reg command to confirm that if I include both country fixed effects and year fixed effects that the regression runs without errors.
    • Next, I describe 3 attempts to use year fixed effects and country random effects. The syntax of the "reg" command does not allow for both fe and re in the same equation, so I try to use mixed and xtreg.
    Attempt 1: mixed with year fe and country re:
    mixed y treat*post treat post controlsvar yearFE* [pweight=weight], robust cluster(countryregion) || country: R.country
    • error: Highest level groups are not nested within countryregion. I understand this to mean, "highest level group (country) is not nested within the same cluster (countryregion) every year." This occurs because some surveys did not collect data on every country-region every year.
    Attempt 2: xtreg with year fe and country re:
    xtreg y treat*post treat post controlsvar yearFE* [pweight=weight], re i(country) vce(cluster countryregion)
    • error: pweight not allowed with between-effects and random-effects models
    • error (run the same code without pweight): panels are not nested within clusters. I think this is a similar error to what I received when using mixed. Not the solution because I need to include weights.
    Attempt 3: xtreg with year fe and country re:
    xtreg y treat*post treat post controlsvar yearFE*, re i(country) vce(robust)
    • limitation: As a test, if I remove the pweight and clustervar, then the code will execute. Not the solution because I need to include weights and can not longer cluster my se.


  • #2
    So, several suggestions, but in the end a question.

    1. Don't create your own interaction term. Use factor variable notation (regardless of which model you end up running). This will enable you to calculate useful results postestimation with the -margins- command. Interaction models are difficult for people to understand, and the -margins- command enables you to easily and without error calculate readily understandable model predictions such as expected predicted outcomes in each group in each era, or each year even, and marginal effects. So instead of treat*post treat post, use i.treat##i.post. For the year fixed effects write i.year (assuming you have a variable called year). For a country fixed effects specification as in the -reg- command, write i.country. Factor-variable notation is enormously helpful in making model output easier to understand and facilitating later calculations as well. See -help fvvarlist- and the corresponding manual section.

    2. If you are going to use -mixed-, I see no advantage to using R.country to specify the random effect at the country level instead of just country. And the estimation will go faster without the R.

    3. In your description of your data and design, you don't tell us anything about this countryregion variable that you are trying to cluster on. When you use vce(cluster whatever) with a multilevel model it is absolutely required that the variable that defines the highest level of the model (country, in your case) is strictly nested within that clustering variable. That is, every instance of country must occur in association with only one value of the clustering variable. So country A cannot appear once with region 1 and another time with region 6, etc. In other words, for this to work, these have to be regions that include countries. If they are regions within countries, then you cannot do what you are trying to do. My thought would be, rather, if these are regions within countries, then you should probably include region as an additional level of random effect in the model, nested within country. If your region variable really is intended to describe regions that include countries, then any given country should never appear in different regions in different observations--and if it does, you probably have some kind of error in your data that needs to be fixed.

    My bottom line(s). If your regions are within countries, I would probably model this as:

    Code:
    mixed i.treat##i.post i.year other_covariates [pweight=weight] || country: || countryregion:, vce(cluster country)
    If your regions are regions that include multiple countries, then clean up your data so each country is always in its appropriate region and try:

    Code:
    mixed i.treat##i.post i.year other_covariates [pweight=weight] || country:, vce(cluster countryregion)
    Note: With only 16 countries, the appropriateness of using vce(cluster country) is questionable. If your regions actually include countries, then there are even fewer of those and the use of vce(cluster countryregion) is truly dubious. I left them in the code so you can see how it would be done, but I recommend against it. When the number of clusters is small, the use of vce(cluster) can actually produce worse results, and it certainly doesn't help. Experts disagree about how many clusters are enough to make vce(cluster) useful, but most would consider 16 marginal, though some would find it acceptable.

    Comment


    • #3
      Hi Clyde,

      Thank you so much for your advice! Regarding your three main comments:

      (1) factor-variable notation - very informative. thank you for the suggestion.
      (2) mixed command without R.country - understood.
      (3) countryregion variable consists of regions within countries. For example, country A has regions: 101, 102, 103, etc. Country B has regions: 201, 202, 203, etc. Therefore, it is the first code option of the two that you provided that is appropriate for my model. I will follow-up to this chain after testing the code with and without vce(cluster country).

      New question: How did you get the lines of code in your response to display so clearly? I executed ssc install dataex in Stata, but I am not sure how to get the code to display in my response to look like your nicely formatted code.

      Thanks again!
      Jacqueline Fiore

      Comment

      Working...
      X