Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mixed multilevel regression

    Dear statalist,

    I want to see the region specific effects of my model. The fixed effect model for trying to explain total out-of-pocket health care expenditures would look something like this:

    xtreg OOP_total hhsize north_nw central volga north_caucasian ural west_sibera east_siberia perc_65thhm perc_5ythhm year1998 year2009 year2015 secondquintile thirdquintile fourthquintile fifthquintile, fe

    or with all covariates

    xtreg OOP_total age Gender employed married i.educ_level smokes alcohol hhsize north_nw central volga north_caucasian ural west_sibera east_siberia perc_65thhm perc_5ythhm year1998 year2009 year2015 secondquintile thirdquintile fourthquintile fifthquintile, fe

    I know that some of these variables should not be in a fixed effects model. I saw this post by Mr Schechter:


    Originally posted by Clyde Schechter View Post
    In most situations, yes.

    As for specifying regional fixed effects, if your GLM is one of those covered by an -xt- command, one approach is to -xtset region- and then use the corresponding -xtreg, fe- or whatever other -xt- regression you want to apply. However, without knowing more about the structure of your data, I can't assure you that this approach is correct. If there is no nesting of observations at any layer lower than region, then this is fine. But if there is also, say, household, or village nesting, then -xtset region- falsely tells Stata that observations are independent within region and your standard errors, and everything based on them, will be incorrect. If your data are, say, nested within households which are, in turn, nested within regions, then you need to -xtset household-. You will then not need to incorporate region fixed effects because region is constant within household and any time-invariant region effects will be taken care of by the household fixed effect. (You will also be unable to incorporate region fixed effects in this context.) If, however, estimating region-specific effects is a goal of the research, then this approach would evidently be unsatisfactory. In that case, I think you need to abandon fixed-effects and go to multi-level modeling, or perhaps look at -xtset region- with -xtreg, be-.
    A bit of background on my data set. I have household level data. This means the household head is interviewed, most socio-demographics are obtained from an individual questionnaire. I merged these data sets. So I have household level data (which consists of just 1 household member being interviewed socio-demographics) and want to know if I can use a multilevel model. I am not sure whether my data is nested on household level since it's only 1 household member. Also, i'm new to mixed level regressions and don't quiet understand how to set it up (household id is id_h and for region it's region).

    Many thanks!

  • #2
    I don't see how your question can be answered based on the information you have provided.

    Each observation is provided by only one household member. But does the same household appear more than once in the data? (Perhaps the data sets you merged correspond to different time periods, and the same households recur in each?) If so you have observations (time periods) nested in households. But if each household appears only once in the entire data set, then there is no nesting of anything in households. Your households might be nested in something else, but you don't provide any information to suggest whether it is or not. In short, I think you need to provide a better description of how your data set is put together.

    Comment


    • #3
      Thanks for your response Mr Schechter, sorry for the lack of description.

      The data is a household-based survey to measure effects of reforms in Russia on economic well-being. It is a longitudinal study of populations of dwelling units. It has a repeated cross-section design between 1994-2015. The design supports cross-sectional and aggregate longitudinal analysis. Although, it is not a true panel design. Some individual household members who moved away were not followed. Each household head is interviewed with questions about the household. This could mean that a household is interviewed / appears more than once in the data set, but they will have the same household id. The household may appear in 1995 and then again in 2001. The attrition rate is somewhat high although "designed for nonresponse".

      RLMS has a two-stage design, first selecting geographic regions and then selects households. Originally, they tried to interview all individuals from the household (not always with success). These individual interviews are in a different data set. I did not merge all individuals into my household data set. I only merged the household head into my data set.

      (Just to make it more confusing: There are two kinds of household data sets. There is one slightly more raw data set with household data files separate for each wave. Another data set is the longitudinal where all household and years are combined. I could have for example, household id 1000 occuring in wave 1 and in wave 2. I am using the latter, longitudinal data set. )

      I hope this helps

      Comment


      • #4
        Thanks, this is much clearer. In one sense, you have a three-level model: repeated observations (waves) within households within regions. But, more crucial here is that there was a two-stage sampling design. So what is most critical for your analysis is to capture that using -svyset- and using the -svy:- prefix with your analytic commands. While you can use both multi-level modeling and -svy:- based design correction in your analysis, unless you have reason to explicitly estimate region level or household level variance components, I think that will just complicate things.

        Now, you will have to go back to the survey documentation to find the appropriate way to -svyset- your data. I'm also a little concerned about the fact that you included in your data only the household heads. With survey designs, a subpopulation cannot be correctly analyzed just by restricting the analysis to that subpopulation: rather you need to include the entire population and then use the -subpop()- option in your -svy:- prefix. So you may have to go back and re-generate your data set so that it includes the entire surveyed population, unless the survey documentation provides a way to -svyset- correctly for just the household head subpopulation.

        Survey design based analysis is a rapidly moving field with which, frankly, I have not kept up because it plays little role in my regular work. So if your situation is at all complicated, it is unlikely that I will be able to answer questions that arise in connection with it. I hope that others who work with this kind of data regularly are following this thread and will be able to step in if you need additional help in this area.

        Comment


        • #5
          Thanks again.

          Well previous research as demonstrated significant differences of out-of-pocket healthcare expenditures between regions. Part of what I want to see is what the explanatory variables are for household out-of-pocket health care expenditures, and so I think I will have to include regions.

          I only included household head because there is quiet some data missing in the individual data set, also i'm more interested in household level not individual level. But the household data set does not have all socio-demographic variables. This is why I used the individual data set, to get socio-demographics of the household head.

          So just for my understanding why can't I just use a fixed effects model with 7/8 regions. The Hausman test points towards fixed effects and when running the fixed effects model the F-statistic is significant. Also, what are the arguments against using xtreg.

          Comment


          • #6
            The argument is that you do not have simple random sampling here. You said you have a complex 2-stage survey design. If you do not account for it in your analysis, at the very least your standard errors will be wrong, perhaps by a very large factor. If there are also different sampling probabilities of households within primary sampling units, then your coefficients will also be biased.

            Comment

            Working...
            X