No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi-level model with mediator and index variable from survey-level data


    I am attempting to perform a regression on a combined micro (individual level data from survey --the European Social Survey in particular -- i.e. categorical data on human values scale with range from 1 to 6) and macroeconomics (country-level data i.e. GDP) dataset. The ESS has 9 waves so far and I am planning to use all of them, thus making it a panel data.

    I would like to ask for help on the following issues:
    1.) The dependent variable is a country-level data while my main independent variable is a country-level data which also depends on a variable which is on the individual level. Therefore the country-level independent variable serves as a "mediator"
    2.) the individual level variable mentioned above is supposed to be somewhat like an index wherein several individual-level variables are combined.
    3.) the hypothesis is therefore is somewhat like: individuals determine a macro variable which in turn affects another macro variable.

    I would like to know if this is possible to perform in Stata? If so, how could it be done? I have read some about the "multi-level" models but I am not really familiar with it or if what I have in mind is even similar. Please do correct me if I'm wrong. Will really appreciate your help. Apologies for not posting a code or specific variables yet as I have not arrived to that point yet. I am still exploring the possibility of such method.

    Thanks in advance.

  • #2
    In what sense does the country-level variable depend on the individual level variable? Is the country level variable something like the mean (or median, or some other simple function) of the responses from individuals within that country in that round of the survey? If so, it might make more sense to aggregate all the data up to the country-level and omit the individual-level data. If the dependence of the country level variable is more complicated than that, and not simply deterministic, then you would probably want to keep the individual level and use a multi-level model.

    It's also not clear whether you need a full-blown multi-level model for this or if the -xt- commands (which are a bit simpler to work with) will suffice. If the 9 waves of the ESS consist of the same people responding each time, then you actually have three levels: country, person, and wave, and a full multi-level model is needed. But if each wave is made up of different people (with perhaps a small number of people appearing more than once just by chance) then you really only have country and wave levels, and you can probably use the -xt- commands.

    All of this is fairly vague, and to make your final decision you will need to take into account specific facts about the particular variables in question, what they mean, and how they are related to each other. There is a good chance, in fact, that it will be primarily substantive rather than statistical considerations that will drive the answer. I'm sorry I can't be more specific, but as the question is fairly general, I don't think there is a less general answer that can be given.


    • #3
      Thank you for your response, Clyde. I have come up with (I hope) a more specific description about my concern.

      In what sense does the country-level variable depend on the individual level variable? Is the country level variable something like the mean (or median, or some other simple function) of the responses from individuals within that country in that round of the survey?
      I was thinking specifically about voting behavior: It's possible that the individual's human values as measured in scale in the ESS can determine what type of government the citizens have, or the ideology of their government, given that there is some kind of voting involved. The country-level I have in mind is a variable "populism" which is a scale from 1 to 100. Thus, the individual affecting the country level. Also, there are several measures of human values, I was wondering, is it more appropriate to create a single index for all of them?

      The main dependent variable is somewhat like the government expenditures on a particular program as a percentage of the entire budget.
      Finally, the path should be: human values --> populism scale --> percentage of expenditure.

      The ESS, I believe, has different respondents for each wave.

      To sum up, here are my concerns:
      1. Is it more appropriate to create a single index of the said human values variables (i.e. 6 vars)?
      2. If it is a multi-level model, can the individual level variable affect the macro variable through a mediator which is another macro variable?
      3. If the above are indeed possible, what commands would you advice?



      • #4
        1. It depends. If the 6 values are different aspects of some latent common construct, then yes, making an index out of them to better hone in on that common construct would make sense. One might do that most simply by just averaging the scores on them (assuming they are measured with a common scale), or, more sophisticated might be to use factor analysis or principal component analysis. If they are 6 values that have largely separate content and are not different reflections of some latent theme, then creating an index would just throw away information.

        2. Yes.

        3. It is premature to be suggesting specific commands at this level of development. Most likely your method will involve using -xtreg- or -mixed-, but you must first work out the details of how you will specify the different variables. And you must also take a side on the contentious issue of the best way to analyze mediation, which could also mean bringing in other analytic tools such as -sem-.


        • #5
          Thanks a lot for your help Clyde. I've decided to do the a panel regression using -xtreg- for now.
          If I do the PCA for the 6 vars, would it make sense if I collapse (get the mean or median) the resulting index from individual level dataset (in different countries and waves) into a country level measure?
          Last edited by Crisanta Garcia; 20 Jan 2020, 22:01.


          • #6
            It isn't really possible to answer this question from a general statistical perspective. Again, this is more of a substantive issue that depends on what these variables are and how they are expected to relate to each other in the real world. It may well be that the individual values of the mediator affect the outcome through their collective mean. In that case collapsing to a mean value and eliminating the individual level from the model would make sense and produce a stronger model, eliminating a lot of individual level noise. But some variables work that way and others don't. It is also possible that the effect on the country-level outcome tracks some other statistic of the mediator such as the median (if different from the mean) or maybe the 87th percentile or something wacky like that. Or, perhaps even more likely, the effect may not be describable through simple statistics. In either of these last two situations, reducing to a country-level mean and eliminating the individual level would be detrimental. So you have to think about what the underlying mechanisms for these relationships are and try to figure out how that would be reflected statistically. If you can't figure it out because not enough is known, then you may have to try it both ways and see which model produces a better fit to the data.