Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crossed and Nested clusters in one HLM model

    I am trying to write stata HLM regression code that has three levels including Country, Year, and Industry. What I particularly want to do is to cross-classify country and industry, and at the same time nest year within both industry and country. Is the code below an appropriate code? If not, what would you suggest? I would appreciate your contribution.

    xtmixed Y X ||_all:R.Country || Industry: || Year:

  • #2
    Well, it is technically correct. There are some ways I would take issue with it, however:

    1. If you are using Stata version 13 or later, -xtmixed- has been renamed -mixed-. This is a minor issue, since Stata will, at present, simply translate -xtmixed- into -mixed- when it encounters your code. But because -xtmixed- is now undocumented, there is no promise that Stata will continue to recognize it (unless you put a version control command in), so this code might break if re-used in the future. I recommend changing it to -mixed- (unless your Stata is earlier than version 13.)

    2. More important, it is most unusual to model year as a random effect. While it is not technically wrong to do so, it seems to go against the spirit of the model. Normally we think of year effects as being idiosyncratic shocks to Y, and, in particular, we generally do not think of them as being draws from a normal distribution, nor, for that matter, as random draws from any particular distribution. They are normally thought of simply as fixed constants. So it would be more typical to model year by including i.Year in the bottom level of the model, rather than creating a Year level.

    3. Also, an important question is whether you have more than one observation per country#industry combination in a given year. If Country Industry and Year uniquely identify observations, then you simply cannot have a separate model level for Year. This is because with only one observation per country#industry at the Year level there is no way to separately identify the year level effect and the residual. The estimation would probably fail to converge, and even if it did converge, its results would not be correct.

    Comment


    • #3
      Thanks a lot Clyde. For sure, I can have more than one observations in a given year for a given industry in a given country. SO I cannot use year as a random effect? The results of LR test and AIC BIC tell me that the model (mixed Y X ||_all:R.Industry || Country: || Year is the best fit for my data. Given the issues you discussed, is there a potentially better model you would suggest? I would appreciate your comment.

      Comment


      • #4
        For sure, I can have more than one observations in a given year for a given industry in a given country. SO I cannot use year as a random effect?
        You misunderstood me; my meaning was the exact opposite. Unless you can and do have multiple observations per year in a given industry and country, you would not be able to use year as a random effect. If you do have multiple such observations, then a random effect at the year level is possible. I still think it's probably not a good idea, but it is possible.

        I still find representing year as a random effect here problematic, or at least odd. I think you would be better off with -mixed Y X i.Year || _all:R.Industry || Country:-, which is what I suggested in numbered paragraph 2. in #2 of this thread.

        Comment


        • #5
          Thank you so much Clyde. You were so insightful. Actually, I had used i.year in my models first, I used it as a random effect based on the recommendation of one of my teachers whose reasoning was that since the number of years is not a few (about 17 to 20 years), I had better use year as random effect. You have anything to add?

          Comment


          • #6
            Well, there are two generic arguments against using a year variable as a random effect. One of them may still apply to your situation and the other does not.

            The generic argument that does not apply is that, in some applications, the number of different years worth of data is very small. Since the effective sample size for estimation of the variance at the year level is the number of years, if only a handful of years are available, you are not adequately sampling year-space. But in your situation with 17-20 years this argument is much weakened. While I wouldn't consider N = 20 a thorough sampling of year-space, it's not outlandishly small. So this argument goes away (or is muted).

            The other generic argument is model specification. When you use year as a random effect, you are asserting a few things about the data generating process:

            1. There are year-specific shocks to the outcome that apply to all industries and countries equally.
            2. These shocks are independently and identically sampled from a normal distribution with mean zero (and variance to be estimated from the data).

            There are some other assumptions implicit in using a random effect as well, but these are the ones that are an issue. The first is a content issue, and it could be raised equally as an objection to using i.year as a fixed effect in the model. I defer to your judgment on the content issue. But the second is a very strong assumption and it may well be wrong. With a time-span of around 20 years, one might be concerned that there are linear trends in the effect of time. Random effects specification not only fails to capture that, but explicitly contradicts it. Do you really conceptualize these year-specific effects as draws from a normal distribution? What about "black swans?" The advantage of using an i.year fixed-effects specification instead is that all of these issues go away: whatever the effects are, regardless of their distribution and regardless of the presence or absence of any trends or other cross-year dependencies, they will be properly represented and adjusted for in the model. The only thing I can think of that might argue strongly for ("had better") modeling time as a random effect is if the number of observations in your data set is small enough that including 16-19 indicator variables for years would soak up too many degrees of freedom for your analysis. But then I would also say that you are skating on very thin ice in the first place and seriously need to think about either getting more data. Fitting a deliberately mis-specified model to a skimpy data set is not a promising approach to research.

            So my position on this would rephrase what that teacher advised. I would not say that you "had better" use a random effect. I would say, instead, that you might consider using a random effect if you think the assumptions that requires are reasonable in your context. I cannot advise you on whether those assumptions are reasonable because you have only minimally explained your context and, in any case, it is out of my domain of expertise.

            That's my summary of the issues. The ultimate decision is based on content issues that I am not able to advise you on, and is up to you and your colleagues.

            Comment


            • #7
              Thank you sooo much. These are really insightful.
              My Last issue I would appreciate if you can answer. For my model (let's say: mixed Y X || country: ), my X and Y are in the firm level. Still, can I allow the slope of X to change across countries? That is, is this model: mixed Y X || country: X correct? The reason why I ask this here is that my colleague told me that the random slope and random intercept must be in the same level, and since X is firm level, it cannot be added to the random effect (country) cos they are in different levels. What do you think?
              Last edited by Ali Alipour; 05 Mar 2018, 09:27.

              Comment


              • #8
                my X and Y are in the firm level. Still, can I allow the slope of X to change across countries? That is, is this model: mixed Y X || country: X correct?
                Yes, this is perfectly fine.

                my colleague told me that the random slope and random intercept must be in the same level, and since X is firm level, it cannot be added to the random effect (country) cos they are in different levels.
                Well, I think there is some breakdown in communication between you and your colleague. The syntax you show has both the random intercept and random slope at the country level, so the issue he or she is raising doesn't even apply. That said, a multi-level model can and should have intercepts at multiple levels, and random intercepts are not associated with any particular variable on than the one that defines their level in the model. Random slopes are associated with a model level and with the particular variable whose slope is being treated as a random variable.

                Finally, the level at which a variable is defined (e.g. X is a defined at the firm level) has no nothing to do with the level(s) at which it can have a random slope in a model. Any variable, regardless of the level at which it is defined in the data, can have a random slope at any level in the model. The choice of which level(s) to specify a random slope is a content-based question: it boils down to what kind of data generating process you think is at work.

                Comment


                • #9
                  Thanks Clyde. So illuminating. I appreciate your concern really

                  Comment

                  Working...
                  X