Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • meqrlogit: logistic mulitilevel with TSCS data : Does it look correct? + questions

    Dear Statalists,

    I am new here in the forum.

    I have worked on a model for some time, and I am quite concerned whether I have done it right and in addition I have some questions to interpretation - I hope someone has the time to help. It would mean a lot.

    My dv is support for democracy: 0 - not supportive and 1 - supportive.
    As a first step I am interested in whether gdp annual growth has an effect on support for democracy.

    My data are survey data from 18 countries with each 1200 obs. from 20 years, so approx. 400.000 obs. in total.
    My gdp data are aprox. 400 obs (from each year in each country) merged with the other 20 datasets.

    I have made the null-model/baseline-model:

    meqrlogit SforD time|| country1:time, cov(indep)

    and a random intercept model:

    meqrlogit SforD gdp_lag2 time|| country1:time, cov(indep)

    and a random slope model:

    meqrlogit SforD gdp_lag2 time|| country1:time gdp_lag2,


    I only have 18 countries, but with the time dimension, I have 400 gdp growth observations, I am just not sure if time is placed right in the model? and if the model even is the right one, when I have the time dimension?

    My gdp growth is not significant when I include the random slope, but I have read that with only 18 level 2 variables, I might not have sufficient statistical power to run the model. (I just assumed that I could have countries*years or something?). Would it be ok to just run a model of random intercept? and leave out the random slope model? (when I don't know if it is actually a statistical issue at stake).

    My general question is, when dealing with 1) binary outcome 2) a level-2 independent variable 3) TSCS data - have I reached the right model, or is there any other obvious solution? + How does the syntax look to you?

    In addition, I am a bit concerned about how to interpret the odds ratios, as I have read that it might be biased due to the random effect part. In that case, can I only report the significance level and direction or is there a way I can get unbiased odds ratios or otherwise report my model?

    Best regards,
    Tammie Schwartz

  • #2
    Some of your questions are statistical or Stata related and I will give you my thoughts about those. But some of your questions are really questions about political science, a field in which I have no expertise. So I will just point out which ones you should consult with somebody expert in your field about (or wait for one of the political scientists who follow Statalist to see the thread and hope they pick it up.)

    I have made the null-model/baseline-model:

    meqrlogit SforD time|| country1:time, cov(indep)
    This is a syntactically correct model. I wouldn't call it a null-model: it includes a time trend, and a random slope on time trend. A null model would have no predictor variables at any level. But it may be, in some sense, a minimalist model in that includes nothing but time. The -cov(indep)- option is not necessary as independent is the default for -cov()-, though there is also no harm including it, and perhaps it is good to have it explicit so that when you look at this code in the future you won't have to try to remember what the default covariance structure is. Under this model you are assuming that (the log odds of) support for democracy varies linearly with time, and the slope of the line varies from one country to the next. Whether linearity is a reasonable model for this is a question for a political scientist. From a purely lay perspective, I have the impression that support for democracy fluctuates up and down over relatively short time scales, so a linear specification clashes with my intuitions--but my intuitions may be wrong, or maybe those fluctuations are, for your purposes, noise that you specifically don't want to model. There is one other modeling issue here. You do not say whether your 20 years worth of survey data are a panel or serial cross-sections. If they are a panel, then you need the panel identifier as an additional level in the model.

    and a random intercept model:

    meqrlogit SforD gdp_lag2 time|| country1:time, cov(indep)
    Again, this model includes a random slope for time, so I wouldn't call it a random intercept model. But it does correctly specify a model with no random slope for gdp_lag2. So this model assumes that the log-odds of support for democracy, adjusted for time trend, varies linearly with gdp_lag2, and that it does so at the same slope in every country. The previous remarks about panel vs serial cross-section and the optionality of the -cov(indep)- option apply equally here.

    and a random slope model:

    meqrlogit SforD gdp_lag2 time|| country1:time gdp_lag2
    This is a full random-slopes model. It models log odds of support for democracy as a linaer function of gdp_lag2 after adjustment for time trend, and it allows each country to have its own slope of the log odds SforD:gdp_lag2 relationship. Again, panel vs cross-section and optionality of -cov(indep)-.

    Would it be ok to just run a model of random intercept? and leave out the random slope model?
    Even if you still want to believe in statistical significance, it doesn't help you answer this question. You have to look at the range of slopes implied by the variance of gdp_lag2 coefficients at the country level. If that range carries you into territory that is, in a practical,meaningful sense, different from the mean value given in the "fixed effects" results, then you should leave the random slope in. If, however, that variance is small enough that there is no substantive difference between the smallest and largest slopes you get, then feel free to omit it.



    My gdp growth is not significant when I include the random slope, but I have read that with only 18 level 2 variables, I might not have sufficient statistical power to run the model.
    If you mean that 18 countries is a rather small sample of country-space and may fail to detect small but still meaningful variation in the slope for gdp_lag2, yes that is a distinct possibility. But you should not obsess about statistical significance. In fact, you should not even look at statistical significance. The American Statistical Association has recommended that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and
    https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr. Focus your attention instead on effect sizes: the regression coefficients, or their exponentiated values (i.e. the odds ratios) and their confidence intervals. Are these large enough to be meaningful in a real-world sense? What if the true value were at one extreme or the other of the confidence interval. Would that change your qualitative impressions of what is going on? Those are the kind of things you should focus on when trying to understand your results.

    In addition, I am a bit concerned about how to interpret the odds ratios, as I have read that it might be biased due to the random effect part. In that case, can I only report the significance level and direction or is there a way I can get unbiased odds ratios or otherwise report my model?
    Yes, in observational studies there is the concern that random effects models give biased results. Reporting significance is a bad idea in any case. And the magnitude of the bias can be large enough that even the sign of the coefficient is wrong. With a two level model (which you can stick with if you have serial cross-sections, not panel data) you can do something different. You can do a fixed-effects model using -xtlogit-. See -help xtset- and -help xtlogit- for syntactic details. You cannot get random slopes with that (although you can sort of emulate them by including i.country1#c.(time gdp_lag2) interaction terms). Fixed effects models produce unbiased estimates of within-country effects. They do not estimate between-country effects at all. So if your research question is about differences among countries, fixed-effects models are not going to be useful, and you might instead look at -xthybrid- (available from SSC), which will give you separate estimates of within-country and between-country effects.

    My general question is, when dealing with 1) binary outcome 2) a level-2 independent variable 3) TSCS data - have I reached the right model, or is there any other obvious solution?
    This is a political science question. Syntactically all these models are fine. Whether they are "right" is not a statistical question. (Or, rather, from a purely statistical perspective, no model is right. The correct question to ask is whether the model is good enough to be useful, to shed light on the processes under study. That is a non-statistical, political science question.)


    Comment


    • #3
      Dear Clyde, thank you very much for your time, clear answers and help.

      Sorry for not being clear about my data, it is serial cross-section - not the same observations in each country each year and therefore my impression is that I cannot use the xtset and xtlogit and it does not work when I have tried to, but as you write:

      With a two level model (which you can stick with if you have serial cross-sections, not panel data) you can do something different. You can do a fixed-effects model using -xtlogit-. See -help xtset- and -help xtlogit- for syntactic details.
      I might be wrong about that, but where I have searched for answers, it seem that xtset only works on panel data? When I run xtset country time, I get the error of repeated time values within panel (which make sense, I guess) and if I run: duplicates list country time, I get an endless list of obs.

      Apart from that, everything else is very clear to me - thank you!

      Tammie

      Comment


      • #4
        it seem that xtset only works on panel data?
        No, that is not correct. -xtset- can also be used with serial cross-sections. However, what you can't do here is -xtset country time- because, as Stata complains, you have repeated observations within a country for a given time (i.e. all the different people). But -xtset- does not require a time variable: that is optional and it is only needed if you were going to do analyses with leads or lags (or other time-series operators) or autoregressive correlation structure. In serial cross-sections, none of those concepts are even meaningful. So just -xtset country- and you then have the -xt- commands available to you.

        Comment

        Working...
        X