Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ordinal Logistic Regression Analysis: A question regarding Output Complexity and the Correct Steps to Follow

    Hello Statalist community,

    I’m hoping for some help in regards to the following question:

    Imagine you had to run an ordinal logistic regression analysis for six dependent variables all with the same model. For example,

    Code:
    ologit depvar1 indvar1 indvar2 indvar3 indvar4 control1 control2 control3 control4
    ologit depvar2 indvar1 indvar2 indvar3 indvar4 control1 control2 control3 control4
    ologit depvar3 indvar1 indvar2 indvar3 indvar4 control1 control2 control3 control4
    ologit depvar4 indvar1 indvar2 indvar3 indvar4 control1 control2 control3 control4
    ologit depvar5 indvar1 indvar2 indvar3 indvar4 control1 control2 control3 control4
    ologit depvar6 indvar1 indvar2 indvar3 indvar4 control1 control2 control3 control4
    Following up you’d like to do the usual steps: margins, marginsplots and predicted probabilities. I’m wondering, do you have any idea how to handle the complexity of the output? I mean, is there a good way to combine the outputs? For instance, I’m not so much interested in every single effect. I’m more interested to compare the effects of the independent variables across the 6 models. To see if all six dependent variables are effected by the same independent variables. Could the coefplot command be the right choice?


    Regarding output complexity: I have another question concerning the right procedure to follow when running ordinal logistic regression analysis. I read all the papers I could find from Richard Williams about proportional odds models, partial proportional odds models, generalised ordered logit models and heterogenous choice models. Based on my readings I came up with the following steps. Maybe someone who knows could point out to me if I'm doing the right thing here. Some guidance would be highly appreciated!
    1. Run a standard proportional odds model using the ologit command without taking care of any violations.
    2. Check if the proportional odds/parallel lines assumption is violated using a Brant test (or additionally running gologit2 with pl suffix and gologit2 with npl suffix and then use an LR test to check for significant differences). Depending on the test statistics I then know whether or not the proportional odds assumption holds.
    3. If the proportional odds assumption is violated, run both a generalised ordered logistic regression model (gologti2..., npl) and a partial proportional odds model (gologit2..., autofit) and check via LR test which model is more parsimonious. Most likely the partial proportional odds model will be the better choice.
    4. In addition, use a heterogenous choice model (oglm with het(…) suffix) to see whether or not some of the variables are affected by heteroskedasticity. Here again, I guess, compare the best fitting model from step 3 with the heterogenous choice model from step 4 and see which of these two models has the better fit.
    A follow up question in this regards would be whether or not the heterogenous choice model takes a violation of the proportional odds assumption into account. I believe not because like the standard ologit command the oglm command does not offer additional values for different thresholds. To summarise it loos to me as if there is no perfect way to handle both (1) violation of the proportional odds assumption and (2) heteroskedasticity. I can either use the partial proportional odds model or the generalised ordered logit model to handle problem 1 (violation of the proportional odds assumption), or handle problem 2 (heteroskedasticity) using a heterogenous choice model. However, it seems to me as if there is no way to handle both problems in one go, right?

    As mentioned earlier, any help from your side is highly welcome!

    Thanks,
    Jonas

  • #2
    Originally posted by Jonas Jakobi View Post
    I’m more interested to compare the effects of the independent variables across the 6 models. To see if all six dependent variables are effected by the same independent variables.
    You could fit all the models at once using gsem and then you could use the official Stata marginsplot. See example below (begin at the "Begin here" comment.) For brevity, I show the method for three outcome variables, but it's easily extended to six.
    Code:
    version 15.1
    
    clear *
    
    set seed `=strreverse("1468818")'
    
    quietly set obs 500
    
    forvalues i = 1/3 {
        generate byte response`i' = runiformint(1, 5)
        generate double predictor`i' = runiform()
    }
    
    *
    * Begin here
    *
    gsem (response? <- c.predictor?, ologit), vce(robust) nodvheader nocnsreport nolog
    
    pause on
    forvalues predictor = 1/3 {
        margins , atmeans at(predictor`predictor' = 0.5)  ///
            predict(equation(response1) eta) ///
            predict(equation(response2) eta) ///
            predict(equation(response3) eta)
    
        marginsplot , ///
            xtitle(Response) xdimension(_equation, nolabels) ///
            ytitle(Linear Prediction) ylabel( , angle(horizontal) nogrid) ///
                plotopts(mcolor(black) lcolor(black)) ///
                ciopts(lcolor(black)) level(50) ///
                    title(Predictor `predictor' at midpoint, position(11) ring(0)) ///
                        legend(off)
        pause
    }
    
    exit
    You'd choose the some interesting point for each of your explanatory variables and examine the linear predictions across outcome variables.

    If I were interested in whether the regression coefficients differed, then I would plot the regression coefficients, themselves, ± their standard errors across the equations, rather than the predictions.

    Regarding assumptions, you're probably better off relying on content knowledge than testing.

    Comment


    • #3
      Joseph Coveney ,
      thanks a million for your input. I’ll have a look at your code and try to figure out how to put it to use for my dataset. However, just a little question regarding the gsem command. I'm not an expert in generalised structural equation modelling, so my question might sound a bit odd to you. Can you please tell me if it is possible to test for a violation of the proportional odds assumption when using the gsem command? While using the standard ologit command I found out that the data I have violates the proportional odds assumption and that a partial proportional odds model would be the better choice. My question is: how to proceed? Would you still run the gsem command and ignore the violation mentioned before or is there a way to account for it?

      If I were interested in whether the regression coefficients differed, then I would plot the regression coefficients, themselves, ± their standard errors across the equations, rather than the predictions.
      That's probably a good idea. Can I use the coefplot command for this or would you suggest another way?

      Thanks again for your help!
      Jonas

      Comment

      Working...
      X