Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Postestimation following melogit

    Hi all
    I am running a 2-level random intercepts logistic regression model (using melogit) and would like to report predicted probabilities after running the model. I can do that easily enough and store them in a variable called full_prediction using

    predict full_prediction, mu

    But what I really want to be able to do is to dissect these predictions into a component due to the fixed part of the model and a component due to the random intercept. The command xb generates the linear predictor of the fixed portion of the model, so I thought that maybe

    predict fixed_prediction, xb

    would allow me to calculate the component due to the random intercept by subtraction.

    As there are no random slopes in this model (only the intercept varies randomly), I expected that the component due to the random intercept would be the same for every case in the same higher level unit. But they weren't.

    Not only that, in some cases they were outside of [0,1]; as were some of the predicted probabilities due to the fixed part of the model; i.e. those derived by

    predict fixed_prediction, xb

    Have I misunderstood something here?


    Many thanks
    John

  • #2
    You can't break up the predicted probability into a fixed part and a random part in the way you are trying to do here. That's because the fixed and random parts are defined in the log odds metric, not in the probability metric, and the logit transformation that connects them is non-linear. So you have to work just in the log-odds metric here

    Code:
    predict eta, eta // == xb + random intercept
    predict xb, xb
    gen random_intercept = eta - xb
    If you try to transform these into probabilities by applying the -invlogit()- function, you will find that they do not add up to the overall predicted probability (mu). The random intercepts calculated this way, however, will be consistent within the higher level unit (well, there may be a little variation that is a small rounding error.)

    Last edited by Clyde Schechter; 15 Mar 2019, 16:20. Reason: Clearer and more accurate explanation of the problem.

    Comment


    • #3
      Thanks Clyde, very helpful as always! I should have remembered that the model is defined in the log odds metric.
      Your help is much appreciated.
      John

      Comment


      • #4
        Hi all
        Please can I ask a follow-up question to my last post, which related to the generation of random intercepts in a 2-level random intercept multiple logistic regression model.

        I have several models which appear to run satisfactorily, but the random intercepts (as generated using the method suggested by Clyde above) are implausibly large; and in 63 out of my 65 higher level units (local authorities; which are approximately, but not exactly, the same size as each other) they are positive. (In most models up to now, the random intercepts I have generated have mean values close to zero in the log odds metric.)

        A few potential issues that I'm aware of is that the total proportion of positive events is very low (about 0.3% of all cases); some of my upper-level units have very few events in them (15 is the lowest number), and also some combinations of categories of predictor variables have no events at all. I have tried dropping all the upper-level units with less than 50 events in them; and also dropping all variables which lead to cells with zero events. This leaves me with a model with about 40 upper level units, 5 variables, and about 1.5 million cases including about 5000 positive cases.

        However, I still get random intercepts which almost all go in the same direction, and have a non-zero mean. Even if I drop all my variables, the resulting null model still has random intercepts which are in the same direction in virtually all higher-level units, but now they are all negative!

        Here is the code I have been using. The outcome variable is called Res_binary and two sets of models are running: one in which the age of the individuals (the level -1 unit) is 16-17, and one in which the age is not 16-17 (as in the example below). The grouping variable is called LACode.

        melogit Res_binary[list of predictor variables; all of which are at level 1 of the model] if Age1617==0 || LACode:
        predict etaresyoung, eta
        predict xbresyoung, xb
        generate rand_int = etaresyoung - xbresyoung

        The variable rand_int is what I am deriving to model the random intercept at LA-level. It is the values of this variable (which are the same for every individual within each local authority) which are puzzling me.

        I'd be very grateful for any light anyone can shed on this issue.

        Many thanks
        John

        Comment

        Working...
        X