Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicted probabilities based on simulated success rates of categorical predictor for logit models

    Hi all,

    I ran a basic logit model and the standardized coef showed that a dichotomies variable is the strongest predictor. Next, I ran descriptives and found that only 40% of people were answered yes to that item. I want to see what would happen to my DV success rate if the success of the predictor was increased from 40% to 60%. I know how to predict success at different rates of an IV when the variable is continuous, but I don't know how to do it when I'm predicting simulated success of a categorical predictor. Any thoughts?

    Thanks again

    Dan

  • #2
    Hi Daniel,

    I am a little confused by your question. With a continuous predictor, you can make predictions for the outcome probability based on an plausible value of the predictor. However, for a binary variable, individuals can only be 0 or 1. There is no such thing as 40% or 60% of a binary predictor (centering variables aside). A predictor success rate of 40 or 60% is a sample metric, not an individual metric. So what you are asking, I think, is how many more events you might have if the sample had the predictor==1 for 60% of participants, instead of 40%. I have tried to achieve this below. Hope this is a helpful starting point.

    I wrote a very crude simulation to show one such way to play around with the proportion of success for your binary predictor, and to see it's effect on the number of events in your outcome. Here is the code:
    Code:
    * Choose number of observation
    capture clear
    set obs 1000
    
    * Simulate your binary predictor
    gen x1 = rbinomial(1, 0.4)
    
    * Log-odds of the outcome based on chosen effect size
    gen logit = 0.5 + 2*x1
    
    * Apply inverse-logit (expit) to get probabilities
    gen prob = exp(logit) / (1 + exp(logit))
    
    * Use probabilities to simulate binary outcome
    gen y = rbinomial(1, prob)
    tab1 y
    logit y x1, nolog or
    You can then run this again and change the probability of success for x1 from 0.4 to 0.6 (or any other number between 0 and 1). You could also add in other covariates and include non-linear effects or higher-order interactions, but this is a very simple demonstration. You may also consider setting the RNG seed to achieve reproducibility. I chose the effect size to be 2.0 on the logit scale, which is a very large effect (OR approximately 7.4), you can replace this with the coefficient from your own data/model.

    Comment


    • #3
      Thank you Matt, this is just what I was looking for!

      Comment


      • #4
        My only issue is that how do I make this distribution match the distribution of my predictor variable?

        Comment


        • #5
          I tried following the fixed correlation thread but that code didn't work

          Comment


          • #6
            Hi Daniel,

            You want the distribution of the simulated predictor to follow the distribution of your observed predictor? Is this correct? In general, to do Monte Carlo simulations we need to specify a probability distribution to sample from and so you can select one that follows closely to your observed distribution (or theoretical distribution). There is room to play around with different distributions and you may parameterize them as you like. Something like Empirical Bayesian approaches could get around this, but I don't know that you wan to go there.

            As for your second question, what fixed correlation thread are you referring to? I am unclear on what you mean.

            Comment

            Working...
            X