Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression: Post estimation, non linear proportion??? and interaction effect

    Let say that I have a very simple model. My dependent variable is whether or not (yes and no categories) patients in a mental clinic were involved in a violent incident in their last admission. My independent variable is age. The visualized this relationship I created a line graph were the Y is just the proportion of Yes (inmates who were involved in a violent incident) and the X axis is age. I see that as age increases the proportion of patients who were involved in a fight increase- however only to a certain age. After 40 there is no difference in the proportion of patients involved in fights. When I regress - logistic fight age- I see that age is significant at the 0.05 level and the Odd ratio is .9192005. After the regression I predicted the Y values with the command - predict pY.

    Questions:
    1. The odd ratio .919 means that for every year increase, the odd ratio of being involved in a fight decrease by .919, right? So this means that the increase is the same for evere year. That is the increase is the same from 14 to 15 than from 50 to 51? When I look at the predict Ys, the changes from one age to the other are not the same. For example. the predicted Y for 15 years old is .14;for 16 years old is .13; for 17 years old is .13, etc. As you see the increase are not the same. Am I confusing concept, terms, ideas?
    2. The predicted values for 15 yeas old is .14. Does this mean that the the predicted probability of being involved in a fight is 14% for 15 yeas old? in other words, that 14 out of 100, 15 years old are predicted to be involved in a violent incidents?
    3. Based on my bivariate descriptive analysis I know that at some point age (lets say 40 years) does not influence violent incident involvement. Does the predicted Y account for this? Or do I have to do something to fit the model in a better way as in regular linear regression?
    4. How can I included an interaction term in my logistic regression- gender. Based on a similar chart describe above, I see that age affect the proportion of being in a fight differently for men and women. Although the proportion for both decrease as age increase, for women is more gradual and for men more steep from 14 to 21 steep decline and then almost no effect. How can I included this interaction effect and what would be a good way to visualize this.

    Thank so much in advance! Any help is welcome!

    Marvin





  • #2
    1. no; it means a decrease of (1-.919) or about an 8% decrease in the odds (NOT in the odds ratio) for each year (assuming that age is in years)

    2. yes

    3. no, you have modeled age as linear so it will be linear for the entire range of ages in your data; if you want some other functional form (e.g., a quadratic or a piecewise linear) you need to model that explicitly

    4.same as in any other model: c.age##i.gender

    Comment


    • #3
      Rich Goldstein Thank you!

      1. So the 8% decrease should be constant for every year, right? That is from 15 to 16 years old should be a decrease of 8% and from 50 to 51 also a decrease of 8%? So my predicted values also should decrease 8% from one year to the next, correct?

      3. I was doing some reading and I discovered the command lowess to test for linearity in the mean of Y across the X values. I used the Lowess command, as well as just creating charts of the proportion of yes (involved in a fight) across values of X. My dependent variables is involvement in fights and the independent variables are Age, Number of previous admissions, and a Patient violence score. Age seems to have an effect in fight only to a certain age and then stop. I think this is a quadratic relationship- am I correct? If so how can I include a term in my regression to account for this? What about Classification score? Its very similar than Age. Finally Number of past admission also seems to have a non-linear relationship. It seems that as the number of past admission increase, the proportion of people involve in fights increase but then it goes down. An possible explanation for this could me that there is a point among recurrent patients that they are more familiar with the system or perhaps consequences that they stop being involved in fights. What type of association is this? How can I included this in my logistic regression? Am I interpreting this right?

      Thank you so much1
      Marvin

      Comment


      • #4
        Remember that the probability of observing an outcome 1 is not a linear function of x. I always find it easier to look at the changes in probabilities with the margins command to explore results (and marginsplot to visualize them). See the example below. Is this the relationship you are thinking of with respect to age?

        Code:
        sysuse auto
        logit foreign rep78 trunk turn
        margins, at(turn=(30(2)50))
        marginsplot
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          Rich Goldstein Thank you!

          1. So the 8% decrease should be constant for every year, right? That is from 15 to 16 years old should be a decrease of 8% and from 50 to 51 also a decrease of 8%? So my predicted values also should decrease 8% from one year to the next, correct?

          3. I was doing some reading and I discovered the command lowess to test for linearity in the mean of Y across the X values. I used the Lowess command, as well as just creating charts of the proportion of yes (involved in a fight) across values of X. My dependent variables is involvement in fights and the independent variables are Age, Number of previous admissions, and a Patient violence score. Age seems to have an effect in fight only to a certain age and then stop. I think this is a quadratic relationship- am I correct? If so how can I include a term in my regression to account for this? What about Classification score? Its very similar than Age. Finally Number of past admission also seems to have a non-linear relationship. It seems that as the number of past admission increase, the proportion of people involve in fights increase but then it goes down. An possible explanation for this could me that there is a point among recurrent patients that they are more familiar with the system or perhaps consequences that they stop being involved in fights. What type of association is this? How can I included this in my logistic regression? Am I interpreting this right?

          Thank you so much1
          Marvin
          Attached Files

          Comment


          • #6
            re: 1 - the 8% is for a decrease in odds, not probabilities

            assuming that the name of your age variable is "age", just add c.age##c.age (and drop the simple "age" term) to your command to get a quadratic; I agree that lowess is a good tool for investigating possible non-linearities when ignoring other covariates; you don't show the commands you used for the lowess curves (which would have helped me, at least); for more general forms of non-linear modeiling you might want to investigate restricted cubic splines (see "h mkspline") and/or fractional polynomials (see "h fp" and "h mfp")

            Comment


            • #7
              Hi Carole,

              Yes- my age variable and the proportion of being in a fight looks like your graph. In your example model,the coefficient for trunk is -.4686829. how can you interpret this? Isn't this: for every unit increase in trunk, the coefficient of foreign increase by -.4686829?? If this is the case, the decrease should be the same for every year to year??? Am I not understanding something here? I see that the predicted values, does not decrease at the same coefficient, why? Based on your sample, should you consider to include a different function, ex quadratic , etc?
              Is there any tutorial or resource you could recommend to understand logistic regression and post estimations techniques?

              Thank you!
              Marvin


              Comment


              • #8
                No, based on my example, there is no need for changing the functional form of x. The whole point is that logit is non-linear.
                (You could estimate logit foreign rep78 trunk c.turn##c.turn and graph the effects)

                Ultimately, the effect may look more or less linear depending on the estimates. For example, if I look at trunk instead of turn as in #4, I get a pretty linear looking relationship between trunk space and the probability of 1:
                Code:
                sysuse auto
                logit foreign rep78 trunk turn
                margins, at(trunk=(5(2)25))
                marginsplot

                There are a variety of ways of interpreting coefficients and odds or odds ratios. I prefer using the predicted probabilities. In the case of the turn example in #4, I might say: The probability of observing an outcome 1 changes substantially across the range of the variable -turn-. If turn is at its minimum value, the probability is ..., whereas if turn is at the sample average, it is ..., and ...at its maximum value of ....

                Remember, though, it is important to understand that the relationship ietween x & Pr(y) is non-linear AND that it depends on the values of all covariates. Consider the effect of turning radius on Pr(y=1) when trunk is at its minimum (5) and when trunk is at its maximum (23):

                Code:
                sysuse auto
                logit foreign rep78 trunk turn
                margins, at(turn=(30(2)50) trunk=(5)) at(turn=(30(2)50) trunk=(23))
                marginsplot
                Not a huge difference in this case, but some.

                To understand all of this, I would start with the webpages of @Richard William since he uses a lot of Stata examples

                Here's his Logistic Regression 2 notes: https://www3.nd.edu/~rwilliam/xsoc73994/Logit02.pdf
                and go up or down as necessary (full course page: https://www3.nd.edu/~rwilliam/xsoc73994/index.html)
                Stata/MP 14.1 (64-bit x86-64)
                Revision 19 May 2016
                Win 8.1

                Comment


                • #9
                  Thanks Carole! I need to do some readings in order to better understand the work of logistic regression and post estimation.


                  Final questions. So in which scenarios, you could change the functional form of an association? For example, if you look at my earlier posts, you can see that age influence the proportion of patients involved in a fight but only to a certain age. let's say that we see that at 30 and up the proportion patient involved in fights are the same for every year. in this case can you introduced a functional form in the model?

                  BEst,
                  Marvin

                  Comment


                  • #10
                    After you have a good understanding of the logit model and how to interpret the results, I would run a model with c.age and then another with c.age##c.age and compare. If the quadratic term has a significantly better fit, and results in substantively different results, then I would add it in. Otherwise, without compelling evidence that the model fit improves and there is a substantive payoff, I would opt for the more parsimonious specification.
                    Stata/MP 14.1 (64-bit x86-64)
                    Revision 19 May 2016
                    Win 8.1

                    Comment

                    Working...
                    X