Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Would this be the right way to get the predicted probability from a logistic regression?

    Hi everyone

    We are investigating the relationship between women's education and contraceptive use in India. We are making splines for educational level and we have used the following code in stata:

    Code:
    mkspline edu1 5 edu2 8 edu3 12 edu4 = education logistic everused edu1-edu4 age age2 dontknow_caste middle_caste high_caste muslim christian other poorer middle richer richest, robust
    adjust age age2 dontknow_caste middle_caste high_caste muslim christian other middle poorer richer richest, gen(pr1)
    generate expr1=exp(pr1)
    generate prob1=1/(1+expr1)
    
    logistic currentmethod edu1-edu4 age age2 dontknow_caste middle_caste high_caste muslim christian other poorer middle richer richest, robust
    adjust age age2 dontknow_caste middle_caste high_caste muslim christian other middle poorer richer richest, gen(pr2)
    generate expr2=exp(pr2)
    generate prob2=1/(1+expr2)
    We are looking at both current use end ever use of contraception methods and the graph we obtain is presented here. We are a bit surprised about the results, the use contraceptive use is higher than we expected.

    So our questions is:

    Would this be the right way to get the predicted probability from a logistic regression?
    Click image for larger version

Name:	45298680_356425811592318_1622213468436299776_n.png
Views:	1
Size:	34.7 KB
ID:	1468809
    Click image for larger version

Name:	45364133_1278171622339465_7880775726960476160_n.png
Views:	1
Size:	94.7 KB
ID:	1468810


  • #2
    since the help file for -adjust- starts by saying,
    adjust has been superseded by margins
    , I wonder what version of Stata you are using (see the FAQ); you might also want to look at:
    Code:
    help logistic postestimation##predict

    Comment


    • #3
      Signe:
      you can also make your code more efficient using -fvvarlist- for categorical variables and interactions (as Rich implicitly reminds you about).
      Please note that if you create -age-, -agesq- by hand and then go -margins-, Stata will not be able to interpret-agesq- as the squared term for -age- and consider them as two different predictors; this problem has an easy fix, which implies -fvvarlist-:
      Code:
      c.age##c.age
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Okay, now we tried what you said and this is our new code:

        Code:
        logit everused education age c.age#c.age i.dontknow_caste i.middle_caste i.high_caste i.muslim i.christian i.other i.poorer i.middle i.richer i.richest, nolog
        Code:
        margins, at(education=(1 5 8 12 20)) atmeans
        Code:
        marginsplot, noci
        But the result we get doesn't make any sense:

        1. Is
        Code:
        at(education=(1 5 8 12 20))
        equivalent to a spline specification?
        2. It seems that probability of everuse decreases as years of education increase. This doesn't make any sense since the relationship should be positive.

        Do you have any idea of what we are doing wrong/ how to get the results we are expecting?
        Click image for larger version

Name:	45320610_273782706676562_3757093092524556288_n.png
Views:	1
Size:	42.8 KB
ID:	1468843

        Last edited by Signe Kristine; 04 Nov 2018, 08:51.

        Comment


        • #5
          Originally posted by Signe Kristine View Post
          Okay, now we tried what you said and this is our new code:

          Code:
          logit everused education age c.age#c.age i.dontknow_caste i.middle_caste i.high_caste i.muslim i.christian i.other i.poorer i.middle i.richer i.richest, nolog
          Code:
          margins, at(education=(1 5 8 12 20)) atmeans
          Code:
          marginsplot, noci
          But the result we get doesn't make any sense:

          1. Is
          Code:
          at(education=(1 5 8 12 20))
          equivalent to a spline specification?
          2. It seems that probability of everuse decreases as years of education increase. This doesn't make any sense since the relationship should be positive.
          No, the margins code is not a splined specification. You merely asked margins to present the average predicted probabilities, holding:

          1) Education at ages 1, 5, 8, 12, and 20,

          2) And all other covariates at their means.

          Logistic models normally produce a curved line even without splines or a quadratic specification of the independent variables. In your case, you may not be seeing much curvature because the probability of using contraception isn't varying much over the entire range of education (look at your Y axis, compared to the other graph you showed).

          If you think the effect of education is non-linear in the log odds, then you could include a quadratic term for education. This is probably not justified given the output from the linear probability model above. Nonetheless, example code would be:

          Code:
          logit everused c.education##c.education c.age##c.age i.caste i.religion i.income_group, nolog
          margins, at(education=(1 5 8 12 20)) atmeans
          marginsplot
          Side note: You appear to still be manually generating dummies for caste, income, and religion. The syntax above relieves you from that burden. I am not sure if you accidentally omitted a base dummy group for income, and if you did, that would be erroneous. It's better to use the factor variable syntax, because it reduces amount of coding you have to do, and it reduces the chance of a coding error. I think it shouldn't change the output from the regression barring the error above. If you have income as originally coded, a lot of readers would accept it if you included income or log income as continuous.

          Side note 2: splines don't work as well with margins and marginsplot. If you absolutely, absolutely must introduce splines into the logistic regression, you should let us know, but I don't think you should need to.

          Last, if you have no coding errors, then your are results are what they are. Given the predicted probabilities on the Y axes, this looks like a different sample. Things could have changed. Or who knows, maybe ever used was accidentally coded in reverse format (such that 1 is never used).
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Kristine:
            as an aside to Weiwen's helpful advice, it would be useful if you shared the legend an related values appearing above the -margins- outcome table when you invoked the -atmeans- option.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Spline functions do complicate your life. Unfortunately, I don't know of an easy way to tell margins that spline variables are not independent of each other. One of my wish list items is for margins to be able to handle more complicated situations, like when you are dealing with functions of a variable (other than things like squaring and cubing).

              I go over some simple spline plotting procedures on pp. 13-17 of

              https://www3.nd.edu/~rwilliam/stats2/l61.pdf

              Maybe they could be adapted for your purposes.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Actually, in my last class, we had a demonstration of how to plot predicted probabilities (or whatever else) after a splined regression. Now, this takes away from the functionality of -margins-, but it does get you a plot. I recall that there have been other discussions of how to run margins properly after a splined regression, but I don't recall the propose solution.

                For Signe's benefit, here's what happens if you take one of the stock datasets, fit a logistic model with dosage as a continuous variable, then fit it with splines. The dataset uses just two variables: dosage, and a continous outcome. The blue line shows the predicted probabilities after fitting a logistic model with dosage treated as continuous, no quadratic term. The red line is the predicted probabilities after we created linear splines at quintile breakpoints, per Stata's example syntax for -mkspline-.

                If you needed 95% CIs with a graph after splines, you should note that you can calculate the standard error of the prediction, then create two new variables at probability +/- 1.96 * SE. Then plot all 3 lines.

                Code:
                webuse mksp2, clear
                sum dosage, det
                                           dosage
                -------------------------------------------------------------
                      Percentiles      Smallest
                 1%            0              0
                 5%            3              0
                10%            8              1       Obs                 100
                25%         24.5              2       Sum of Wgt.         100
                
                50%         48.5                      Mean               48.3
                                        Largest       Std. Dev.      29.78729
                75%         73.5             96
                90%           91             99       Variance       887.2828
                95%         95.5             99       Skewness       .0825489
                99%         99.5            100       Kurtosis       1.814948
                
                logistic outcome dosage
                predict pr_logistic
                mkspline dose 5 = dosage, pctile
                logistic outcome dose1-dose5
                predict pr_spline
                twoway (line pr_logistic dosage, sort) (line pr_spline dosage, sort)
                Click image for larger version

Name:	demo.png
Views:	1
Size:	52.4 KB
ID:	1468881


                So, as I mentioned, logistic regression usually produces a set of predicted probabilities that have a curve over a large range (see the blue line). Signe's graph after her own logistic model doesn't look curved. My sense is that there probably isn't a lot of variation in her predicted probabilities, so her graph of probabilities looks rather more straight. If you took the mid-section of the blue line, it would also look straight. Also note the predicted probabilities on my Y-axis - they range from nearly 0 to nearly 1, whereas Signe's graph is constrained to .45-.55 or so.

                Back to Signe, this is what things could look like if you did a splined logistic regression. You have a set of piecewise logistic functions. They aren't very interpretable to me, but I don't typically model dose-response relationships in this much detail.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment

                Working...
                X