Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a line graph from regression coefficients

    Hi, everyone!

    First of all - I am relatively new to Stata, but have learnt some of the basics.

    I want to do something that I would presume was quite simple: Create a line graph based on binomial logistic regression result (regression coefficients). There is a small hiccup that is causing me some problems: I want to have to overlayed line graphs – one for women and one for men.

    To illustrate my approach based on the auto.dta dataset:
    First, I need a log transformation of a variable to be included in the set

    gen mpg_log = log(mpg)

    logit foreign mpg mpg_log if rep78==3

    Logistic regression Number of obs = 30
    LR chi2(2) = 8.21
    Prob > chi2 = 0.0165
    Log likelihood = -5.6493936 Pseudo R2 = 0.4207

    ------------------------------------------------------------------------------
    foreign | Coefficient Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    mpg | -9.760884 6.677538 -1.46 0.144 -22.84862 3.32685
    mpg_log | 235.3801 157.8325 1.49 0.136 -73.96592 544.7261
    _cons | -513.3446 340.8593 -1.51 0.132 -1181.417 154.7273
    ------------------------------------------------------------------------------

    logit foreign mpg mpg_log if rep78==4

    Logistic regression Number of obs = 18
    LR chi2(2) = 13.12
    Prob > chi2 = 0.0014
    Log likelihood = -5.914538 Pseudo R2 = 0.5260

    ------------------------------------------------------------------------------
    foreign | Coefficient Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    mpg | -5.140943 4.310988 -1.19 0.233 -13.59032 3.308438
    mpg_log | 132.5099 107.5598 1.23 0.218 -78.30356 343.3233
    _cons | -295.845 237.7719 -1.24 0.213 -761.8694 170.1793
    ------------------------------------------------------------------------------
    Note: 2 failures and 0 successes completely determined.

    Thus far, I have only found one solution:

    Simply input the coefficients in a twoway plot using function:
    twoway function y=-295.845-5.141*x+132.510*log(x), range(upperlimit1 lowerlimit1) || function y=-513.345-9.761*x+235.380*log(x), range(upperlimit2 lowerlimit2)

    I have tried understanding the coefplot command, but can't seem to get it right (I do not have sufficient understanding of STATA to understand how margins really work). It would also be nice to have the option to plot the estimates as points separated by colours (y-scale is log odds, x-scale is mpg).

    A slightly more advanced option would be to use the _b[_coef], _b[mpg] and _b[mpg_log] instead of directly entering the numbers after performing the regression, but then I have the problem of entering two different functions in the same graph: _b will just use the latest coefficient estimates, not both separately.

    I am sorry if this is something that I should have already found on the forum, but I really can't seem to.

    Do anyone have a good solution?

  • #2
    y=-295.845-5.141*x+132.510*log(x)
    This is just simply a linear prediction. You do not provide the corresponding values of x defining the range, so I will suppose that these are the minimum and maximum values of the variable mpg.

    Code:
    sysuse auto, clear
    gen mpg_log = log(mpg)
    logit foreign mpg mpg_log if rep78==4
    quietly sum mpg if rep78==4
    set scheme s1mono
    twoway function y=-295.845-5.141*x+132.510*log(x), range(`r(max)' `r(min)') saving(gr1, replace)
    *USING MARGINS
    margins, predict(xb) over(mpg)
    marginsplot, recast(line) noci saving(gr2, replace)
    gr combine gr1.gph gr2.gph
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	23.4 KB
ID:	1663413




    To combine two marginsplots, specify the -post- option in margins and then save the estimates. Thereafter, you can use coefplot (from SSC). Alternatively, you can save the margins and use twoway, but the former requires less effort. Also, see https://www.statalist.org/forums/help#spelling.
    Last edited by Andrew Musau; 07 May 2022, 02:47.

    Comment


    • #3
      Thank you for your response!

      Originally posted by Andrew Musau View Post

      This is just simply a linear prediction. You do not provide the corresponding values of x defining the range, so I will suppose that these are the minimum and maximum values of the variable mpg.
      Thank you, that was what I wanted, I should have specified.

      *USING MARGINS
      margins, predict(xb) over(mpg)
      marginsplot, recast(line) noci saving(gr2, replace)
      This seems to create a new "problem" - I have non-integer values in my actual dataset, so the command will not run (I didn't know this was an issue until I tried this solution, sorry). Is there a simple solution here?

      Thanks in advance.

      Comment


      • #4
        You have a continuous variable. You can calculate predicted values at defined values of the continuous variable (you specify those values using the -at()- option). But note that the -over()- and -at()- options represent different ways of calculating margins. With the -over()- option, only observations of the variable taking a specified value in the dataset are used to calculate the margin. On the other hand, when using the -at()- option, all observations in the dataset (whether or not they take the specified value) are used to compute the margin. See

        Code:
        help margins
        Code:
        sysuse auto, clear
        set seed 05082022
        replace mpg= mpg+ rnormal(0.01, 0.1)
        gen mpg_log = log(mpg)
        logit foreign c.mpg c.mpg_log if rep78==4
        margins, predict(xb) at(mpg=(14(2)30))
        set scheme s1mono
        marginsplot, recast(line) noci saving(gr2, replace)
        Click image for larger version

Name:	Graph.png
Views:	1
Size:	22.6 KB
ID:	1663526

        Last edited by Andrew Musau; 08 May 2022, 09:26.

        Comment


        • #5
          I agree with Andrew Musau that Stata's -margins- and -marginsplot- and user-contributed -coefplot- and -marginscontplot- are the best way to accomplish your objective. These powerful commands are the gateway to many ways to interpret regression results and, especially, for visualizing the impact of a simulated change of an explanatory variable on the dependent variable from a regression. These tools enable the user to hold constant all the variables except the one of interest and to display confidence intervals, contrasts, etc., etc. In my opinion, -margins- is one of the most compelling reasons to use Stata- rather than a competing statistical package.

          However, since you are new to Stata you may be interested in a more transparent, if more limited, way to get these same plots. As you point out, after a regression Stata temporarily stores regression results in the "underscore macros": _const, _b[mpg], etc. Stata also temporarily stores the vector of coefficients in a temporary matrix called -e(b)-. So in your example using -auto.dta-, you can generate the desired graphs by applying matrix multiplication to the data used to run the regression. The following approach demonstrates one of the cleverest features of Stata, built-in from its earliest versions, which is the command -matrix score-. See -help matrix score-.

          Code:
          sysuse auto, clear
          keep if rep78==3 | rep78==4
          gen mpg_log = log(mpg)
          
          *    Regression for subsample with rep78==3
          logit foreign mpg mpg_log if rep78==3
          
          matrix define eb3 = e(b)
          matrix list eb3
          
          matrix score lgtp3 = eb3
              label variable lgtp3 "logit(predicted prob foreign) when rep78==3"
          
          *    Regression for subsample with rep78==4
          logit foreign mpg mpg_log if rep78==4
          
          matrix define eb4 = e(b)
          matrix list eb4
          
          matrix score lgtp4 = eb4
              label variable lgtp4 "logit(predicted prob foreign) when rep78==4"
          
          graph twoway line lgtp3 lgtp4 mpg, sort legend(col(1))
          The magic of the -matrix score- command is possible because Stata records the variable names from a regression as the K+1 column names of the stored vector of estimated coefficients, -e(b)-. (K right-hand-side variables plus an extra column for the constant.) The -matrix score- command looks for K variables in the data with names that match the K column names of the matrix appearing after the equal sign. The command then post-multiplies the [Nx(K+1)] matrix of data by that [(K+1) x 1] matrix of estimated coefficients. That is, the -matrix score- command computes predicted values using exactly the same matrix multiplication that is taught in statistics class. And it does so without requiring you to construct the [Nx(K+1)] matrix, which is impractical for big data.

          In this case, the result is to calculate a new variable equal to the fitted values from each of the two regressions. Superimposing the graphs of these two variables on the same graph gives the following image, which shows the same curves as in Andrew Musau 's post #2:

          Click image for larger version

Name:	predicted_logits.png
Views:	1
Size:	160.2 KB
ID:	1663545

          Since you have estimated a logit function for each of these two repair categories, the prediction of interest is probably the probability, not the logit of the probability. Using the function -invlogit()-, you can arrive at that comparative graph like this.


          Code:
          gen p3 = invlogit(lgtp3)
              label variable p3 "predicted probability of foreign when rep78==3"
          gen p4 = invlogit(lgtp4)
              label variable p4 "predicted probability of foreign when rep78==4"
          
          graph twoway line p3 p4 mpg, sort  legend(col(1))
          Click image for larger version

Name:	predicted_probs.png
Views:	1
Size:	162.7 KB
ID:	1663546

          I don't want to oversell the -matrix score- command as a substitute for -margins- . As I said at the beginning, a particular strength of -margins- is its ability to hold constant some variables, while predicting the values of other variables. Suppose your original regressions contained another control variable -length-. Then, in order to replicate the results of -margins- with -matrix score-, you would have to fool the -matrix score- command by replacing the true value of -length- with its mean, like this:


          Code:
          *    Regression for subsample with rep78==3
          logit foreign mpg mpg_log length
          
          matrix define eb3_withp = e(b)
          matrix list eb3_withp
          
          *   Replacing the actual values of -length- with its mean "fools" matrix score
          sum length
          replace length = r(mean)
          
          matrix score lgtp3_withp = eb3_withp
              label variable lgtp3_withp "logit(predicted prob foreign) holding -length- constant"
          This procedure is tedious and prone to error when the estimated equation is more complex. (The variable -length- has been corrupted!) Thus, the goto solution to your question is really -margins-.
          Last edited by Mead Over; 08 May 2022, 12:01.

          Comment


          • #6
            Thank you to users Andrew Musau and Mead Over, the graph now looks great!

            Comment

            Working...
            X