Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicting xb with (temporary) alterations to x

    I realize there must be an easy option that I'm missing here... I want to predict "yhat" after a regression, with various alterations to the matrix of covariates. Of course I can do this by changing the X matrix, but if I want to do this over and over, basically predicting the outcomes under various X scenarios, it becomes a pain to change X back and forth from it's original values. I thought either "predict" or "margins" would allow me to do this, but predict doesn't allow the "at" option, and margins doesn't seem to have an option for actually predicting a variable (aka a value for each observation)? Is there an efficient way to do this that I'm missing? Thanks!

  • #2
    I'm not totally sure why you want to do this, but I think you can. Margins has an undocumented generate function:

    https://www.statalist.org/forums/for...ted-gen-option

    Example:

    Code:
    webuse nhanes2f, clear
    logit diabetes weight i.female i.black
    margins, at(weight = (30(10)180)) gen(xweight)
    sum xw*
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Another possibility would be to make up a dataset containing observations with the combinations of x values of interest, and then appending it or otherwise putting it into your dataset, along with a variable marking these cases ("predict_only =1 vs. 0"). Then you could do -regress y x1 x2 if (predict_only == 0)-, and then do -predict yhat if (predict_only ==1), xb- The important principle is that cases used by -predict- must have all the x values, but need not have been in the estimation sample for the regression model. This approach might or might not be more useful or convenient than -margins- in your situation.

      Comment


      • #4
        Hi, thanks both! Richard, I didn't know margins could do that, neat. I thought it was the perfect solution, until I realized that the "at" option under margins takes only 1 variable (multiple values, but 1 variable) at at time, where I want to change a few variables at once. I guess this just isn't what margins was set up to do.

        Mike, your solution is good, and I thought about doing that too. I can also use preserve and restore around (1) variable changes, (2) a prediction, which is then (3) exported into a tempfile, to be later merged back on. At the moment I'm just saving original variable values in a second variable called var_orig, so that I can move variable values back and forth from original values, simulating predictions under various scenarios. I guess it was silly to expect a canned way to do this. Thanks again!

        Comment


        • #5
          Richard, I didn't know margins could do that, neat. I thought it was the perfect solution, until I realized that the "at" option under margins takes only 1 variable (multiple values, but 1 variable) at at time, where I want to change a few variables at once. I guess this just isn't what margins was set up to do.
          Not true. If you can keep the parentheses straight (a non-trivial task) you can do all sorts of convoluted and complicated things.

          Code:
          webuse nhanes2f, clear
          logit diabetes weight height i.female i.black
          margins, at(weight = (30(50)180) height = (135(20)200)) gen(xweight)
          sum xw*
          That gives you predictions for 16 height/weight combinations.

          You can even have multiple at statements:

          Code:
          margins, at(weight = (30(50)180)) at(height = (135(20)200)) gen(zweight)
          sum zw*
          That gives you 8 predicted values, 4 for different values of height and 4 for different values of weight.

          I'm not sure how complicated you want to make this, but the mtable command (part of mpost13_ado) can make for much more readable output.

          Code:
           webuse nhanes2f, clear
           logit diabetes i.female i.black c.age c.age#c.age, nolog
           * Nice looking output
           mtable, at (black = (0 1) age = 20 ) at (black = (0 1) age = 47 ) at (black = (0 1) age = 74 ) dec(4) statistics(all)
           * even nicer looking output
           quietly mtable, at (black = 0 age = 20 ) rown(20 year old white) dec(4) statistics(all)
           quietly mtable, at (black = 1 age = 20 ) rown(20 year old black) dec(4) statistics(all) below
           quietly mtable, at (black = 0 age = 47 ) rown(47 year old white) dec(4) statistics(all) below
           quietly mtable, at (black = 1 age = 47 ) rown(47 year old black) dec(4) statistics(all) below
           quietly mtable, at (black = 0 age = 74 ) rown(74 year old white) dec(4) statistics(all) below
           mtable, at (black = 1 age = 74 ) rown(74 year old black) dec(4) below statistics(all)

          Here is the output produced by the nice and even nicer mtable commands:

          Code:
          Expression: Pr(diabetes), predict()
          
                     |    black       age     Pr(y)        se         z         p        ll        ul
           ----------+-------------------------------------------------------------------------------
                   1 |        0        20    0.0033    0.0010    3.4674    0.0005    0.0015    0.0052
                   2 |        1        20    0.0069    0.0021    3.3149    0.0009    0.0028    0.0109
                   3 |        0        47    0.0325    0.0028   11.7688    0.0000    0.0271    0.0380
                   4 |        1        47    0.0647    0.0081    7.9553    0.0000    0.0487    0.0806
                   5 |        0        74    0.1078    0.0101   10.6929    0.0000    0.0881    0.1276
                   6 |        1        74    0.1990    0.0237    8.3827    0.0000    0.1524    0.2455
          Code:
          Expression: Pr(diabetes), predict()
          
                              |    Pr(y)        se         z         p        ll        ul
           -------------------+-----------------------------------------------------------
            20 year old white |   0.0033    0.0010    3.4674    0.0005    0.0015    0.0052
            20 year old black |   0.0069    0.0021    3.3149    0.0009    0.0028    0.0109
            47 year old white |   0.0325    0.0028   11.7688    0.0000    0.0271    0.0380
            47 year old black |   0.0647    0.0081    7.9553    0.0000    0.0487    0.0806
            74 year old white |   0.1078    0.0101   10.6929    0.0000    0.0881    0.1276
            74 year old black |   0.1990    0.0237    8.3827    0.0000    0.1524    0.2455
          
          Specified values of covariates
          
                     |    black       age
           ----------+-------------------
               Set 1 |        0        20
               Set 2 |        1        20
               Set 3 |        0        47
               Set 4 |        1        47
               Set 5 |        0        74
             Current |        1        74
          For more, see

          https://www3.nd.edu/~rwilliam/stats3/Margins04.pdf

          https://www3.nd.edu/~rwilliam/stats3/Margins05.pdf

          In short, pretty much anything can be done. Doing it correctly may be a challenge though. Mike's suggestion may be tedious but it may be less error-prone than trying to do everything with a single mega command.

          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment

          Working...
          X