Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins at the observational level

    Dear Statalist users,
    Is it possible to save the margins that are calculated for each observation as an additional variable when using the margins command?

    Here is an example of what I'm looking for. Let's say we have estimated the following probit model
    webuse margex, clear
    probit outcome age distance

    If we use the command
    margins, eyex(age)
    calculations are made at the observational level and are then averaged.

    The number that is reported (4.325736) is the resulting average from the calculations done for each observation.
    Is it possible to access the result of the calculations done for each observation and save those as another variable?
    Thanks.


  • #2
    This is a weakness of margins, in my opinion. Slides 32 and 33 of

    http://www3.nd.edu/~rwilliam/stats/Margins01.pdf

    show how to do this for a categorical independent variable. The calculation is different for a continuous variable but I think it can be adapted. I've just doen this for dydx, not eyex.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Here is how to do dydx for a continuous variable. I assume you can tweak this to do eyex.

      Code:
      webuse nhanes2f, clear
      clonevar xage = age
      sum xage
      gen xdelta = r(sd)/1000
      logit diabetes i.female xage
      margins, dydx(xage)
      predict xage1
      replace xage = xage + xdelta
      predict xage2
      gen xme = (xage2 - xage1) / xdelta
      sum xme
      You'd have to tweak if there is missing data or sample restrictions or whatever. Compare the results of the margins command and the last sum command and make sure they are virtually identical.

      I have adapted this from section 10.6.10 of http://www.stata.com/bookstore/microeconometrics-stata/ . Their files can be downloaded, and the mus10 files show a more general solution if you have a bunch of continuous vars you want to do this for. For categorical variables use the approach I showed in my earlier link.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thanks! So basically, these have to be calculated by hand by re-calculating the probabilities at new values (after some delta change) for the variable of interest. A weakness of margins, indeed!
        Last edited by Maria-Ana Vitorino; 07 Jul 2014, 18:22.

        Comment


        • #5
          Well, Stata will do the calculations for you so it isn't quite by hand!

          I can't figure out how to do eyex in probit. The formula given in the Stata manual is eyex() = dydx * (x/y). I am guessing that means that

          gen xeyex = xme * (xage/xage1)

          i.e. you use the predicted value of P(Y = 1| X) but I could be wrong. That seems to come pretty close though. Maybe it should be xage2. Again,check the margins results versus the sum results. If anyone knows for sure I would be curious to hear it.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            The manual says "As margins always does with response functions, calculations are made at the observational level and are then averaged." Given that it is doing the individual-level calculations I don't know why it won't let you save them. The dydx's give you averages but across individual cases there can be a lot of variability in the effect of x on a change in y.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Here is how it works in your example. I'm still not sure about the calculation of eyex but in this case it seems to come very close.
              Code:
              . webuse margex, clear
              (Artificial data for margins)
              
              . clonevar xage = age
              
              . sum xage
              
                  Variable |       Obs        Mean    Std. Dev.       Min        Max
              -------------+--------------------------------------------------------
                      xage |      3000      39.799    11.54174         20         60
              
              . gen xdelta = r(sd)/1000
              
              . probit outcome xage distance, nolog
              
              Probit regression                                 Number of obs   =       3000
                                                                LR chi2(2)      =     594.51
                                                                Prob > chi2     =     0.0000
              Log likelihood = -1068.8192                       Pseudo R2       =     0.2176
              
              ------------------------------------------------------------------------------
                   outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                      xage |   .0650308   .0032464    20.03   0.000     .0586679    .0713937
                  distance |  -.0038913   .0013313    -2.92   0.003    -.0065007    -.001282
                     _cons |  -3.702959   .1501843   -24.66   0.000    -3.997315   -3.408603
              ------------------------------------------------------------------------------
              
              . margins, dydx(xage)
              
              Average marginal effects                          Number of obs   =       3000
              Model VCE    : OIM
              
              Expression   : Pr(outcome), predict()
              dy/dx w.r.t. : xage
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                      xage |   .0129155   .0005348    24.15   0.000     .0118673    .0139637
              ------------------------------------------------------------------------------
              
              . margins, eyex(xage)
              
              Average marginal effects                          Number of obs   =       3000
              Model VCE    : OIM
              
              Expression   : Pr(outcome), predict()
              ey/ex w.r.t. : xage
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      ey/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                      xage |   4.325736   .3165899    13.66   0.000     3.705231    4.946241
              ------------------------------------------------------------------------------
              
              . predict xage1
              (option pr assumed; Pr(outcome))
              
              . replace xage = xage + xdelta
              (3000 real changes made)
              
              . predict xage2
              (option pr assumed; Pr(outcome))
              
              . gen xme = (xage2 - xage1) / xdelta
              
              . gen xeyex = xme * (xage/xage1)
              
              . sum xme xeyex
              
                  Variable |       Obs        Mean    Std. Dev.       Min        Max
              -------------+--------------------------------------------------------
                       xme |      3000    .0129204    .0090861   2.27e-08   .0259479
                     xeyex |      3000     4.33014    1.679459   2.655209   11.87233
              
              . 
              end of do-file
              
              .
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                I found a formula for dydx for probit. This seems to work perfectly (for probit only):

                Code:
                * Probit specific
                webuse margex, clear
                probit outcome age distance, nolog
                margins, dydx(age)
                margins, eyex(age)
                margins, eydx(age)
                margins, dyex(age)
                predict agepred
                gen xdydx = normalden(invnorm(agepred)) * _b[age]
                gen xeyex = xdydx * (age/agepred)
                gen xeydx = xdydx * (1/agepred)
                gen xdyex = xdydx * age
                sum xdydx xeyex xeydx xdyex
                Likewise this works perfectly for logit:

                Code:
                * Logit specific - Works perfectly
                webuse margex, clear
                logit outcome age distance, nolog
                margins, dydx(age)
                margins, eyex(age)
                margins, eydx(age)
                margins, dyex(age)
                predict agepred
                gen xdydx = agepred * (1 - agepred) * _b[age]
                gen xeyex = xdydx * (age/agepred)
                gen xeydx = xdydx * (1/agepred)
                gen xdyex = xdydx * age
                sum xdydx xeyex xeydx xdyex
                At least, they work perfectly in these simple models. If you toss in interaction or squared terms it may get more complicated.

                Here is the more general code -- Not quite s precise but will work when you don't have an exact formula for dydx. I think it would also be good if, say, you had interaction terms involved. I fixed a small error from earlier.

                Code:
                webuse margex, clear
                clonevar xage = age
                sum xage
                gen xdelta = r(sd)/1000
                probit outcome xage distance, nolog
                margins, dydx(xage)
                margins, eyex(xage)
                margins, eydx(xage)
                margins, dyex(xage)
                predict xagepred1
                replace xage = xage + xdelta
                predict xagepred2
                replace xage = age
                gen xdydx = (xagepred2 - xagepred1) / xdelta
                gen xeyex = xdydx * (xage/xagepred1)
                gen xeydx = xdydx * (1/xagepred1)
                gen xdyex = xdydx * xage
                sum xdydx xeyex xeydx xdyex
                Last edited by Richard Williams; 07 Jul 2014, 23:06.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  My nomination for "Thread of the month"! --Mark

                  Comment


                  • #10
                    Hi Richard, This is great, thanks! It's a pity that one cannot do this with margins.
                    Right now, margins calculates the derivatives at the observational level and then averages them. But there is literature (e.g. Hensher, Rose and Greene, 2007) that advocates that the elasticities should not be just averaged across observations (which is called "naive pooling" but rather should be weighted by each observation's predicted probability ("probability weighted sample enumeration").
                    Given this and that margins does not have an option for generating such "weighted elasticities" automatically (which would be great!), it would be good to at least be able to save the elasticities at the observational level so that the users could then weigh them as they please.
                    For the logit model this is not so much of an issue given the closed form expression of many of the formulas. But for the probit it is a bit more cumbersome. Your code gets around this. To calculate the weighted elasticities then one only needs to add the following lines to your code:
                    egen xeyexW=wtmean(xeyex), weight(xagepred1)
                    sum xeyexW

                    Comment


                    • #11
                      Glad something useful came out of it! When I ran your code, I got the message "unknown egen function wtmean()". So, I did -findit wtmean-, and found '_GWTMEAN' at SSC, a 2001 program by David Kantor. Was that the right thing to do or should I have found wtmean elsewhere? Here is what I got with the Probit specific code above, does it match what you got?

                      Code:
                      . sum xeyexW
                      
                          Variable |       Obs        Mean    Std. Dev.       Min        Max
                      -------------+--------------------------------------------------------
                            xeyexW |      3000      3.5793           0     3.5793     3.5793
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      StataNow Version: 19.5 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://www3.nd.edu/~rwilliam

                      Comment


                      • #12
                        I have a related Q of my own. I never actually use most of this stuff -- the main thing I use is dydx for categorical variables. The formulas I used are partly from p. 1168 of r.pdf. They include

                        eyex() = dy/dx * x/y
                        eydx() = dy/dx * 1/y

                        What happens when y is zero or, perhaps even worse, very very very close to 0? That may not have happened in the current example but it seems like it could happen in other examples or with other methods where y itself, rather than a function of y, is being used. I wonder if there is some other variation of these formulas that would work better.

                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Can this code be used for multinomial probit as well, or would modifications need to be made to the code to account for which outcome is being predicted?

                          Comment

                          Working...
                          X