Margins at the observational level

Maria-Ana Vitorino

Join Date: Jun 2014

Posts: 20
#1

Margins at the observational level

07 Jul 2014, 17:00

Dear Statalist users,
Is it possible to save the margins that are calculated for each observation as an additional variable when using the margins command?

Here is an example of what I'm looking for. Let's say we have estimated the following probit model
webuse margex, clear
probit outcome age distance

If we use the command
margins, eyex(age)
calculations are made at the observational level and are then averaged.

The number that is reported (4.325736) is the resulting average from the calculations done for each observation.
Is it possible to access the result of the calculations done for each observation and save those as another variable?
Thanks.
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5008
#2

07 Jul 2014, 17:18

This is a weakness of margins, in my opinion. Slides 32 and 33 of

http://www3.nd.edu/~rwilliam/stats/Margins01.pdf

show how to do this for a categorical independent variable. The calculation is different for a continuous variable but I think it can be adapted. I've just doen this for dydx, not eyex.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#3

07 Jul 2014, 17:48

Here is how to do dydx for a continuous variable. I assume you can tweak this to do eyex.

Code:

webuse nhanes2f, clear clonevar xage = age sum xage gen xdelta = r(sd)/1000 logit diabetes i.female xage margins, dydx(xage) predict xage1 replace xage = xage + xdelta predict xage2 gen xme = (xage2 - xage1) / xdelta sum xme

You'd have to tweak if there is missing data or sample restrictions or whatever. Compare the results of the margins command and the last sum command and make sure they are virtually identical.

I have adapted this from section 10.6.10 of http://www.stata.com/bookstore/microeconometrics-stata/ . Their files can be downloaded, and the mus10 files show a more general solution if you have a bunch of continuous vars you want to do this for. For categorical variables use the approach I showed in my earlier link.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Maria-Ana Vitorino

Join Date: Jun 2014

Posts: 20
#4

07 Jul 2014, 18:18

Thanks! So basically, these have to be calculated by hand by re-calculating the probabilities at new values (after some delta change) for the variable of interest. A weakness of margins, indeed!

Last edited by Maria-Ana Vitorino; 07 Jul 2014, 18:22.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#5

07 Jul 2014, 18:35

Well, Stata will do the calculations for you so it isn't quite by hand!

I can't figure out how to do eyex in probit. The formula given in the Stata manual is eyex() = dydx * (x/y). I am guessing that means that

gen xeyex = xme * (xage/xage1)

i.e. you use the predicted value of P(Y = 1| X) but I could be wrong. That seems to come pretty close though. Maybe it should be xage2. Again,check the margins results versus the sum results. If anyone knows for sure I would be curious to hear it.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#6

07 Jul 2014, 18:38

The manual says "As margins always does with response functions, calculations are made at the observational level and are then averaged." Given that it is doing the individual-level calculations I don't know why it won't let you save them. The dydx's give you averages but across individual cases there can be a lot of variability in the effect of x on a change in y.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5008

07 Jul 2014, 18:46

Here is how it works in your example. I'm still not sure about the calculation of eyex but in this case it seems to come very close.

Code:

. webuse margex, clear
(Artificial data for margins)

. clonevar xage = age

. sum xage

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        xage |      3000      39.799    11.54174         20         60

. gen xdelta = r(sd)/1000

. probit outcome xage distance, nolog

Probit regression                                 Number of obs   =       3000
                                                  LR chi2(2)      =     594.51
                                                  Prob > chi2     =     0.0000
Log likelihood = -1068.8192                       Pseudo R2       =     0.2176

------------------------------------------------------------------------------
     outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        xage |   .0650308   .0032464    20.03   0.000     .0586679    .0713937
    distance |  -.0038913   .0013313    -2.92   0.003    -.0065007    -.001282
       _cons |  -3.702959   .1501843   -24.66   0.000    -3.997315   -3.408603
------------------------------------------------------------------------------

. margins, dydx(xage)

Average marginal effects                          Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()
dy/dx w.r.t. : xage

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        xage |   .0129155   .0005348    24.15   0.000     .0118673    .0139637
------------------------------------------------------------------------------

. margins, eyex(xage)

Average marginal effects                          Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()
ey/ex w.r.t. : xage

------------------------------------------------------------------------------
             |            Delta-method
             |      ey/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        xage |   4.325736   .3165899    13.66   0.000     3.705231    4.946241
------------------------------------------------------------------------------

. predict xage1
(option pr assumed; Pr(outcome))

. replace xage = xage + xdelta
(3000 real changes made)

. predict xage2
(option pr assumed; Pr(outcome))

. gen xme = (xage2 - xage1) / xdelta

. gen xeyex = xme * (xage/xage1)

. sum xme xeyex

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         xme |      3000    .0129204    .0090861   2.27e-08   .0259479
       xeyex |      3000     4.33014    1.679459   2.655209   11.87233

. 
end of do-file

.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Richard Williams

Join Date: Apr 2014
Posts: 5008

07 Jul 2014, 22:35

I found a formula for dydx for probit. This seems to work perfectly (for probit only):

Code:

* Probit specific
webuse margex, clear
probit outcome age distance, nolog
margins, dydx(age)
margins, eyex(age)
margins, eydx(age)
margins, dyex(age)
predict agepred
gen xdydx = normalden(invnorm(agepred)) * _b[age]
gen xeyex = xdydx * (age/agepred)
gen xeydx = xdydx * (1/agepred)
gen xdyex = xdydx * age
sum xdydx xeyex xeydx xdyex

Likewise this works perfectly for logit:

Code:

* Logit specific - Works perfectly
webuse margex, clear
logit outcome age distance, nolog
margins, dydx(age)
margins, eyex(age)
margins, eydx(age)
margins, dyex(age)
predict agepred
gen xdydx = agepred * (1 - agepred) * _b[age]
gen xeyex = xdydx * (age/agepred)
gen xeydx = xdydx * (1/agepred)
gen xdyex = xdydx * age
sum xdydx xeyex xeydx xdyex

At least, they work perfectly in these simple models. If you toss in interaction or squared terms it may get more complicated.

Here is the more general code -- Not quite s precise but will work when you don't have an exact formula for dydx. I think it would also be good if, say, you had interaction terms involved. I fixed a small error from earlier.

Code:

webuse margex, clear
clonevar xage = age
sum xage
gen xdelta = r(sd)/1000
probit outcome xage distance, nolog
margins, dydx(xage)
margins, eyex(xage)
margins, eydx(xage)
margins, dyex(xage)
predict xagepred1
replace xage = xage + xdelta
predict xagepred2
replace xage = age
gen xdydx = (xagepred2 - xagepred1) / xdelta
gen xeyex = xdydx * (xage/xagepred1)
gen xeydx = xdydx * (1/xagepred1)
gen xdyex = xdydx * xage
sum xdydx xeyex xeydx xdyex

Last edited by Richard Williams; 07 Jul 2014, 23:06.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Mark Schaffer

Join Date: Mar 2014

Posts: 324
#9

08 Jul 2014, 04:20

My nomination for "Thread of the month"! --Mark
Comment
Maria-Ana Vitorino

Join Date: Jun 2014

Posts: 20
#10

09 Jul 2014, 00:41

Hi Richard, This is great, thanks! It's a pity that one cannot do this with margins.
Right now, margins calculates the derivatives at the observational level and then averages them. But there is literature (e.g. Hensher, Rose and Greene, 2007) that advocates that the elasticities should not be just averaged across observations (which is called "naive pooling" but rather should be weighted by each observation's predicted probability ("probability weighted sample enumeration").
Given this and that margins does not have an option for generating such "weighted elasticities" automatically (which would be great!), it would be good to at least be able to save the elasticities at the observational level so that the users could then weigh them as they please.
For the logit model this is not so much of an issue given the closed form expression of many of the formulas. But for the probit it is a bit more cumbersome. Your code gets around this. To calculate the weighted elasticities then one only needs to add the following lines to your code:
egen xeyexW=wtmean(xeyex), weight(xagepred1)
sum xeyexW
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#11

09 Jul 2014, 06:59

Glad something useful came out of it! When I ran your code, I got the message "unknown egen function wtmean()". So, I did -findit wtmean-, and found '_GWTMEAN' at SSC, a 2001 program by David Kantor. Was that the right thing to do or should I have found wtmean elsewhere? Here is what I got with the Probit specific code above, does it match what you got?

Code:

. sum xeyexW Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- xeyexW | 3000 3.5793 0 3.5793 3.5793

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#12

09 Jul 2014, 07:11

I have a related Q of my own. I never actually use most of this stuff -- the main thing I use is dydx for categorical variables. The formulas I used are partly from p. 1168 of r.pdf. They include

eyex() = dy/dx * x/y
eydx() = dy/dx * 1/y

What happens when y is zero or, perhaps even worse, very very very close to 0? That may not have happened in the current example but it seems like it could happen in other examples or with other methods where y itself, rather than a function of y, is being used. I wonder if there is some other variation of these formulas that would work better.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Rieza Soelaeman

Join Date: Dec 2015

Posts: 5
#13

14 Apr 2016, 10:07

Can this code be used for multinomial probit as well, or would modifications need to be made to the code to account for which outcome is being predicted?
Comment

Announcement