Delta Method Standard Errors for average marginal effects using margins command

Andrew Tomarken

Join Date: Jul 2014

Posts: 2
#1

Delta Method Standard Errors for average marginal effects using margins command

24 Jul 2014, 19:17

My questions concerns how precisely standard errors for average marginal effects are computed by the stata margins command using the delta method. I have searched the documentation for specifics to no avail so I am turning to the list.

My data-analytic problem involves a logistic regression analysis (analyzed by the logit command) with two predictors: group (dichotomous factor variable) and covar (continuous covariate). My goal is to estimate the average marginal effects (AMEs) for each group (i.e., the average of the probabilities for the individual S's in each group) using the margins command. I am familiar with the delta method and have no trouble understanding or reproducing the estimates and standard errors in the stata output when attempting to estimate marginal effects for each group at the mean of the covariate (i.e. MEMs). One just uses the standard multivariate delta formula (transpose gradient times VC matrix times gradient) plugging in the mean for covar in the gradient component. However, I don't understand how the delta method standard error is generated when computing the average marginal effect across subjects. The computation of the AME estimate itself is not an issue -- it's the standard error computation that confuses me. My understanding is that the delta method would normally require entry of a specific value for covariate but I can't figure out what precisely that value would be when computing an AME (as opposed to a MEM). One might suspect that it's again the mean of covar and that results would not be identical to the MEM computation because of differing predicted probabilities in the MEM and AME cases. However, I've tried this and can't reproduce the standard error estimate for the AME. I've tried various approaches here to reproduce the stata output (e.g., computing the variances and standard errors for specific cases and averaging) but have not succeeded. Thanks for any help you might offer.

Andy Tomarken
Tags: None

Richard Williams

Join Date: Apr 2014
Posts: 5008

24 Jul 2014, 20:41

I don't know if this will help get you started or not. It clones how Stata produces the AMEs. How you go from there to the standard errors, I don't know.

Code:

version 11.1
set more off
webuse nhanes2f, clear
keep if !missing(diabetes, black, female, age, age2, agegrp)
label variable age2 "age squared"
* Compute the variables we will need
tab1 agegrp, gen(agegrp)
gen femage = female*age
label variable femage "female * age interaction"
label define black 0 "nonBlack" 1 "black"
label define female 0 "male" 1 "female"
label values female female
label values black black
logit diabetes i.black i.female age, nolog
* Average Adjusted Predictions (AAPs)
margins black female
* Average Marginal Effects (AMEs)
margins, dydx(black female)
* Replicate AME for black without using margins
clonevar xblack = black
quietly logit diabetes i.xblack i.female age, nolog
replace xblack = 0
predict adjpredwhite
replace xblack = 1
predict adjpredblack
gen meblack = adjpredblack - adjpredwhite
sum adjpredwhite adjpredblack meblack

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Andrew Tomarken

Join Date: Jul 2014

Posts: 2
#3

25 Jul 2014, 08:39

Dear Richard:
Thanks very much. Actually I believe I've seen that code in one of your handouts or papers that's on the web! I've found your work very helpful. The computation of the AME's is not my problem, unfortunately, I know what stata is doing there. It's the standard errors that have me confused when one of the predictors is a continuous covariate and you are interested in the AME for a dichotomous grouping variable. The key question for me is what specific value of the covariate is plugged into the derivative parts of the delta formula. When I've computed AME's in the past I've gone different routes for standard errors (either bootstrap, simulation from a multivariate normal model, or Bayesian which is my preferred approach). I'm just wondering how the delta method can be instantiated here.

Thanks very much again.

Andy Tomarken
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 701

25 Jul 2014, 15:52

Aside: This post is long. Please be aware that some code blocks are scrollable
as there is more content than is being shown.

Aside: In the following I use the nofvlabel option so the output lines
up with the expressions I employ. This is just a display option that is
common to margins and estimation commands. This option was
introduced in Stata 13 where we now show the value labels for factor
variables by default.

Here is an example similar to the one described by Andrew.

Code:

. webuse margex
(Artificial data for margins)

. logit outcome i.treatment distance, nofvlabel

Iteration 0:   log likelihood = -1366.0718  
Iteration 1:   log likelihood = -1257.5623  
Iteration 2:   log likelihood = -1244.2136  
Iteration 3:   log likelihood = -1242.8796  
Iteration 4:   log likelihood = -1242.8245  
Iteration 5:   log likelihood = -1242.8245  

Logistic regression                               Number of obs   =       3000
                                                  LR chi2(2)      =     246.49
                                                  Prob > chi2     =     0.0000
Log likelihood = -1242.8245                       Pseudo R2       =     0.0902

------------------------------------------------------------------------------
     outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 1.treatment |    1.42776    .113082    12.63   0.000     1.206124    1.649397
    distance |  -.0047747   .0011051    -4.32   0.000    -.0069406   -.0026088
       _cons |  -2.337762   .0962406   -24.29   0.000     -2.52639   -2.149134
------------------------------------------------------------------------------

Andrew wants to know how margins computes standard errors of average
marginal effects (AMEs). The remaining discussion has two parts. The first
describes how to compute AMEs and their SE estimates for factor variables, the
second is for continuous variables.

Factor variables

Since the average marginal effect of a 2-level factor variable is just a
difference between the two predictive margins, I will start with computing the
SEs for predictive margins. Here we use margins to compute
the predictive margins for the 2 levels of treatment:

Code:

. margins treatment, nofvlabel

Predictive margins                                Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   treatment |
          0  |   .0791146   .0069456    11.39   0.000     .0655016    .0927277
          1  |   .2600204   .0111772    23.26   0.000     .2381135    .2819272
------------------------------------------------------------------------------

I will make copies of two matrices from the margins saved results to
compare with later. r(Jacobian) is the Jacobian matrix, we will see
what this is shortly. r(V) is the estimated variance matrix that
corresponds with the reported predictive margins. The square root of the
diagonal elements are reported in the above column labeled
"Delta-method Std. Err.".

Code:

. matrix rJ = r(Jacobian)

. matrix rV = r(V)

. di sqrt(rV[1,1])
.00694557

. di sqrt(rV[2,2])
.01117718

The delta method allows us to obtain the appropriate standard errors of any
smooth function of the fitted model parameters. It basically involves
applying a Jacobian matrix to the estimated variance matrix of the fitted
model parameters. The Jacobian is a matrix of partial derivatives of the
statistics of interest with respect to each of the fitted model parameters.

Before we start taking derivatives, I want to show how the predictive margins
are essentially computed. This will provide us with the formulas from which
we will work out the derivatives that go into the Jacobian matrix. In the
following, p0 is the variable used to recreate the predictive margin
for 0.treatment, and p1 corresponds to 1.treatment. The
means of these variables are the predictive margins.

Code:

. gen p0 = invlogit(_b[0b.treatment] + _b[distance]*distance + _b[_cons])

. gen p1 = invlogit(_b[1.treatment] + _b[distance]*distance + _b[_cons])

. sum p0 p1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          p0 |      3000    .0791146    .0212555   .0026954   .0879477
          p1 |      3000    .2600204    .0688961    .011143   .2867554

Now we will make the calculations necessary to reproduce the Jacobian matrix,
which we will then use to reproduce the estimated variance matrix. The rows
identify what we are taking the partial derivative of; the columns identify
what we are taking the partial derivative with respect to. Thus the rows
correspond with the predictive margins and the columns correspond with the
fitted model parameters in the logistic regression.

Code:

. gen dp0dxb = p0*(1-p0)

. gen dp1dxb = p1*(1-p1)

. gen zero = 0

. gen one = 1

. matrix vecaccum J0 = dp0dxb zero zero distance

. matrix vecaccum J1 = dp1dxb zero one  distance

. matrix Jac = J0/e(N) \ J1/e(N)

. matrix V = Jac*e(V)*Jac'

. matrix list Jac

Jac[2,4]
             zero       zero   distance      _cons
dp0dxb          0          0  .74390659  .07240388
dp1dxb          0  .18766468  2.1907626  .18766468

. matrix list rJ

rJ[2,4]
               outcome:   outcome:   outcome:   outcome:
                    0b.         1.                      
             treatment  treatment   distance      _cons
0.treatment          0          0  .74390659  .07240388
1.treatment          0  .18766468  2.1907626  .18766468

. matrix list V

symmetric V[2,2]
            dp0dxb      dp1dxb
dp0dxb   .00004824
dp1dxb  -1.181e-07   .00012493

. matrix list rV

symmetric rV[2,2]
                      0.          1.
              treatment   treatment
0.treatment   .00004824
1.treatment  -1.181e-07   .00012493

Now let's compute the marginal effect of treatment. margins
keeps a placeholder for the base outcome of treatment; the Jacobian
and variance elements are merely set to zero.

Code:

. margins, dydx(treatment) nofvlabel

Average marginal effects                          Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()
dy/dx w.r.t. : 1.treatment

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 1.treatment |   .1809057   .0131684    13.74   0.000     .1550961    .2067153
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. matrix list r(Jacobian)

r(Jacobian)[2,4]
                outcome:   outcome:   outcome:   outcome:
                     0b.         1.                      
              treatment  treatment   distance      _cons
0b.treatment          0          0          0          0
 1.treatment          0  .18766468   1.446856  .11526081

. matrix list r(V)

symmetric r(V)[2,2]
                     0b.         1.
              treatment  treatment
0b.treatment          0
 1.treatment          0  .00017341

The marginal effect of treatment is just the difference between the
predictive margins. Here we take advantage of the fact that the difference of
the means is the mean of the differences.

Code:

. gen pdiff = p1 - p0

. sum pdiff

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       pdiff |      3000    .1809057    .0476499   .0084476   .1988077

The Jacobian is similarly composed from the previous calculations. We do so
ignoring the base level of treatment. Notice that the elements of
Jdiff match those of the second row of r(Jacobian) above.

Code:

. matrix Jdiff = (-1, 1)*Jac

. matrix list Jdiff

Jdiff[1,4]
         zero       zero   distance      _cons
r1          0  .18766468   1.446856  .11526081

. matrix Vdiff = Jdiff*e(V)*Jdiff'

. matrix list Vdiff

symmetric Vdiff[1,1]
           r1
r1  .00017341

Continuous variables

The marginal effect of a continuous variable is essentially computed the same
way, but there are more derivatives involved. Here we have margins
compute the average marginal effect of distance and save off copies of
r(Jacobian) and r(V).

Code:

. margins, dydx(distance)

Average marginal effects                          Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()
dy/dx w.r.t. : distance

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    distance |  -.0006228   .0001444    -4.31   0.000    -.0009058   -.0003399
------------------------------------------------------------------------------

. matrix rJ = r(Jacobian)

. matrix rV = r(V)

. di sqrt(rV[1,1])
.00014436

The average marginal effect of distance is the average of the
derivative of the prediction with respect to distance.

Code:

. predict p, pr

. gen dpdxb = p*(1-p)

. gen dpdx = dpdxb*_b[distance]

. sum dpdx

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        dpdx |      3000   -.0006228    .0003231  -.0009766  -.0000128

For the Jacobian, we must compute the partials of dpdx with respect to
the model parameters. The partial with respect to the coefficient for distance
has an extra piece.

Code:

. gen d2pdxb2 = p*(1-p)*(1-p) - p*p*(1-p)

. matrix vecaccum Jac = d2pdxb2 0b.treatment 1.treatment distance

. matrix Jac = Jac*_b[distance]/e(N)

. sum dpdxb

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       dpdxb |      3000    .1304451    .0676672   .0026901   .2045267

. matrix Jac[1,3] = Jac[1,3] + r(mean)

. matrix V = Jac*e(V)*Jac'

. matrix list Jac

Jac[1,4]
                 0b.          1.                        
          treatment   treatment    distance       _cons
d2pdxb2           0  -.00020228    .1256217  -.00034567

. matrix list rJ

rJ[1,4]
             outcome:    outcome:    outcome:    outcome:
                  0b.          1.                        
           treatment   treatment    distance       _cons
distance           0  -.00020228    .1256217  -.00034567

. matrix list V

symmetric V[1,1]
           d2pdxb2
d2pdxb2  2.084e-08

. matrix list rV

symmetric rV[1,1]
           distance
distance  2.084e-08

In summary we have shown how to compute discrete and continuous marginal
effects along with their corresponding standard error estimates using the
delta method. This is essentially what margins does in all cases,
except that is uses numerical derivatives for all but the linear prediction.

Comment

Inna Petrunyk

Join Date: Oct 2014

Posts: 31
#5

11 Oct 2014, 11:19

Dear Jeff,

My question relates to the Average Partial Effect (hereafter APE) and the Predicted Probability Ratio (hereafter PPR). I would be happy if you could help me understand these concepts in connection with the use of corresponding commands in Stata.

Let us assume that I study state dependence of being unemployed and my covariates are collgrad (1=have tertiary education, 0=do not have tertiary education) and ttl_exp (continuous variable of number of experience years). I have panel data and use a dynamic non-linear random effects probit model adopting the Wooldridge (or Heckman) estimator. I have to find the APE & PPR.

1) Is the Average Marginal Effect (hereafter AME) the same as the APE?
2) What is the difference between the APE and the APE at the mean of the explanatory variables (which I found in some papers)?
3) Let us I assume I type:

xtprobit unemployed_it lag_unemployed_it collgrad_it ttl_exp_it, re

I am interested in the difference between the probability of being unemployed in t being unemployed in t-1 and the probability of being unemployed in t being employed in t-1. Is it called the APE I am interested in? I assume I should type the following:

margins lag_unemployed

Assume I get:

lag_unemployed Margin
0 .0907686
1 .1447145

The predicted probability of lag_unemployed=0 is 0.0907686, the predicted probability of lag_unemployed=1 is 0.1447145, correct?

So, is the APE in this case: 0.1447-0.0908=0.0539?
The PPR is 0.1447/0.0908=1.594?

I would be very grateful to you if you could help me.

Best wishes
Inna

PS: my question relates to this post, should I start a new topic and not post my question here?

Last edited by Inna Petrunyk; 11 Oct 2014, 11:34.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#6

21 Oct 2014, 11:37

Please start a new thread. But, when doing so, please rephrase your questions with reference to equations in key papers on this topic, such as:
(1) "SIMPLE SOLUTIONS TO THE INITIAL CONDITIONS PROBLEM IN DYNAMIC, NONLINEAR PANEL DATA MODELS WITH UNOBSERVED HETEROGENEITY" by JEFFREY M. WOOLDRIDGE, JOURNAL OF APPLIED ECONOMETRICS 20: 39–54 (2005), especially sectiojn 5.1, and
(2) "THE INTERRELATED DYNAMICS OF UNEMPLOYMENT AND LOW-WAGE EMPLOYMENT" by MARK B. STEWART, JOURNAL OF APPLIED ECONOMETRICS, 22: 511–531 (2007), page 522
Inspection of these key formulae should be informative about whether margins can be used to calculate the expressions that you want.
[Inna sent me a private Message on this topic, identical to the one posted above. Sorry, but I am unwilling to respond privately to virtually such requests. If I am able to respond (and I have little time at present), I think it should be done publicly via the relevant Forum. Lack of response in the Forum can be taken as inability or unwillingness to respond!]
Comment
Adam C. Carle

Join Date: Oct 2014

Posts: 2
#7

02 Nov 2014, 04:01

Jeff, Thank you for that incredibly helpful explanation and set of code. Could you expand on that a bit and show how to get the marginal effect of a continuous variable at a specific value, for example: margins, at(distance=X)? I'd really appreciate it. Thanks in advance.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#8

02 Nov 2014, 08:50

A fantastic post and fantastic explanation (#4) by Jeff . Perhaps helpful for many others. Thanks Jeff.

Roman
Comment

Announcement