No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Delta Method Standard Errors for average marginal effects using margins command

    My questions concerns how precisely standard errors for average marginal effects are computed by the stata margins command using the delta method. I have searched the documentation for specifics to no avail so I am turning to the list.

    My data-analytic problem involves a logistic regression analysis (analyzed by the logit command) with two predictors: group (dichotomous factor variable) and covar (continuous covariate). My goal is to estimate the average marginal effects (AMEs) for each group (i.e., the average of the probabilities for the individual S's in each group) using the margins command. I am familiar with the delta method and have no trouble understanding or reproducing the estimates and standard errors in the stata output when attempting to estimate marginal effects for each group at the mean of the covariate (i.e. MEMs). One just uses the standard multivariate delta formula (transpose gradient times VC matrix times gradient) plugging in the mean for covar in the gradient component. However, I don't understand how the delta method standard error is generated when computing the average marginal effect across subjects. The computation of the AME estimate itself is not an issue -- it's the standard error computation that confuses me. My understanding is that the delta method would normally require entry of a specific value for covariate but I can't figure out what precisely that value would be when computing an AME (as opposed to a MEM). One might suspect that it's again the mean of covar and that results would not be identical to the MEM computation because of differing predicted probabilities in the MEM and AME cases. However, I've tried this and can't reproduce the standard error estimate for the AME. I've tried various approaches here to reproduce the stata output (e.g., computing the variances and standard errors for specific cases and averaging) but have not succeeded. Thanks for any help you might offer.

    Andy Tomarken

  • #2
    I don't know if this will help get you started or not. It clones how Stata produces the AMEs. How you go from there to the standard errors, I don't know.

    version 11.1
    set more off
    webuse nhanes2f, clear
    keep if !missing(diabetes, black, female, age, age2, agegrp)
    label variable age2 "age squared"
    * Compute the variables we will need
    tab1 agegrp, gen(agegrp)
    gen femage = female*age
    label variable femage "female * age interaction"
    label define black 0 "nonBlack" 1 "black"
    label define female 0 "male" 1 "female"
    label values female female
    label values black black
    logit diabetes i.female age, nolog
    * Average Adjusted Predictions (AAPs)
    margins black female
    * Average Marginal Effects (AMEs)
    margins, dydx(black female)
    * Replicate AME for black without using margins
    clonevar xblack = black
    quietly logit diabetes i.xblack i.female age, nolog
    replace xblack = 0
    predict adjpredwhite
    replace xblack = 1
    predict adjpredblack
    gen meblack = adjpredblack - adjpredwhite
    sum adjpredwhite adjpredblack meblack
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: rwilliam@ND.Edu


    • #3
      Dear Richard:
      Thanks very much. Actually I believe I've seen that code in one of your handouts or papers that's on the web! I've found your work very helpful. The computation of the AME's is not my problem, unfortunately, I know what stata is doing there. It's the standard errors that have me confused when one of the predictors is a continuous covariate and you are interested in the AME for a dichotomous grouping variable. The key question for me is what specific value of the covariate is plugged into the derivative parts of the delta formula. When I've computed AME's in the past I've gone different routes for standard errors (either bootstrap, simulation from a multivariate normal model, or Bayesian which is my preferred approach). I'm just wondering how the delta method can be instantiated here.

      Thanks very much again.

      Andy Tomarken


      • #4
        Aside: This post is long. Please be aware that some code blocks are scrollable
        as there is more content than is being shown.

        Aside: In the following I use the nofvlabel option so the output lines
        up with the expressions I employ. This is just a display option that is
        common to margins and estimation commands. This option was
        introduced in Stata 13 where we now show the value labels for factor
        variables by default.

        Here is an example similar to the one described by Andrew.

        . webuse margex
        (Artificial data for margins)
        . logit outcome i.treatment distance, nofvlabel
        Iteration 0:   log likelihood = -1366.0718  
        Iteration 1:   log likelihood = -1257.5623  
        Iteration 2:   log likelihood = -1244.2136  
        Iteration 3:   log likelihood = -1242.8796  
        Iteration 4:   log likelihood = -1242.8245  
        Iteration 5:   log likelihood = -1242.8245  
        Logistic regression                               Number of obs   =       3000
                                                          LR chi2(2)      =     246.49
                                                          Prob > chi2     =     0.0000
        Log likelihood = -1242.8245                       Pseudo R2       =     0.0902
             outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
         1.treatment |    1.42776    .113082    12.63   0.000     1.206124    1.649397
            distance |  -.0047747   .0011051    -4.32   0.000    -.0069406   -.0026088
               _cons |  -2.337762   .0962406   -24.29   0.000     -2.52639   -2.149134
        Andrew wants to know how margins computes standard errors of average
        marginal effects (AMEs). The remaining discussion has two parts. The first
        describes how to compute AMEs and their SE estimates for factor variables, the
        second is for continuous variables.

        Factor variables

        Since the average marginal effect of a 2-level factor variable is just a
        difference between the two predictive margins, I will start with computing the
        SEs for predictive margins. Here we use margins to compute
        the predictive margins for the 2 levels of treatment:

        . margins treatment, nofvlabel
        Predictive margins                                Number of obs   =       3000
        Model VCE    : OIM
        Expression   : Pr(outcome), predict()
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
           treatment |
                  0  |   .0791146   .0069456    11.39   0.000     .0655016    .0927277
                  1  |   .2600204   .0111772    23.26   0.000     .2381135    .2819272
        I will make copies of two matrices from the margins saved results to
        compare with later. r(Jacobian) is the Jacobian matrix, we will see
        what this is shortly. r(V) is the estimated variance matrix that
        corresponds with the reported predictive margins. The square root of the
        diagonal elements are reported in the above column labeled
        "Delta-method Std. Err.".

        . matrix rJ = r(Jacobian)
        . matrix rV = r(V)
        . di sqrt(rV[1,1])
        . di sqrt(rV[2,2])
        The delta method allows us to obtain the appropriate standard errors of any
        smooth function of the fitted model parameters. It basically involves
        applying a Jacobian matrix to the estimated variance matrix of the fitted
        model parameters. The Jacobian is a matrix of partial derivatives of the
        statistics of interest with respect to each of the fitted model parameters.

        Before we start taking derivatives, I want to show how the predictive margins
        are essentially computed. This will provide us with the formulas from which
        we will work out the derivatives that go into the Jacobian matrix. In the
        following, p0 is the variable used to recreate the predictive margin
        for 0.treatment, and p1 corresponds to 1.treatment. The
        means of these variables are the predictive margins.

        . gen p0 = invlogit(_b[0b.treatment] + _b[distance]*distance + _b[_cons])
        . gen p1 = invlogit(_b[1.treatment] + _b[distance]*distance + _b[_cons])
        . sum p0 p1
            Variable |       Obs        Mean    Std. Dev.       Min        Max
                  p0 |      3000    .0791146    .0212555   .0026954   .0879477
                  p1 |      3000    .2600204    .0688961    .011143   .2867554
        Now we will make the calculations necessary to reproduce the Jacobian matrix,
        which we will then use to reproduce the estimated variance matrix. The rows
        identify what we are taking the partial derivative of; the columns identify
        what we are taking the partial derivative with respect to. Thus the rows
        correspond with the predictive margins and the columns correspond with the
        fitted model parameters in the logistic regression.

        . gen dp0dxb = p0*(1-p0)
        . gen dp1dxb = p1*(1-p1)
        . gen zero = 0
        . gen one = 1
        . matrix vecaccum J0 = dp0dxb zero zero distance
        . matrix vecaccum J1 = dp1dxb zero one  distance
        . matrix Jac = J0/e(N) \ J1/e(N)
        . matrix V = Jac*e(V)*Jac'
        . matrix list Jac
                     zero       zero   distance      _cons
        dp0dxb          0          0  .74390659  .07240388
        dp1dxb          0  .18766468  2.1907626  .18766468
        . matrix list rJ
                       outcome:   outcome:   outcome:   outcome:
                            0b.         1.                      
                     treatment  treatment   distance      _cons
        0.treatment          0          0  .74390659  .07240388
        1.treatment          0  .18766468  2.1907626  .18766468
        . matrix list V
        symmetric V[2,2]
                    dp0dxb      dp1dxb
        dp0dxb   .00004824
        dp1dxb  -1.181e-07   .00012493
        . matrix list rV
        symmetric rV[2,2]
                              0.          1.
                      treatment   treatment
        0.treatment   .00004824
        1.treatment  -1.181e-07   .00012493
        Now let's compute the marginal effect of treatment. margins
        keeps a placeholder for the base outcome of treatment; the Jacobian
        and variance elements are merely set to zero.

        . margins, dydx(treatment) nofvlabel
        Average marginal effects                          Number of obs   =       3000
        Model VCE    : OIM
        Expression   : Pr(outcome), predict()
        dy/dx w.r.t. : 1.treatment
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
         1.treatment |   .1809057   .0131684    13.74   0.000     .1550961    .2067153
        Note: dy/dx for factor levels is the discrete change from the base level.
        . matrix list r(Jacobian)
                        outcome:   outcome:   outcome:   outcome:
                             0b.         1.                      
                      treatment  treatment   distance      _cons
        0b.treatment          0          0          0          0
         1.treatment          0  .18766468   1.446856  .11526081
        . matrix list r(V)
        symmetric r(V)[2,2]
                             0b.         1.
                      treatment  treatment
        0b.treatment          0
         1.treatment          0  .00017341
        The marginal effect of treatment is just the difference between the
        predictive margins. Here we take advantage of the fact that the difference of
        the means is the mean of the differences.

        . gen pdiff = p1 - p0
        . sum pdiff
            Variable |       Obs        Mean    Std. Dev.       Min        Max
               pdiff |      3000    .1809057    .0476499   .0084476   .1988077
        The Jacobian is similarly composed from the previous calculations. We do so
        ignoring the base level of treatment. Notice that the elements of
        Jdiff match those of the second row of r(Jacobian) above.

        . matrix Jdiff = (-1, 1)*Jac
        . matrix list Jdiff
                 zero       zero   distance      _cons
        r1          0  .18766468   1.446856  .11526081
        . matrix Vdiff = Jdiff*e(V)*Jdiff'
        . matrix list Vdiff
        symmetric Vdiff[1,1]
        r1  .00017341
        Continuous variables

        The marginal effect of a continuous variable is essentially computed the same
        way, but there are more derivatives involved. Here we have margins
        compute the average marginal effect of distance and save off copies of
        r(Jacobian) and r(V).

        . margins, dydx(distance)
        Average marginal effects                          Number of obs   =       3000
        Model VCE    : OIM
        Expression   : Pr(outcome), predict()
        dy/dx w.r.t. : distance
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
            distance |  -.0006228   .0001444    -4.31   0.000    -.0009058   -.0003399
        . matrix rJ = r(Jacobian)
        . matrix rV = r(V)
        . di sqrt(rV[1,1])
        The average marginal effect of distance is the average of the
        derivative of the prediction with respect to distance.

        . predict p, pr
        . gen dpdxb = p*(1-p)
        . gen dpdx = dpdxb*_b[distance]
        . sum dpdx
            Variable |       Obs        Mean    Std. Dev.       Min        Max
                dpdx |      3000   -.0006228    .0003231  -.0009766  -.0000128
        For the Jacobian, we must compute the partials of dpdx with respect to
        the model parameters. The partial with respect to the coefficient for distance
        has an extra piece.

        . gen d2pdxb2 = p*(1-p)*(1-p) - p*p*(1-p)
        . matrix vecaccum Jac = d2pdxb2 0b.treatment 1.treatment distance
        . matrix Jac = Jac*_b[distance]/e(N)
        . sum dpdxb
            Variable |       Obs        Mean    Std. Dev.       Min        Max
               dpdxb |      3000    .1304451    .0676672   .0026901   .2045267
        . matrix Jac[1,3] = Jac[1,3] + r(mean)
        . matrix V = Jac*e(V)*Jac'
        . matrix list Jac
                         0b.          1.                        
                  treatment   treatment    distance       _cons
        d2pdxb2           0  -.00020228    .1256217  -.00034567
        . matrix list rJ
                     outcome:    outcome:    outcome:    outcome:
                          0b.          1.                        
                   treatment   treatment    distance       _cons
        distance           0  -.00020228    .1256217  -.00034567
        . matrix list V
        symmetric V[1,1]
        d2pdxb2  2.084e-08
        . matrix list rV
        symmetric rV[1,1]
        distance  2.084e-08
        In summary we have shown how to compute discrete and continuous marginal
        effects along with their corresponding standard error estimates using the
        delta method. This is essentially what margins does in all cases,
        except that is uses numerical derivatives for all but the linear prediction.


        • #5
          Dear Jeff,

          My question relates to the Average Partial Effect (hereafter APE) and the Predicted Probability Ratio (hereafter PPR). I would be happy if you could help me understand these concepts in connection with the use of corresponding commands in Stata.

          Let us assume that I study state dependence of being unemployed and my covariates are collgrad (1=have tertiary education, 0=do not have tertiary education) and ttl_exp (continuous variable of number of experience years). I have panel data and use a dynamic non-linear random effects probit model adopting the Wooldridge (or Heckman) estimator. I have to find the APE & PPR.

          1) Is the Average Marginal Effect (hereafter AME) the same as the APE?
          2) What is the difference between the APE and the APE at the mean of the explanatory variables (which I found in some papers)?
          3) Let us I assume I type:

          xtprobit unemployed_it lag_unemployed_it collgrad_it ttl_exp_it, re

          I am interested in the difference between the probability of being unemployed in t being unemployed in t-1 and the probability of being unemployed in t being employed in t-1. Is it called the APE I am interested in? I assume I should type the following:

          margins lag_unemployed

          Assume I get:

          lag_unemployed Margin
          0 .0907686
          1 .1447145

          The predicted probability of lag_unemployed=0 is 0.0907686, the predicted probability of lag_unemployed=1 is 0.1447145, correct?

          So, is the APE in this case: 0.1447-0.0908=0.0539?
          The PPR is 0.1447/0.0908=1.594?

          I would be very grateful to you if you could help me.

          Best wishes

          PS: my question relates to this post, should I start a new topic and not post my question here?
          Last edited by Inna Petrunyk; 11 Oct 2014, 12:34.


          • #6
            Please start a new thread. But, when doing so, please rephrase your questions with reference to equations in key papers on this topic, such as:
            (2) "
            Inspection of these key formulae should be informative about whether margins can be used to calculate the expressions that you want.
            [Inna sent me a private Message on this topic, identical to the one posted above. Sorry, but I am unwilling to respond privately to virtually such requests. If I am able to respond (and I have little time at present), I think it should be done publicly via the relevant Forum. Lack of response in the Forum can be taken as inability or unwillingness to respond!]


            • #7
              Jeff, Thank you for that incredibly helpful explanation and set of code. Could you expand on that a bit and show how to get the marginal effect of a continuous variable at a specific value, for example: margins, at(distance=X)? I'd really appreciate it. Thanks in advance.


              • #8
                A fantastic post and fantastic explanation (#4) by Jeff . Perhaps helpful for many others. Thanks Jeff.