Issue with predictnl

Dario Maimone Ansaldo Patti

Join Date: Aug 2014
Posts: 505

Issue with predictnl

29 Jan 2019, 09:43

Dear All,

I am going through the paper by Karaka-Mandic, Norton and Down (2012)

"Interaction terms in Nonlinear Models" (downloadable from here). In order to calculate the marginal effect of the interaction term they suggest three approached, the last one using predictnl. My point is not to discuss about the alternative procedures but refers to a calculation issue with Stata. I use Stata 15.1 MP.

I run the following:

Code:

webuse margex

Code:

gen female=(sex==1)
gen agefem=age*female
logit outcome age fem agefem

I obtain the following:

Code:

. logit outcome age female agefem

Iteration 0:   log likelihood = -1366.0718 
Iteration 1:   log likelihood = -1130.6519 
Iteration 2:   log likelihood = -1086.7145 
Iteration 3:   log likelihood =   -1084.73 
Iteration 4:   log likelihood = -1084.7241 
Iteration 5:   log likelihood = -1084.7241 

Logistic regression                             Number of obs     =      3,000
                                                LR chi2(3)        =     562.70
                                                Prob > chi2       =     0.0000
Log likelihood = -1084.7241                     Pseudo R2         =     0.2060

------------------------------------------------------------------------------
     outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .110599    .010689    10.35   0.000      .089649     .131549
      female |     1.3517    .622081     2.17   0.030     .1324438    2.570957
      agefem |  -.0104589   .0130144    -0.80   0.422    -.0359667    .0150489
       _cons |  -7.030922   .5024759   -13.99   0.000    -8.015757   -6.046088
------------------------------------------------------------------------------

This perfectly matches the results in the paper. The next step is to calculate the following using the formula in the paper:

Code:

predictnl phat=(_b[age]+_b[agefem])* ///
(1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age))))* ///
(1-(1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age))))) ///
-_b[age]*(1/(1+exp(-(_b[_cons]+_b[age]*age))))* ///
(1-(1/(1+exp(-(_b[_cons]+_b[age]*age))))), se(phat_se)

The summary statistics of phat and phat_se are reported below:

Code:


. su phat phat_se

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        phat |      3,000    .0040705    .0022295  -.0022299   .0069318
     phat_se |      3,000    .0016307    .0012906   .0003856   .0039043

Again results match those in the paper. Here is the point. Since the formula above is long, I want to break it down in different parts. So I do the following

Code:

gen a= (1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age))))
gen b=(1-(1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age)))))
gen c=(1/(1+exp(-(_b[_cons]+_b[age]*age))))
gen d=(1-(1/(1+exp(-(_b[_cons]+_b[age]*age)))))
predictnl phat2=(_b[age]+_b[agefem])*a*b-_b[age]*c*d, se(phat2_se)

The summary statistics of phat2 and phat2_se are below:

Code:


. su phat2 phat2_se

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       phat2 |      3,000    .0040705    .0022295  -.0022299   .0069318
    phat2_se |      3,000    .0014025    .0009636   .0001979   .0031421

While the means of phat and phat2 are identical, the means of s.e. are not! Why does it happen? I am getting crazy to figure out the reason. Any help would be highly appreciated. THanks in advance

Tags: predictnl

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

29 Jan 2019, 10:25

My intuition here is that the first calculation is the correct one, because Stata is aware of what is an estimate, and correctly takes into accounts its sampling error. In the second calculation where you pre-calculate a,b,c,d the output is wrong, because when you do this, you cheat Stata into thinking that things which are estimated (parameter estimates having sampling variance) are known for certain.

I am actually surprised that the two things come so close together.
1 like
Comment
Dario Maimone Ansaldo Patti

Join Date: Aug 2014

Posts: 505
#3

29 Jan 2019, 13:39

Joro Kolev thanks for your help.
Comment

Announcement

Issue with predictnl

Comment

Comment