Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with predictnl

    Dear All,

    I am going through the paper by Karaka-Mandic, Norton and Down (2012)
    "Interaction terms in Nonlinear Models" (downloadable from here). In order to calculate the marginal effect of the interaction term they suggest three approached, the last one using predictnl. My point is not to discuss about the alternative procedures but refers to a calculation issue with Stata. I use Stata 15.1 MP.

    I run the following:

    Code:
    webuse margex
    Code:
    gen female=(sex==1)
    gen agefem=age*female
    logit outcome age fem agefem
    I obtain the following:

    Code:
    . logit outcome age female agefem
    
    Iteration 0:   log likelihood = -1366.0718 
    Iteration 1:   log likelihood = -1130.6519 
    Iteration 2:   log likelihood = -1086.7145 
    Iteration 3:   log likelihood =   -1084.73 
    Iteration 4:   log likelihood = -1084.7241 
    Iteration 5:   log likelihood = -1084.7241 
    
    Logistic regression                             Number of obs     =      3,000
                                                    LR chi2(3)        =     562.70
                                                    Prob > chi2       =     0.0000
    Log likelihood = -1084.7241                     Pseudo R2         =     0.2060
    
    ------------------------------------------------------------------------------
         outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |    .110599    .010689    10.35   0.000      .089649     .131549
          female |     1.3517    .622081     2.17   0.030     .1324438    2.570957
          agefem |  -.0104589   .0130144    -0.80   0.422    -.0359667    .0150489
           _cons |  -7.030922   .5024759   -13.99   0.000    -8.015757   -6.046088
    ------------------------------------------------------------------------------
    This perfectly matches the results in the paper. The next step is to calculate the following using the formula in the paper:

    Code:
    predictnl phat=(_b[age]+_b[agefem])* ///
    (1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age))))* ///
    (1-(1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age))))) ///
    -_b[age]*(1/(1+exp(-(_b[_cons]+_b[age]*age))))* ///
    (1-(1/(1+exp(-(_b[_cons]+_b[age]*age))))), se(phat_se)
    The summary statistics of phat and phat_se are reported below:

    Code:
    
    . su phat phat_se
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
            phat |      3,000    .0040705    .0022295  -.0022299   .0069318
         phat_se |      3,000    .0016307    .0012906   .0003856   .0039043
    Again results match those in the paper. Here is the point. Since the formula above is long, I want to break it down in different parts. So I do the following

    Code:
    gen a= (1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age))))
    gen b=(1-(1/(1+exp(-(_b[_cons]+_b[age]*age+_b[female]+_b[agefem]*age)))))
    gen c=(1/(1+exp(-(_b[_cons]+_b[age]*age))))
    gen d=(1-(1/(1+exp(-(_b[_cons]+_b[age]*age)))))
    predictnl phat2=(_b[age]+_b[agefem])*a*b-_b[age]*c*d, se(phat2_se)
    The summary statistics of phat2 and phat2_se are below:

    Code:
    
    . su phat2 phat2_se
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           phat2 |      3,000    .0040705    .0022295  -.0022299   .0069318
        phat2_se |      3,000    .0014025    .0009636   .0001979   .0031421
    While the means of phat and phat2 are identical, the means of s.e. are not! Why does it happen? I am getting crazy to figure out the reason. Any help would be highly appreciated. THanks in advance


  • #2
    My intuition here is that the first calculation is the correct one, because Stata is aware of what is an estimate, and correctly takes into accounts its sampling error. In the second calculation where you pre-calculate a,b,c,d the output is wrong, because when you do this, you cheat Stata into thinking that things which are estimated (parameter estimates having sampling variance) are known for certain.

    I am actually surprised that the two things come so close together.

    Comment


    • #3
      ​​​​​​​Joro Kolev thanks for your help.

      Comment

      Working...
      X