Finding the difference between predicted probabilities of the subpopulation of the outcome of a logit model.

Kehinde Atoloye

Join Date: Nov 2021
Posts: 119

Finding the difference between predicted probabilities of the subpopulation of the outcome of a logit model.

15 Jun 2023, 13:23

I am trying to conduct a difference-in-difference regression model for a binary outcome. So I came across this work Difference-in-differences with an ordinal dependent variable : assessing the impact of the London bombings on the safety perceptions of Muslims (whiterose.ac.uk) that implements Interaction terms in logit and probit models - ScienceDirect non-linear approach. the former implemented the latter to an ordinal outcome model using probit.

Below is the code from the former (I do not have the data):

Code:

oprobit Y D T DT x1 x2, vce(robust)
margins, dydx(DT) vce(unconditional) subpop(DT) post
nlcom [DT]1. predict - [DT]2. predict

I understand that I may not use this directly for a binary outcome since a binary outcome is not ordered but I tried to see if I could apply same using logit but there is problem in generating predicted probabilities for subpopulations of the binary outcome.

Below is the code I tried with [0._predict] not found error. Your assistance will be much appreciated. Thanks. Attached is a sample data.

Code:

logit savemoney kd_did i.age i.period i.progexp [pw=kdweight] if state==2, vce(robust) 
margin , dydx(kd_did) vce(unconditional) over(kd_did) post
nlcom [kd_did]1._predict - [kd_did]2._predict

With the following output:

Code:

. logit savemoney kd_did i.age i.period i.progexp [pw=kdweight] if state==2, vce(robust) 

Iteration 0:   log pseudolikelihood = -701.54708  
Iteration 1:   log pseudolikelihood =  -605.8859  
Iteration 2:   log pseudolikelihood = -587.40382  
Iteration 3:   log pseudolikelihood = -586.56099  
Iteration 4:   log pseudolikelihood = -586.55551  
Iteration 5:   log pseudolikelihood = -586.55551  

Logistic regression                                     Number of obs =  1,168
                                                        Wald chi2(8)  =  41.97
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -586.55551                       Pseudo R2     = 0.1639

-------------------------------------------------------------------------------------
                    |               Robust
          savemoney | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------------+----------------------------------------------------------------
             kd_did |   3.167547   .6370738     4.97   0.000     1.918906    4.416189
                    |
                age |
                16  |   .6278054    .947813     0.66   0.508    -1.229874    2.485485
                17  |    2.00357   .9929915     2.02   0.044     .0573424    3.949798
                18  |    .867744   .8106324     1.07   0.284    -.7210664    2.456554
                19  |   .7230419   .7988291     0.91   0.365    -.8426343    2.288718
                20  |   1.230873   .9438616     1.30   0.192    -.6190615    3.080808
                    |
             period |
           Endline  |  -.4932012   .2819289    -1.75   0.080    -1.045772    .0593693
                    |
            progexp |
Intervention Group  |   .1317869   .3128276     0.42   0.674    -.4813439    .7449178
              _cons |   .4604072   .8164822     0.56   0.573    -1.139869    2.060683
-------------------------------------------------------------------------------------

. margin , dydx(kd_did) vce(unconditional) over(kd_did) post // subpop(kd_did) post

Average marginal effects                                 Number of obs = 1,168

Expression: Pr(savemoney), predict()
dy/dx wrt:  kd_did
Over:       kd_did

------------------------------------------------------------------------------
             |            Unconditional
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
kd_did       |
      kd_did |
          0  |   .6109751   .1309218     4.67   0.000     .3543731    .8675772
          1  |   .0520365   .0187694     2.77   0.006     .0152492    .0888239
------------------------------------------------------------------------------

. nlcom [kd_did]0._predict - [kd_did]1._predict

[0._predict] not found

Below is my data:

Code:

1 1 19 1 1     1.1519125 2
1 1 18 1 1     1.2058938 2
. 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
1 0 19 0 1             1 2
1 1 19 1 1      .9901584 2
0 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 20 1 1 1.2315145e-08 2
1 0 19 0 1             1 2
1 1 19 1 1       .747017 2
1 0 19 0 1             1 2
. 0 19 0 1             1 2
1 0 19 0 1             1 2
1 0 19 0 1             1 2
1 0 19 0 1             1 2
1 1 19 1 1      1.002548 2
1 1 19 1 1      .6983826 2
1 0 19 0 1             1 2
1 1 19 1 1      .9901584 2
. 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
1 1 19 1 1       .747017 2
. 0 19 0 1             1 2
. 0 19 0 1             1 2
. 0 17 0 1             1 2
1 1 15 1 1     1.1977272 2
. 0 19 0 1             1 2
1 1 18 1 1      .9097768 2
1 1 19 1 1     .58043873 2
1 0 19 0 1             1 2
1 1 19 1 1      .6983826 2
1 0 18 0 1             1 2
1 1 19 1 1      .9901584 2
1 0 19 0 1             1 2
. 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
1 1 19 1 1      .6983826 2
1 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
. 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1      .6983826 2
. 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
. 0 18 0 1             1 2
1 1 19 1 1     .58043873 2
. 0 18 0 1             1 2
1 1 18 1 1      .9097768 2
. 0 17 0 1             1 2
. 0 19 0 1             1 2
. 0 18 0 1             1 2
1 1 19 1 1      .6983826 2
. 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1     1.7102258 2
. 0 18 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1      .7460096 2
1 0 19 0 1             1 2
1 1 20 1 1  7.219235e-09 2
. 0 17 0 1             1 2
1 1 17 1 1      .8503983 2
1 1 19 1 1      .9901584 2
. 0 16 0 1             1 2
1 1 19 1 1       .747017 2
. 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
. 0 19 0 1             1 2
1 1 19 1 1     .58043873 2
. 0 19 0 1             1 2
1 0 16 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1      .9901584 2
. 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1      .8950461 2
. 0 17 0 1             1 2
1 1 17 1 1 1.5221377e-07 2
. 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
1 1 19 1 1      .6983826 2
. 0 19 0 1             1 2
1 1 19 1 1      .6983826 2
. 0 19 0 1             1 2
1 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
1 0 19 0 1             1 2
1 1 19 1 1     1.1913565 2
. 0 19 0 1             1 2
. 0 19 0 1             1 2
1 1 19 1 1      .8988092 2
1 0 19 0 1             1 2
1 1 20 1 1  9.316057e-09 2
1 0 19 0 1             1 2
1 1 20 1 1   6.00004e-09 2
1 0 18 0 1             1 2
end
label values savemoney yesno2
label def yesno2 0 "No", modify
label def yesno2 1 "Yes", modify
label values period period
label def period 0 "Baseline", modify
label def period 1 "Endline", modify
label values progexp exposure
label def exposure 1 "Intervention Group", modify
label values state state
label def state 2 "Kaduna", modify

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10284
#2

15 Jun 2023, 19:14

If the question is how to refer to coefficients from margins, then note that the option -coeflegend- is allowed.

Code:

margins, dydx(kd_did) vce(unconditional) over(kd_did) post coeflegend
Comment

Kehinde Atoloye

Join Date: Nov 2021
Posts: 119

15 Jun 2023, 22:21

Originally posted by Andrew Musau View Post

If the question is how to refer to coefficients from margins, then note that the option -coeflegend- is allowed.

Code:

margins, dydx(kd_did) vce(unconditional) over(kd_did) post coeflegend

Thank you. The coeflegend worked.

Code:

logit savemoney kd_did i.age i.period i.progexp [pw=kdweight] if state==2, vce(robust) 
margins, dydx(kd_did) vce(unconditional) over(kd_did) post 
nlcom _b[kd_did:0bn.kd_did] - _b[kd_did:1.kd_did]

 logit savemoney kd_did i.age i.period i.progexp [pw=kdweight] if state==2, vce(robust) 

Iteration 0:   log pseudolikelihood = -701.54708  
Iteration 1:   log pseudolikelihood =  -605.8859  
Iteration 2:   log pseudolikelihood = -587.40382  
Iteration 3:   log pseudolikelihood = -586.56099  
Iteration 4:   log pseudolikelihood = -586.55551  
Iteration 5:   log pseudolikelihood = -586.55551  

Logistic regression                                     Number of obs =  1,168
                                                        Wald chi2(8)  =  41.97
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -586.55551                       Pseudo R2     = 0.1639

-------------------------------------------------------------------------------------
                    |               Robust
          savemoney | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------------+----------------------------------------------------------------
             kd_did |   3.167547   .6370738     4.97   0.000     1.918906    4.416189
                    |
                age |
                16  |   .6278054    .947813     0.66   0.508    -1.229874    2.485485
                17  |    2.00357   .9929915     2.02   0.044     .0573424    3.949798
                18  |    .867744   .8106324     1.07   0.284    -.7210664    2.456554
                19  |   .7230419   .7988291     0.91   0.365    -.8426343    2.288718
                20  |   1.230873   .9438616     1.30   0.192    -.6190615    3.080808
                    |
             period |
           Endline  |  -.4932012   .2819289    -1.75   0.080    -1.045772    .0593693
                    |
            progexp |
Intervention Group  |   .1317869   .3128276     0.42   0.674    -.4813439    .7449178
              _cons |   .4604072   .8164822     0.56   0.573    -1.139869    2.060683
-------------------------------------------------------------------------------------

. margins, dydx(kd_did) vce(unconditional) over(kd_did) post 

Average marginal effects                                 Number of obs = 1,168

Expression: Pr(savemoney), predict()
dy/dx wrt:  kd_did
Over:       kd_did

------------------------------------------------------------------------------
             |            Unconditional
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
kd_did       |
      kd_did |
          0  |   .6109751   .1309218     4.67   0.000     .3543731    .8675772
          1  |   .0520365   .0187694     2.77   0.006     .0152492    .0888239
------------------------------------------------------------------------------

. nlcom _b[kd_did:0bn.kd_did] - _b[kd_did:1.kd_did]

       _nl_1: _b[kd_did:0bn.kd_did] - _b[kd_did:1.kd_did]

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _nl_1 |   .5589386   .1421614     3.93   0.000     .2803073    .8375699
------------------------------------------------------------------------------

but I am now wondering if the margins and nlcom line did a similar job as shown when I used ologit below. What I actually wanted are the predict values at savemoney=0 and at savemoney=1 as shown below with ologit but it seems that is not the case with the logit's result. Is it possible to have such with logit? Thanks.

Code:

ologit savemoney kd_did i.age i.highedulevel i.ethnic i.childnum i.period i.progexp [pw=kdweight] if state==2, vce(robust)
margins, dydx(kd_did) vce(unconditional) subpop(kd_did) post
nlcom [kd_did]1._predict - [kd_did]2._predict

. ologit savemoney kd_did i.age i.highedulevel i.ethnic i.childnum i.period i.progexp [pw=kdweight] if state==2, vce(robust)

Iteration 0:   log pseudolikelihood = -701.54708  
Iteration 1:   log pseudolikelihood =  -596.1297  
Iteration 2:   log pseudolikelihood = -575.52064  
Iteration 3:   log pseudolikelihood = -574.51627  
Iteration 4:   log pseudolikelihood = -574.50081  
Iteration 5:   log pseudolikelihood = -574.49762  
Iteration 6:   log pseudolikelihood =  -574.4969  
Iteration 7:   log pseudolikelihood = -574.49672  
Iteration 8:   log pseudolikelihood = -574.49668  
Iteration 9:   log pseudolikelihood = -574.49668  

Ordered logistic regression                             Number of obs =  1,168
                                                        Wald chi2(20) = 395.04
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -574.49668                       Pseudo R2     = 0.1811

------------------------------------------------------------------------------------------
                         |               Robust
               savemoney | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------------------+----------------------------------------------------------------
                  kd_did |   3.212439   .6423242     5.00   0.000     1.953507    4.471371
                         |
                     age |
                     16  |    .547495   .9666026     0.57   0.571    -1.347011    2.442001
                     17  |   1.742254   1.041358     1.67   0.094    -.2987712    3.783279
                     18  |   .5793496   .8698062     0.67   0.505    -1.125439    2.284138
                     19  |    .473355   .8706683     0.54   0.587    -1.233124    2.179834
                     20  |   .9644165   1.006985     0.96   0.338    -1.009237     2.93807
                         |
            highedulevel |
             Islamiyyah  |  -.1427443   .6020627    -0.24   0.813    -1.322765    1.037277
                Primary  |   .2401711   .6220503     0.39   0.699     -.979025    1.459367
Junior Secondary School  |   .3426247   .5975378     0.57   0.566    -.8285279    1.513777
Senior Secondary School  |   .8186172   .5759525     1.42   0.155    -.3102289    1.947463
        Above secondary  |     .31397   .6309992     0.50   0.619    -.9227657    1.550706
                         |
                  ethnic |
                  Hausa  |  -11.57713    .974074   -11.89   0.000    -13.48628   -9.667984
                 Fulani  |  -11.99326   1.048266   -11.44   0.000    -14.04783   -9.938699
                   Igbo  |   1.492786   1.497149     1.00   0.319    -1.441572    4.427144
                 Others  |  -12.00018     1.0283   -11.67   0.000    -14.01561   -9.984751
                         |
                childnum |
              One Child  |   .4441028   .3680415     1.21   0.228    -.2772453    1.165451
           Two Children  |    .159522   .3825836     0.42   0.677    -.5903281    .9093721
 Three or more Children  |   .0907047   .4039654     0.22   0.822     -.701053    .8824624
                         |
                  period |
                Endline  |  -.4846891   .2863782    -1.69   0.091     -1.04598    .0766019
                         |
                 progexp |
     Intervention Group  |   .1447468   .3206002     0.45   0.652     -.483618    .7731117
-------------------------+----------------------------------------------------------------
                   /cut1 |  -11.46735   1.390522                     -14.19272   -8.741976
------------------------------------------------------------------------------------------
Note: 2 observations completely determined. Standard errors questionable.

. 
. margins, dydx(kd_did) vce(unconditional) subpop(kd_did) post

Average marginal effects                               Number of obs   = 1,168
                                                       Subpop. no. obs =   474

dy/dx wrt: kd_did

1._predict: Pr(savemoney==0), predict(pr outcome(0))
2._predict: Pr(savemoney==1), predict(pr outcome(1))

------------------------------------------------------------------------------
             |            Unconditional
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
kd_did       |
    _predict |
          1  |  -.0526479   .0191132    -2.75   0.006    -.0901091   -.0151867
          2  |   .0526479   .0191132     2.75   0.006     .0151867    .0901091
------------------------------------------------------------------------------

. 
. nlcom [kd_did]1._predict - [kd_did]2._predict

       _nl_1: [kd_did]1._predict - [kd_did]2._predict

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _nl_1 |  -.1052958   .0382265    -2.75   0.006    -.1802183   -.0303733
------------------------------------------------------------------------------

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#4

15 Jun 2023, 22:41

What I actually wanted are the predict values at savemoney=0 and at savemoney=1 as shown below with ologit but it seems that is not the case with the logit's result.

In both the -logit- and -ologit- cases what -margins- is giving you is the marginal effect of variable kd_did on the probability of savemoney, not the expected values of savemoney.

To get the expected values, the -margins- command should be:

Code:

margins, vce(unconditional) subpop(kd_did) post

By the way, the use of -subpop(kd_did)- means that your -margins- calculations are restricted to the observations for which variable kd_did is non-missing and also non-zero. You will be excluding any observations in which kd_did == 0 or missing value from the calculation. That's perfectly fine if that's what you intend, but it isn't something commonly done. So just wanted to make sure it wasn't a misunderstanding.

Last edited by Clyde Schechter; 15 Jun 2023, 22:44.
Comment
Kehinde Atoloye

Join Date: Nov 2021

Posts: 119
#5

16 Jun 2023, 04:08

Originally posted by Clyde Schechter View Post

By the way, the use of -subpop(kd_did)- means that your -margins- calculations are restricted to the observations for which variable kd_did is non-missing and also non-zero. You will be excluding any observations in which kd_did == 0 or missing value from the calculation. That's perfectly fine if that's what you intend, but it isn't something commonly done. So just wanted to make sure it wasn't a misunderstanding.

Yes, I am more interested in the -subpop(kd_did) but below is what I get instead of having something similar to the -ologit- results.

Code:

margins, vce(unconditional) subpop(kd_did) post coeflegend Predictive margins Number of obs = 1,168 Subpop. no. obs = 474 Expression: Pr(savemoney), predict() ------------------------------------------------------------------------------ | Margin Legend -------------+---------------------------------------------------------------- _cons | .983281 _b[_cons] ------------------------------------------------------------------------------
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#6

16 Jun 2023, 08:25

But, remember, as I pointed out in #4, what you got from -ologit- is not what you say you want. Had you done the -ologit- analysis correctly, it would have looked like this.
Comment
Kehinde Atoloye

Join Date: Nov 2021

Posts: 119
#7

18 Jun 2023, 06:14

Originally posted by Clyde Schechter View Post

But, remember, as I pointed out in #4, what you got from -ologit- is not what you say you want. Had you done the -ologit- analysis correctly, it would have looked like this.

Perhaps, I am not putting my words correctly. Pardon me. What I want is something similar to what I have from -ologit-. I am trying to replicate what was done using -ologit- with -logit-. thank you.

My main goal is to achieve what was stated in https://www.sciencedirect.com/scienc...65176503000326, i.e. "The interaction effect, which is often the variable of interest in applied econometrics, cannot be evaluated simply by looking at the sign, magnitude, or statistical significance of the coefficient on the interaction term when the model is nonlinear. Instead, the interaction effect requires computing the cross derivative or cross difference. Like the marginal effect of a single variable, the magnitude of the interaction effect depends on all the covariates in the model. In addition, it can have ..."

PS: Can a binary outcome of yes (1) and no(0) be considered ordinal in the context of an intervention study?

Last edited by Kehinde Atoloye; 18 Jun 2023, 07:01.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#8

18 Jun 2023, 10:30

OK, the difference between the -margins- results for -ologit- and -logit- arises because with -ologit- you specified -subpop(kd_did)-, but with -logit- you specified -over(kd_did)-.

I believe what you actually want for both is the -over(kd_did)- specification. (Actually, I think you would be better off doing -margins kd_did, dydx(kd_did) vce(unconditional) post- for both, but I won't press that point.)
Comment

Announcement

Finding the difference between predicted probabilities of the subpopulation of the outcome of a logit model.

Comment

Comment

Comment

Comment

Comment

Comment

Comment