Issue with IOW mediation analysis method

Darina Peycheva

Join Date: Jul 2022

Posts: 23
#1

Issue with IOW mediation analysis method

12 Aug 2022, 03:52

Dear Statalist users,

I'm trying to estimate the mediating effect of several early life variables (e.g., birthweight standardized for gestation=bweightz, breastfeeding=bfed, etc.) through which smoking in pregnancy=smokingm may be affecting the menopause timing=early_menopause, controlling for social class at birth=bses. I'm using the IOW method described by Nguyen et al 2015 (10.1093/aje/kwu278), adopting their stata code. The method is also applied here: 10.1136/bmjopen-2018-026258 and stata code has been made available, so I've referred to this too.

My code is as follows (where the outcome is early menopause coded 0/1, the exposure is smoking in pregnancy coded 0/1, mediators - birthweight z-scores and breastfeeding coded 0/1, and social class at birth coded as 0/1 controlled for as a confounder) and n=2,450:

*IOW:
*Define a user-written program to estimate mediation parameters
capture program drop IOW
program IOW, rclass
capture drop logodds predprob inverseodds wt_iow

*Run exposure model
logit smokingm bweightz i.bfed i.bses,or

*Create inverse odds weights
predict logodds,xb
gen predprob=exp(logodds)/(1+exp(logodds))
gen inverseodds = ((1-predprob)/predprob)

gen wt_iow = 1 if smokingm==0
replace wt_iow = inverseodds if smokingm==1

*Estimate TE
glm early_menopause i.smokingm i.bses, fam(binomial) link(log) vce(robust) eform nolog base

matrix bb_total = e(b)
scalar b_total = bb_total[1,1]
return scalar b_total = bb_total[1,1]

*Estimate NDE
glm early_menopause i.smokingm i.bses [pweight=wt_iow], fam(binomial) link(log) vce(robust) eform nolog base

matrix bb_direct = e(b)
scalar b_direct = bb_direct[1,1]
return scalar b_direct = bb_direct[1,1]

*Estimate NIE
return scalar b_indirect = b_total-b_direct

end

* Request bootstrapped estimates of indirect, direct and total effects.
bootstrap r(b_indirect) r(b_direct) r(b_total), seed(32222) reps(1000) nodrop: IOW
estat bootstrap, all

This returns coefficients of 0 and omitted standard errors i.e. no bootstrap results, and I don't know why that is. I would really appreciate your help if you've had this issue too.

I'm trying to make this work using logistic regression, but will subsequently perform on imputed data and in a time to event setting (as in 10.1136/bmjopen-2018-026258). I have estimated these effects using ldecomp (Buis, M. 2010; 10.1177/1536867X1001000104), but noting the imputation and the survival setting, the method above will be more appropriate for my analysis, in my opinion.

Thanks for your help.

Kind regards,
Darina

This is the output from dataex if this could help:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(bweightz early_menopause) byte(bses bfed smokingm _randomtag) -.210943 0 0 0 0 0 -1.0437096 0 1 0 1 0 -1.048419 0 0 1 0 0 -.9111431 0 1 0 1 0 -2.2724223 0 1 0 1 0 -.14652179 0 1 0 1 0 -.710833 1 1 1 1 0 .3924279 0 1 1 0 0 .454743 0 0 0 1 0 1.4280024 0 0 0 1 0 .0467419 0 1 1 1 0 1.5069565 0 0 1 0 0 -.2627374 0 1 0 1 0 .7768492 0 1 0 0 0 .3503493 0 0 1 0 0 -.3349975 0 1 0 0 0 .45871115 0 0 0 1 0 -1.9937698 0 1 0 1 0 -.56668586 0 1 0 1 0 -.4471542 0 1 1 0 0 .7261847 0 1 0 1 0 .45871115 0 0 0 1 0 -.14652179 0 1 0 0 0 .5691833 0 0 0 0 0 .06101154 0 1 0 0 0 -1.1128402 0 0 0 1 0 -.18289313 0 1 0 1 0 .34242785 0 0 0 1 0 -1.0054715 0 1 0 1 0 -.382733 0 1 0 0 0 -.210943 1 1 0 1 0 -.2627374 0 1 0 0 0 -.5576323 0 0 0 1 0 .08310597 0 1 1 0 0 -.017679332 0 0 1 0 0 -.7771568 0 1 0 1 0 -.6013076 0 1 0 1 0 -1.1275387 0 1 0 1 0 .04121107 0 1 1 0 0 .4944934 0 1 0 0 0 -.5339692 0 0 1 1 0 .21853185 0 0 0 0 0 -.56668586 0 0 1 0 0 -1.0937463 1 1 0 1 0 .39311635 1 0 0 1 0 2.167214 0 1 0 1 0 -.5115754 0 1 1 0 0 -.7051345 0 0 0 0 0 1.356198 0 1 1 1 0 .43200195 0 1 0 1 0 -.210943 0 0 1 0 0 .0467419 1 1 0 0 0 .6056868 0 0 1 1 0 -1.6181647 0 1 0 1 0 -1.1444348 0 1 0 0 0 .05519302 0 1 1 1 0 .10588152 1 0 0 1 0 .5691833 0 0 0 0 0 -.5115754 0 1 1 0 0 -.4676453 0 1 0 1 0 .08310597 0 1 1 0 0 .43200195 0 1 1 1 0 -.7343877 0 1 1 0 0 .34242785 0 1 0 0 0 .3261446 0 1 0 0 0 -.8766291 0 1 0 0 0 -1.6181647 0 1 0 1 0 -1.0437096 0 0 1 0 0 1.0774815 0 1 0 0 0 -.5576323 0 1 0 1 0 -.210943 0 1 0 1 0 .20725852 0 1 0 0 0 .6354665 0 1 1 1 0 -.8434807 0 0 0 1 0 -1.1762762 0 1 0 0 0 -1.5509355 0 1 0 1 0 -.1813533 0 1 1 0 0 -.7048391 1 1 0 1 0 .10716172 0 0 1 0 0 -.57599664 0 1 0 1 0 .51916426 0 1 1 0 0 -.15813383 0 0 0 1 0 2.712342 0 1 0 0 0 -.57599664 0 1 0 0 0 .454743 0 0 0 0 0 -.3349975 0 1 0 0 0 -.9774264 0 1 0 1 0 -2.839825 0 0 0 0 0 -1.370525 0 0 0 0 0 .21770154 0 1 1 0 0 .3061334 0 1 0 1 0 .2156725 0 0 0 0 0 -.7389268 0 1 0 0 0 -.4013214 0 1 0 0 0 .5691833 0 1 0 1 0 .39311635 0 1 0 0 0 -.27536425 0 1 0 1 0 -.3587825 0 1 0 1 0 -1.1529922 0 1 1 0 0 .454743 0 0 1 0 0 end label values bses manual_nof label def manual_nof 0 "non-manual", modify label def manual_nof 1 "manual or no father figure", modify label values bfed bfed label def bfed 0 "never_or_<1m", modify label def bfed 1 "1m+", modify
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

12 Aug 2022, 09:15

The problem is probably with your data. In the example data, we see that your output variable, early_menopause, is uncommon, occurring in just 7 out of the 100 observations shown. On top of that, it is very strongly associated in the data with both bses (manual) and smokingm (1). In other words, in your example data, there is only one observation with both smokingm = 0 and early_menopause = 1, and only two with bses = non-manual and early_menopause = 1. This means that in many bootstrap samples, there will be no such observations at all, just by the luck of the draw. Even when that 1 observation is included, it is inescapable that the smokingm variable then perfectly predicts the outcome, which entails that the smokingm variable must be omitted from the model. In the affected samples, the regressions you are trying to estimate cannot be carried out because of these gaps in the data, and the IOW program returns missing values because the coefficients from which it is supposed to calculate the effects you want do not exist. Then when -bootstrap- tries to put it all together, there are not enough valid samples for it use (with the example data there aren't any at all), so it gives you the disappointing results you see.

The clue that something like this was happening is that if you look at the output you will see:

Code:

Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .xx.x.x.xx...x.....xxxxxxx...x...xxx..x.x....x..x. 50 xx.x...x.....xxxx..xx....xx...x.x.xxx......x..xxxx 100 [remaining iterations redacted]

Each x (in the actual Stata output they are in red) indicates that IOW was unable to return proper results in that replication. You can see that the majority of the replications are failing. You can see the details of how they all fail by adding the -noisily- option to your -bootstrap- command.

In your full sample of 2,450 perhaps the proportion of failed replications is not so large, but apparently still large enough to be a problem.

In any case, the root of your problem is the rarity of the outcome variable compounded by its strong association with two of your explanatory variables. It is possible that with a data set orders of magnitude larger, were you able to obtain such a data set, you might be able to use this methodology and get results. But perhaps not even then. I think that you either have to find another methodology, or go out and get a different data set that oversamples women with early menopause, especially women with early menopause who are non-manual ses and did not smoke during pregnancy. You would then also have to adjust the analysis to reflect the over-sampling.

Last edited by Clyde Schechter; 12 Aug 2022, 09:18.
Comment
Darina Peycheva

Join Date: Jul 2022

Posts: 23
#3

12 Aug 2022, 10:10

Thanks, that's very helpful! The outcome is rare (up to 10% in the population) and all variables are known risk factors for early menopause. In previous analysis I pooled this survey sample with another survey to increase the number of women with early menopause. I intend to do the same for this analysis, but wanted to first make it work on each sample separately. Hopefully, this will work when attempted on the pooled sample (around 6,000 women, around 8% with early menopause).
Comment
Darina Peycheva

Join Date: Jul 2022

Posts: 23
#4

13 Aug 2022, 12:15

Hello again, I tried this on a sample of 7,000 women and the outcome was the same. I then tried it with different set of variables (different exposure, outcome, mediator, not as strongly correlated) even with a different dataset (17,000 obs) and the outcome was the same. Also, the ldecomp command also uses bootstrap to compute standard errors and this runs smoothly. Any other thoughts will be greatly appreciated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#5

13 Aug 2022, 12:37

Add the -dots- option to your -bootstrap- prefix. What do the "dots" from bootstrap show? Remember that every x is a failed iteration. Add the -noisily- option of the -bootstrap- prefix to get the regressions to print out. (Do this with a small number of iterations, not 1000, so you don't drown in output.) That will show you what is going wrong. I still think the problem is that for many of the bootstrap samples, you are getting perfect prediction or 0 observations of a level of an explanatory variable. But, no need to speculate about that: the -noisily- output will show you what's going on. Remember, also, that you have three logistic regressions in program IOW. If any one of them fails, the whole thing fails.
Comment
Darina Peycheva

Join Date: Jul 2022

Posts: 23
#6

14 Aug 2022, 10:18

Thanks very much for your help! With the -dots- option of the -bootstrap- there's no indication that the replications are failing (no 'x'); with the -noisily- option I see that missing values are generated (only) when the first model is performed (regressing the mediator=birthweight z-scores on the exposure=smoking in pregnancy). This is performed on a dataset with no missing observations (i.e. missing observations were dropped).

I went ahead and imputed the missing values (in the covariates but not the outcome) and set up the code in a time to event setting following the examples in: Sheikh et al 2017 (http://dx.doi.org/10.1136/jech-2016-208671 / Supplementary material) and Hossin et al 2019 (https://bmjopen.bmj.com/content/11/4/e026258corr1 / Supplementary material) as slightly different setting and coding were used for weighted Poisson regression models with IOW.

Following Sheikh's approach I had the following error on the bootstrap estimations (i.e. no estimation starts):
file miest.ster already exists
an error occurred when bootstrap executed IOWMI
r(602);

There are posts on this error so I'll go through these as see if I can fix this issue myself.

Following Hossin's approach bootstrap estimations start, missing values are generated when the first model is performed (regressing the mediator=birthweight z-scores on the exposure=smoking in pregnancy), followed by the same error;
file miest01.ster already exists
an error occurred when bootstrap executed IOW
r(602);

I.e. the same type of error and missing observations dropped in the first regression model.

Could the problem be with 'return results' commands preceding the bootstrap estimations?

Many thanks,
Darina

----
The code I used is not much different from the code shared above, but accounts for MI; the imputation method was MICE:

Sheikh et al 2017 (adapted):
clear all
use "\\ad.ucl.ac.uk\homep\sejjdnp\Documents\Darina-Data\Menopause and Smoking\Data & Code\ncds-imputed.dta",clear
*Define a user-written program to estimate mediation parameters
capture program drop IOWMI
program IOWMI, rclass
capture drop logodds predprob inverseodds wt_iow

*Run exposure model
mi estimate, saving(miest): logit smokingm stdbweight

*Create inverse odds weights
mi predict logodds using miest,xb
mi passive: gen predprob=exp(logodds)/(1+exp(logodds))
mi passive: gen inverseodds = ((1-predprob)/predprob)

mi passive: gen wt_iow = 1 if smokingm==0
mi passive: replace wt_iow = inverseodds if smokingm==1

*Estimate TE
mi estimate, saving(miest1) eform post: glm nmeno i.smokingm, fam(poisson) link(log) vce(robust)

matrix bb_total = e(b_mi)
scalar b_total = bb_total[1,1]
return scalar b_total = bb_total[1,1]

*Estimate NDE
mi estimate, saving(miest2) eform post: glm nmeno i.smokingm [pweight=wt_iow], fam(poisson) link(log) vce(robust)

matrix bb_direct = e(b_mi)
scalar b_direct = bb_direct[1,1]
return scalar b_direct = bb_direct[1,1]

*Estimate NIE
return scalar b_indirect = b_total-b_direct

end

* Request bootstrapped estimates of indirect, direct and total effects.
bootstrap exp(r(b_indirect)) exp(r(b_direct)) exp(r(b_total)), seed(12345) reps(5) noisily: IOWMI
estat bootstrap, all

*Hossin et al 2019 (adapted code):
mi stset timeout, failure(meno) scale(12) id(ID)

drop _mi_id
*Define a user-written program to estimate mediation parameters
capture program drop IOW
program IOW, rclass
capture drop logodds predprob inverseodds wt_iow

*Run exposure model
logit smokingm stdbweight

*Create inverse odds weights
predict logodds,xb
gen predprob=exp(logodds)/(1+exp(logodds))
gen inverseodds = ((1-predprob)/predprob)

gen wt_iow = 1 if smokingm==0
replace wt_iow = inverseodds if smokingm==1

*Estimate TE
mi estimate, saving(miest01): glm _d i.smokingm, fam(poisson) link(log) vce(robust) eform nolog base

matrix bb_total = e(b)
scalar b_total = bb_total[1,1]
return scalar b_total = bb_total[1,1]

*Estimate NDE
mi estimate, saving(miest02): glm _d i.smokingm [pweight=wt_iow], fam(poisson) link(log) vce(robust) eform nolog base

matrix bb_direct = e(b)
scalar b_direct = bb_direct[1,1]
return scalar b_direct = bb_direct[1,1]

*Estimate NIE
return scalar b_indirect = b_total-b_direct

end

* Request bootstrapped estimates of indirect, direct and total effects.
bootstrap r(b_indirect) r(b_direct) r(b_total), seed(12345) reps(5) noisily: IOW
estat bootstrap, all
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#7

14 Aug 2022, 10:29

I have not reviewed all of the new code, but the line

Code:

mi estimate, saving(miest): logit smokingm stdbweight

is clearly wrong and is leading to the errors about miest.ster already existing. This line of code will work when -bootstrap- runs it the first time (which is for the original, overall sample). But, then having created miest.ster, the next time through, Stata will see that the miest.ster dataset already exists and will refuse to overwrite it. To overcome this you have to give the -replace- suboption to tell Stata that it is OK to overwrite the existing file.

Code:

mi estimate, saving(miest, replace): logit smokingm stdbweight

I think you will run into the same problem with the -mi estimate- commands used in the *Estimate TE and *Estimate NDE sections of the code.

General principle: Whenever you run a command that iterates a program (e.g. -bootstrap-, -simulate-) if the program saves a data set, you need to either have a -replace- option/suboption to allow that data set to be overwritten at each iteration, or if you actually need to keep all those data sets, then the program needs some mechanism to assign a different name to the data set each time.
Comment
Darina Peycheva

Join Date: Jul 2022

Posts: 23
#8

14 Aug 2022, 10:53

Thanks so much! The 'replace' option fixed the 'miest' issue indeed. And, yes, the bootstrap continues to remove observations when performing the first regression. However, the coefficients and CIs changed from 0 to 1. This was performed on imputed sample of one of the data sets I have. I will now impute the other and pool together again and see if anything changes (before I give up).

The second time to event approach (Hossin et al 2019) still results in an error but this time as follows:
'r(b_indirect)' evaluated to missing in full sample
r(322);

This is why I though there may be something with the return results commands?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#9

14 Aug 2022, 11:26

Well, since b_indirect = b_total - b_direct, then b_total, or b_direct (or both) must also be missing in the full sample. I suggest you just run program IOW on your full data set (without bootstrapping) to see what is going on there. Pay careful attention to any error messages, and also to any warning messages that appear in above the regression output table. List out the matrices bb_total and bb_direct to see what is in them.
Comment
Darina Peycheva

Join Date: Jul 2022

Posts: 23
#10

15 Aug 2022, 10:54

Thanks. All regression models run smoothly without warnings or error messages. The matrices bb_total and bb_direct display correctly the model coefficients. I'm looking into scalar and return scalar as scalar list shows b_direct = 0 and b_total = 0.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#11

15 Aug 2022, 13:34

Wait! I see the problem. I'm really embarrassed that I didn't see this in the first place, and I apologize for delaying your work and wasting your time. The problem is that there is no e(b) after -mi estimate-. You need e(b_mi).
Comment
Darina Peycheva

Join Date: Jul 2022

Posts: 23
#12

15 Aug 2022, 14:18

Thanks so much, I really appreciate the help and I'm sorry for wasting your time with this! Yes, '_mi' was missing and adding it let me run the bootstrap without an error message, but the estimated coefficients remain '0s'. I went through the first part of the code today, ignoring the bootstrap completely as well as the imputation (i.e. focused on IOW rather than), and all seems to run smoothly but the stored/returned total and direct effects which come back as '0s'. The only line of this code that I cannot run independently is: ' return scalar b_direct = bb_direct[1,1]' as results in: 'non r-class program may not set r()'. But this should be because it needs the 'program IOW, rclass' and 'end' lines. Is this right? The output of the remaining lines of code seems sensible to me and with no error or warning messages.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30355

#13

15 Aug 2022, 14:30

The only line of this code that I cannot run independently is: ' return scalar b_direct = bb_direct[1,1]' as results in: 'non r-class program may not set r()'. But this should be because it needs the 'program IOW, rclass' and 'end' lines. Is this right?

Yes, that's right.

So, putting the -bootstrap- aside and get IOW working correctly on its own first is a good strategy. I would test it by running this sequence:

Code:

IOW
return list
mat list bb_total bb_direct
scalar list b_total b_direct b_indirect

These should all match up correctly with each other and with the regression outputs you see from IOW itself. Wherever you find the first discrepancy will help you localize the source of the problem.

I wish I could see what is going wrong here, but I don't. A simplified version mimicking your code runs fine on my setup:

Code:

. clear*

.
. sysuse auto
(1978 automobile data)

.
. capture program drop my_program

. program define my_program, rclass
  1.     regress price mpg headroom
  2.     matrix bb_total = e(b)
  3.     scalar b_total = bb_total[1,1]
  4.     return scalar b_total = bb_total[1,1]
  5.     exit
  6. end

.
. my_program

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     10.44
       Model |   144280501         2  72140250.4   Prob > F        =    0.0001
    Residual |   490784895        71  6912463.32   R-squared       =    0.2272
-------------+----------------------------------   Adj R-squared   =    0.2054
       Total |   635065396        73  8699525.97   Root MSE        =    2629.2

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -259.1057   58.42485    -4.43   0.000    -375.6015   -142.6098
    headroom |  -334.0215   399.5499    -0.84   0.406    -1130.701    462.6585
       _cons |   12683.31   2074.497     6.11   0.000     8546.885    16819.74
------------------------------------------------------------------------------

. return list

scalars:
            r(b_total) =  -259.1056595643832

.
. display r(b_total)
-259.10566

. matrix list bb_total

bb_total[1,3]
           mpg    headroom       _cons
y1  -259.10566  -334.02147   12683.315

. scalar list b_total
   b_total = -259.10566

.
. bootstrap r(b_total), reps(25): my_program
(running my_program on estimation sample)

Bootstrap replications (25)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.........................

Bootstrap results                                           Number of obs = 74
                                                            Replications  = 25

      Command: my_program
        _bs_1: r(b_total)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -259.1057   59.71616    -4.34   0.000    -376.1472   -142.0641
------------------------------------------------------------------------------

. estat bootstrap

Bootstrap results                               Number of obs     =         74
                                                Replications      =         25

      Command: my_program
        _bs_1: r(b_total)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             | coefficient       Bias    std. err.  [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -259.10566   1.410509   59.716161   -340.3522   -118.136  (BC)
------------------------------------------------------------------------------
Key: BC: Bias-corrected

.

Comment

Darina Peycheva

Join Date: Jul 2022
Posts: 23

#14

16 Aug 2022, 04:58

Thanks. If you run this on the 100 cases example dataset, it seems that there's a problem with the scalar function. (Please ignore for a moment the sample size, generated missing values, 'x's in bootstrap estimation etc as these perform ok on a full dataset).

Code:

capture program drop IOW
program IOW, rclass
capture drop logodds predprob inverseodds wt_iow

*Run exposure model
logit smokingm i.bfed,or

*Create inverse odds weights
predict logodds,xb
gen predprob=exp(logodds)/(1+exp(logodds))
gen inverseodds = ((1-predprob)/predprob)

gen wt_iow = 1 if smokingm==0 
replace wt_iow = inverseodds if smokingm==1

*Estimate TE
glm early_menopause i.smokingm, fam(binomial) link(log) vce(robust)  nolog base

matrix bb_total = e(b) 
*matrix list e(b)

scalar b_total = bb_total[1,1]
return scalar b_total = bb_total[1,1]

*matrix list r(table)

*display b_total
*scalar list

*Estimate NDE
glm early_menopause i.smokingm [pw=wt_iow], fam(binomial) link(log) vce(robust)  nolog base

matrix bb_direct=e(b)
*matrix list e(b) 

scalar b_direct = bb_direct[1,1]
return scalar b_direct = bb_direct[1,1] 
*return list

*matrix list r(table)

*display b_direct
*scalar list  

*Estimate NIE
return scalar b_indirect = b_total-b_direct 

exit
end


IOW
return list
mat list bb_total bb_direct
scalar list b_total b_direct b_indirect

If you then run the following:

Code:

capture program drop my_program
program define my_program, rclass  
*Estimate TE
glm early_menopause i.smokingm, fam(binomial) link(log) vce(robust)  nolog base
matrix bb_total = e(b) 
scalar b_total = bb_total[1,1]
return scalar b_total = bb_total[1,1]
exit
end

my_program
return list
display r(b_total)
matrix list bb_total
scalar list b_total
bootstrap r(b_total), reps(25): my_program

You see the same, scalar values not being stored/returned. Matrix list works ok.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30355

#15

16 Aug 2022, 11:33

You see the same, scalar values not being stored/returned. Matrix list works ok.

I cannot replicate your finding. When I do this, the output is:

Code:

. my_program

Generalized linear models                         Number of obs   =        100
Optimization     : ML                             Residual df     =         98
                                                  Scale parameter =          1
Deviance         =  46.27797963                   (1/df) Deviance =   .4722243
Pearson          =  99.99988662                   (1/df) Pearson  =   1.020407

Variance function: V(u) = u*(1-u)                 [Bernoulli]
Link function    : g(u) = ln(u)                   [Log]

                                                  AIC             =   .5027798
Log pseudolikelihood = -23.13898981               BIC             =  -405.0287

------------------------------------------------------------------------------
             |               Robust
early_meno~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    smokingm |
          0  |          0  (base)
          1  |   1.831763   1.066783     1.72   0.086    -.2590941    3.922619
             |
       _cons |  -3.931823   .9951334    -3.95   0.000    -5.882249   -1.981398
------------------------------------------------------------------------------

. return list

scalars:
            r(b_total) =  0

. display r(b_total)
0

. matrix list bb_total

bb_total[1,3]
     early_men~e:  early_men~e:  early_men~e:
              0b.            1.              
        smokingm      smokingm         _cons
y1             0     1.8317625    -3.9318234

. scalar list b_total
   b_total =          0

. bootstrap r(b_total), reps(25): my_program
(running my_program on estimation sample)

Bootstrap replications (25)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
....x.....xx.x..xxxx....x

Bootstrap results                                          Number of obs = 100
                                                           Replications  =  16

      Command: my_program
        _bs_1: r(b_total)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |          0  (omitted)
------------------------------------------------------------------------------
Note: One or more parameters could not be estimated in 9 bootstrap replicates;
      standard-error estimates include only complete replications.

You can see that r(b_total), bb_total[1,1], and scalar b_total are all 0. So the code is correctly maintaining the value in all of those places.

I think the problem is that bb_total[1,1] is not the parameter you want: bb_total[1,1] is the coefficient of 0.smokingm, and since 0 is the base category for that variable, its coefficient is, by definition, 0. I think you want the coefficient of 1.smokingm, which will be found in bb_total[1, 2]. If you make that change, you get:

Code:

. my_program

Generalized linear models                         Number of obs   =        100
Optimization     : ML                             Residual df     =         98
                                                  Scale parameter =          1
Deviance         =  46.27797963                   (1/df) Deviance =   .4722243
Pearson          =  99.99988662                   (1/df) Pearson  =   1.020407

Variance function: V(u) = u*(1-u)                 [Bernoulli]
Link function    : g(u) = ln(u)                   [Log]

                                                  AIC             =   .5027798
Log pseudolikelihood = -23.13898981               BIC             =  -405.0287

------------------------------------------------------------------------------
             |               Robust
early_meno~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    smokingm |
          0  |          0  (base)
          1  |   1.831763   1.066783     1.72   0.086    -.2590941    3.922619
             |
       _cons |  -3.931823   .9951334    -3.95   0.000    -5.882249   -1.981398
------------------------------------------------------------------------------

. return list

scalars:
            r(b_total) =  1.831762535421762

. display r(b_total)
1.8317625

. matrix list bb_total

bb_total[1,3]
     early_men~e:  early_men~e:  early_men~e:
              0b.            1.              
        smokingm      smokingm         _cons
y1             0     1.8317625    -3.9318234

. scalar list b_total
   b_total =  1.8317625

And following that up with the -bootstrap-:

Code:

. my_program

Generalized linear models                         Number of obs   =        100
Optimization     : ML                             Residual df     =         98
                                                  Scale parameter =          1
Deviance         =  46.27797963                   (1/df) Deviance =   .4722243
Pearson          =  99.99988662                   (1/df) Pearson  =   1.020407

Variance function: V(u) = u*(1-u)                 [Bernoulli]
Link function    : g(u) = ln(u)                   [Log]

                                                  AIC             =   .5027798
Log pseudolikelihood = -23.13898981               BIC             =  -405.0287

------------------------------------------------------------------------------
             |               Robust
early_meno~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    smokingm |
          0  |          0  (base)
          1  |   1.831763   1.066783     1.72   0.086    -.2590941    3.922619
             |
       _cons |  -3.931823   .9951334    -3.95   0.000    -5.882249   -1.981398
------------------------------------------------------------------------------

. return list

scalars:
            r(b_total) =  1.831762535421762

. display r(b_total)
1.8317625

. matrix list bb_total

bb_total[1,3]
     early_men~e:  early_men~e:  early_men~e:
              0b.            1.              
        smokingm      smokingm         _cons
y1             0     1.8317625    -3.9318234

. scalar list b_total
   b_total =  1.8317625

. bootstrap r(b_total), reps(25): my_program
(running my_program on estimation sample)

Bootstrap replications (25)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.xx.xxx......xx.xx..x...x

Bootstrap results                                          Number of obs = 100
                                                           Replications  =  14

      Command: my_program
        _bs_1: r(b_total)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |   1.831763   .7508921     2.44   0.015      .360041    3.303484
------------------------------------------------------------------------------
Note: One or more parameters could not be estimated in 11 bootstrap replicates;
      standard-error estimates include only complete replications.

By the way, one of the pitfalls of accessing coefficients through e(b) is that you need to know which column contains the information on the variable you are interested in. It is easy to get that wrong. A more robust way to code this is -scalar b_total = _b[1.smokingm]-, because the pseudo-matrix _b allows you to refer to the coefficients by name, rather than location.

Added: Concerning my last paragraph above, you can also do something like this working with matrix bb_total, but it requires you to know the full equation name of the variable: -scalar b_total = _b["y1", "early_menopause:1.smokingm"]-. (_b["y1", "1.smokingm"] will give you an error message.) Aside from this being extra typing, it can fail if you don't know exactly how your particular regression command names the columns in e(b), and it sometimes differs in different commands. Working with _b[ ] is simpler because it does not require this level of detail.

Last edited by Clyde Schechter; 16 Aug 2022, 11:39.

Announcement