Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with IOW mediation analysis method

    Dear Statalist users,

    I'm trying to estimate the mediating effect of several early life variables (e.g., birthweight standardized for gestation=bweightz, breastfeeding=bfed, etc.) through which smoking in pregnancy=smokingm may be affecting the menopause timing=early_menopause, controlling for social class at birth=bses. I'm using the IOW method described by Nguyen et al 2015 (10.1093/aje/kwu278), adopting their stata code. The method is also applied here: 10.1136/bmjopen-2018-026258 and stata code has been made available, so I've referred to this too.

    My code is as follows (where the outcome is early menopause coded 0/1, the exposure is smoking in pregnancy coded 0/1, mediators - birthweight z-scores and breastfeeding coded 0/1, and social class at birth coded as 0/1 controlled for as a confounder) and n=2,450:

    *IOW:
    *Define a user-written program to estimate mediation parameters
    capture program drop IOW
    program IOW, rclass
    capture drop logodds predprob inverseodds wt_iow

    *Run exposure model
    logit smokingm bweightz i.bfed i.bses,or

    *Create inverse odds weights
    predict logodds,xb
    gen predprob=exp(logodds)/(1+exp(logodds))
    gen inverseodds = ((1-predprob)/predprob)

    gen wt_iow = 1 if smokingm==0
    replace wt_iow = inverseodds if smokingm==1

    *Estimate TE
    glm early_menopause i.smokingm i.bses, fam(binomial) link(log) vce(robust) eform nolog base

    matrix bb_total = e(b)
    scalar b_total = bb_total[1,1]
    return scalar b_total = bb_total[1,1]

    *Estimate NDE
    glm early_menopause i.smokingm i.bses [pweight=wt_iow], fam(binomial) link(log) vce(robust) eform nolog base

    matrix bb_direct = e(b)
    scalar b_direct = bb_direct[1,1]
    return scalar b_direct = bb_direct[1,1]

    *Estimate NIE
    return scalar b_indirect = b_total-b_direct

    end

    * Request bootstrapped estimates of indirect, direct and total effects.
    bootstrap r(b_indirect) r(b_direct) r(b_total), seed(32222) reps(1000) nodrop: IOW
    estat bootstrap, all

    This returns coefficients of 0 and omitted standard errors i.e. no bootstrap results, and I don't know why that is. I would really appreciate your help if you've had this issue too.

    I'm trying to make this work using logistic regression, but will subsequently perform on imputed data and in a time to event setting (as in 10.1136/bmjopen-2018-026258). I have estimated these effects using ldecomp (Buis, M. 2010; 10.1177/1536867X1001000104), but noting the imputation and the survival setting, the method above will be more appropriate for my analysis, in my opinion.

    Thanks for your help.

    Kind regards,
    Darina

    This is the output from dataex if this could help:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(bweightz early_menopause) byte(bses bfed smokingm _randomtag)
       -.210943 0 0 0 0 0
     -1.0437096 0 1 0 1 0
      -1.048419 0 0 1 0 0
      -.9111431 0 1 0 1 0
     -2.2724223 0 1 0 1 0
     -.14652179 0 1 0 1 0
       -.710833 1 1 1 1 0
       .3924279 0 1 1 0 0
        .454743 0 0 0 1 0
      1.4280024 0 0 0 1 0
       .0467419 0 1 1 1 0
      1.5069565 0 0 1 0 0
      -.2627374 0 1 0 1 0
       .7768492 0 1 0 0 0
       .3503493 0 0 1 0 0
      -.3349975 0 1 0 0 0
      .45871115 0 0 0 1 0
     -1.9937698 0 1 0 1 0
     -.56668586 0 1 0 1 0
      -.4471542 0 1 1 0 0
       .7261847 0 1 0 1 0
      .45871115 0 0 0 1 0
     -.14652179 0 1 0 0 0
       .5691833 0 0 0 0 0
      .06101154 0 1 0 0 0
     -1.1128402 0 0 0 1 0
     -.18289313 0 1 0 1 0
      .34242785 0 0 0 1 0
     -1.0054715 0 1 0 1 0
       -.382733 0 1 0 0 0
       -.210943 1 1 0 1 0
      -.2627374 0 1 0 0 0
      -.5576323 0 0 0 1 0
      .08310597 0 1 1 0 0
    -.017679332 0 0 1 0 0
      -.7771568 0 1 0 1 0
      -.6013076 0 1 0 1 0
     -1.1275387 0 1 0 1 0
      .04121107 0 1 1 0 0
       .4944934 0 1 0 0 0
      -.5339692 0 0 1 1 0
      .21853185 0 0 0 0 0
     -.56668586 0 0 1 0 0
     -1.0937463 1 1 0 1 0
      .39311635 1 0 0 1 0
       2.167214 0 1 0 1 0
      -.5115754 0 1 1 0 0
      -.7051345 0 0 0 0 0
       1.356198 0 1 1 1 0
      .43200195 0 1 0 1 0
       -.210943 0 0 1 0 0
       .0467419 1 1 0 0 0
       .6056868 0 0 1 1 0
     -1.6181647 0 1 0 1 0
     -1.1444348 0 1 0 0 0
      .05519302 0 1 1 1 0
      .10588152 1 0 0 1 0
       .5691833 0 0 0 0 0
      -.5115754 0 1 1 0 0
      -.4676453 0 1 0 1 0
      .08310597 0 1 1 0 0
      .43200195 0 1 1 1 0
      -.7343877 0 1 1 0 0
      .34242785 0 1 0 0 0
       .3261446 0 1 0 0 0
      -.8766291 0 1 0 0 0
     -1.6181647 0 1 0 1 0
     -1.0437096 0 0 1 0 0
      1.0774815 0 1 0 0 0
      -.5576323 0 1 0 1 0
       -.210943 0 1 0 1 0
      .20725852 0 1 0 0 0
       .6354665 0 1 1 1 0
      -.8434807 0 0 0 1 0
     -1.1762762 0 1 0 0 0
     -1.5509355 0 1 0 1 0
      -.1813533 0 1 1 0 0
      -.7048391 1 1 0 1 0
      .10716172 0 0 1 0 0
     -.57599664 0 1 0 1 0
      .51916426 0 1 1 0 0
     -.15813383 0 0 0 1 0
       2.712342 0 1 0 0 0
     -.57599664 0 1 0 0 0
        .454743 0 0 0 0 0
      -.3349975 0 1 0 0 0
      -.9774264 0 1 0 1 0
      -2.839825 0 0 0 0 0
      -1.370525 0 0 0 0 0
      .21770154 0 1 1 0 0
       .3061334 0 1 0 1 0
       .2156725 0 0 0 0 0
      -.7389268 0 1 0 0 0
      -.4013214 0 1 0 0 0
       .5691833 0 1 0 1 0
      .39311635 0 1 0 0 0
     -.27536425 0 1 0 1 0
      -.3587825 0 1 0 1 0
     -1.1529922 0 1 1 0 0
        .454743 0 0 1 0 0
    end
    label values bses manual_nof
    label def manual_nof 0 "non-manual", modify
    label def manual_nof 1 "manual or no father figure", modify
    label values bfed bfed
    label def bfed 0 "never_or_<1m", modify
    label def bfed 1 "1m+", modify

  • #2
    The problem is probably with your data. In the example data, we see that your output variable, early_menopause, is uncommon, occurring in just 7 out of the 100 observations shown. On top of that, it is very strongly associated in the data with both bses (manual) and smokingm (1). In other words, in your example data, there is only one observation with both smokingm = 0 and early_menopause = 1, and only two with bses = non-manual and early_menopause = 1. This means that in many bootstrap samples, there will be no such observations at all, just by the luck of the draw. Even when that 1 observation is included, it is inescapable that the smokingm variable then perfectly predicts the outcome, which entails that the smokingm variable must be omitted from the model. In the affected samples, the regressions you are trying to estimate cannot be carried out because of these gaps in the data, and the IOW program returns missing values because the coefficients from which it is supposed to calculate the effects you want do not exist. Then when -bootstrap- tries to put it all together, there are not enough valid samples for it use (with the example data there aren't any at all), so it gives you the disappointing results you see.

    The clue that something like this was happening is that if you look at the output you will see:
    Code:
    Bootstrap replications (100)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    .xx.x.x.xx...x.....xxxxxxx...x...xxx..x.x....x..x.    50
    xx.x...x.....xxxx..xx....xx...x.x.xxx......x..xxxx   100
    [remaining iterations redacted]
    Each x (in the actual Stata output they are in red) indicates that IOW was unable to return proper results in that replication. You can see that the majority of the replications are failing. You can see the details of how they all fail by adding the -noisily- option to your -bootstrap- command.

    In your full sample of 2,450 perhaps the proportion of failed replications is not so large, but apparently still large enough to be a problem.

    In any case, the root of your problem is the rarity of the outcome variable compounded by its strong association with two of your explanatory variables. It is possible that with a data set orders of magnitude larger, were you able to obtain such a data set, you might be able to use this methodology and get results. But perhaps not even then. I think that you either have to find another methodology, or go out and get a different data set that oversamples women with early menopause, especially women with early menopause who are non-manual ses and did not smoke during pregnancy. You would then also have to adjust the analysis to reflect the over-sampling.
    Last edited by Clyde Schechter; 12 Aug 2022, 09:18.

    Comment


    • #3
      Thanks, that's very helpful! The outcome is rare (up to 10% in the population) and all variables are known risk factors for early menopause. In previous analysis I pooled this survey sample with another survey to increase the number of women with early menopause. I intend to do the same for this analysis, but wanted to first make it work on each sample separately. Hopefully, this will work when attempted on the pooled sample (around 6,000 women, around 8% with early menopause).

      Comment


      • #4
        Hello again, I tried this on a sample of 7,000 women and the outcome was the same. I then tried it with different set of variables (different exposure, outcome, mediator, not as strongly correlated) even with a different dataset (17,000 obs) and the outcome was the same. Also, the ldecomp command also uses bootstrap to compute standard errors and this runs smoothly. Any other thoughts will be greatly appreciated.

        Comment


        • #5
          Add the -dots- option to your -bootstrap- prefix. What do the "dots" from bootstrap show? Remember that every x is a failed iteration. Add the -noisily- option of the -bootstrap- prefix to get the regressions to print out. (Do this with a small number of iterations, not 1000, so you don't drown in output.) That will show you what is going wrong. I still think the problem is that for many of the bootstrap samples, you are getting perfect prediction or 0 observations of a level of an explanatory variable. But, no need to speculate about that: the -noisily- output will show you what's going on. Remember, also, that you have three logistic regressions in program IOW. If any one of them fails, the whole thing fails.

          Comment


          • #6
            Thanks very much for your help! With the -dots- option of the -bootstrap- there's no indication that the replications are failing (no 'x'); with the -noisily- option I see that missing values are generated (only) when the first model is performed (regressing the mediator=birthweight z-scores on the exposure=smoking in pregnancy). This is performed on a dataset with no missing observations (i.e. missing observations were dropped).

            I went ahead and imputed the missing values (in the covariates but not the outcome) and set up the code in a time to event setting following the examples in: Sheikh et al 2017 (http://dx.doi.org/10.1136/jech-2016-208671 / Supplementary material) and Hossin et al 2019 (https://bmjopen.bmj.com/content/11/4/e026258corr1 / Supplementary material) as slightly different setting and coding were used for weighted Poisson regression models with IOW.

            Following Sheikh's approach I had the following error on the bootstrap estimations (i.e. no estimation starts):
            file miest.ster already exists
            an error occurred when bootstrap executed IOWMI
            r(602);

            There are posts on this error so I'll go through these as see if I can fix this issue myself.

            Following Hossin's approach bootstrap estimations start, missing values are generated when the first model is performed (regressing the mediator=birthweight z-scores on the exposure=smoking in pregnancy), followed by the same error;
            file miest01.ster already exists
            an error occurred when bootstrap executed IOW
            r(602);

            I.e. the same type of error and missing observations dropped in the first regression model.

            Could the problem be with 'return results' commands preceding the bootstrap estimations?

            Many thanks,
            Darina

            ----
            The code I used is not much different from the code shared above, but accounts for MI; the imputation method was MICE:

            Sheikh et al 2017 (adapted):
            clear all
            use "\\ad.ucl.ac.uk\homep\sejjdnp\Documents\Darina-Data\Menopause and Smoking\Data & Code\ncds-imputed.dta",clear
            *Define a user-written program to estimate mediation parameters
            capture program drop IOWMI
            program IOWMI, rclass
            capture drop logodds predprob inverseodds wt_iow

            *Run exposure model
            mi estimate, saving(miest): logit smokingm stdbweight

            *Create inverse odds weights
            mi predict logodds using miest,xb
            mi passive: gen predprob=exp(logodds)/(1+exp(logodds))
            mi passive: gen inverseodds = ((1-predprob)/predprob)

            mi passive: gen wt_iow = 1 if smokingm==0
            mi passive: replace wt_iow = inverseodds if smokingm==1

            *Estimate TE
            mi estimate, saving(miest1) eform post: glm nmeno i.smokingm, fam(poisson) link(log) vce(robust)

            matrix bb_total = e(b_mi)
            scalar b_total = bb_total[1,1]
            return scalar b_total = bb_total[1,1]

            *Estimate NDE
            mi estimate, saving(miest2) eform post: glm nmeno i.smokingm [pweight=wt_iow], fam(poisson) link(log) vce(robust)

            matrix bb_direct = e(b_mi)
            scalar b_direct = bb_direct[1,1]
            return scalar b_direct = bb_direct[1,1]

            *Estimate NIE
            return scalar b_indirect = b_total-b_direct

            end

            * Request bootstrapped estimates of indirect, direct and total effects.
            bootstrap exp(r(b_indirect)) exp(r(b_direct)) exp(r(b_total)), seed(12345) reps(5) noisily: IOWMI
            estat bootstrap, all


            *Hossin et al 2019 (adapted code):
            mi stset timeout, failure(meno) scale(12) id(ID)

            drop _mi_id
            *Define a user-written program to estimate mediation parameters
            capture program drop IOW
            program IOW, rclass
            capture drop logodds predprob inverseodds wt_iow

            *Run exposure model
            logit smokingm stdbweight

            *Create inverse odds weights
            predict logodds,xb
            gen predprob=exp(logodds)/(1+exp(logodds))
            gen inverseodds = ((1-predprob)/predprob)

            gen wt_iow = 1 if smokingm==0
            replace wt_iow = inverseodds if smokingm==1

            *Estimate TE
            mi estimate, saving(miest01): glm _d i.smokingm, fam(poisson) link(log) vce(robust) eform nolog base

            matrix bb_total = e(b)
            scalar b_total = bb_total[1,1]
            return scalar b_total = bb_total[1,1]

            *Estimate NDE
            mi estimate, saving(miest02): glm _d i.smokingm [pweight=wt_iow], fam(poisson) link(log) vce(robust) eform nolog base

            matrix bb_direct = e(b)
            scalar b_direct = bb_direct[1,1]
            return scalar b_direct = bb_direct[1,1]

            *Estimate NIE
            return scalar b_indirect = b_total-b_direct

            end

            * Request bootstrapped estimates of indirect, direct and total effects.
            bootstrap r(b_indirect) r(b_direct) r(b_total), seed(12345) reps(5) noisily: IOW
            estat bootstrap, all

            Comment


            • #7
              I have not reviewed all of the new code, but the line
              Code:
              mi estimate, saving(miest): logit smokingm stdbweight
              is clearly wrong and is leading to the errors about miest.ster already existing. This line of code will work when -bootstrap- runs it the first time (which is for the original, overall sample). But, then having created miest.ster, the next time through, Stata will see that the miest.ster dataset already exists and will refuse to overwrite it. To overcome this you have to give the -replace- suboption to tell Stata that it is OK to overwrite the existing file.
              Code:
              mi estimate, saving(miest, replace): logit smokingm stdbweight
              I think you will run into the same problem with the -mi estimate- commands used in the *Estimate TE and *Estimate NDE sections of the code.

              General principle: Whenever you run a command that iterates a program (e.g. -bootstrap-, -simulate-) if the program saves a data set, you need to either have a -replace- option/suboption to allow that data set to be overwritten at each iteration, or if you actually need to keep all those data sets, then the program needs some mechanism to assign a different name to the data set each time.

              Comment


              • #8
                Thanks so much! The 'replace' option fixed the 'miest' issue indeed. And, yes, the bootstrap continues to remove observations when performing the first regression. However, the coefficients and CIs changed from 0 to 1. This was performed on imputed sample of one of the data sets I have. I will now impute the other and pool together again and see if anything changes (before I give up).

                The second time to event approach (Hossin et al 2019) still results in an error but this time as follows:
                'r(b_indirect)' evaluated to missing in full sample
                r(322);

                This is why I though there may be something with the return results commands?

                Comment


                • #9
                  Well, since b_indirect = b_total - b_direct, then b_total, or b_direct (or both) must also be missing in the full sample. I suggest you just run program IOW on your full data set (without bootstrapping) to see what is going on there. Pay careful attention to any error messages, and also to any warning messages that appear in above the regression output table. List out the matrices bb_total and bb_direct to see what is in them.

                  Comment


                  • #10
                    Thanks. All regression models run smoothly without warnings or error messages. The matrices bb_total and bb_direct display correctly the model coefficients. I'm looking into scalar and return scalar as scalar list shows b_direct = 0 and b_total = 0.

                    Comment


                    • #11
                      Wait! I see the problem. I'm really embarrassed that I didn't see this in the first place, and I apologize for delaying your work and wasting your time. The problem is that there is no e(b) after -mi estimate-. You need e(b_mi).

                      Comment


                      • #12
                        Thanks so much, I really appreciate the help and I'm sorry for wasting your time with this! Yes, '_mi' was missing and adding it let me run the bootstrap without an error message, but the estimated coefficients remain '0s'. I went through the first part of the code today, ignoring the bootstrap completely as well as the imputation (i.e. focused on IOW rather than), and all seems to run smoothly but the stored/returned total and direct effects which come back as '0s'. The only line of this code that I cannot run independently is: ' return scalar b_direct = bb_direct[1,1]' as results in: 'non r-class program may not set r()'. But this should be because it needs the 'program IOW, rclass' and 'end' lines. Is this right? The output of the remaining lines of code seems sensible to me and with no error or warning messages.

                        Comment


                        • #13
                          The only line of this code that I cannot run independently is: ' return scalar b_direct = bb_direct[1,1]' as results in: 'non r-class program may not set r()'. But this should be because it needs the 'program IOW, rclass' and 'end' lines. Is this right?
                          Yes, that's right.

                          So, putting the -bootstrap- aside and get IOW working correctly on its own first is a good strategy. I would test it by running this sequence:
                          Code:
                          IOW
                          return list
                          mat list bb_total bb_direct
                          scalar list b_total b_direct b_indirect
                          These should all match up correctly with each other and with the regression outputs you see from IOW itself. Wherever you find the first discrepancy will help you localize the source of the problem.

                          I wish I could see what is going wrong here, but I don't. A simplified version mimicking your code runs fine on my setup:
                          Code:
                          . clear*
                          
                          .
                          . sysuse auto
                          (1978 automobile data)
                          
                          .
                          . capture program drop my_program
                          
                          . program define my_program, rclass
                            1.     regress price mpg headroom
                            2.     matrix bb_total = e(b)
                            3.     scalar b_total = bb_total[1,1]
                            4.     return scalar b_total = bb_total[1,1]
                            5.     exit
                            6. end
                          
                          .
                          . my_program
                          
                                Source |       SS           df       MS      Number of obs   =        74
                          -------------+----------------------------------   F(2, 71)        =     10.44
                                 Model |   144280501         2  72140250.4   Prob > F        =    0.0001
                              Residual |   490784895        71  6912463.32   R-squared       =    0.2272
                          -------------+----------------------------------   Adj R-squared   =    0.2054
                                 Total |   635065396        73  8699525.97   Root MSE        =    2629.2
                          
                          ------------------------------------------------------------------------------
                                 price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                   mpg |  -259.1057   58.42485    -4.43   0.000    -375.6015   -142.6098
                              headroom |  -334.0215   399.5499    -0.84   0.406    -1130.701    462.6585
                                 _cons |   12683.31   2074.497     6.11   0.000     8546.885    16819.74
                          ------------------------------------------------------------------------------
                          
                          . return list
                          
                          scalars:
                                      r(b_total) =  -259.1056595643832
                          
                          .
                          . display r(b_total)
                          -259.10566
                          
                          . matrix list bb_total
                          
                          bb_total[1,3]
                                     mpg    headroom       _cons
                          y1  -259.10566  -334.02147   12683.315
                          
                          . scalar list b_total
                             b_total = -259.10566
                          
                          .
                          . bootstrap r(b_total), reps(25): my_program
                          (running my_program on estimation sample)
                          
                          Bootstrap replications (25)
                          ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
                          .........................
                          
                          Bootstrap results                                           Number of obs = 74
                                                                                      Replications  = 25
                          
                                Command: my_program
                                  _bs_1: r(b_total)
                          
                          ------------------------------------------------------------------------------
                                       |   Observed   Bootstrap                         Normal-based
                                       | coefficient  std. err.      z    P>|z|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                 _bs_1 |  -259.1057   59.71616    -4.34   0.000    -376.1472   -142.0641
                          ------------------------------------------------------------------------------
                          
                          . estat bootstrap
                          
                          Bootstrap results                               Number of obs     =         74
                                                                          Replications      =         25
                          
                                Command: my_program
                                  _bs_1: r(b_total)
                          
                          ------------------------------------------------------------------------------
                                       |    Observed               Bootstrap
                                       | coefficient       Bias    std. err.  [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                 _bs_1 |  -259.10566   1.410509   59.716161   -340.3522   -118.136  (BC)
                          ------------------------------------------------------------------------------
                          Key: BC: Bias-corrected
                          
                          .

                          Comment


                          • #14
                            Thanks. If you run this on the 100 cases example dataset, it seems that there's a problem with the scalar function. (Please ignore for a moment the sample size, generated missing values, 'x's in bootstrap estimation etc as these perform ok on a full dataset).

                            Code:
                            capture program drop IOW
                            program IOW, rclass
                            capture drop logodds predprob inverseodds wt_iow
                            
                            *Run exposure model
                            logit smokingm i.bfed,or
                            
                            *Create inverse odds weights
                            predict logodds,xb
                            gen predprob=exp(logodds)/(1+exp(logodds))
                            gen inverseodds = ((1-predprob)/predprob)
                            
                            gen wt_iow = 1 if smokingm==0 
                            replace wt_iow = inverseodds if smokingm==1
                            
                            *Estimate TE
                            glm early_menopause i.smokingm, fam(binomial) link(log) vce(robust)  nolog base
                            
                            matrix bb_total = e(b) 
                            *matrix list e(b)
                            
                            scalar b_total = bb_total[1,1]
                            return scalar b_total = bb_total[1,1]
                            
                            *matrix list r(table)
                            
                            *display b_total
                            *scalar list
                            
                            *Estimate NDE
                            glm early_menopause i.smokingm [pw=wt_iow], fam(binomial) link(log) vce(robust)  nolog base
                            
                            matrix bb_direct=e(b)
                            *matrix list e(b) 
                            
                            scalar b_direct = bb_direct[1,1]
                            return scalar b_direct = bb_direct[1,1] 
                            *return list
                            
                            *matrix list r(table)
                            
                            *display b_direct
                            *scalar list  
                            
                            *Estimate NIE
                            return scalar b_indirect = b_total-b_direct 
                            
                            exit
                            end
                            
                            
                            IOW
                            return list
                            mat list bb_total bb_direct
                            scalar list b_total b_direct b_indirect
                            If you then run the following:

                            Code:
                            capture program drop my_program
                            program define my_program, rclass  
                            *Estimate TE
                            glm early_menopause i.smokingm, fam(binomial) link(log) vce(robust)  nolog base
                            matrix bb_total = e(b) 
                            scalar b_total = bb_total[1,1]
                            return scalar b_total = bb_total[1,1]
                            exit
                            end
                            
                            my_program
                            return list
                            display r(b_total)
                            matrix list bb_total
                            scalar list b_total
                            bootstrap r(b_total), reps(25): my_program
                            You see the same, scalar values not being stored/returned. Matrix list works ok.

                            Comment


                            • #15
                              You see the same, scalar values not being stored/returned. Matrix list works ok.
                              I cannot replicate your finding. When I do this, the output is:
                              Code:
                              . my_program
                              
                              Generalized linear models                         Number of obs   =        100
                              Optimization     : ML                             Residual df     =         98
                                                                                Scale parameter =          1
                              Deviance         =  46.27797963                   (1/df) Deviance =   .4722243
                              Pearson          =  99.99988662                   (1/df) Pearson  =   1.020407
                              
                              Variance function: V(u) = u*(1-u)                 [Bernoulli]
                              Link function    : g(u) = ln(u)                   [Log]
                              
                                                                                AIC             =   .5027798
                              Log pseudolikelihood = -23.13898981               BIC             =  -405.0287
                              
                              ------------------------------------------------------------------------------
                                           |               Robust
                              early_meno~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                  smokingm |
                                        0  |          0  (base)
                                        1  |   1.831763   1.066783     1.72   0.086    -.2590941    3.922619
                                           |
                                     _cons |  -3.931823   .9951334    -3.95   0.000    -5.882249   -1.981398
                              ------------------------------------------------------------------------------
                              
                              . return list
                              
                              scalars:
                                          r(b_total) =  0
                              
                              . display r(b_total)
                              0
                              
                              . matrix list bb_total
                              
                              bb_total[1,3]
                                   early_men~e:  early_men~e:  early_men~e:
                                            0b.            1.              
                                      smokingm      smokingm         _cons
                              y1             0     1.8317625    -3.9318234
                              
                              . scalar list b_total
                                 b_total =          0
                              
                              . bootstrap r(b_total), reps(25): my_program
                              (running my_program on estimation sample)
                              
                              Bootstrap replications (25)
                              ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
                              ....x.....xx.x..xxxx....x
                              
                              Bootstrap results                                          Number of obs = 100
                                                                                         Replications  =  16
                              
                                    Command: my_program
                                      _bs_1: r(b_total)
                              
                              ------------------------------------------------------------------------------
                                           |   Observed   Bootstrap                         Normal-based
                                           | coefficient  std. err.      z    P>|z|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                     _bs_1 |          0  (omitted)
                              ------------------------------------------------------------------------------
                              Note: One or more parameters could not be estimated in 9 bootstrap replicates;
                                    standard-error estimates include only complete replications.
                              You can see that r(b_total), bb_total[1,1], and scalar b_total are all 0. So the code is correctly maintaining the value in all of those places.

                              I think the problem is that bb_total[1,1] is not the parameter you want: bb_total[1,1] is the coefficient of 0.smokingm, and since 0 is the base category for that variable, its coefficient is, by definition, 0. I think you want the coefficient of 1.smokingm, which will be found in bb_total[1, 2]. If you make that change, you get:
                              Code:
                              . my_program
                              
                              Generalized linear models                         Number of obs   =        100
                              Optimization     : ML                             Residual df     =         98
                                                                                Scale parameter =          1
                              Deviance         =  46.27797963                   (1/df) Deviance =   .4722243
                              Pearson          =  99.99988662                   (1/df) Pearson  =   1.020407
                              
                              Variance function: V(u) = u*(1-u)                 [Bernoulli]
                              Link function    : g(u) = ln(u)                   [Log]
                              
                                                                                AIC             =   .5027798
                              Log pseudolikelihood = -23.13898981               BIC             =  -405.0287
                              
                              ------------------------------------------------------------------------------
                                           |               Robust
                              early_meno~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                  smokingm |
                                        0  |          0  (base)
                                        1  |   1.831763   1.066783     1.72   0.086    -.2590941    3.922619
                                           |
                                     _cons |  -3.931823   .9951334    -3.95   0.000    -5.882249   -1.981398
                              ------------------------------------------------------------------------------
                              
                              . return list
                              
                              scalars:
                                          r(b_total) =  1.831762535421762
                              
                              . display r(b_total)
                              1.8317625
                              
                              . matrix list bb_total
                              
                              bb_total[1,3]
                                   early_men~e:  early_men~e:  early_men~e:
                                            0b.            1.              
                                      smokingm      smokingm         _cons
                              y1             0     1.8317625    -3.9318234
                              
                              . scalar list b_total
                                 b_total =  1.8317625
                              And following that up with the -bootstrap-:
                              Code:
                              . my_program
                              
                              Generalized linear models                         Number of obs   =        100
                              Optimization     : ML                             Residual df     =         98
                                                                                Scale parameter =          1
                              Deviance         =  46.27797963                   (1/df) Deviance =   .4722243
                              Pearson          =  99.99988662                   (1/df) Pearson  =   1.020407
                              
                              Variance function: V(u) = u*(1-u)                 [Bernoulli]
                              Link function    : g(u) = ln(u)                   [Log]
                              
                                                                                AIC             =   .5027798
                              Log pseudolikelihood = -23.13898981               BIC             =  -405.0287
                              
                              ------------------------------------------------------------------------------
                                           |               Robust
                              early_meno~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                  smokingm |
                                        0  |          0  (base)
                                        1  |   1.831763   1.066783     1.72   0.086    -.2590941    3.922619
                                           |
                                     _cons |  -3.931823   .9951334    -3.95   0.000    -5.882249   -1.981398
                              ------------------------------------------------------------------------------
                              
                              . return list
                              
                              scalars:
                                          r(b_total) =  1.831762535421762
                              
                              . display r(b_total)
                              1.8317625
                              
                              . matrix list bb_total
                              
                              bb_total[1,3]
                                   early_men~e:  early_men~e:  early_men~e:
                                            0b.            1.              
                                      smokingm      smokingm         _cons
                              y1             0     1.8317625    -3.9318234
                              
                              . scalar list b_total
                                 b_total =  1.8317625
                              
                              . bootstrap r(b_total), reps(25): my_program
                              (running my_program on estimation sample)
                              
                              Bootstrap replications (25)
                              ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
                              .xx.xxx......xx.xx..x...x
                              
                              Bootstrap results                                          Number of obs = 100
                                                                                         Replications  =  14
                              
                                    Command: my_program
                                      _bs_1: r(b_total)
                              
                              ------------------------------------------------------------------------------
                                           |   Observed   Bootstrap                         Normal-based
                                           | coefficient  std. err.      z    P>|z|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                     _bs_1 |   1.831763   .7508921     2.44   0.015      .360041    3.303484
                              ------------------------------------------------------------------------------
                              Note: One or more parameters could not be estimated in 11 bootstrap replicates;
                                    standard-error estimates include only complete replications.
                              By the way, one of the pitfalls of accessing coefficients through e(b) is that you need to know which column contains the information on the variable you are interested in. It is easy to get that wrong. A more robust way to code this is -scalar b_total = _b[1.smokingm]-, because the pseudo-matrix _b allows you to refer to the coefficients by name, rather than location.

                              Added: Concerning my last paragraph above, you can also do something like this working with matrix bb_total, but it requires you to know the full equation name of the variable: -scalar b_total = _b["y1", "early_menopause:1.smokingm"]-. (_b["y1", "1.smokingm"] will give you an error message.) Aside from this being extra typing, it can fail if you don't know exactly how your particular regression command names the columns in e(b), and it sometimes differs in different commands. Working with _b[ ] is simpler because it does not require this level of detail.
                              Last edited by Clyde Schechter; 16 Aug 2022, 11:39.

                              Comment

                              Working...
                              X