Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting logistic regression results

    Hi all,

    I am trying to investigate the effect of breastfeeding on the double burden of malnutrition among mother-child pairs. The breastfeeding variable m4 has 3 categories: never breastfed, ever breastfed but no longer breastfeeding, and still breastfeeding. The outcome variable is binary where 1 = presence of the double burden and 0 = no presence. I ran a multivariate model with various sociodemographic, health and dietary variables as such:

    Code:
    svy: logit DBM i.b4 i.age_child i.sizeatbirth i.diarrhea i.v460 i.v013 i.v106 c.v218 i.short_structure i.v394 c.v453 i.v463a i.m4 i.m3b i.v401 i.v714 i.first_birth i.v467b i.v467d i.v130 i.v190 ib2.v025 c.HHhead_age i.v151 i.improved_water i.improved_sanitation i.s653a i.s653b i.s653c i.s653d i.s653e i.s653f i.s653g i.s653h i.s653i i.s653j i.s653k i.s653l i.s653m i.s653n i.s653o i.s653p i.s653q i.s653r i.s653s i.s653t i.s653u i.s653v, or
    The results of the regression are as follows (only top part included to avoid it being too long):

    Code:
                            
            Linearized
    DBM    Odds ratio    std. err.    t    P>t    [95% conf.    interval]
                            
    b4    
    female    .7977014    .0811424    -2.22    0.026    .6533933    .9738813
        
    age_child    
    12-    1.443243    .3351756    1.58    0.114    .915109    2.276178
    19-    1.623259    .4312487    1.82    0.068    .9639142    2.733616
    29-    1.394491    .3956674    1.17    0.241    .7992317    2.433095
    39-    1.008225    .2936575    0.03    0.978    .56938    1.785306
    49-    1.059838    .3035261    0.20    0.839    .6042769    1.858845
        
    sizeatbirth    
    2    1.512282    .3164086    1.98    0.048    1.003165    2.279782
    3    1.69537    .3276294    2.73    0.006    1.160421    2.476926
    4    1.835523    .4701192    2.37    0.018    1.110563    3.033727
    5    3.131504    1.149433    3.11    0.002    1.524131    6.434036
        
    1.diarrhea    1.027892    .1721993    0.16    0.870    .7399708    1.427842
        
    v460    
    all children    .8815488    .1543398    -0.72    0.472    .6252876    1.242833
    some children    .6968555    .1815756    -1.39    0.166    .417967    1.161832
    no net in household    1.015444    .1905204    0.08    0.935    .7027507    1.467272
        
    v013    
    20-24    2.393979    1.181106    1.77    0.077    .9094378    6.301843
    25-29    3.087438    1.450843    2.40    0.017    1.228083    7.761909
    30-34    2.945537    1.494631    2.13    0.033    1.08853    7.970553
    35-39    4.306002    2.204598    2.85    0.004    1.577121    11.75665
    40-44    3.202782    1.834246    2.03    0.042    1.041318    9.850794
    45-49    4.583337    3.074994    2.27    0.023    1.229047    17.09209
        
    v106    
    primary    1.626226    .2969723    2.66    0.008    1.136563    2.326851
    secondary    1.191351    .2511549    0.83    0.406    .7878169    1.801584
    higher    1.036256    .3389724    0.11    0.913    .5454653    1.968644
        
    v218    1.09111    .0470229    2.02    0.043    1.002653    1.187372
    1.short_structure    1.84761    .2787333    4.07    0.000    1.374283    2.483959
        
    v394    
    yes    .774395    .0932031    -2.12    0.034    .611532    .9806316
    v453    1.000462    .0005271    0.88    0.381    .9994283    1.001497
        
    v463a    
    yes    3.031143    1.800266    1.87    0.062    .9453304    9.719169
        
    m4    
    never breastfed    1.255726    .4795985    0.60    0.551    .5935959    2.656432
    still breastfeeding    .7993677    .1818369    -0.98    0.325    .5116077    1.248982

    In this result, neither category of breastfeeding is statistically significant. However, if i replace the categorical child age variable with a continuous equivalent b19 (age in months), still breastfeeding becomes statistically significant for reducing the odds of having the double burden:

    Code:
    svy: logit DBM i.b4 c.b19 i.sizeatbirth i.diarrhea i.v460 i.v013 i.v106 c.v218 i.short_structure i.v394 c.v453 i.v463a i.m4 i.early_initiation i.m3b i.v401 i.v714 i.first_birth i.v467b i.v467d i.v130 i.v190 ib2.v025 c.HHhead_age i.v151 i.improved_water i.improved_sanitation i.s653a i.s653b i.s653c i.s653d i.s653e i.s653f i.s653g i.s653h i.s653i i.s653j i.s653k i.s653l i.s653m i.s653n i.s653o i.s653p i.s653q i.s653r i.s653s i.s653t i.s653u i.s653v, or
    Code:
                            
            Linearized
    DBM    Odds ratio    std. err.    t    P>t    [95% conf.    interval]
                            
    b4    
    female    .8011144    .0826832    -2.15    0.032    .6542735    .9809114
    b19    .9887508    .0044224    -2.53    0.012    .9801128    .9974649
        
    sizeatbirth    
    2    1.538217    .3256564    2.03    0.042    1.015407    2.330209
    3    1.725471    .3389479    2.78    0.006    1.17366    2.536722
    4    1.914457    .4995159    2.49    0.013    1.147476    3.194095
    5    3.179426    1.18215    3.11    0.002    1.533078    6.593763
        
    1.diarrhea    .9882171    .1658365    -0.07    0.944    .7110086    1.373504
        
    v460    
    all children    .8891045    .1566164    -0.67    0.505    .629321    1.256127
    some children    .6958354    .1822213    -1.38    0.166    .4162838    1.163117
    no net in household    1.031197    .1930018    0.16    0.870    .7142965    1.488691
        
    v013    
    20-24    3.433124    2.083032    2.03    0.042    1.0441    11.28852
    25-29    4.557931    2.623815    2.64    0.009    1.473349    14.10035
    30-34    4.349447    2.679922    2.39    0.017    1.298589    14.56788
    35-39    6.43073    3.944805    3.03    0.002    1.930266    21.42414
    40-44    4.804882    3.210934    2.35    0.019    1.295151    17.82564
    45-49    7.200005    5.517323    2.58    0.010    1.601214    32.37548
        
    v106    
    primary    1.657486    .303282    2.76    0.006    1.157586    2.373267
    secondary    1.173758    .2525981    0.74    0.457    .7695272    1.79033
    higher    1.031402    .3411283    0.09    0.926    .5390579    1.973426
        
    v218    1.091747    .0470638    2.04    0.042    1.003214    1.188094
    1.short_structure    1.82728    .2799426    3.93    0.000    1.352935    2.467932
        
    v394    
    yes    .7714005    .0965369    -2.07    0.038    .6034708    .9860605
    v453    1.000472    .0005165    0.91    0.361    .9994591    1.001486
        
    v463a    
    yes    2.988678    1.795403    1.82    0.069    .9197134    9.711936
        
    m4    
    never breastfed    1.149961    .6107756    0.26    0.793    .4056597    3.259898
    still breastfeeding    .6109488    .116082    -2.59    0.010    .4208451    .886926
    The same occurs if I widen the child age variable into two categories of 5-23 months and 24-59 months. Any narrower categories and breastfeeding becomes insignificant. I am not quite sure how to interpret this. I was thinking that possibly with the narrow age categories, some of the variation caused by breastfeeding is explained by the age categories, with younger age groups more likely to experience the benefits of breastfeeding. With the continuous age variable, age acts as a control which is needed as older children are less likely to still be breastfed. Would this interpretation be valid or am I reaching? I wanted to include age as categorical so I could show which age groups are at greatest risk (and to account for the fact that the effect of age is non-linear). I was then thinking of including this second model as well with age being continuous to show that breastfeeding is a significant factor associated with the double burden. Any comments or advice on how I should handle this would be greatly appreciated.

  • #2
    Aadi:
    you might be interested in reading: https://pubmed.ncbi.nlm.nih.gov/16217841/
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo, thanks a lot for the link, it was very helpful. So from what I understand, dichotomizing the age variables makes the model lose statistical power, especially when used to adjust for the effects of a confounder (breastfeeding). It therefore clearly seems more suitable to keep age as a continuous variable. I plotted the effect of age on the outcome DBM, and there is a positive (but insignificant) linear relationship between the two so I include a single linear age term. However, in the multivariate model the relationship is negative and significant, and still breastfeeding significantly lowers the odds of DBM. When I remove breastfeeding from the model, age becomes insignificant again. I have tested every other variable in the model and only breastfeeding has this effect on age. If I interact the age and breastfeeding terms, I get the following output (b19 is child age and m4 is breastfeeding - base category once breastfed but no longer breastfeeding):

      Code:
      --------------------------------------------------------------------------------------
                           |             Linearized
                       DBM | Odds ratio   std. err.      t    P>|t|     [95% conf. interval]
      ---------------------+----------------------------------------------------------------
                        b4 |
                   female  |   .7898089   .0796394    -2.34   0.019     .6480549    .9625698
                       b19 |   .9850384   .0042554    -3.49   0.001     .9767255    .9934221
                           |
               sizeatbirth |
                        2  |   1.491948   .3045659     1.96   0.050     .9996016    2.226797
                        3  |   1.636046   .3090005     2.61   0.009     1.129479    2.369806
                        4  |   1.749131   .4434057     2.21   0.028     1.063751    2.876104
                        5  |   2.678137   .9364573     2.82   0.005     1.348704     5.31801
                           |
                1.diarrhea |     1.0185   .1666868     0.11   0.911     .7387945    1.404101
                           |
                      mother age |
                    20-24  |   2.262518   1.059738     1.74   0.082     .9026627    5.670988
                    25-29  |   2.931134   1.293977     2.44   0.015     1.232857    6.968813
                    30-34  |   3.009925   1.422046     2.33   0.020     1.191315    7.604744
                    35-39  |   4.613895   2.260387     3.12   0.002      1.76468    12.06339
                    40-44  |   3.582218   1.842382     2.48   0.013     1.306043    9.825315
                    45-49  |   4.713869   2.720564     2.69   0.007     1.519339    14.62516
                           |
                      mother education |
                  primary  |   1.656941   .2638861     3.17   0.002     1.212319     2.26463
                secondary  |   1.307112   .2009673     1.74   0.082     .9667602    1.767286
                   higher  |   1.297249   .3525476     0.96   0.338      .761166    2.210891
                           |
                      v218 |   1.072881   .0454016     1.66   0.097     .9874095    1.165752
         1.short_structure |   1.800499    .264921     4.00   0.000     1.349062    2.403001
                           |
                      v394 |
                      yes  |   .7631513   .0874067    -2.36   0.018     .6095761    .9554179
                      v453 |   1.000552   .0005059     1.09   0.275     .9995602    1.001545
                           |
                     v463a |
                      yes  |   3.741827   2.373133     2.08   0.038     1.078278    12.98484
                           |
                        m4 |
          never breastfed  |   .8708385   .8477348    -0.14   0.887     .1289859    5.879398
      still breastfeeding  |   .3015025   .0806458    -4.48   0.000     .1784015    .5095461
                           |
                  m4#c.b19 |
          never breastfed  |    1.01042   .0248725     0.42   0.674     .9627841    1.060412
      still breastfeeding  |   1.047782   .0136611     3.58   0.000     1.021322    1.074928
                           |
                     _cons |   .0199635   .0105252    -7.42   0.000     .0070965    .0561604
      If I plot the predicted probabilities, I get the following graph:

      breastfeeding_age.png

      I think I can make sense of this graph: breastfeeding reduces the probability of DBM until around 30 months, after which it increases the probability compared to those who don't breastfeed (possibly due to breastfeeding not being accompanied by complementary feeding practices after this age). However, I am still unclear why adding breastfeeding to the model makes age significant. The literature suggests that the relationship between age and DBM should be positive, but it is negative in this model. Why would breastfeeding cause child age to be significantly negatively related to DBM, even with the inclusion of the interaction?

      Comment


      • #4
        Asadil:
        have you already investigated a non-linear relationship between mother's age (plugged in as a continuous variable with both its linear and squared terms)?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Aadil:
          please forget the part about quadratic relationship in my previous reply (whereas the linear term of mother's age as a continuous variabile deserves a shot) and take a look at https://journals.sagepub.com/doi/pdf...867X1001000211
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Hi Carlo, thank you very much for your advice. It seems to me that when looking at interactions the important part is the coefficients for the interactions themselves and the "main" effect of child age is not so important. So would it be fair to say that as breastfeeding is what causes child age to be significant, the interaction between age and breastfeeding is what I should be looking at, and the individual effect for child age can be disregarded?

            Regardless, I'm still slightly confused about my results, so I'm trying to re-fit my model. I'm trying to use fractional polynomials to figure out the best way to model child age.

            I've used the following command to test this:

            Code:
            fp <b19>, replace : logistic DBM <b19> b4 sizeatbirth diarrhea v013 v106 v218 short_structure v394 v453 v463a m4 birth_assistance v401 v714 first_birth v467b v467d v130 v190 v025 HHhead_age v151 improved_water improved_sanitation
            where b19 is the child age variables and the rest are the other independent variables. I get the following output, suggesting that the m2 model is best. I'm a bit unsure about what this output means as the powers are .5 and .5. Surely this doesn't mean including two square root terms, so how would I model child age given these results?


            Code:
            -------------------------------------------------------------------
                         | Test              Deviance
                     b19 |   df   Deviance       diff.       P   Powers
            -------------+-----------------------------------------------------
                 omitted |    4   4083.692     14.714    0.005               
                  linear |    3   4074.690      5.711    0.127   1           
                   m = 1 |    2   4072.647      3.669    0.160   2           
                   m = 2 |    0   4068.978      0.000       --   .5 .5       
            -------------------------------------------------------------------

            Comment


            • #7
              Aadil:
              you may find https://www.stata.com/bookstore/flex...nalysis-stata/ useful.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Hi Carlo. Thanks for your suggestion, unfortunately I do not have access online access to this resource through my university. I did manage to figure it though, so thank you for your help. I am still having trouble modelling my continuous variables though. I have plotted lowess graphs for the variables and outcome DBM to look at the relationship.

                Click image for larger version

Name:	childage1.png
Views:	4
Size:	25.9 KB
ID:	1677032Click image for larger version

Name:	firstbirth1.png
Views:	2
Size:	28.1 KB
ID:	1677034

                Click image for larger version

Name:	hhead1.png
Views:	1
Size:	26.9 KB
ID:	1677035 Click image for larger version

Name:	hmlevel1.png
Views:	1
Size:	29.7 KB
ID:	1677036

                From what I can interpret from these, child age does not have a linear relationship - it looks a bit like a log or a cubic function. The other three seem to be fairly linear. However, for age of household head and hemoglobin level these have strong variation at the start, and then are linear after this. In this case, how would I model these variables?

                If I do a multivariate fractional polynomial model with only the continuous variables, I get the following results suggesting all variables should be modelled as linear, except for child age which should be modelled as square root of age and ln(square root of age).

                Code:
                mfp : logistic DBM b19 v212 v218 v453 HHhead_age
                Code:
                Final multivariable fractional polynomial model for DBM
                --------------------------------------------------------------------
                    Variable |    -----Initial-----          -----Final-----
                             |   df     Select   Alpha    Status    df    Powers
                -------------+------------------------------------------------------
                         b19 |    4     1.0000   0.0500     in      4     .5 .5
                        v212 |    4     1.0000   0.0500     in      1     1
                        v218 |    4     1.0000   0.0500     in      1     1
                        v453 |    4     1.0000   0.0500     in      1     1
                  HHhead_age |    4     1.0000   0.0500     in      1     1
                --------------------------------------------------------------------

                This is different for my second outcome TBM, which suggests age should be modelled as linear as well.

                Code:
                Final multivariable fractional polynomial model for TBM
                --------------------------------------------------------------------
                    Variable |    -----Initial-----          -----Final-----
                             |   df     Select   Alpha    Status    df    Powers
                -------------+------------------------------------------------------
                         b19 |    4     1.0000   0.0500     in      1     1
                        v212 |    4     1.0000   0.0500     in      1     1
                        v218 |    4     1.0000   0.0500     in      1     1
                        v453 |    4     1.0000   0.0500     in      1     1
                  HHhead_age |    4     1.0000   0.0500     in      1     1
                --------------------------------------------------------------------
                If I do a multivariate polynomial model with all my independent variables, it shows that a linear model is best for all the continuous variables, which is the same for DBM and TBM (v130 and v190 are categorical so I have ignored these powers):

                Code:
                mfp : logistic DBM b19 b4 sizeatbirth diarrhea v013 v106 v218 short_structure v394 v453 v463a m4 birth_assistance v401 v714 v212 v467b v467d v130 v190 v025 HHhead_age v151 improved_water improved_sanitation
                Code:
                Final multivariable fractional polynomial model for DBM
                --------------------------------------------------------------------
                    Variable |    -----Initial-----          -----Final-----
                             |   df     Select   Alpha    Status    df    Powers
                -------------+------------------------------------------------------
                         b19 |    4     1.0000   0.0500     in      1     1
                          b4 |    1     1.0000   0.0500     in      1     1
                 sizeatbirth |    2     1.0000   0.0500     in      1     1
                    diarrhea |    1     1.0000   0.0500     in      1     1
                        v013 |    4     1.0000   0.0500     in      1     1
                        v106 |    2     1.0000   0.0500     in      1     1
                        v218 |    4     1.0000   0.0500     in      1     1
                short_str... |    1     1.0000   0.0500     in      1     1
                        v394 |    1     1.0000   0.0500     in      1     1
                        v453 |    4     1.0000   0.0500     in      1     1
                       v463a |    1     1.0000   0.0500     in      1     1
                          m4 |    1     1.0000   0.0500     in      1     1
                birth_ass... |    1     1.0000   0.0500     in      1     1
                        v401 |    1     1.0000   0.0500     in      1     1
                        v714 |    1     1.0000   0.0500     in      1     1
                        v212 |    4     1.0000   0.0500     in      1     1
                       v467b |    1     1.0000   0.0500     in      1     1
                       v467d |    1     1.0000   0.0500     in      1     1
                        v130 |    2     1.0000   0.0500     in      2     -2
                        v190 |    2     1.0000   0.0500     in      2     -1
                        v025 |    1     1.0000   0.0500     in      1     1
                  HHhead_age |    4     1.0000   0.0500     in      1     1
                        v151 |    1     1.0000   0.0500     in      1     1
                improved_... |    1     1.0000   0.0500     in      1     1
                improved_... |    1     1.0000   0.0500     in      1     1
                --------------------------------------------------------------------

                I feel like I am making this more complicated than it needs to be. It seems for most of these variables keeping them linear would suffice. Although child age does not seem linear, modelling it at square root of age and ln(square root of age) seems like it would add non-needed complexity to the model, especially as I am more interested in child age being a control for confounders such as breastfeeding rather than investigating the actual impact of age. Would you have any advice on how to tackle this? Apologies for all the questions, but this dissertation is causing lots of trouble and unfortunately I do not have a supervisor to help me!
                Attached Files

                Comment


                • #9
                  Aadil:
                  keep it linear, child_age included.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thank you so much for your help Carlo, I really appreciate it!

                    Comment

                    Working...
                    X