Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Main effects significance and Interaction terms using logistic regression in stat 16

    Dear all users,
    I am interested to examine the effect of emotional IPV on ANC visits differ by women educational status or not.

    Outcome variable: (adequate ANC service (1) or inadequate ANC service (0))
    Main exposures: spousal emotional IPV (Yes/No or 1/0)
    Moderators: (Education status) - Lower education (1) & Higher education (2).
    Hypothesis 1:
    The effect of emotional IPV on adequate ANC services will be moderated by education and wealth.

    I have fitted the model using the following commands:
    xtmelogit anc_adequacy i.emotional_viol1##1.educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.dma i.birth_order i.V102 i.contextual_regions || psu :,or nolog
    Output:
    Mixed-effects logistic regression Number of obs = 2,863
    Group variable: psu Number of groups = 618
    Obs per group:
    min = 1
    avg = 4.6
    max = 12
    Integration points = 7 Wald chi2(17) = 263.01
    Log likelihood = -1571.0285 Prob > chi2 = 0.0000
    ------------------------------------------------------------------------------------------------
    anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    -------------------------------+----------------------------------------------------------------
    emotional_viol1 |
    Yes | 1.154874 .4641096 0.36 0.720 .525366 2.538676
    |
    educ_mom |
    Primary or no education | .6540172 .13108 -2.12 0.034 .4415592 .9687003
    |
    emotional_viol1#educ_mom |
    Yes#Primary or no education | .6461428 .2718937 -1.04 0.299 .2832351 1.474042

    Note: results of all other covariates excluded

    Therefore, there is no main effect of emotional violence (p = 0.720) on ANC and statistical insignificant interaction on this model between emotional IPV and low education (p = 0.299), while adjusted for all covariate. But using these same variables the model fitted without interaction terms found that the effect of emotional IPV (p= 0.021) on ANC visits depends on women’s low education. Here, the commands and the finding using stat 16:
    . xtmelogit anc_adequacy emotional_viol1 i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextu al_regions if educ_mom==1 || psu :,or nolog
    d-effects logistic regression Number of obs = 2,548
    Group variable: psu Number of groups = 580

    Obs per group:
    min = 1
    avg = 4.4
    max = 12

    Integration points = 7 Wald chi2(15) = 160.53
    Log likelihood = -1384.4623 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------------------------
    anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    -------------------------------+----------------------------------------------------------------
    emotional_viol1 | .7387078 .0969917 -2.31 0.021 .571098 .9555089
    |
    age_catgorey |
    25 – 34 | 1.433873 .2368658 2.18 0.029 1.037285 1.982089
    35 – 49 | 1.446263 .3013516 1.77 0.077 .9613604 2.175748

    My question is why these two models finding vary? Is there any mistakes I have made in the commands to fit the models? If not, both the main effects and interaction terms are insignificant, why the stratified analyses become sig. for the main effect emotional IPV (p = 0.021)? Why the number of participants are different (2863 vs. 2548) in the two models?
    Thank you so much in advance for your quick responses.

  • #2
    The main problem is that you don't understand how interaction models work, and you don't understand the meaning of statistical significance either. I also have some questions about the appropriateness of the modeling here, but I'll defer that for later. Let's clear up the misunderstandings of what you've done first.

    In an interaction model, by definition, there is no such thing as the effect of IPV on ANC adequacy. Rather, there are two different effects of IPV on ANC: one when educ_mom = 1 and another when educ_mom = 0.

    In the regression output the coefficient (odds ratio) of emotional_viol1 (which I'm guessing is the variable name corresponding to IPV) represents the effect of emotional_viol1 on ANC when educ_mom = 0. The coefficient of emotional_viol1#educ_mom represents the difference (actually, ratio of odds ratios) between the effect of emotioinal_viol1 on ANC when educ_mom = 1 compared to the effect when educ_mom = 0. The effect (odds ratio) of emotional_viol1 on ANC when educ_mom = 1, itself, does not appear in the output and must be calculated separately. Rather than doing that calculation, it is easier to follow up the interaction model with the -margins- command:

    Code:
    margins educ_mom#emotional_viol1
    margins educ_mom, dydx(emotional_viol1)
    gives you results that are easier to understand. The output from the first of those -margins- commands will be the predicted probability of ANC in each of the four combinations of values of educ_mom and emotional_viol1. The second one gives you the separate marginal effects (as a difference in probabilities, not in the odds ratio metric) of emotional_viol1 on ANC for each of the values of educ_mom. This will not only be easier for you to understand, it will also be much easier to explain to anyone who needs to see your results.

    In your non-interaction model, the coefficient (odds ratio) of emotional_viol1 represents an overall effect of emotional_viol1 on ANC that does not take educ_mom into account. You might think of it as an average effect, in a loose sort of way. It is important to note that this average effect draws on the entire estimation sample, whereas the separate effects calculated from the interaction model are each relying primarily on a subset of the sample (of which at least one is half the sample size or less). So it is not appropriate to ponder the "significance" of one in comparison to the "significance" of the other. It is appropriate to compare the odds ratios themselves directly if you like, without regard to p-values.

    Now, as for statistical significance, a non-statistically-significant result does not mean that there is no effect. It means only that the combination of the actual effect, the sample size, and the noisiness of the data are such that you cannot estimate the effect precisely enough to determine whether it is positive, negative or zero to the desired degree of confidence. Even when properly interpreted, however, the concept of statistical significance is a slippery one, and it does not obey the usual laws of logic, leading people to waste time puzzling over paradoxical changes in the "significance" of results when the modeling is slightly changed. Few people remember that the difference between statistically significant and not statistically significant is not, itself, statistically significant. These are among the reasons that the American Statistical Society recommends that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.

    Now let's turn to some questions I have about your modeling. You do not explain the variable psu: which you use as a second level in your model. But that name is suggestive of a primary sampling unit in a survey design. In fact, I cannot recall ever seeing that name used in any other context, though perhaps this is the first time. Now, it may be appropriate to treat the primary sampling unit of a survey as a level in the model--but often it is not. That depends on how the psu's were arrived at. If the psu's are, say, households, and the outcome variable is strongly SES dependent, then, yes, we would expect intra-household correlation, and use of psu as a level would make sense. But if the psu's are telephone exchanges (as in a random digit dialing survey), then for most outcomes, there is little reason to think people with the same phone exchange will be more similar in their outcomes than those from other phone exchanges. So the use of the psu as a second level would not make sense. So you need to think about this.

    The other thing I will say is that if, as I suspect, you have a complex survey design here, then you need to use that design properly. That means using the -svyset- command to set out sampling weights, primary (and possibly higher order) sampling units, and stratification in order to get proper results. And then you have to use the -svy:- prefix with your -melogit- command. (-xtmelogit- is an old name; use -melogit- unless you are working on a pretty old version of Stata, like 13 or earlier.)

    Comment


    • #3
      Dear Clyde Schechter,
      Thank you so much for your help really, it’s highly useful and informative particularly the confusion I had how interaction model work and the p-values issue, including the links about statistical sign of the p-values.
      I have fitted the model again based on your feedback:
      Command :
      xtmelogit anc_adequacy i.emotional_viol1##educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog

      Output:
      Mixed-effects logistic regression Number of obs = 2,863
      Group variable: psu Number of groups = 618
      Obs per group:
      min = 1
      avg = 4.6
      max = 12
      Integration points = 7 Wald chi2(17) = 263.01
      Log likelihood = -1571.0285 Prob > chi2 = 0.0000
      ------------------------------------------------------------------------------------------------
      anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
      -------------------------------+----------------------------------------------------------------
      emotional_viol1 |
      Yes | .7462136 .0959586 -2.28 0.023 .5799676 .9601134
      |
      educ_mom |
      Secondary and above | 1.529012 .306449 2.12 0.034 1.032311 2.264702
      |
      emotional_viol1#educ_mom |
      Yes#Secondary and above | 1.547645 .6512417 1.04 0.299 .6784059 3.530636

      Here, the results of other covariates excluded for simplicity. Using the margins command, also found the interaction model with the following results:

      Command: margins educ_mom#emotional_viol1

      Output:
      Predictive margins Number of obs = 2,863
      Expression : Linear prediction, fixed portion, predict(xb)
      ----------------------------------------------------------------------------------------------
      | Delta-method
      | Margin Std. Err. z P>|z| [95% Conf. Interval]
      -----------------------------+----------------------------------------------------------------
      educ_mom#emotional_viol1 |
      Primary or no education#No | -.8091921 .074371 -10.88 0.000 -.9549565 -.6634277
      Primary or no education#Yes | -1.101936 .1238117 -8.90 0.000 -1.344602 -.859269
      Secondary and above#No | -.3845704 .1915795 -2.01 0.045 -.7600593 -.0090815
      Secondary and above#Yes | -.2405796 .3801416 -0.63 0.527 -.9856433 .5044842
      ----------------------------------------------------------------------------------------------

      Command: margins educ_mom, dydx(emotional_viol1)

      Output:
      Average marginal effects Number of obs = 2,863
      Expression : Linear prediction, fixed portion, predict(xb)
      dy/dx w.r.t. : 1.emotional_viol1
      ------------------------------------------------------------------------------------------
      | Delta-method
      | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
      -------------------------+----------------------------------------------------------------
      0.emotional_viol1 | (base outcome)
      -------------------------+----------------------------------------------------------------
      1.emotional_viol1 |
      educ_mom |
      Primary or no education | -.2927434 .128594 -2.28 0.023 -.544783 -.0407039
      Secondary and above | .1439908 .4018706 0.36 0.720 -.643661 .9316426
      ------------------------------------------------------------------------------------------
      Note: dy/dx for factor levels is the discrete change from the base level.

      According to the above results, particularly based on the separate marginal effects model, can I say that the effects of emotional IPV on ANC when mom-educ is primary or none, statistically significant but non-significant for secondary or above education? Or simply put can I say that the effect of emotional IPV on ANC differs by mothers’ education status, even if the interaction term is insignificant (emotional_viol1#educ_mom, p = 0.299).

      About the modelling (the use of primary sampling unit (PSU)) in my case, I am using the 2016 Ethiopian Demographic and Health survey (EDHS) dataset, where most DHS surveys use a fixed take of households per cluster, determining the number of clusters to be selected. In the first stage of selection, the primary sampling units (PSUs) are selected with probability proportional to size (PPS) within each stratum.

      Regarding, the survey design, yes I have a complex survey design that need to use a complex design properly. As you mentioned I tried to use -svyset- command with -melogit- command:
      Command:
      svy: melogit anc_adequacy i.emotional_viol1##educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog

      but gave me the following error term:

      survey final weights not allowed with multilevel models;
      a final weight variable was svyset using the [pw=exp] syntax, but multilevel models require that each stage-level weight variable
      is svyset using the stage's corresponding weight() option
      an error occurred when svy executed melogit.

      So, how I can fix these problem in order to get proper results? I am using Stata 16 software.
      Thank you so much in advance for your usually supportive responses.

      Comment


      • #4
        So let's deal with the svy: issue first. You need to consult the documentation that came with your survey data to find out what the sampling weights are at the level of the variable psu, so you can specify them in your -svyset- command as well. This assumes that psu is really an appropriate level in the model--which I am skeptical of because that is rarely the case. It is more likely that you should be using -svy: logit- rather than -xtmelogit … || psu:-. But check the survey documentation (or contact the curators of the survey).

        I am puzzled by the outputs you are getting from -margins-, not for the reasons you are, but because Stata is giving you the results in the log-odds metric (see where it says it's predicting xb). The default output of -margins- should be the predicted probability. I think the reason you are getting that is because you are still using the incorrect name of the command. -xtmelogit- is no longer a current command name in Stata. If you read the -help xtmelogit- file you will see that that name is no longer in use and that command is now called -meqrlogit-. Now, even -meqrlogit- still has the predicted probability as the default output from -margins-. But if we go back a few versions, -xtmelogit- may have had -xb- as the default -margins- output. So I suspect your use of the obsolete command name is confusing Stata. Re-run it all using either -melogit- or -meqrlogit- as the command name. (They are two different algorithms for estimating the same model. There is no particular reason to prefer one over the other, but either may produces results when the other fails to converge.) Anyway, your results will be much more understandable in the probability metric and then we can discuss what conclusions to draw. But before doing even this, resolve the question of whether a multi-level model is even appropriate at all, rather than -svy: logit-. I don't think either of us should invest a lot of time and effort into interpreting results of a model that may well be inappropriate in the first place. So let's get that question settled first.

        Comment


        • #5
          Dear Clyde Schechter,
          Thank you so much again for endless help,
          Regarding the question that the multilevel model appropriateness of my data, the data is clustered at the survey level. Individuals are nested within families, which are in turn nested within communities which are in turn influenced by policies and a host of other factors. Methodologically, it is important to take this nested structure into account. This demands the use of multilevel modelling, which would calculate the standard errors more accurately and reduce the chance of misestimating the significance of variables, as some of the assumptions inherent in traditional regression methods are not valid for nested data.
          Also, I re-run the model using both –melogit- and –meqrlogit- command and found the same output:
          Command:
          meqrlogit anc_adequacy i.emotional_viol1##i.educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog
          output:
          Mixed-effects logistic regression Number of obs = 2,863
          Group variable: psu Number of groups = 618
          Obs per group:
          min = 1
          avg = 4.6
          max = 12

          Integration points = 7 Wald chi2(17) = 263.01
          Log likelihood = -1571.0285 Prob > chi2 = 0.0000
          ------------------------------------------------------------------------------------------------
          anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
          -------------------------------+----------------------------------------------------------------
          emotional_viol1 |
          Yes | .7462136 .0959586 -2.28 0.023 .5799676 .9601134
          |
          educ_mom |
          Secondary and above | 1.529012 .306449 2.12 0.034 1.032311 2.264702
          |
          emotional_viol1#educ_mom |
          Yes#Secondary and above | 1.547645 .6512417 1.04 0.299 .6784059 3.530636
          |
          age_catgorey |

          Command:
          . margins educ_mom#emotional_viol1
          Predictive margins Number of obs = 2,863
          Expression : Linear prediction, fixed portion, predict(xb)
          ----------------------------------------------------------------------------------------------
          | Delta-method
          | Margin Std. Err. z P>|z| [95% Conf. Interval]
          -----------------------------+----------------------------------------------------------------
          educ_mom#emotional_viol1 |
          Primary or no education#No | -.8091921 .074371 -10.88 0.000 -.9549565 -.6634277
          Primary or no education#Yes | -1.101936 .1238117 -8.90 0.000 -1.344602 -.859269
          Secondary and above#No | -.3845704 .1915795 -2.01 0.045 -.7600593 -.0090815

          Secondary and above#Yes | -.2405796 .3801416 -0.63 0.527 -.9856433 .5044842

          Command:
          margins educ_mom, dydx(emotional_viol1)
          Average marginal effects Number of obs = 2,863
          Expression : Linear prediction, fixed portion, predict(xb)
          dy/dx w.r.t. : 1.emotional_viol1
          ------------------------------------------------------------------------------------------
          | Delta-method
          | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
          -------------------------+----------------------------------------------------------------
          0.emotional_viol1 | (base outcome)
          -------------------------+----------------------------------------------------------------
          1.emotional_viol1 |
          educ_mom |
          Primary or no education | -.2927434 .128594 -2.28 0.023 -.544783 -.0407039
          Secondary and above | .1439908 .4018706 0.36 0.720 -.643661 .9316426
          ------------------------------------------------------------------------------------------
          Note: dy/dx for factor levels is the discrete change from the base level.

          Again, we can see that the output from –margins- the predicting xb, instead of the default –the predicted probability- even after changing the command name. So, how we can go further clear to address my specific research questions in order to arrive the right conclusion?

          Thank you so much in advance for your time and supportive responses.


          Comment


          • #6
            This is very strange, and I don't know what to advise you here. On my setup (Windows 10, Stata 16.1 MP) when I run -margins categorical_var, dydx(continuous_var)- after -melogit-, I get predictions in the probability metric. If I use -meqrlogit- instead, then I get them in the -xb- metric as you do. (Also if I try to force the issue after -meqrlogit- by specifying -predict(mu)- in the -margins- command, Stata gives me an error message and refuses.) Either way, of course, I get the same regression results.

            Code:
            . webuse bangladesh, clear
            (Bangladesh Fertility Survey, 1989)
            
            . meqrlogit c_use i.urban##c.age || district:
            
            Refining starting values:
            
            Iteration 0:   log likelihood = -1262.9099 
            Iteration 1:   log likelihood = -1251.6486 
            Iteration 2:   log likelihood =  -1249.928 
            
            Performing gradient-based optimization:
            
            Iteration 0:   log likelihood =  -1249.928 
            Iteration 1:   log likelihood =  -1249.846 
            Iteration 2:   log likelihood = -1249.8458 
            Iteration 3:   log likelihood = -1249.8458 
            
            Mixed-effects logistic regression               Number of obs     =      1,934
            Group variable: district                        Number of groups  =         60
            
                                                            Obs per group:
                                                                          min =          2
                                                                          avg =       32.2
                                                                          max =        118
            
            Integration points =   7                        Wald chi2(3)      =      34.65
            Log likelihood = -1249.8458                     Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                   c_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   urban |
                  urban  |   .6532636    .115637     5.65   0.000     .4266193    .8799079
                     age |    .011255   .0063792     1.76   0.078    -.0012481     .023758
                         |
             urban#c.age |
                  urban  |  -.0078829   .0119736    -0.66   0.510    -.0313508    .0155849
                         |
                   _cons |  -.7038013   .0855519    -8.23   0.000    -.8714799   -.5361227
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
              Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
            -----------------------------+------------------------------------------------
            district: Identity           |
                              var(_cons) |   .1939727   .0677661      .0978061     .384694
            ------------------------------------------------------------------------------
            LR test vs. logistic model: chibar2(01) = 38.65       Prob >= chibar2 = 0.0000
            
            . margins urban, dydx(age)
            
            Average marginal effects                        Number of obs     =      1,934
            
            Expression   : Linear prediction, fixed portion, predict(xb)
            dy/dx w.r.t. : age
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            age          |
                   urban |
                  rural  |    .011255   .0063792     1.76   0.078    -.0012481     .023758
                  urban  |    .003372    .010144     0.33   0.740    -.0165099     .023254
            ------------------------------------------------------------------------------
            
            . melogit c_use i.urban##c.age || district:
            
            Fitting fixed-effects model:
            
            Iteration 0:   log likelihood = -1271.1294 
            Iteration 1:   log likelihood = -1269.1728 
            Iteration 2:   log likelihood = -1269.1719 
            Iteration 3:   log likelihood = -1269.1719 
            
            Refining starting values:
            
            Grid node 0:   log likelihood = -1262.9098
            
            Fitting full model:
            
            Iteration 0:   log likelihood = -1262.9098  (not concave)
            Iteration 1:   log likelihood = -1250.2798 
            Iteration 2:   log likelihood = -1249.8506 
            Iteration 3:   log likelihood = -1249.8458 
            Iteration 4:   log likelihood = -1249.8458 
            
            Mixed-effects logistic regression               Number of obs     =      1,934
            Group variable:        district                 Number of groups  =         60
            
                                                            Obs per group:
                                                                          min =          2
                                                                          avg =       32.2
                                                                          max =        118
            
            Integration method: mvaghermite                 Integration pts.  =          7
            
                                                            Wald chi2(3)      =      34.65
            Log likelihood = -1249.8458                     Prob > chi2       =     0.0000
            ------------------------------------------------------------------------------
                   c_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   urban |
                  urban  |   .6532637   .1156369     5.65   0.000     .4266194    .8799079
                     age |    .011255   .0063792     1.76   0.078    -.0012481     .023758
                         |
             urban#c.age |
                  urban  |  -.0078829   .0119736    -0.66   0.510    -.0313508    .0155849
                         |
                   _cons |  -.7038014   .0855518    -8.23   0.000    -.8714799   -.5361229
            -------------+----------------------------------------------------------------
            district     |
               var(_cons)|   .1939725   .0677651                      .0978069      .38469
            ------------------------------------------------------------------------------
            LR test vs. logistic model: chibar2(01) = 38.65       Prob >= chibar2 = 0.0000
            
            . margins urban, dydx(age)
            
            Average marginal effects                        Number of obs     =      1,934
            Model VCE    : OIM
            
            Expression   : Marginal predicted mean, predict()
            dy/dx w.r.t. : age
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            age          |
                   urban |
                  rural  |   .0024118   .0013615     1.77   0.076    -.0002568    .0050804
                  urban  |    .000805   .0024204     0.33   0.739    -.0039389    .0055489
            ------------------------------------------------------------------------------
            It may be that -meqrlogit- does not support estimating the random effects, so that its postestimation commands can't work in the probability metric (mu). But -melogit- does, and as you can see, on my setup, that metric is the default after -melogit-.

            Be that as it may, working in the log odds ratio metric, you are seeing an effect in the negative direction among those with primary or low education. In those with secondary education and above, the direction appears to be positive, but the best estimate is only about half as large, and the standard error is three times as large. So basically you cannot draw any firm conclusions about those with secondary education and above. Moreover, back in the regression output, the interaction "odds ratio" (it's actually a ratio of odds ratios) is very large at 1.5, which would make me reluctant to abandon the interaction model in most circumstances. But the standard error is also enormous, so that it is really hard to even say whether the interaction model is needed or not. The combination of these says either that your outcome is very badly measured in the secondary and higher education group or that the number of people in this category is just too small. Either way your data just don't have enough information for you to say anything meaningful about the higher education group.

            Comment


            • #7
              Dear Clyde Schechter,
              Thank you so much again. You are right, when I re-run –margins –after –melogit-, I get predictions in the probability metric I think (the default metric after –melogit-); here, is the regression results:


              Code:
              [IMG]file:///C:/Users/SKO/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png[/IMG]

              Code:
              [IMG]file:///C:/Users/SKO/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png[/IMG]
              So, based on these output, could we say any new information on the need for interaction model and the interpretations of the findings in general, to draw conclusion on secondary education & above subgroup?
              Also, for the final feedback that ‘the outcome is very badly measured in the secondary and above education subgroup or that the number of people in this category is just too small’ – I run the Chi-square (X2) test to figure out the relationship of the outcome variable with mother education level and found the following output:
              [IMG]file:///C:/Users/SKO/AppData/Local/Temp/msohtmlclip1/01/clip_image006.png[/IMG]
              In general, my research interest is to assess whether IPV is associated with maternal health outcomes, as well as whether education level exacerbates any observed associations. Then methodologically, the models will be stratified by level of education to determine whether observed associations will be stronger for low education versus high education. The interaction model is just needed to test for an interaction between IPV and education. So, based on the information we had so far, I strongly need your concluding advice to carry out this data analysis into meaningful scientific outcome.
              Thank you so much in advance for your tireless effort to share your expertise.
              Best regards,

              Comment


              • #8
                Dear Clyde Schechter,
                Thank you so much again. You are right, when I re-run –margins –after –melogit-, I get predictions in the probability metric I think (the default metric after –melogit-); here, is the regression results:
                use "C:\Users\SKO\Documents\ALL.doc\DHS\EDHS Data Pool\Working dataset\MasterFile\Master File\Working file births5Yrs\Extraction Proc> ess\Paper 3\IPV+MH\Work.dataFile\Extract 5\26.07.20.dta"
                Code:
                . melogit anc_adequacy i.emotional_viol1##i.educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog
                Mixed-effects logistic regression Number of obs = 2,863
                Group variable: psu Number of groups = 618
                Obs per group:
                min = 1
                avg = 4.6
                max = 12
                Integration method: mvaghermite Integration pts. = 7
                Wald chi2(17) = 263.01
                Log likelihood = -1571.0275 Prob > chi2 = 0.0000
                ------------------------------------------------------------------------------------------------
                anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
                -------------------------------+----------------------------------------------------------------
                emotional_viol1 |
                Yes | .7462035 .0959571 -2.28 0.023 .57996 .9601
                |
                educ_mom |
                Secondary and above | 1.529016 .3064539 2.12 0.034 1.032309 2.264721
                |
                emotional_viol1#educ_mom |
                Yes#Secondary and above | 1.547658 .6512486 1.04 0.299 .6784109 3.530672
                |
                Code:
                . margins educ_mom#emotional_viol1
                Predictive margins Number of obs = 2,863
                Model VCE : OIM
                Expression : Marginal predicted mean, predict()
                ----------------------------------------------------------------------------------------------
                | Delta-method
                | Margin Std. Err. z P>|z| [95% Conf. Interval]
                -----------------------------+----------------------------------------------------------------
                educ_mom#emotional_viol1 |
                Primary or no education#No | .3496119 .0124207 28.15 0.000 .3252678 .3739561
                Primary or no education#Yes | .301941 .0191592 15.76 0.000 .2643896 .3394923
                Secondary and above#No | .423862 .034634 12.24 0.000 .3559806 .4917434
                Secondary and above#Yes | .4500238 .0696351 6.46 0.000 .3135414 .5865062
                ----------------------------------------------------------------------------------------------
                Code:
                . margins educ_mom , dydx(emotional_viol1)
                Average marginal effects Number of obs = 2,863
                Model VCE : OIM
                Expression : Marginal predicted mean, predict()
                dy/dx w.r.t. : 1.emotional_viol1
                ------------------------------------------------------------------------------------------
                | Delta-method
                | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
                -------------------------+----------------------------------------------------------------
                0.emotional_viol1 | (base outcome)
                -------------------------+----------------------------------------------------------------
                1.emotional_viol1 |
                educ_mom |
                Primary or no education | -.047671 .0204596 -2.33 0.020 -.0877711 -.0075708
                Secondary and above | .0261618 .0733369 0.36 0.721 -.117576 .1698996
                ------------------------------------------------------------------------------------------
                Note: dy/dx for factor levels is the discrete change from the base level.

                So, based on these output, could we say any new information on the need for interaction model and the interpretations of the findings in general, to draw conclusion on secondary education & above subgroup?
                Also, for the final feedback that ‘the outcome is very badly measured in the secondary and above education subgroup or that the number of people in this category is just too small’ – I run the Chi-square (X2) test to figure out the relationship of the outcome variable with mother education level and found the following output:

                Code:
                svy: tab anc_adequacy educ_mom , count format (%4.0f)
                (running tabulate on estimation sample)
                Number of strata = 25 Number of obs = 3,061
                Number of PSUs = 626 Population size = 2,836.1442
                Design df = 601
                ----------------------------------------
                Adequate |
                ANC |
                Utilizati | Education level of the women
                on | Primary Secondar Total
                ----------+-----------------------------
                Inadequa | 1877 103 1980
                Adequate | 700 157 856
                |
                Total | 2577 259 2836
                ----------------------------------------
                Key: weighted count
                Pearson:
                Uncorrected chi2(1) = 133.7477
                Design-based F(1, 601) = 57.1672 P = 0.0000

                In general, my research interest is to assess whether IPV is associated with maternal health outcomes, as well as whether education level exacerbates any observed associations. Then methodologically, the models will be stratified by level of education to determine whether observed associations will be stronger for low education versus high education. The interaction model is just needed to test for an interaction between IPV and education. So, based on the information we had so far, I strongly need your concluding advice to carry out this data analysis into meaningful scientific outcome.
                Thank you so much in advance for your tireless effort to share your expertise.
                Best regards,

                Comment


                • #9
                  This is not my field of science, so the judgment of what is meaningful rests with you, or if you do not feel confident making such judgments, you need to consult literature or colleagues in this area.

                  From a statistical perspective (which is of some value in deciding what is meaningful, but is not enough by itself) we can say the following:

                  The IPV variable makes a difference in ANC probability of approximately 3 percentage points increase in the secondary or higher education group, and about 5 percentage points lower in the primary education group. The 95% CI for the primary education group clearly excludes the result for the secondary or higher group, but not the other way around. Again, the confidence interval on the marginal effect of IPV in the secondary education group is very, very wide: the data simply don't tell us much about it.

                  So, it turns out we are left in almost the same position as before. But not exactly. In the primary education group, the effect of IPV is modest, but measured to fairly good precision. So we can say that this effect is modest, and negative. Whether roughly 5 percentage points is meaningful is a substantive, not a statistical, judgment that I leave to you.

                  In the secondary education group, your tabulation shows that you have only 259 observations, and that is certainly a good explanation why the results for this subgroup are so vague. The width of that confidence interval for the marginal effect, from a 12 percentage point decrease to an 18 percentage point increase, is breathtaking! For this secondary education group, we really can say nothing more than that the data are inconclusive about the effect of IPV.

                  Given the pretty large discrepancy in marginal effect of IPV between the two education groups, a bit more than 7 percentage points in an outcome that averages around 35-45% overall, I would say that the interaction model is probably needed. That is also given some support by the fact that the marginal effect in the secondary education group lies far outside the confidence interval for the primary group. Again, though, you need to decide what is large enough to be meaningful in a substantive way: if a 7 percentage point difference in marginal effect is of no practical importance, then the statistics don't matter--it's just not meaningful.

                  Another approach you might consider, if the inconclusive nature of the secondary education results leaves you uncomfortable, is to simply drop the data on secondary or higher education and redo the analysis with only the primary education group. You can then report the results of that and add a statement that your attempt to gather data on people with secondary education or higher yielded too few people for a meaningful analysis. You would also have to then acknowledge that any conclusions you draw apply only to those with primary education or less.
                  Last edited by Clyde Schechter; 24 Aug 2020, 12:25.

                  Comment


                  • #10
                    Dear Clyde Schechter,
                    I want to thank you for taking the time to answer my questions. I am sure that you are busy, and so I greatly appreciate your personal response. Thank you so much again.
                    Best,

                    Comment


                    • #11
                      Dear Clyde Schechter,
                      Sorry, I still don't understand what that means by: The 95% CI for the primary education group clearly excludes the result for the secondary or higher group, but not the other way around. Could you please, explain me what it means based on the outputs? Thank you so much in advance for your patience.
                      Best,

                      Comment


                      • #12
                        So, the 95% CI for the primary education group is -.088 to -.008 (rounded to three decimal places), and the result for the secondary group is +.026. .026 is clearly not between -.088 and -.008.

                        But the 95% CI for the secondary education group is -.118 to +.170, and the result for the primary group is -.048. And -.048 does fall between -.118 and +.170.

                        Comment


                        • #13
                          Dear Clyde Schechter,
                          I really highly appreciate your patience for the responses.

                          Comment

                          Working...
                          X