Main effects significance and Interaction terms using logistic regression in stat 16

SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#1

Main effects significance and Interaction terms using logistic regression in stat 16

19 Aug 2020, 07:26

Dear all users,
I am interested to examine the effect of emotional IPV on ANC visits differ by women educational status or not.

Outcome variable: (adequate ANC service (1) or inadequate ANC service (0))
Main exposures: spousal emotional IPV (Yes/No or 1/0)
Moderators: (Education status) - Lower education (1) & Higher education (2).
Hypothesis 1:
The effect of emotional IPV on adequate ANC services will be moderated by education and wealth.

I have fitted the model using the following commands:
xtmelogit anc_adequacy i.emotional_viol1##1.educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.dma i.birth_order i.V102 i.contextual_regions || psu :,or nolog
Output:
Mixed-effects logistic regression Number of obs = 2,863
Group variable: psu Number of groups = 618
Obs per group:
min = 1
avg = 4.6
max = 12
Integration points = 7 Wald chi2(17) = 263.01
Log likelihood = -1571.0285 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------------------
anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
emotional_viol1 |
Yes | 1.154874 .4641096 0.36 0.720 .525366 2.538676
|
educ_mom |
Primary or no education | .6540172 .13108 -2.12 0.034 .4415592 .9687003
|
emotional_viol1#educ_mom |
Yes#Primary or no education | .6461428 .2718937 -1.04 0.299 .2832351 1.474042

Note: results of all other covariates excluded

Therefore, there is no main effect of emotional violence (p = 0.720) on ANC and statistical insignificant interaction on this model between emotional IPV and low education (p = 0.299), while adjusted for all covariate. But using these same variables the model fitted without interaction terms found that the effect of emotional IPV (p= 0.021) on ANC visits depends on women’s low education. Here, the commands and the finding using stat 16:
. xtmelogit anc_adequacy emotional_viol1 i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextu al_regions if educ_mom==1 || psu :,or nolog
d-effects logistic regression Number of obs = 2,548
Group variable: psu Number of groups = 580

Obs per group:
min = 1
avg = 4.4
max = 12

Integration points = 7 Wald chi2(15) = 160.53
Log likelihood = -1384.4623 Prob > chi2 = 0.0000

------------------------------------------------------------------------------------------------
anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
emotional_viol1 | .7387078 .0969917 -2.31 0.021 .571098 .9555089
|
age_catgorey |
25 – 34 | 1.433873 .2368658 2.18 0.029 1.037285 1.982089
35 – 49 | 1.446263 .3013516 1.77 0.077 .9613604 2.175748

My question is why these two models finding vary? Is there any mistakes I have made in the commands to fit the models? If not, both the main effects and interaction terms are insignificant, why the stratified analyses become sig. for the main effect emotional IPV (p = 0.021)? Why the number of participants are different (2863 vs. 2548) in the two models?
Thank you so much in advance for your quick responses.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#2

19 Aug 2020, 10:46

The main problem is that you don't understand how interaction models work, and you don't understand the meaning of statistical significance either. I also have some questions about the appropriateness of the modeling here, but I'll defer that for later. Let's clear up the misunderstandings of what you've done first.

In an interaction model, by definition, there is no such thing as the effect of IPV on ANC adequacy. Rather, there are two different effects of IPV on ANC: one when educ_mom = 1 and another when educ_mom = 0.

In the regression output the coefficient (odds ratio) of emotional_viol1 (which I'm guessing is the variable name corresponding to IPV) represents the effect of emotional_viol1 on ANC when educ_mom = 0. The coefficient of emotional_viol1#educ_mom represents the difference (actually, ratio of odds ratios) between the effect of emotioinal_viol1 on ANC when educ_mom = 1 compared to the effect when educ_mom = 0. The effect (odds ratio) of emotional_viol1 on ANC when educ_mom = 1, itself, does not appear in the output and must be calculated separately. Rather than doing that calculation, it is easier to follow up the interaction model with the -margins- command:

Code:

margins educ_mom#emotional_viol1 margins educ_mom, dydx(emotional_viol1)

gives you results that are easier to understand. The output from the first of those -margins- commands will be the predicted probability of ANC in each of the four combinations of values of educ_mom and emotional_viol1. The second one gives you the separate marginal effects (as a difference in probabilities, not in the odds ratio metric) of emotional_viol1 on ANC for each of the values of educ_mom. This will not only be easier for you to understand, it will also be much easier to explain to anyone who needs to see your results.

In your non-interaction model, the coefficient (odds ratio) of emotional_viol1 represents an overall effect of emotional_viol1 on ANC that does not take educ_mom into account. You might think of it as an average effect, in a loose sort of way. It is important to note that this average effect draws on the entire estimation sample, whereas the separate effects calculated from the interaction model are each relying primarily on a subset of the sample (of which at least one is half the sample size or less). So it is not appropriate to ponder the "significance" of one in comparison to the "significance" of the other. It is appropriate to compare the odds ratios themselves directly if you like, without regard to p-values.

Now, as for statistical significance, a non-statistically-significant result does not mean that there is no effect. It means only that the combination of the actual effect, the sample size, and the noisiness of the data are such that you cannot estimate the effect precisely enough to determine whether it is positive, negative or zero to the desired degree of confidence. Even when properly interpreted, however, the concept of statistical significance is a slippery one, and it does not obey the usual laws of logic, leading people to waste time puzzling over paradoxical changes in the "significance" of results when the modeling is slightly changed. Few people remember that the difference between statistically significant and not statistically significant is not, itself, statistically significant. These are among the reasons that the American Statistical Society recommends that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.

Now let's turn to some questions I have about your modeling. You do not explain the variable psu: which you use as a second level in your model. But that name is suggestive of a primary sampling unit in a survey design. In fact, I cannot recall ever seeing that name used in any other context, though perhaps this is the first time. Now, it may be appropriate to treat the primary sampling unit of a survey as a level in the model--but often it is not. That depends on how the psu's were arrived at. If the psu's are, say, households, and the outcome variable is strongly SES dependent, then, yes, we would expect intra-household correlation, and use of psu as a level would make sense. But if the psu's are telephone exchanges (as in a random digit dialing survey), then for most outcomes, there is little reason to think people with the same phone exchange will be more similar in their outcomes than those from other phone exchanges. So the use of the psu as a second level would not make sense. So you need to think about this.

The other thing I will say is that if, as I suspect, you have a complex survey design here, then you need to use that design properly. That means using the -svyset- command to set out sampling weights, primary (and possibly higher order) sampling units, and stratification in order to get proper results. And then you have to use the -svy:- prefix with your -melogit- command. (-xtmelogit- is an old name; use -melogit- unless you are working on a pretty old version of Stata, like 13 or earlier.)
2 likes
Comment
SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#3

20 Aug 2020, 05:31

Dear Clyde Schechter,
Thank you so much for your help really, it’s highly useful and informative particularly the confusion I had how interaction model work and the p-values issue, including the links about statistical sign of the p-values.
I have fitted the model again based on your feedback:
Command :
xtmelogit anc_adequacy i.emotional_viol1##educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog

Output:
Mixed-effects logistic regression Number of obs = 2,863
Group variable: psu Number of groups = 618
Obs per group:
min = 1
avg = 4.6
max = 12
Integration points = 7 Wald chi2(17) = 263.01
Log likelihood = -1571.0285 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------------------
anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
emotional_viol1 |
Yes | .7462136 .0959586 -2.28 0.023 .5799676 .9601134
|
educ_mom |
Secondary and above | 1.529012 .306449 2.12 0.034 1.032311 2.264702
|
emotional_viol1#educ_mom |
Yes#Secondary and above | 1.547645 .6512417 1.04 0.299 .6784059 3.530636

Here, the results of other covariates excluded for simplicity. Using the margins command, also found the interaction model with the following results:

Command: margins educ_mom#emotional_viol1

Output:
Predictive margins Number of obs = 2,863
Expression : Linear prediction, fixed portion, predict(xb)
----------------------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
educ_mom#emotional_viol1 |
Primary or no education#No | -.8091921 .074371 -10.88 0.000 -.9549565 -.6634277
Primary or no education#Yes | -1.101936 .1238117 -8.90 0.000 -1.344602 -.859269
Secondary and above#No | -.3845704 .1915795 -2.01 0.045 -.7600593 -.0090815
Secondary and above#Yes | -.2405796 .3801416 -0.63 0.527 -.9856433 .5044842
----------------------------------------------------------------------------------------------

Command: margins educ_mom, dydx(emotional_viol1)

Output:
Average marginal effects Number of obs = 2,863
Expression : Linear prediction, fixed portion, predict(xb)
dy/dx w.r.t. : 1.emotional_viol1
------------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
0.emotional_viol1 | (base outcome)
-------------------------+----------------------------------------------------------------
1.emotional_viol1 |
educ_mom |
Primary or no education | -.2927434 .128594 -2.28 0.023 -.544783 -.0407039
Secondary and above | .1439908 .4018706 0.36 0.720 -.643661 .9316426
------------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

According to the above results, particularly based on the separate marginal effects model, can I say that the effects of emotional IPV on ANC when mom-educ is primary or none, statistically significant but non-significant for secondary or above education? Or simply put can I say that the effect of emotional IPV on ANC differs by mothers’ education status, even if the interaction term is insignificant (emotional_viol1#educ_mom, p = 0.299).

About the modelling (the use of primary sampling unit (PSU)) in my case, I am using the 2016 Ethiopian Demographic and Health survey (EDHS) dataset, where most DHS surveys use a fixed take of households per cluster, determining the number of clusters to be selected. In the first stage of selection, the primary sampling units (PSUs) are selected with probability proportional to size (PPS) within each stratum.

Regarding, the survey design, yes I have a complex survey design that need to use a complex design properly. As you mentioned I tried to use -svyset- command with -melogit- command:
Command:
svy: melogit anc_adequacy i.emotional_viol1##educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog

but gave me the following error term:

survey final weights not allowed with multilevel models;
a final weight variable was svyset using the [pw=exp] syntax, but multilevel models require that each stage-level weight variable
is svyset using the stage's corresponding weight() option
an error occurred when svy executed melogit.

So, how I can fix these problem in order to get proper results? I am using Stata 16 software.
Thank you so much in advance for your usually supportive responses.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#4

20 Aug 2020, 12:03

So let's deal with the svy: issue first. You need to consult the documentation that came with your survey data to find out what the sampling weights are at the level of the variable psu, so you can specify them in your -svyset- command as well. This assumes that psu is really an appropriate level in the model--which I am skeptical of because that is rarely the case. It is more likely that you should be using -svy: logit- rather than -xtmelogit … || psu:-. But check the survey documentation (or contact the curators of the survey).

I am puzzled by the outputs you are getting from -margins-, not for the reasons you are, but because Stata is giving you the results in the log-odds metric (see where it says it's predicting xb). The default output of -margins- should be the predicted probability. I think the reason you are getting that is because you are still using the incorrect name of the command. -xtmelogit- is no longer a current command name in Stata. If you read the -help xtmelogit- file you will see that that name is no longer in use and that command is now called -meqrlogit-. Now, even -meqrlogit- still has the predicted probability as the default output from -margins-. But if we go back a few versions, -xtmelogit- may have had -xb- as the default -margins- output. So I suspect your use of the obsolete command name is confusing Stata. Re-run it all using either -melogit- or -meqrlogit- as the command name. (They are two different algorithms for estimating the same model. There is no particular reason to prefer one over the other, but either may produces results when the other fails to converge.) Anyway, your results will be much more understandable in the probability metric and then we can discuss what conclusions to draw. But before doing even this, resolve the question of whether a multi-level model is even appropriate at all, rather than -svy: logit-. I don't think either of us should invest a lot of time and effort into interpreting results of a model that may well be inappropriate in the first place. So let's get that question settled first.
Comment
SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#5

21 Aug 2020, 03:07

Dear Clyde Schechter,
Thank you so much again for endless help,
Regarding the question that the multilevel model appropriateness of my data, the data is clustered at the survey level. Individuals are nested within families, which are in turn nested within communities which are in turn influenced by policies and a host of other factors. Methodologically, it is important to take this nested structure into account. This demands the use of multilevel modelling, which would calculate the standard errors more accurately and reduce the chance of misestimating the significance of variables, as some of the assumptions inherent in traditional regression methods are not valid for nested data.
Also, I re-run the model using both –melogit- and –meqrlogit- command and found the same output:
Command:
meqrlogit anc_adequacy i.emotional_viol1##i.educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog
output:
Mixed-effects logistic regression Number of obs = 2,863
Group variable: psu Number of groups = 618
Obs per group:
min = 1
avg = 4.6
max = 12

Integration points = 7 Wald chi2(17) = 263.01
Log likelihood = -1571.0285 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------------------
anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
emotional_viol1 |
Yes | .7462136 .0959586 -2.28 0.023 .5799676 .9601134
|
educ_mom |
Secondary and above | 1.529012 .306449 2.12 0.034 1.032311 2.264702
|
emotional_viol1#educ_mom |
Yes#Secondary and above | 1.547645 .6512417 1.04 0.299 .6784059 3.530636
|
age_catgorey |

Command:
. margins educ_mom#emotional_viol1
Predictive margins Number of obs = 2,863
Expression : Linear prediction, fixed portion, predict(xb)
----------------------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
educ_mom#emotional_viol1 |
Primary or no education#No | -.8091921 .074371 -10.88 0.000 -.9549565 -.6634277
Primary or no education#Yes | -1.101936 .1238117 -8.90 0.000 -1.344602 -.859269
Secondary and above#No | -.3845704 .1915795 -2.01 0.045 -.7600593 -.0090815

Secondary and above#Yes | -.2405796 .3801416 -0.63 0.527 -.9856433 .5044842

Command:
margins educ_mom, dydx(emotional_viol1)
Average marginal effects Number of obs = 2,863
Expression : Linear prediction, fixed portion, predict(xb)
dy/dx w.r.t. : 1.emotional_viol1
------------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
0.emotional_viol1 | (base outcome)
-------------------------+----------------------------------------------------------------
1.emotional_viol1 |
educ_mom |
Primary or no education | -.2927434 .128594 -2.28 0.023 -.544783 -.0407039
Secondary and above | .1439908 .4018706 0.36 0.720 -.643661 .9316426
------------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Again, we can see that the output from –margins- the predicting xb, instead of the default –the predicted probability- even after changing the command name. So, how we can go further clear to address my specific research questions in order to arrive the right conclusion?

Thank you so much in advance for your time and supportive responses.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30357

21 Aug 2020, 12:19

This is very strange, and I don't know what to advise you here. On my setup (Windows 10, Stata 16.1 MP) when I run -margins categorical_var, dydx(continuous_var)- after -melogit-, I get predictions in the probability metric. If I use -meqrlogit- instead, then I get them in the -xb- metric as you do. (Also if I try to force the issue after -meqrlogit- by specifying -predict(mu)- in the -margins- command, Stata gives me an error message and refuses.) Either way, of course, I get the same regression results.

Code:

. webuse bangladesh, clear
(Bangladesh Fertility Survey, 1989)

. meqrlogit c_use i.urban##c.age || district:

Refining starting values:

Iteration 0:   log likelihood = -1262.9099 
Iteration 1:   log likelihood = -1251.6486 
Iteration 2:   log likelihood =  -1249.928 

Performing gradient-based optimization:

Iteration 0:   log likelihood =  -1249.928 
Iteration 1:   log likelihood =  -1249.846 
Iteration 2:   log likelihood = -1249.8458 
Iteration 3:   log likelihood = -1249.8458 

Mixed-effects logistic regression               Number of obs     =      1,934
Group variable: district                        Number of groups  =         60

                                                Obs per group:
                                                              min =          2
                                                              avg =       32.2
                                                              max =        118

Integration points =   7                        Wald chi2(3)      =      34.65
Log likelihood = -1249.8458                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
       c_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       urban |
      urban  |   .6532636    .115637     5.65   0.000     .4266193    .8799079
         age |    .011255   .0063792     1.76   0.078    -.0012481     .023758
             |
 urban#c.age |
      urban  |  -.0078829   .0119736    -0.66   0.510    -.0313508    .0155849
             |
       _cons |  -.7038013   .0855519    -8.23   0.000    -.8714799   -.5361227
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
district: Identity           |
                  var(_cons) |   .1939727   .0677661      .0978061     .384694
------------------------------------------------------------------------------
LR test vs. logistic model: chibar2(01) = 38.65       Prob >= chibar2 = 0.0000

. margins urban, dydx(age)

Average marginal effects                        Number of obs     =      1,934

Expression   : Linear prediction, fixed portion, predict(xb)
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age          |
       urban |
      rural  |    .011255   .0063792     1.76   0.078    -.0012481     .023758
      urban  |    .003372    .010144     0.33   0.740    -.0165099     .023254
------------------------------------------------------------------------------

. melogit c_use i.urban##c.age || district:

Fitting fixed-effects model:

Iteration 0:   log likelihood = -1271.1294 
Iteration 1:   log likelihood = -1269.1728 
Iteration 2:   log likelihood = -1269.1719 
Iteration 3:   log likelihood = -1269.1719 

Refining starting values:

Grid node 0:   log likelihood = -1262.9098

Fitting full model:

Iteration 0:   log likelihood = -1262.9098  (not concave)
Iteration 1:   log likelihood = -1250.2798 
Iteration 2:   log likelihood = -1249.8506 
Iteration 3:   log likelihood = -1249.8458 
Iteration 4:   log likelihood = -1249.8458 

Mixed-effects logistic regression               Number of obs     =      1,934
Group variable:        district                 Number of groups  =         60

                                                Obs per group:
                                                              min =          2
                                                              avg =       32.2
                                                              max =        118

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(3)      =      34.65
Log likelihood = -1249.8458                     Prob > chi2       =     0.0000
------------------------------------------------------------------------------
       c_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       urban |
      urban  |   .6532637   .1156369     5.65   0.000     .4266194    .8799079
         age |    .011255   .0063792     1.76   0.078    -.0012481     .023758
             |
 urban#c.age |
      urban  |  -.0078829   .0119736    -0.66   0.510    -.0313508    .0155849
             |
       _cons |  -.7038014   .0855518    -8.23   0.000    -.8714799   -.5361229
-------------+----------------------------------------------------------------
district     |
   var(_cons)|   .1939725   .0677651                      .0978069      .38469
------------------------------------------------------------------------------
LR test vs. logistic model: chibar2(01) = 38.65       Prob >= chibar2 = 0.0000

. margins urban, dydx(age)

Average marginal effects                        Number of obs     =      1,934
Model VCE    : OIM

Expression   : Marginal predicted mean, predict()
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age          |
       urban |
      rural  |   .0024118   .0013615     1.77   0.076    -.0002568    .0050804
      urban  |    .000805   .0024204     0.33   0.739    -.0039389    .0055489
------------------------------------------------------------------------------

It may be that -meqrlogit- does not support estimating the random effects, so that its postestimation commands can't work in the probability metric (mu). But -melogit- does, and as you can see, on my setup, that metric is the default after -melogit-.

Be that as it may, working in the log odds ratio metric, you are seeing an effect in the negative direction among those with primary or low education. In those with secondary education and above, the direction appears to be positive, but the best estimate is only about half as large, and the standard error is three times as large. So basically you cannot draw any firm conclusions about those with secondary education and above. Moreover, back in the regression output, the interaction "odds ratio" (it's actually a ratio of odds ratios) is very large at 1.5, which would make me reluctant to abandon the interaction model in most circumstances. But the standard error is also enormous, so that it is really hard to even say whether the interaction model is needed or not. The combination of these says either that your outcome is very badly measured in the secondary and higher education group or that the number of people in this category is just too small. Either way your data just don't have enough information for you to say anything meaningful about the higher education group.

Comment

SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#7

24 Aug 2020, 01:54

Dear Clyde Schechter,
Thank you so much again. You are right, when I re-run –margins –after –melogit-, I get predictions in the probability metric I think (the default metric after –melogit-); here, is the regression results:

Code:
[IMG]file:///C:/Users/SKO/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png[/IMG]

Code:
[IMG]file:///C:/Users/SKO/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png[/IMG]
So, based on these output, could we say any new information on the need for interaction model and the interpretations of the findings in general, to draw conclusion on secondary education & above subgroup?
Also, for the final feedback that ‘the outcome is very badly measured in the secondary and above education subgroup or that the number of people in this category is just too small’ – I run the Chi-square (X²) test to figure out the relationship of the outcome variable with mother education level and found the following output:
[IMG]file:///C:/Users/SKO/AppData/Local/Temp/msohtmlclip1/01/clip_image006.png[/IMG]
In general, my research interest is to assess whether IPV is associated with maternal health outcomes, as well as whether education level exacerbates any observed associations. Then methodologically, the models will be stratified by level of education to determine whether observed associations will be stronger for low education versus high education. The interaction model is just needed to test for an interaction between IPV and education. So, based on the information we had so far, I strongly need your concluding advice to carry out this data analysis into meaningful scientific outcome.
Thank you so much in advance for your tireless effort to share your expertise.
Best regards,
Comment
SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#8

24 Aug 2020, 02:14

Dear Clyde Schechter,
Thank you so much again. You are right, when I re-run –margins –after –melogit-, I get predictions in the probability metric I think (the default metric after –melogit-); here, is the regression results:
use "C:\Users\SKO\Documents\ALL.doc\DHS\EDHS Data Pool\Working dataset\MasterFile\Master File\Working file births5Yrs\Extraction Proc> ess\Paper 3\IPV+MH\Work.dataFile\Extract 5\26.07.20.dta"
Code:
. melogit anc_adequacy i.emotional_viol1##i.educ_mom i.age_catgorey i.husband_educ i.wealth_hh i.mediae_expo i.birth_order i.dma i.V102 i.contextual_regions || psu :,or nolog
Mixed-effects logistic regression Number of obs = 2,863
Group variable: psu Number of groups = 618
Obs per group:
min = 1
avg = 4.6
max = 12
Integration method: mvaghermite Integration pts. = 7
Wald chi2(17) = 263.01
Log likelihood = -1571.0275 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------------------
anc_adequacy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
emotional_viol1 |
Yes | .7462035 .0959571 -2.28 0.023 .57996 .9601
|
educ_mom |
Secondary and above | 1.529016 .3064539 2.12 0.034 1.032309 2.264721
|
emotional_viol1#educ_mom |
Yes#Secondary and above | 1.547658 .6512486 1.04 0.299 .6784109 3.530672
|
Code:
. margins educ_mom#emotional_viol1
Predictive margins Number of obs = 2,863
Model VCE : OIM
Expression : Marginal predicted mean, predict()
----------------------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
educ_mom#emotional_viol1 |
Primary or no education#No | .3496119 .0124207 28.15 0.000 .3252678 .3739561
Primary or no education#Yes | .301941 .0191592 15.76 0.000 .2643896 .3394923
Secondary and above#No | .423862 .034634 12.24 0.000 .3559806 .4917434
Secondary and above#Yes | .4500238 .0696351 6.46 0.000 .3135414 .5865062
----------------------------------------------------------------------------------------------
Code:
. margins educ_mom , dydx(emotional_viol1)
Average marginal effects Number of obs = 2,863
Model VCE : OIM
Expression : Marginal predicted mean, predict()
dy/dx w.r.t. : 1.emotional_viol1
------------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
0.emotional_viol1 | (base outcome)
-------------------------+----------------------------------------------------------------
1.emotional_viol1 |
educ_mom |
Primary or no education | -.047671 .0204596 -2.33 0.020 -.0877711 -.0075708
Secondary and above | .0261618 .0733369 0.36 0.721 -.117576 .1698996
------------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

So, based on these output, could we say any new information on the need for interaction model and the interpretations of the findings in general, to draw conclusion on secondary education & above subgroup?
Also, for the final feedback that ‘the outcome is very badly measured in the secondary and above education subgroup or that the number of people in this category is just too small’ – I run the Chi-square (X²) test to figure out the relationship of the outcome variable with mother education level and found the following output:

Code:
svy: tab anc_adequacy educ_mom , count format (%4.0f)
(running tabulate on estimation sample)
Number of strata = 25 Number of obs = 3,061
Number of PSUs = 626 Population size = 2,836.1442
Design df = 601
----------------------------------------
Adequate |
ANC |
Utilizati | Education level of the women
on | Primary Secondar Total
----------+-----------------------------
Inadequa | 1877 103 1980
Adequate | 700 157 856
|
Total | 2577 259 2836
----------------------------------------
Key: weighted count
Pearson:
Uncorrected chi2(1) = 133.7477
Design-based F(1, 601) = 57.1672 P = 0.0000

In general, my research interest is to assess whether IPV is associated with maternal health outcomes, as well as whether education level exacerbates any observed associations. Then methodologically, the models will be stratified by level of education to determine whether observed associations will be stronger for low education versus high education. The interaction model is just needed to test for an interaction between IPV and education. So, based on the information we had so far, I strongly need your concluding advice to carry out this data analysis into meaningful scientific outcome.
Thank you so much in advance for your tireless effort to share your expertise.
Best regards,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#9

24 Aug 2020, 12:18

This is not my field of science, so the judgment of what is meaningful rests with you, or if you do not feel confident making such judgments, you need to consult literature or colleagues in this area.

From a statistical perspective (which is of some value in deciding what is meaningful, but is not enough by itself) we can say the following:

The IPV variable makes a difference in ANC probability of approximately 3 percentage points increase in the secondary or higher education group, and about 5 percentage points lower in the primary education group. The 95% CI for the primary education group clearly excludes the result for the secondary or higher group, but not the other way around. Again, the confidence interval on the marginal effect of IPV in the secondary education group is very, very wide: the data simply don't tell us much about it.

So, it turns out we are left in almost the same position as before. But not exactly. In the primary education group, the effect of IPV is modest, but measured to fairly good precision. So we can say that this effect is modest, and negative. Whether roughly 5 percentage points is meaningful is a substantive, not a statistical, judgment that I leave to you.

In the secondary education group, your tabulation shows that you have only 259 observations, and that is certainly a good explanation why the results for this subgroup are so vague. The width of that confidence interval for the marginal effect, from a 12 percentage point decrease to an 18 percentage point increase, is breathtaking! For this secondary education group, we really can say nothing more than that the data are inconclusive about the effect of IPV.

Given the pretty large discrepancy in marginal effect of IPV between the two education groups, a bit more than 7 percentage points in an outcome that averages around 35-45% overall, I would say that the interaction model is probably needed. That is also given some support by the fact that the marginal effect in the secondary education group lies far outside the confidence interval for the primary group. Again, though, you need to decide what is large enough to be meaningful in a substantive way: if a 7 percentage point difference in marginal effect is of no practical importance, then the statistics don't matter--it's just not meaningful.

Another approach you might consider, if the inconclusive nature of the secondary education results leaves you uncomfortable, is to simply drop the data on secondary or higher education and redo the analysis with only the primary education group. You can then report the results of that and add a statement that your attempt to gather data on people with secondary education or higher yielded too few people for a meaningful analysis. You would also have to then acknowledge that any conclusions you draw apply only to those with primary education or less.

Last edited by Clyde Schechter; 24 Aug 2020, 12:25.
Comment
SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#10

25 Aug 2020, 02:06

Dear Clyde Schechter,
I want to thank you for taking the time to answer my questions. I am sure that you are busy, and so I greatly appreciate your personal response. Thank you so much again.
Best,
Comment
SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#11

26 Aug 2020, 01:52

Dear Clyde Schechter,
Sorry, I still don't understand what that means by: The 95% CI for the primary education group clearly excludes the result for the secondary or higher group, but not the other way around. Could you please, explain me what it means based on the outputs? Thank you so much in advance for your patience.
Best,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#12

26 Aug 2020, 12:46

So, the 95% CI for the primary education group is -.088 to -.008 (rounded to three decimal places), and the result for the secondary group is +.026. .026 is clearly not between -.088 and -.008.

But the 95% CI for the secondary education group is -.118 to +.170, and the result for the primary group is -.048. And -.048 does fall between -.118 and +.170.
Comment
SEMAN OSMAN

Join Date: Jul 2016

Posts: 29
#13

27 Aug 2020, 04:12

Dear Clyde Schechter,
I really highly appreciate your patience for the responses.
Comment

Announcement