Hello everyone,
I have a question regarding the "correct" specification of the covariance matrix for my two-level multinomial logistic regression model. My journey of trying to solve the question myself via internet and literature research has brought me closer but still left me uncertain about which suggestions are applicable to my specific situation. Of those ressources that i could find, a large amout deals with multilevel linear models and few talk about the application to stata. I therefore wanted to ask for help here, hopying to find someone with experience with multilevel logistic regression models (and/or with a stronger statistical/methodological background than myself) who could share some insight or their approach of model specification.
My problem is the following: I am working with the first round of data from a household survey, therefore relying on cross-sectional data with individuals nested within households. My outcome is "individuals health seeking behavior" and consists of 4 outcome categories (no care / informal care / formal care public / formal care private). I have a sample of 5172 individuals within 4315 clusters, ranging from 1 to 5 individuals per cluster (1.2 on average). I am working with stata 17.0 on Mac. So far I have tried:
- covariance (independent): did not converge (message:"convergence not achieved") after computing for 3 days.
- covariance (exchangeable): converged, with multiple constraints applied to a number independent variables.
- covariance (unstructured): converged, results see below
- covariance (shared): converged, with although with higher AIC and BIC than for unstructured.
To my knowledge, specifying the covariance matrix is partially a theoretical assumption about the underlying datastructure with regard to the research question, while some papers argue to let the choice be guided by the information criterion (AIC, BIC), while other people that i have asked simply responded with:"go with unstructured if your dataset is large enough". I am therefore somewhat confused about how to make the "correct" choice (if a single correct choice exists in that context) and how to defend my choice that i subsequently make.
The code i use is:
model output:
I have a question regarding the "correct" specification of the covariance matrix for my two-level multinomial logistic regression model. My journey of trying to solve the question myself via internet and literature research has brought me closer but still left me uncertain about which suggestions are applicable to my specific situation. Of those ressources that i could find, a large amout deals with multilevel linear models and few talk about the application to stata. I therefore wanted to ask for help here, hopying to find someone with experience with multilevel logistic regression models (and/or with a stronger statistical/methodological background than myself) who could share some insight or their approach of model specification.
My problem is the following: I am working with the first round of data from a household survey, therefore relying on cross-sectional data with individuals nested within households. My outcome is "individuals health seeking behavior" and consists of 4 outcome categories (no care / informal care / formal care public / formal care private). I have a sample of 5172 individuals within 4315 clusters, ranging from 1 to 5 individuals per cluster (1.2 on average). I am working with stata 17.0 on Mac. So far I have tried:
- covariance (independent): did not converge (message:"convergence not achieved") after computing for 3 days.
- covariance (exchangeable): converged, with multiple constraints applied to a number independent variables.
- covariance (unstructured): converged, results see below
- covariance (shared): converged, with although with higher AIC and BIC than for unstructured.
To my knowledge, specifying the covariance matrix is partially a theoretical assumption about the underlying datastructure with regard to the research question, while some papers argue to let the choice be guided by the information criterion (AIC, BIC), while other people that i have asked simply responded with:"go with unstructured if your dataset is large enough". I am therefore somewhat confused about how to make the "correct" choice (if a single correct choice exists in that context) and how to defend my choice that i subsequently make.
The code i use is:
Code:
xtmlogit chronic_facwho2 age i.b1.hh_sizecat2 i.quintile i.state_groups i.b2.relation_hhhead2 i.hhhead_education i.hhhead_sex i.sex2 i.maritalstatus5 i.education4 i.b1.shi2 i.n_chronic2 i.b2.chronicname_new3 i.b3.d_chronic_limit2, baseoutcome (1) re covariance (unstructured) vce (cluster hh_id) rrr
Code:
Random-effects multinomial logistic regression Number of obs = 5,172 Group variable: hh_id Number of groups = 4,315 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 1.2 max = 5 Integration method: mvaghermite Integration pts. = 7 Wald chi2(60) = 240.45 Log pseudolikelihood = -5789.7248 Prob > chi2 = 0.0000 (Std. err. adjusted for 4,315 clusters in hh_id) ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Robust chronic_facwho2 | RRR std. err. z P>|z| [95% conf. interval] ------------------------------------------------------------------------------------------------------+---------------------------------------------------------------- No_care | (base outcome) ------------------------------------------------------------------------------------------------------+---------------------------------------------------------------- Informal_care | age | 1.004728 .0060094 0.79 0.430 .9930183 1.016575 | hh_sizecat2 | 1-5 | .7572747 .1414395 -1.49 0.137 .5251348 1.092034 | quintile | Q2 | 2.369423 .6800568 3.01 0.003 1.350008 4.158619 Q3 | 2.949095 .8708711 3.66 0.000 1.65321 5.260773 Q4 | 2.271779 .6983997 2.67 0.008 1.243616 4.14998 Q5 | 3.910824 1.226452 4.35 0.000 2.115085 7.231175 | state_groups | Non-EAG-states, HAQ-Index above indian average & medium to high ETL | .8402903 .2401396 -0.61 0.543 .479923 1.471252 | relation_hhhead2 | Yes | 1.610165 .3921482 1.96 0.050 .9989973 2.595235 | hhhead_education | Primary School | .8386942 .2175119 -0.68 0.498 .5044847 1.39431 Secondary School and above | 1.36586 .3961937 1.07 0.282 .7735692 2.411646 | hhhead_sex | Female | .7450297 .2368493 -0.93 0.355 .3995505 1.389234 | sex2 | Female | 1.012221 .2532301 0.05 0.961 .6199086 1.652809 | maritalstatus5 | Not married | .8183443 .1777682 -0.92 0.356 .5346016 1.252685 | education4 | Up to primary school | 1.129097 .2915267 0.47 0.638 .6806997 1.872866 Up to secondary school and above | .6908156 .2172718 -1.18 0.240 .3729466 1.27961 | shi2 | No | .8300199 .1664389 -0.93 0.353 .5602762 1.229631 | n_chronic2 | Two or more | .8149306 .3176192 -0.53 0.600 .3796338 1.749349 | chronicname_new3 | CNCDs targeted by prevention&control programmes (CVD, Chronic Respiratory Disease, Cancer, Diabetes) | 2.585228 .4847103 5.07 0.000 1.790215 3.733297 | d_chronic_limit2 | Permanent limitations | .9361159 .2531434 -0.24 0.807 .5509963 1.590415 Temporary limitations | 1.740558 .3253865 2.96 0.003 1.206596 2.510819 | _cons | .0629838 .03323 -5.24 0.000 .0223944 .1771409 ------------------------------------------------------------------------------------------------------+---------------------------------------------------------------- Formal_care_public | age | 1.006161 .0058682 1.05 0.292 .9947251 1.017728 | hh_sizecat2 | 1-5 | 1.229405 .2187377 1.16 0.246 .867457 1.742377 | quintile | Q2 | 1.080035 .2573618 0.32 0.747 .6770253 1.722943 Q3 | .8279183 .2131016 -0.73 0.463 .4999112 1.371141 Q4 | .7043305 .1948473 -1.27 0.205 .4095421 1.211308 Q5 | .6232261 .1815569 -1.62 0.105 .3521076 1.103102 | state_groups | Non-EAG-states, HAQ-Index above indian average & medium to high ETL | 21.40783 7.969517 8.23 0.000 10.32036 44.4069 | relation_hhhead2 | Yes | 1.183487 .2610266 0.76 0.445 .7681123 1.823485 | hhhead_education | Primary School | 1.198744 .277689 0.78 0.434 .7612818 1.887588 Secondary School and above | .7559868 .2075549 -1.02 0.308 .4413857 1.294822 | hhhead_sex | Female | .9298609 .2611343 -0.26 0.796 .5362579 1.612361 | sex2 | Female | 1.240415 .2741327 0.97 0.330 .8043608 1.91286 | maritalstatus5 | Not married | .8978069 .1807932 -0.54 0.592 .605027 1.332266 | education4 | Up to primary school | 1.375035 .3298175 1.33 0.184 .8592976 2.20031 Up to secondary school and above | 1.789872 .5121959 2.03 0.042 1.021502 3.136207 | shi2 | No | .6434247 .1289017 -2.20 0.028 .4344808 .9528507 | n_chronic2 | Two or more | 1.319175 .4732615 0.77 0.440 .6530239 2.664868 | chronicname_new3 | CNCDs targeted by prevention&control programmes (CVD, Chronic Respiratory Disease, Cancer, Diabetes) | 3.308351 .5952606 6.65 0.000 2.325186 4.70723 | d_chronic_limit2 | Permanent limitations | 3.3521 .8139596 4.98 0.000 2.082704 5.395186 Temporary limitations | 2.365113 .4390259 4.64 0.000 1.643793 3.40296 | _cons | .0427349 .0276625 -4.87 0.000 .012017 .1519739 ------------------------------------------------------------------------------------------------------+---------------------------------------------------------------- Formal_care_private | age | .9906028 .0042418 -2.20 0.027 .9823239 .9989515 | hh_sizecat2 | 1-5 | .714373 .0954265 -2.52 0.012 .5498205 .9281735 | quintile | Q2 | 1.468499 .2800816 2.01 0.044 1.010478 2.134127 Q3 | 1.64562 .3247505 2.52 0.012 1.117768 2.422745 Q4 | 2.114456 .4347208 3.64 0.000 1.413176 3.163742 Q5 | 2.360421 .4967251 4.08 0.000 1.562655 3.56546 | state_groups | Non-EAG-states, HAQ-Index above indian average & medium to high ETL | 1.327884 .2546436 1.48 0.139 .9118601 1.933713 | relation_hhhead2 | Yes | 1.203321 .202027 1.10 0.270 .8659086 1.672209 | hhhead_education | Primary School | .9520505 .1691962 -0.28 0.782 .6720265 1.348757 Secondary School and above | 1.335874 .2635809 1.47 0.142 .9074346 1.966599 | hhhead_sex | Female | .8135932 .1772067 -0.95 0.344 .5308958 1.246825 | sex2 | Female | 1.344488 .2258281 1.76 0.078 .9673508 1.868657 | maritalstatus5 | Not married | .778312 .1150856 -1.69 0.090 .5824925 1.039961 | education4 | Up to primary school | .8856513 .161034 -0.67 0.504 .6201441 1.264832 Up to secondary school and above | .975415 .205756 -0.12 0.906 .6451123 1.474835 | shi2 | No | .6761378 .0953955 -2.77 0.006 .5127908 .8915183 | n_chronic2 | Two or more | 1.245042 .3451812 0.79 0.429 .7230906 2.143754 | chronicname_new3 | CNCDs targeted by prevention&control programmes (CVD, Chronic Respiratory Disease, Cancer, Diabetes) | 3.009566 .418389 7.93 0.000 2.291765 3.952189 | d_chronic_limit2 | Permanent limitations | 2.449425 .4507498 4.87 0.000 1.707748 3.513214 Temporary limitations | 1.847875 .2537873 4.47 0.000 1.411785 2.418671 | _cons | 1.831152 .5484203 2.02 0.043 1.018108 3.293478 ------------------------------------------------------------------------------------------------------+---------------------------------------------------------------- var(u2)| 8.218266 1.883477 5.244455 12.87835 var(u3)| 7.117239 2.028477 4.071097 12.44262 var(u4)| 4.955002 .9093869 3.457989 7.100093 ------------------------------------------------------------------------------------------------------+---------------------------------------------------------------- cov(u2,u3)| 3.509047 1.364615 2.57 0.010 .8344505 6.183644 cov(u2,u4)| 4.118961 1.010872 4.07 0.000 2.137688 6.100235 cov(u3,u4)| 2.589384 .9309654 2.78 0.005 .7647252 4.414042 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Note: Estimates are transformed only in the first 4 equations to relative-risk ratios. Note: _cons estimates baseline relative risk (conditional on zero random effects).