Dear all,

I am currently working on a count model (negative binomial) and trying to choose the best model based on AIC or BIC model fit tests.

The issue that I am concerned is that I read some articles saying that these tests should not be used when using clustered / weighted data which are common for survey data. Because the way that I constructed my data is somewhat different from this statement, I am looking for any help to see whether these tests (AIC, BIC) are appropriate or not.

The model consists of two types of datasets. The dependent variable was obtained from the surveillance database, so neither weight nor sampling is matter. However, the independent variables were obtained from Demography and Health Survey (DHS) data where sample weights are required to use. The goal of this analysis is to find out statistically significant independent variables to explain variance of the dependent variable (as usual).

Prior to running a regression, the dataset for the independent variables was prepared by collapsing (by region) with the "sample weights" provided from DHS datasets. Thus, I do not have to use the "[iweight=weight]" option when running the regression (because the final dataset for independent variables was already weighted when collapsing, and no weight was required for the dependent variable). The regression and test outputs for one of the models are shown as below.

I was wondering if it would be okay to use AIC or BIC tests for model comparison in this context.

Thank you.

Jungseok Lee

. xi: glm inc1000 i.q3RF1 i.age_grp*inc_type, fam(nb)

i.q3RF1 _Iq3RF1_1-3 (naturally coded; _Iq3RF1_1 omitted)

i.age_grp _Iage_grp_1-5 (naturally coded; _Iage_grp_5 omitted)

i.age_~p*inc~pe _IageXinc_t_# (coded as above)

note: _IageXinc_t_1 omitted because of collinearity

Iteration 0: log likelihood = -228.99003

Iteration 1: log likelihood = -225.47151

Iteration 2: log likelihood = -225.43393

Iteration 3: log likelihood = -225.43391

Generalized linear models No. of obs = 84

Optimization : ML Residual df = 73

Scale parameter = 1

Deviance = 80.20500562 (1/df) Deviance = 1.098699

Pearson = 71.00797014 (1/df) Pearson = .9727119

Variance function: V(u) = u+(1)u^2 [Neg. Binomial]

Link function : g(u) = ln(u) [Log]

AIC = 5.629379

Log likelihood = -225.4339137 BIC = -243.2446

------------------------------------------------------------------------------

| OIM

inc1000 | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Iq3RF1_2 | -.5920396 .3099836 -1.91 0.056 -1.199596 .0155171

_Iq3RF1_3 | .3788479 .3493202 1.08 0.278 -.305807 1.063503

_Iage_grp_1 | .9523037 .4453947 2.14 0.033 .079346 1.825261

_Iage_grp_2 | -4.379099 1.337118 -3.28 0.001 -6.999803 -1.758395

_Iage_grp_3 | -1.639636 .6600511 -2.48 0.013 -2.933312 -.3459597

_Iage_grp_4 | -3.685952 1.512576 -2.44 0.015 -6.650546 -.721358

inc_type | -3.278245 .6011499 -5.45 0.000 -4.456478 -2.100013

_IageXinc_~1 | (omitted)

_IageXinc_~2 | 5.701549 1.390779 4.10 0.000 2.975672 8.427426

_IageXinc_~3 | 2.357005 .7785509 3.03 0.002 .831073 3.882936

_IageXinc_~4 | 3.255374 1.596704 2.04 0.041 .1258912 6.384857

_cons | 4.277992 .5850281 7.31 0.000 3.131358 5.424626

------------------------------------------------------------------------------

. estat ic

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+---------------------------------------------------------------

. | 84 . -225.4339 11 472.8678 499.6068

-----------------------------------------------------------------------------

Note: N=Obs used in calculating BIC; see [R] BIC note

I am currently working on a count model (negative binomial) and trying to choose the best model based on AIC or BIC model fit tests.

The issue that I am concerned is that I read some articles saying that these tests should not be used when using clustered / weighted data which are common for survey data. Because the way that I constructed my data is somewhat different from this statement, I am looking for any help to see whether these tests (AIC, BIC) are appropriate or not.

The model consists of two types of datasets. The dependent variable was obtained from the surveillance database, so neither weight nor sampling is matter. However, the independent variables were obtained from Demography and Health Survey (DHS) data where sample weights are required to use. The goal of this analysis is to find out statistically significant independent variables to explain variance of the dependent variable (as usual).

Prior to running a regression, the dataset for the independent variables was prepared by collapsing (by region) with the "sample weights" provided from DHS datasets. Thus, I do not have to use the "[iweight=weight]" option when running the regression (because the final dataset for independent variables was already weighted when collapsing, and no weight was required for the dependent variable). The regression and test outputs for one of the models are shown as below.

I was wondering if it would be okay to use AIC or BIC tests for model comparison in this context.

Thank you.

Jungseok Lee

. xi: glm inc1000 i.q3RF1 i.age_grp*inc_type, fam(nb)

i.q3RF1 _Iq3RF1_1-3 (naturally coded; _Iq3RF1_1 omitted)

i.age_grp _Iage_grp_1-5 (naturally coded; _Iage_grp_5 omitted)

i.age_~p*inc~pe _IageXinc_t_# (coded as above)

note: _IageXinc_t_1 omitted because of collinearity

Iteration 0: log likelihood = -228.99003

Iteration 1: log likelihood = -225.47151

Iteration 2: log likelihood = -225.43393

Iteration 3: log likelihood = -225.43391

Generalized linear models No. of obs = 84

Optimization : ML Residual df = 73

Scale parameter = 1

Deviance = 80.20500562 (1/df) Deviance = 1.098699

Pearson = 71.00797014 (1/df) Pearson = .9727119

Variance function: V(u) = u+(1)u^2 [Neg. Binomial]

Link function : g(u) = ln(u) [Log]

AIC = 5.629379

Log likelihood = -225.4339137 BIC = -243.2446

------------------------------------------------------------------------------

| OIM

inc1000 | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Iq3RF1_2 | -.5920396 .3099836 -1.91 0.056 -1.199596 .0155171

_Iq3RF1_3 | .3788479 .3493202 1.08 0.278 -.305807 1.063503

_Iage_grp_1 | .9523037 .4453947 2.14 0.033 .079346 1.825261

_Iage_grp_2 | -4.379099 1.337118 -3.28 0.001 -6.999803 -1.758395

_Iage_grp_3 | -1.639636 .6600511 -2.48 0.013 -2.933312 -.3459597

_Iage_grp_4 | -3.685952 1.512576 -2.44 0.015 -6.650546 -.721358

inc_type | -3.278245 .6011499 -5.45 0.000 -4.456478 -2.100013

_IageXinc_~1 | (omitted)

_IageXinc_~2 | 5.701549 1.390779 4.10 0.000 2.975672 8.427426

_IageXinc_~3 | 2.357005 .7785509 3.03 0.002 .831073 3.882936

_IageXinc_~4 | 3.255374 1.596704 2.04 0.041 .1258912 6.384857

_cons | 4.277992 .5850281 7.31 0.000 3.131358 5.424626

------------------------------------------------------------------------------

. estat ic

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+---------------------------------------------------------------

. | 84 . -225.4339 11 472.8678 499.6068

-----------------------------------------------------------------------------

Note: N=Obs used in calculating BIC; see [R] BIC note

## Comment