two-level linear regression model - random coefficient parameter estimates too large

Samira Choudhury

Join Date: Sep 2016

Posts: 32
#1

two-level linear regression model - random coefficient parameter estimates too large

16 Oct 2018, 12:02

Hi,
I'm trying to fit a two-level linear regression model with households at level 1 and states (state code) at level 2 using the National Sample Survey data (2011-2012). My dependent variable is household fruit and vegetable intake (g/capita/day) and I would like to look to estimate the contribution of household and state level factors (road density -rdennew, market density-mden) that drive household fruit and vegetable intakes.

mixed pcfruitvegg logmpce hh_size educfsp femhh rural hindu agri c0_5 i.tercilerainfall rdennew mden ||statecode:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -83030.922
Iteration 1: log likelihood = -83030.922

Computing standard errors:

Mixed-effects ML regression Number of obs = 13,402
Group variable: statecode Number of groups = 5

Obs per group:
min = 569
avg = 2,680.4
max = 6,324

Wald chi2(12) = 6156.97
Log likelihood = -83030.922 Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
pcfruitvegg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
logmpce | 129.306 2.135464 60.55 0.000 125.1205 133.4914
hh_size | -7.762976 .4705432 -16.50 0.000 -8.685224 -6.840728
educfsp | .4859159 .3262967 1.49 0.136 -.153614 1.125446
femhh | 10.4659 3.649228 2.87 0.004 3.313547 17.61826
rural | -.5977197 2.273317 -0.26 0.793 -5.053338 3.857899
hindu | -.9536007 2.651058 -0.36 0.719 -6.14958 4.242378
agrihh | .0361239 2.119468 0.02 0.986 -4.117956 4.190204
c0_5 | 2.725623 1.30707 2.09 0.037 .1638132 5.287433
|
tercilerainfall |
2 | 8.955871 2.618222 3.42 0.001 3.824249 14.08749
3 | 37.90899 6.791959 5.58 0.000 24.597 51.22099
|
rdennew | 3.182636 5.868054 0.54 0.588 -8.318538 14.68381
mden | 1.109875 .9562233 1.16 0.246 -.7642887 2.984038
_cons | -711.6345 20.30829 -35.04 0.000 -751.438 -671.831
---------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
statecode: Identity |
var(_cons) | 453.4765 298.4873 124.8179 1647.527
-----------------------------+------------------------------------------------
var(Residual) | 14064.46 171.8451 13731.65 14405.34
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 160.80 Prob >= chibar2 = 0.0000

estat icc

Residual intraclass correlation

------------------------------------------------------------------------------
Level | ICC Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
statecode | .0312356 .0199215 .0087945 .1048813
------------------------------------------------------------------------------

My question is why are the random effects parameters so large? How would you interpret them? Have I estimated this correctly?

Thank you very much for your help.

Samira.
Tags: None

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

16 Oct 2018, 12:15

Samira, note that it's more readable if you present both code and results in code delimiters (see my signature), e.g.

Code:

mixed pcfruitvegg logmpce hh_size educfsp femhh rural hindu agri c0_5 i.tercilerainfall rdennew mden ||statecode:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0: log likelihood = -83030.922  
Iteration 1: log likelihood = -83030.922  

Computing standard errors:

Mixed-effects ML regression Number of obs = 13,402
Group variable: statecode Number of groups = 5

Obs per group:
min = 569
avg = 2,680.4
max = 6,324

Wald chi2(12) = 6156.97
Log likelihood = -83030.922 Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
pcfruitvegg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
logmpce | 129.306 2.135464 60.55 0.000 125.1205 133.4914
hh_size | -7.762976 .4705432 -16.50 0.000 -8.685224 -6.840728
educfsp | .4859159 .3262967 1.49 0.136 -.153614 1.125446
femhh | 10.4659 3.649228 2.87 0.004 3.313547 17.61826
rural | -.5977197 2.273317 -0.26 0.793 -5.053338 3.857899
hindu | -.9536007 2.651058 -0.36 0.719 -6.14958 4.242378
agrihh | .0361239 2.119468 0.02 0.986 -4.117956 4.190204
c0_5 | 2.725623 1.30707 2.09 0.037 .1638132 5.287433
|
tercilerainfall |
2 | 8.955871 2.618222 3.42 0.001 3.824249 14.08749
3 | 37.90899 6.791959 5.58 0.000 24.597 51.22099
|
rdennew | 3.182636 5.868054 0.54 0.588 -8.318538 14.68381
mden | 1.109875 .9562233 1.16 0.246 -.7642887 2.984038
_cons | -711.6345 20.30829 -35.04 0.000 -751.438 -671.831
---------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
statecode: Identity |
var(_cons) | 453.4765 298.4873 124.8179 1647.527
-----------------------------+------------------------------------------------
var(Residual) | 14064.46 171.8451 13731.65 14405.34
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 160.80 Prob >= chibar2 = 0.0000


estat icc

Residual intraclass correlation

------------------------------------------------------------------------------
Level | ICC Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
statecode | .0312356 .0199215 .0087945 .1048813
------------------------------------------------------------------------------

That said, your random effects parameters probably seem large because the scale of the dependent variable seems pretty large. You noted that it's grams per capita per day. If someone had half a pound of fruit and veg a day, that's already 227 grams. Your intra-class correlation seems reasonable. The variance of the random intercept is basically 453g (so, I believe the SD is about 21g).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Comment

Samira Choudhury

Join Date: Sep 2016

Posts: 32
#3

16 Oct 2018, 13:46

Thank you so much for your help. Next time I'll definitely use the code delimiters to show code and results. I have just one more question.

In the following paper, https://pdfs.semanticscholar.org/3bd...b6c34f7465.pdf
in table 2, the authors listed the "random part" level 2 (neighbourhoods) and level 1 (individuals). How did they calculate the variations in fruit intakes at the individual and neighborhood level and the significance levels? Is there a different command I need to run? With my data, I would like to present similar results. According to my results, -2 log likelihood = 83030.922. Is this correct?

Thanks a lot for your help!
Samira.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

16 Oct 2018, 18:17

Originally posted by Samira Choudhury View Post

Thank you so much for your help. Next time I'll definitely use the code delimiters to show code and results. I have just one more question.

In the following paper, https://pdfs.semanticscholar.org/3bd...b6c34f7465.pdf
in table 2, the authors listed the "random part" level 2 (neighbourhoods) and level 1 (individuals). How did they calculate the variations in fruit intakes at the individual and neighborhood level and the significance levels? Is there a different command I need to run? With my data, I would like to present similar results. According to my results, -2 log likelihood = 83030.922. Is this correct?

Thanks a lot for your help!
Samira.

As far as I can tell, the authors didn't calculate the variations in fruit intakes - those would have been estimated by the model. Their DV is on a completely different scale than yours; see pg 624, under the paragraph outcome variables.

Fruit and vegetable intakes were assessed separately by asking ‘How many servings of [fruit/vegetables] do you usually eat each day?’ (a serving of fruit was defined as 1 medium piece or 2 small pieces of fruit, or 1 cup of diced pieces; a serving of vegetables was defined as 1/2 cup of cooked vegetables or 1 cup of salad vegetables). Response options were ‘none’, ‘1 serving’, ‘2 servings’, ‘3–4 servings’ (coded 3.5 for analyses) or ‘5 servings or more’ (coded 5).

Your variable, assuming the data collection process was accurate, is more like a continuous variable.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Samira Choudhury

Join Date: Sep 2016

Posts: 32
#5

17 Oct 2018, 03:49

Thank you for getting back to me. Yes my DV is a continuous variable (fruit and vegetable intake in g/capita/day using a 7 day recall period). So in that case, according to my table, statecode (variance) is 14064.46? But the calculation does not show the household (variance). Is there a way to calculate this via stata? Thank you for your help.
Comment
Samira Choudhury

Join Date: Sep 2016

Posts: 32
#6

17 Oct 2018, 03:54

Also according to my results, the intraclass correlation coefficient is 0.03 at state level. This implies that the state level accounts for 3 percent of the fruit and vegetable intake variation at the household level.
Comment
Samira Choudhury

Join Date: Sep 2016

Posts: 32
#7

17 Oct 2018, 04:24

Lastly, what command do I use for Stata to report the significance of ICC? Thank you for your help.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

17 Oct 2018, 10:41

Originally posted by Samira Choudhury View Post

I'm trying to fit a two-level linear regression model with households at level 1 and states (state code) at level 2 using the National Sample Survey data (2011-2012)....But the calculation does not show the household (variance). Is there a way to calculate this via stata? ...Lastly, what command do I use for Stata to report the significance of ICC? Thank you for your help.

Samira,

Re-reading your posts from the start, it looks like you a) didn't fit a mixed model with any clustering at the household level, and b) you also didn't account for the complex survey design. I am not familiar with doing the latter in -mixed-; normally, you would -svyset- the data and use the -svy- prefix with any estimation command, but I think that in -mixed-, you can invoke the sampling weight at each appropriate level. The documentation for -mixed- should explain more.

You need to include some sort of household identifier as the second level in the random effects specification, e.g.

Code:

mixed pcfruitvegg logmpce hh_size educfsp femhh rural hindu agri c0_5 i.tercilerainfall rdennew mden || statecode: || household_id:

Stata doesn't conduct a t- or Z-test for the random intercepts or slopes like the other regression coefficients. Here, the 95% confidence interval for the estimated random effect variance doesn't cross zero. Think of it this way: if the estimated variance of the random intercepts were essentially zero, that basically says that there's no point fitting that random intercept. Additionally, my impression is that there is some unresolved theoretical debate on how to calculate p-values for any parameter in any mixed model in the first place. If you read the footnotes of table 2 in the article you cited, chances are very good that the software these guys used also didn't provide p-values for the random effects: they say that

* P , 0.05 (based on ratio of estimate to its SE).

I would probably leave the p-values out of the random effects parameters, but report the standard errors. Again, because the confidence interval for your state random intercept is comfortably away from zero, that random effect is probably quite robust.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Samira Choudhury

Join Date: Sep 2016

Posts: 32
#9

31 Oct 2018, 10:13

Hi! Thanks for your reply. My impression was with a two-level linear regression model, you would only cluster for states since these indicators are already measured at household level.
Comment

Announcement