Dear Statalisters,
I am trying to run an OLS regression, with log of per capita calorie as my dependent variable and age and years of education of household head, log per capita expenditure as my independent variables (other controls to be added eventually). When I run the regression with just age and education as control, they are significant and positive. However, as soon as I add log per capita expenditure, education becomes negative and significant. I am puzzled by this result- I understand that education of the household head might reflect a "wealth" effect, but the correlation coefficient is not that large. I have posted my regression results below, as well as summary statistics. I was wondering if someone could help me understand what is going on here. I realize that this sort of problem might (or might not ) be overcome using other techniques than OLS, but I have just started learning OLS and would like to understand how to deal with this in OLS, or at least know why it cannot deal with this.
Thanks,
Monzur
. regress log_pccal age_hhhead eduy_hhhead [pw=hhweight], r
Linear regression Number of obs = 3355
F( 2, 3352) = 105.40
Prob > F = 0.0000
R-squared = 0.0692
Root MSE = .25583
------------------------------------------------------------------------------
| Robust
log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_hhhead | .0049182 .0003602 13.65 0.000 .004212 .0056244
eduy_hhhead | .0075136 .0011997 6.26 0.000 .0051613 .0098659
_cons | 7.537586 .0171067 440.62 0.000 7.504045 7.571126
------------------------------------------------------------------------------
. regress log_pccal age_hhhead eduy_hhhead log_pcexp [pw=hhweight], r
Linear regression Number of obs = 3355
F( 3, 3351) = 601.38
Prob > F = 0.0000
R-squared = 0.4123
Root MSE = .20332
------------------------------------------------------------------------------
| Robust
log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_hhhead | .001919 .0002945 6.52 0.000 .0013415 .0024964
eduy_hhhead | -.0082508 .001044 -7.90 0.000 -.0102977 -.0062039
log_pcexp | .3777407 .0100402 37.62 0.000 .3580552 .3974262
_cons | 4.795607 .0730719 65.63 0.000 4.652337 4.938877
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
log_pcexp | 1.20 0.832228
eduy_hhhead | 1.16 0.863121
age_hhhead | 1.07 0.930743
-------------+----------------------
Mean VIF | 1.14
. su log_pccal eduy_hhhead log_pcexp, d
log_pccal
-------------------------------------------------------------
Percentiles Smallest
1% 7.123889 6.311302
5% 7.337663 6.67333
10% 7.436243 6.834251 Obs 3698
25% 7.607244 6.855416 Sum of Wgt. 3698
50% 7.779021 Mean 7.783589
Largest Std. Dev. .276406
75% 7.96576 8.723692
90% 8.135495 8.726619 Variance .0764003
95% 8.232234 8.736762 Skewness .0350145
99% 8.477096 8.86989 Kurtosis 3.511389
years of education of household head
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 3698
25% 0 0 Sum of Wgt. 3698
50% 0 Mean 2.984857
Largest Std. Dev. 3.776812
75% 5 16
90% 9 16 Variance 14.26431
95% 10 16 Skewness .9461994
99% 12 16 Kurtosis 2.751041
log of hh per capita expenditure
-------------------------------------------------------------
Percentiles Smallest
1% 6.799201 6.302472
5% 7.063458 6.434649
10% 7.202945 6.450388 Obs 3698
25% 7.432215 6.458682 Sum of Wgt. 3698
50% 7.7299 Mean 7.762185
Largest Std. Dev. .4636838
75% 8.045497 9.502833
90% 8.368738 9.544683 Variance .2150027
95% 8.571793 9.76697 Skewness .4395734
99% 9.038363 9.858101 Kurtosis 3.433132
. pwcorr log_pccal age_hhhead eduy_hhhead log_pcexp, sig
| log~ccal age_hh~d eduy_h~d log_pc~p
-------------+------------------------------------
log_pccal | 1.0000
|
|
age_hhhead | 0.2282 1.0000
| 0.0000
|
eduy_hhhead | 0.0855 -0.1133 1.0000
| 0.0000 0.0000
|
log_pcexp | 0.6401 0.1796 0.3254 1.0000
| 0.0000 0.0000 0.0000
|
I am trying to run an OLS regression, with log of per capita calorie as my dependent variable and age and years of education of household head, log per capita expenditure as my independent variables (other controls to be added eventually). When I run the regression with just age and education as control, they are significant and positive. However, as soon as I add log per capita expenditure, education becomes negative and significant. I am puzzled by this result- I understand that education of the household head might reflect a "wealth" effect, but the correlation coefficient is not that large. I have posted my regression results below, as well as summary statistics. I was wondering if someone could help me understand what is going on here. I realize that this sort of problem might (or might not ) be overcome using other techniques than OLS, but I have just started learning OLS and would like to understand how to deal with this in OLS, or at least know why it cannot deal with this.
Thanks,
Monzur
. regress log_pccal age_hhhead eduy_hhhead [pw=hhweight], r
Linear regression Number of obs = 3355
F( 2, 3352) = 105.40
Prob > F = 0.0000
R-squared = 0.0692
Root MSE = .25583
------------------------------------------------------------------------------
| Robust
log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_hhhead | .0049182 .0003602 13.65 0.000 .004212 .0056244
eduy_hhhead | .0075136 .0011997 6.26 0.000 .0051613 .0098659
_cons | 7.537586 .0171067 440.62 0.000 7.504045 7.571126
------------------------------------------------------------------------------
. regress log_pccal age_hhhead eduy_hhhead log_pcexp [pw=hhweight], r
Linear regression Number of obs = 3355
F( 3, 3351) = 601.38
Prob > F = 0.0000
R-squared = 0.4123
Root MSE = .20332
------------------------------------------------------------------------------
| Robust
log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_hhhead | .001919 .0002945 6.52 0.000 .0013415 .0024964
eduy_hhhead | -.0082508 .001044 -7.90 0.000 -.0102977 -.0062039
log_pcexp | .3777407 .0100402 37.62 0.000 .3580552 .3974262
_cons | 4.795607 .0730719 65.63 0.000 4.652337 4.938877
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
log_pcexp | 1.20 0.832228
eduy_hhhead | 1.16 0.863121
age_hhhead | 1.07 0.930743
-------------+----------------------
Mean VIF | 1.14
. su log_pccal eduy_hhhead log_pcexp, d
log_pccal
-------------------------------------------------------------
Percentiles Smallest
1% 7.123889 6.311302
5% 7.337663 6.67333
10% 7.436243 6.834251 Obs 3698
25% 7.607244 6.855416 Sum of Wgt. 3698
50% 7.779021 Mean 7.783589
Largest Std. Dev. .276406
75% 7.96576 8.723692
90% 8.135495 8.726619 Variance .0764003
95% 8.232234 8.736762 Skewness .0350145
99% 8.477096 8.86989 Kurtosis 3.511389
years of education of household head
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 3698
25% 0 0 Sum of Wgt. 3698
50% 0 Mean 2.984857
Largest Std. Dev. 3.776812
75% 5 16
90% 9 16 Variance 14.26431
95% 10 16 Skewness .9461994
99% 12 16 Kurtosis 2.751041
log of hh per capita expenditure
-------------------------------------------------------------
Percentiles Smallest
1% 6.799201 6.302472
5% 7.063458 6.434649
10% 7.202945 6.450388 Obs 3698
25% 7.432215 6.458682 Sum of Wgt. 3698
50% 7.7299 Mean 7.762185
Largest Std. Dev. .4636838
75% 8.045497 9.502833
90% 8.368738 9.544683 Variance .2150027
95% 8.571793 9.76697 Skewness .4395734
99% 9.038363 9.858101 Kurtosis 3.433132
. pwcorr log_pccal age_hhhead eduy_hhhead log_pcexp, sig
| log~ccal age_hh~d eduy_h~d log_pc~p
-------------+------------------------------------
log_pccal | 1.0000
|
|
age_hhhead | 0.2282 1.0000
| 0.0000
|
eduy_hhhead | 0.0855 -0.1133 1.0000
| 0.0000 0.0000
|
log_pcexp | 0.6401 0.1796 0.3254 1.0000
| 0.0000 0.0000 0.0000
|
Comment