GEE model problematic variable

Janina Apostolou

Join Date: Jan 2017

Posts: 10
#1

GEE model problematic variable

31 Jan 2017, 12:28

Dear Stata members,
I am running a population-averaged logistic regression where the dependent variable is 1 or 0, if a city applies Participatory Budgeting or not.

My main explanatory variable is the debt per capita. Another explanatory variable is population size.
I expected the effect of the variable debt to be stronger. However, running the regression in Stata, the variable population has much higher odds ratios and is significant at the 1 per cent level (odds ratio of the variable debt only at 10 per cent level).
If I exclude the variable population, debt is significant at the one per cent level.

The variable population shows much higher variability in values, ranging from cities with 4.000 to 3.000.000 inhabitants.
Naturally, the debt per capita does not show such a wide range.

Could that “bias” the results? Do I have to transform the variable?

I tried as well, how the model works, if I centre the population variable. The odds ratio of the debt variable become significant at 1 per cent level then but Stata reports “convergence not achieved”. That problem only occurs, when these two variables are in one model.

According to multicollinearity test in Stata, MK between these two variables should not be a problem.

Does anybody have an idea of what could be the problem here?
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4446
#2

31 Jan 2017, 21:45

I'm not familiar with any of this, but is it pretty common to have per capita something-or-other and population size both as predictors in a regression model?

I would have thought, perhaps naïvely, that the fact that population size is also the denominator of per capita debt would give rise to unexpected behavior (see below—compare the first model's coefficients and p-values for the two predictors debt and population size to those for the second where debt has been converted to per capita debt).

And I wouldn't have expected a test(?) for collinearity to offer any kind of protection (see VIF values below).

.ÿversionÿ14.2

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿsetÿseedÿ1372310

.ÿ
.ÿquietlyÿsetÿobsÿ200

.ÿgenerateÿintÿcityÿ=ÿ_n

.ÿgenerateÿdoubleÿuÿ=ÿrnormal()

.ÿquietlyÿexpandÿ3

.ÿbysortÿcity:ÿgenerateÿbyteÿtimeÿ=ÿ_n

.ÿ
.ÿtempnameÿCorr

.ÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(3,ÿ3,ÿ0.5)ÿ+ÿ0.5ÿ*ÿI(3)

.ÿquietlyÿdrawnormÿdebtÿpopu,ÿdoubleÿcorr(`Corr')

.ÿ
.ÿgenerateÿdoubleÿprÿ=ÿnormal(debtÿ/ÿ10ÿ+ÿpopuÿ/ÿ10ÿ+ÿtimeÿ/ÿ30ÿ+ÿu)

.ÿquietlyÿgenerateÿbyteÿparbudÿ=ÿrbinomial(1,ÿpr)

.ÿ
.ÿxtgeeÿparbudÿc.debtÿc.popuÿi.tim,ÿi(city)ÿt(time)ÿ///
>ÿÿÿÿÿÿÿÿÿfamily(binomial)ÿlink(logit)ÿcorr(unstructured)ÿeformÿnolog

GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ600
Groupÿandÿtimeÿvars:ÿÿÿÿÿÿÿÿÿÿÿÿÿcityÿtimeÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ200
Link:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogitÿÿÿÿÿÿObsÿperÿgroup:
Family:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿbinomialÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ3
Correlation:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿunstructuredÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ3.0
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ3
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(4)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ9.61
Scaleÿparameter:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0474

------------------------------------------------------------------------------
ÿÿÿÿÿÿparbudÿ|ÿOddsÿRatioÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿdebtÿ|ÿÿÿ1.417301ÿÿÿ.2026199ÿÿÿÿÿ2.44ÿÿÿ0.015ÿÿÿÿÿ1.070959ÿÿÿÿ1.875648
ÿÿÿÿÿÿÿÿpopuÿ|ÿÿÿ.8090169ÿÿÿ.1410313ÿÿÿÿ-1.22ÿÿÿ0.224ÿÿÿÿÿ.5748738ÿÿÿÿ1.138525
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿtimeÿ|
ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ.9996482ÿÿÿ.1464693ÿÿÿÿ-0.00ÿÿÿ0.998ÿÿÿÿÿÿ.750116ÿÿÿÿ1.332189
ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ1.109584ÿÿÿ.1720545ÿÿÿÿÿ0.67ÿÿÿ0.502ÿÿÿÿÿ.8187869ÿÿÿÿ1.503659
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ1.06858ÿÿÿ.1522834ÿÿÿÿÿ0.47ÿÿÿ0.642ÿÿÿÿÿ.8081691ÿÿÿÿ1.412901
------------------------------------------------------------------------------

.ÿ
.ÿgenerateÿdoubleÿpcdeÿ=ÿdebtÿ/ÿpopu

.ÿxtgeeÿparbudÿc.pcdeÿc.popuÿi.tim,ÿi(city)ÿt(time)ÿ///
>ÿÿÿÿÿÿÿÿÿfamily(binomial)ÿlink(logit)ÿcorr(unstructured)ÿeformÿnolog

GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ600
Groupÿandÿtimeÿvars:ÿÿÿÿÿÿÿÿÿÿÿÿÿcityÿtimeÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ200
Link:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogitÿÿÿÿÿÿObsÿperÿgroup:
Family:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿbinomialÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ3
Correlation:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿunstructuredÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ3.0
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ3
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(4)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ4.28
Scaleÿparameter:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.3696

------------------------------------------------------------------------------
ÿÿÿÿÿÿparbudÿ|ÿOddsÿRatioÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿpcdeÿ|ÿÿÿ1.003929ÿÿÿ.0055236ÿÿÿÿÿ0.71ÿÿÿ0.476ÿÿÿÿÿÿ.993161ÿÿÿÿ1.014814
ÿÿÿÿÿÿÿÿpopuÿ|ÿÿÿ1.172776ÿÿÿ.1009017ÿÿÿÿÿ1.85ÿÿÿ0.064ÿÿÿÿÿ.9907879ÿÿÿÿ1.388193
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿtimeÿ|
ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿÿ1.01531ÿÿÿ.1481031ÿÿÿÿÿ0.10ÿÿÿ0.917ÿÿÿÿÿ.7628406ÿÿÿÿ1.351336
ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ1.113147ÿÿÿ.1744949ÿÿÿÿÿ0.68ÿÿÿ0.494ÿÿÿÿÿ.8186902ÿÿÿÿÿ1.51351
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ1.06309ÿÿÿ.1511315ÿÿÿÿÿ0.43ÿÿÿ0.667ÿÿÿÿÿ.8045651ÿÿÿÿ1.404685
------------------------------------------------------------------------------

.ÿ
.ÿquietlyÿregressÿparbudÿc.pcdeÿc.popuÿi.timÿi.city

.ÿquietlyÿestatÿvif

.ÿforvaluesÿiÿ=ÿ1/2ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"VIFÿ`r(name_`i')'ÿ=ÿ"ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿasÿresultÿ_column(10)ÿ%4.2fÿr(vif_`i')
ÿÿ3.ÿ}
VIFÿpcdeÿ=ÿ1.51
VIFÿpopuÿ=ÿ1.55

.ÿ
.ÿexit

endÿofÿdo-file

.
Comment
Janina Apostolou

Join Date: Jan 2017

Posts: 10
#3

02 Feb 2017, 08:55

Dear Joseph,

thank you very much for your answer!

Probably it is not wise to have debt per capita and population size in one regression.
However, I need the variable debt per capita, as the absolute debt level makes comparison between cities of different size impossible. Larger cities in my sample will have higher debt levels than smaller cities, but that does not mean, that they are more indebted. And this “indebtedness” is my main explanatory variable. If I exchanged it to the absolute debt level my whole argument would be a different one. If I exchange it to absolute debt, my odds ratio is even smaller than one, while the one for debt per capita is larger than one.

But I will try to find a proxy for population size, maybe that will solve my problem.

Thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30155
#4

02 Feb 2017, 12:21

This is really down to a substantive question rather than a statistical one, and as a non-economist I'm out of my depth here. But based just on my naive intuition, might it not be helpful to have a ratio of debt to municipal product instead of per-capita debt here?
Comment
Janina Apostolou

Join Date: Jan 2017

Posts: 10
#5

11 Feb 2017, 06:04

Dear Joseph, I am still adjusting my model exchanging the population variable with other proxy. However, I think the variable debt might have to be included as a polynomial. I found this code of yours in a different post: summarize turn, meanonly generate double c_turn = turn - r(mean) logit foreign c.c_turn##c.c_turn, nolog estimates store Full quietly logit foreign c.c_turn, nolog lrtest Full and ran it with my data and variables, looking like that: summarize debt, meanonly generate double c_debt = debt - r(mean) logit pb c.c_debt##c.c_debt, nolog estimates store Full quietly logit pb c.c_debt, nolog lrtest Full The likelihood test turned out like that: lrtest Full Likelihood-ratio test LR chi2(1) = 157.92 (Assumption: . nested in Full) Prob > chi2 = 0.0000 That means that the moedel with the polynomial term is a better fit? Thanks a lot for your help! Janina
Comment

Announcement

GEE model problematic variable

Comment

Comment

Comment

Comment