Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GEE model problematic variable

    Dear Stata members,
    I am running a population-averaged logistic regression where the dependent variable is 1 or 0, if a city applies Participatory Budgeting or not.

    My main explanatory variable is the debt per capita. Another explanatory variable is population size.
    I expected the effect of the variable debt to be stronger. However, running the regression in Stata, the variable population has much higher odds ratios and is significant at the 1 per cent level (odds ratio of the variable debt only at 10 per cent level).
    If I exclude the variable population, debt is significant at the one per cent level.

    The variable population shows much higher variability in values, ranging from cities with 4.000 to 3.000.000 inhabitants.
    Naturally, the debt per capita does not show such a wide range.

    Could that “bias” the results? Do I have to transform the variable?

    I tried as well, how the model works, if I centre the population variable. The odds ratio of the debt variable become significant at 1 per cent level then but Stata reports “convergence not achieved”. That problem only occurs, when these two variables are in one model.

    According to multicollinearity test in Stata, MK between these two variables should not be a problem.

    Does anybody have an idea of what could be the problem here?


  • #2
    I'm not familiar with any of this, but is it pretty common to have per capita something-or-other and population size both as predictors in a regression model?

    I would have thought, perhaps naïvely, that the fact that population size is also the denominator of per capita debt would give rise to unexpected behavior (see below—compare the first model's coefficients and p-values for the two predictors debt and population size to those for the second where debt has been converted to per capita debt).

    And I wouldn't have expected a test(?) for collinearity to offer any kind of protection (see VIF values below).

    .ÿversionÿ14.2

    .ÿ
    .ÿclearÿ*

    .ÿsetÿmoreÿoff

    .ÿsetÿseedÿ1372310

    .ÿ
    .ÿquietlyÿsetÿobsÿ200

    .ÿgenerateÿintÿcityÿ=ÿ_n

    .ÿgenerateÿdoubleÿuÿ=ÿrnormal()

    .ÿquietlyÿexpandÿ3

    .ÿbysortÿcity:ÿgenerateÿbyteÿtimeÿ=ÿ_n

    .ÿ
    .ÿtempnameÿCorr

    .ÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(3,ÿ3,ÿ0.5)ÿ+ÿ0.5ÿ*ÿI(3)

    .ÿquietlyÿdrawnormÿdebtÿpopu,ÿdoubleÿcorr(`Corr')

    .ÿ
    .ÿgenerateÿdoubleÿprÿ=ÿnormal(debtÿ/ÿ10ÿ+ÿpopuÿ/ÿ10ÿ+ÿtimeÿ/ÿ30ÿ+ÿu)

    .ÿquietlyÿgenerateÿbyteÿparbudÿ=ÿrbinomial(1,ÿpr)

    .ÿ
    .ÿxtgeeÿparbudÿc.debtÿc.popuÿi.tim,ÿi(city)ÿt(time)ÿ///
    >ÿÿÿÿÿÿÿÿÿfamily(binomial)ÿlink(logit)ÿcorr(unstructured)ÿeformÿnolog

    GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ600
    Groupÿandÿtimeÿvars:ÿÿÿÿÿÿÿÿÿÿÿÿÿcityÿtimeÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ200
    Link:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogitÿÿÿÿÿÿObsÿperÿgroup:
    Family:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿbinomialÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ3
    Correlation:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿunstructuredÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ3.0
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ3
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(4)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ9.61
    Scaleÿparameter:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0474

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿparbudÿ|ÿOddsÿRatioÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿÿdebtÿ|ÿÿÿ1.417301ÿÿÿ.2026199ÿÿÿÿÿ2.44ÿÿÿ0.015ÿÿÿÿÿ1.070959ÿÿÿÿ1.875648
    ÿÿÿÿÿÿÿÿpopuÿ|ÿÿÿ.8090169ÿÿÿ.1410313ÿÿÿÿ-1.22ÿÿÿ0.224ÿÿÿÿÿ.5748738ÿÿÿÿ1.138525
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿtimeÿ|
    ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ.9996482ÿÿÿ.1464693ÿÿÿÿ-0.00ÿÿÿ0.998ÿÿÿÿÿÿ.750116ÿÿÿÿ1.332189
    ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ1.109584ÿÿÿ.1720545ÿÿÿÿÿ0.67ÿÿÿ0.502ÿÿÿÿÿ.8187869ÿÿÿÿ1.503659
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ1.06858ÿÿÿ.1522834ÿÿÿÿÿ0.47ÿÿÿ0.642ÿÿÿÿÿ.8081691ÿÿÿÿ1.412901
    ------------------------------------------------------------------------------

    .ÿ
    .ÿgenerateÿdoubleÿpcdeÿ=ÿdebtÿ/ÿpopu

    .ÿxtgeeÿparbudÿc.pcdeÿc.popuÿi.tim,ÿi(city)ÿt(time)ÿ///
    >ÿÿÿÿÿÿÿÿÿfamily(binomial)ÿlink(logit)ÿcorr(unstructured)ÿeformÿnolog

    GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ600
    Groupÿandÿtimeÿvars:ÿÿÿÿÿÿÿÿÿÿÿÿÿcityÿtimeÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ200
    Link:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogitÿÿÿÿÿÿObsÿperÿgroup:
    Family:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿbinomialÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ3
    Correlation:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿunstructuredÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ3.0
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ3
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(4)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ4.28
    Scaleÿparameter:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.3696

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿparbudÿ|ÿOddsÿRatioÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿÿpcdeÿ|ÿÿÿ1.003929ÿÿÿ.0055236ÿÿÿÿÿ0.71ÿÿÿ0.476ÿÿÿÿÿÿ.993161ÿÿÿÿ1.014814
    ÿÿÿÿÿÿÿÿpopuÿ|ÿÿÿ1.172776ÿÿÿ.1009017ÿÿÿÿÿ1.85ÿÿÿ0.064ÿÿÿÿÿ.9907879ÿÿÿÿ1.388193
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿtimeÿ|
    ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿÿ1.01531ÿÿÿ.1481031ÿÿÿÿÿ0.10ÿÿÿ0.917ÿÿÿÿÿ.7628406ÿÿÿÿ1.351336
    ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ1.113147ÿÿÿ.1744949ÿÿÿÿÿ0.68ÿÿÿ0.494ÿÿÿÿÿ.8186902ÿÿÿÿÿ1.51351
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ1.06309ÿÿÿ.1511315ÿÿÿÿÿ0.43ÿÿÿ0.667ÿÿÿÿÿ.8045651ÿÿÿÿ1.404685
    ------------------------------------------------------------------------------

    .ÿ
    .ÿquietlyÿregressÿparbudÿc.pcdeÿc.popuÿi.timÿi.city

    .ÿquietlyÿestatÿvif

    .ÿforvaluesÿiÿ=ÿ1/2ÿ{
    ÿÿ2.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"VIFÿ`r(name_`i')'ÿ=ÿ"ÿ///
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿasÿresultÿ_column(10)ÿ%4.2fÿr(vif_`i')
    ÿÿ3.ÿ}
    VIFÿpcdeÿ=ÿ1.51
    VIFÿpopuÿ=ÿ1.55

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Dear Joseph,

      thank you very much for your answer!

      Probably it is not wise to have debt per capita and population size in one regression.
      However, I need the variable debt per capita, as the absolute debt level makes comparison between cities of different size impossible. Larger cities in my sample will have higher debt levels than smaller cities, but that does not mean, that they are more indebted. And this “indebtedness” is my main explanatory variable. If I exchanged it to the absolute debt level my whole argument would be a different one. If I exchange it to absolute debt, my odds ratio is even smaller than one, while the one for debt per capita is larger than one.

      But I will try to find a proxy for population size, maybe that will solve my problem.

      Thanks again!

      Comment


      • #4
        This is really down to a substantive question rather than a statistical one, and as a non-economist I'm out of my depth here. But based just on my naive intuition, might it not be helpful to have a ratio of debt to municipal product instead of per-capita debt here?

        Comment


        • #5

          Dear Joseph, I am still adjusting my model exchanging the population variable with other proxy. However, I think the variable debt might have to be included as a polynomial. I found this code of yours in a different post: summarize turn, meanonly generate double c_turn = turn - r(mean) logit foreign c.c_turn##c.c_turn, nolog estimates store Full quietly logit foreign c.c_turn, nolog lrtest Full and ran it with my data and variables, looking like that: summarize debt, meanonly generate double c_debt = debt - r(mean) logit pb c.c_debt##c.c_debt, nolog estimates store Full quietly logit pb c.c_debt, nolog lrtest Full The likelihood test turned out like that: lrtest Full Likelihood-ratio test LR chi2(1) = 157.92 (Assumption: . nested in Full) Prob > chi2 = 0.0000 That means that the moedel with the polynomial term is a better fit? Thanks a lot for your help! Janina

          Comment

          Working...
          X