Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding the coefficients when the first category is dropped for multicollinearity but they are NOT dummy variables.

    I am running the following fixed effects regression

    xtreg recycling loginc logpopden age1120 age2130 age3140 age4150 age5160 age6170 age7180 age81plus md11 md12 md13 md14 md15 md16 md17 md18 md19 md20 md21 md22 md23 md24 md25 md26 md27 md28 md29 md291 wasteavg dryavg quarter2 quarter3 quarter4 year2 year3 year4 year5, fe vce(cluster acode)

    Where age1120...age81 plus are the percentages of people in each a local authority that fall into that age category (0-10 is dropped to avoid multicollinearity as all the percentages add up to 100%). I am trying to understand whether for example having more young people increases the recycling rate (which is also in percentages). How can I interpret the coefficients on age categories?

    Is it correct that a 1% percentage point increase in the percentage of people aged 11-20 in that local authority it associated with -0.296% percentage point decrease in the recycling rate compared with the percentage of people in the age 0-10 category? I am not sure how to interpret. Thanks!

    My results look like this:

    Fixed-effects (within) regression Number of obs = 5,862
    Group variable: acode Number of groups = 311

    R-sq: Obs per group:
    within = 0.3716 min = 4
    between = 0.1048 avg = 18.8
    overall = 0.1549 max = 20

    F(39,310) = 43.00
    corr(u_i, Xb) = -0.5252 Prob > F = 0.0000

    (Std. Err. adjusted for 311 clusters in acode)
    ------------------------------------------------------------------------------
    | Robust
    recycling | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    loginc | 12.03771 6.237323 1.93 0.055 -.2351345 24.31055
    logpopden | -1.332437 2.427655 -0.55 0.583 -6.109203 3.444329
    age1120 | -.2954794 .6908926 -0.43 0.669 -1.654911 1.063953
    age2130 | .1885658 .6400376 0.29 0.768 -1.070802 1.447933
    age3140 | .7329204 .9751467 0.75 0.453 -1.185823 2.651664
    age4150 | -.3526127 1.066931 -0.33 0.741 -2.451955 1.74673
    age5160 | 1.275646 .9134214 1.40 0.164 -.5216435 3.072936
    age6170 | .024184 .9108853 0.03 0.979 -1.768116 1.816484
    age7180 | 1.218289 .8260266 1.47 0.141 -.4070386 2.843617
    age81plus | -.3917339 1.448403 -0.27 0.787 -3.241678 2.45821
    md11 | .4255393 .4895818 0.87 0.385 -.5377843 1.388863
    md12 | -.5535315 .4137792 -1.34 0.182 -1.367703 .2606396
    md13 | -.9062759 .5197156 -1.74 0.082 -1.928892 .1163405
    md14 | -.1184019 .4622045 -0.26 0.798 -1.027857 .7910528
    md15 | .5687996 .58112 0.98 0.328 -.5746388 1.712238
    md16 | .0920765 .7555284 0.12 0.903 -1.394536 1.578689
    md17 | -.2591195 .3896524 -0.67 0.507 -1.025818 .5075785
    md18 | -.051231 .5992336 -0.09 0.932 -1.230311 1.127848
    md19 | -.0115919 .4511141 -0.03 0.980 -.8992246 .8760409
    md20 | -1.560113 .4897158 -3.19 0.002 -2.5237 -.5965258
    md21 | -1.002337 .5052073 -1.98 0.048 -1.996406 -.0082675
    md22 | -1.213785 .5130708 -2.37 0.019 -2.223327 -.2042435
    md23 | .1537965 .5411894 0.28 0.776 -.9110726 1.218666
    md24 | .392942 .4239478 0.93 0.355 -.4412372 1.227121
    md25 | .1252963 .871987 0.14 0.886 -1.590465 1.841058
    md26 | .5656828 .3854658 1.47 0.143 -.1927775 1.324143
    md27 | 1.367237 .367305 3.72 0.000 .6445112 2.089964
    md28 | .3488904 .3075395 1.13 0.257 -.2562384 .9540191
    md29 | -.0347408 .680321 -0.05 0.959 -1.373372 1.30389
    md291 | .3742537 .3683196 1.02 0.310 -.3504689 1.098976
    wasteavg | 1.467284 .5581088 2.63 0.009 .3691235 2.565444
    dryavg | -.6221646 .6502192 -0.96 0.339 -1.901566 .6572366
    quarter2 | -4.480375 .1217009 -36.81 0.000 -4.71984 -4.240911
    quarter3 | -4.181014 .1181899 -35.38 0.000 -4.41357 -3.948458
    quarter4 | -2.514117 .1012822 -24.82 0.000 -2.713405 -2.31483
    year2 | -.3664256 .2981572 -1.23 0.220 -.9530934 .2202422
    year3 | -1.513213 .5369766 -2.82 0.005 -2.569793 -.4566336
    year4 | -3.069163 .9697009 -3.17 0.002 -4.977191 -1.161135
    year5 | -4.18252 1.169431 -3.58 0.000 -6.483547 -1.881493
    _cons | -114.9097 80.65038 -1.42 0.155 -273.6011 43.78166
    -------------+----------------------------------------------------------------
    sigma_u | 4.7016969
    sigma_e | 2.5497095
    rho | .77274705 (fraction of variance due to u_i)
    ------------------------------------------------------------------------------

  • #2
    If I understand you correctly, the age* variables represent mutually exclusive and exhaustive categories of age, and they take values from 0 to 100, reflecting the percentage of all persons in the unit of analysis (which, based on your other posts, I take to be the authority) who fall in that age group. In that case it is always that case that the sum of these variables = 100. That is the source of the colinearity. Whenever you have a group of variables which always add up to the same amount, those variablesw and the constant term are necessarily colinear. In particular 1*age010 + 1*age1120 + 1*age2130 + ... + 1*age81plus - 100*1 = 0 is the linear combination of those variables and the constant with non-zero coefficients that is identically 0.

    Interpreting these is a bit tricky. Your proposed interpretation
    Is it correct that a 1% percentage point increase in the percentage of people aged 11-20 in that local authority it associated with -0.296% percentage point decrease in the recycling rate compared with the percentage of people in the age 0-10 category?
    is, strictly speaking correct, but not clearly worded. In particular, the only way that the value of age1120 can change while all the other non-omitted age* variables remain the same is to, in effect, "transfer" some population from the omitted age010 category to the age1120 category. And, were that done, the expected difference in recycling is, indeed, a 0.296 percentage point decrease in the recycling rate. But is that conclusion useful or meaningful? Population shifts of that nature simply don't happen in the real world. You have correctly answered that question, but I wonder if that is a question that you, or anybody else, want to ask.

    I think, for this reason, that in models like this one, where you represent the distribution of a variable like age in a population by a series of category percentages (or probabilities summing to 1), the coefficients of one individual variable (or any other representation of the marginal effect of a single variable) is usually not very important.

    What might be useful is to imagine hypothetical population age distributions that seem realistic--perhaps a demographic projection of what the age distribution will be in, say, 10 years, and then use the model to see what the impact on recycling might be. I'll leave the conjuring of such scenarios to you, as I have no practical experience in this area.

    Comment

    Working...
    X