Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems Diagnosing Collinearity

    Hello. We have data, which from visual inspection, has the outcome variable (summarized over a group of counties) over time varying between levels of a dichotomous variable (PTELL), where another dichotomous variable (Growthno) = 0. But does not vary over time between levels of the same variable (PTELL) where Growthno=1. Graphs are below to demonstrate the difference.

    In order to test whether there is a significant effect of PTELL (1 and 0) over time on Outcome, and if it interacts with Growthno (1 and 0), ran fixed effects model using panel data:


    . * Set Outcome variable
    . global Outcome RatioofSPDSchooltaxestoall
    . * Set data as panel data
    . sort CountyNo YearNum
    . xtset CountyNo YearNum
    Panel variable: CountyNo (strongly balanced)
    Time variable: YearNum, 1988 to 2020
    Delta: 1 unit
    . xtdescribe

    CountyNo: 1, 2, ..., 102 n = 81
    YearNum: 1988, 1989, ..., 2020 T = 33
    Delta(YearNum) = 1 unit
    Span(YearNum) = 33 periods
    (CountyNo*YearNum uniquely identifies each observation)

    Distribution of T_i: min 5% 25% 50% 75% 95% max
    33 33 33 33 33 33 33

    Freq. Percent Cum. | Pattern
    ---------------------------+-----------------------------------
    81 100.00 100.00 | 111111111111111111111111111111111
    ---------------------------+-----------------------------------
    81 100.00 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    . xtsum CountyNo YearNum PTELL

    Variable | Mean Std. dev. Min Max | Observations
    -----------------+--------------------------------------------+----------------
    CountyNo overall | 52.19753 29.12861 1 102 | N = 2673
    between | 29.30462 1 102 | n = 81
    within | 0 52.19753 52.19753 | T = 33
    | |
    YearNum overall | 2004 9.523686 1988 2020 | N = 2673
    between | 0 2004 2004 | n = 81
    within | 9.523686 1988 2020 | T = 33
    | |
    PTELL overall | .3333333 .4714927 0 1 | N = 2673
    between | .4743416 0 1 | n = 81
    within | 0 .3333333 .3333333 | T = 33


    .xtreg $Outcome PTELL Growthno, fe

    note: PTELL omitted because of collinearity.
    note: Growthno omitted because of collinearity.

    Fixed-effects (within) regression Number of obs = 2,673
    Group variable: CountyNo Number of groups = 81

    R-squared: Obs per group:
    Within = . min = 33
    Between = . avg = 33.0
    Overall = . max = 33

    F(0,2592) = 0.00
    corr(u_i, Xb) = . Prob > F = .

    ------------------------------------------------------------------------------
    RatioofSPD~l | Coefficient Std. err. t P>|t| [95% conf. interval]
    -------------+----------------------------------------------------------------
    PTELL | 0 (omitted)
    Growthno | 0 (omitted)
    _cons | .6630764 .0005525 1200.21 0.000 .6619931 .6641598
    -------------+----------------------------------------------------------------
    sigma_u | .05000966
    sigma_e | .02856317
    rho | .75402503 (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(80, 2592) = 101.16 Prob > F = 0.0000

    Both Growthno and PTELL are omitted because of collinearity. However, all tests for association between the variables show no association:

    . correlate $Outcome CountyNo PTELL Growthno
    (obs=2,673)

    | Ratioo~l CountyNo PTELL Growthno
    -------------+------------------------------------
    RatioofSPD~l | 1.0000
    CountyNo | 0.1827 1.0000
    PTELL | 0.1888 0.1310 1.0000
    Growthno | -0.0190 -0.0608 0.0000 1.0000


    .
    . mean $Outcome, over(PTELL)

    Mean estimation Number of obs = 2,673

    ------------------------------------------------------------------------------------
    | Mean Std. err. [95% conf. interval]
    -----------------------------------+------------------------------------------------
    c.RatioofSPDSchooltaxestoall@PTELL |
    0 | .6554512 .0014099 .6526867 .6582157
    1 | .6783269 .0016265 .6751375 .6815164
    ------------------------------------------------------------------------------------

    .
    . mean $Outcome, over(Growthno)

    Mean estimation Number of obs = 2,673

    ---------------------------------------------------------------------------------------
    | Mean Std. err. [95% conf. interval]
    --------------------------------------+------------------------------------------------
    c.RatioofSPDSchooltaxestoall@Growthno |
    0 | .6639776 .0012925 .6614433 .6665119
    1 | .6617657 .0019544 .6579335 .6655979
    ---------------------------------------------------------------------------------------

    .
    . tab PTELL Growthno

    | Growthno
    PTELL | 0 1 | Total
    -----------+----------------------+----------
    0 | 1,056 726 | 1,782
    1 | 528 363 | 891
    -----------+----------------------+----------
    Total | 1,584 1,089 | 2,673

    Also, the VIFs are 1.

    . quietly regress $Outcome PTELL Growthno

    .
    . vif

    Variable | VIF 1/VIF
    -------------+----------------------
    Growthno | 1.00 1.000000
    PTELL | 1.00 1.000000
    -------------+----------------------
    Mean VIF | 1.00


    I cannot figure out why there is collinearity. Or perhaps I am setting up the model wrong? Any insight would be helpful.


    Click image for larger version

Name:	Growthno1.JPG
Views:	2
Size:	86.4 KB
ID:	1680995


    Click image for larger version

Name:	Growthno0.JPG
Views:	1
Size:	83.4 KB
ID:	1680993
    Attached Files

  • #2
    The variation of the outcome variable will have no impact on the colinearity issue: that arises exclusively from relationships among the independent variables (including the County fixed effect). Unfortunately, the various analyses of these variables that you show do not shed much light on that as they do not look at what is going on within counties over time.

    Turning to general principles, if PTELL or GrowthNo is time-invariant within County, then it will be colinear with the county fixed effect and get omitted. In fact, since you refer in your graph to "PTELL counties" and "non-PTELL counties," it sounds like this is precisely what is happening, at least for PTELL. Nothing you say about GrowthNo provides any information along these lines either way.

    If that is not what is happening, then there may be some more complicated relationship going on that you could discern with:
    Code:
    regress PTELL i.GrowthNo i.CountyNo
    regress GrowthNo i.PTELL i.CountyNo
    I expect you will see R2 = 1, meaning perfect linear dependence of the DV on the IVs in these regressions, and you will even see the coefficients that define the linear relationship(s).

    Comment


    • #3
      Ahh. You are correct on all counts. PTELL and Growthno are time invariant within counties. And the r2 from the regressions is 1:


      . regress PTELL i.Growthno i.CountyNo
      note: 96.CountyNo omitted because of collinearity.

      Source | SS df MS Number of obs = 2,673
      -------------+---------------------------------- F(80, 2592) = .
      Model | 594 80 7.425 Prob > F = .
      Residual | 0 2,592 0 R-squared = 1.0000
      -------------+---------------------------------- Adj R-squared = 1.0000
      Total | 594 2,672 .222305389 Root MSE = 0

      ...

      . regress Growthno i.PTELL i.CountyNo
      note: 100.CountyNo omitted because of collinearity.

      Source | SS df MS Number of obs = 2,673
      -------------+---------------------------------- F(80, 2592) = .
      Model | 645.333333 80 8.06666667 Prob > F = .
      Residual | 0 2,592 0 R-squared = 1.0000
      -------------+---------------------------------- Adj R-squared = 1.0000
      Total | 645.333333 2,672 .241516966 Root MSE = 0

      Mixed models and fixed effects models were new when I left graduate school some time ago. I suppose I will go back to the drawing board. I believed the appropriate model for repeated-measures variable, comparing subjects (counties) over time, in two groups, with another dichotomous variable, involved the use of panel data, with a fixed effects model with to fixed effects. I will read more. Thank you.

      Comment


      • #4
        When your goal is to estimate the effects of variables that do vary over time within panels (counties), then, yes a fixed effects model is often the preferred approach. But that approach is inherently incapable of estimating effects of variables that are constant over time within panels. For that you should look at other alternatives: occasionally pooled OLS will be appropriate if there is very little between-panel variation in the outcomes. More often, a random-effects model (xtreg, re), or a hybrid model (ssc install hybrid) will work out. A between-panels regression model (-xtreg, be-) is another possibility.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          More often, a random-effects model (xtreg, re), or a hybrid model (ssc install hybrid) will work out. A between-panels regression model (-xtreg, be-) is another possibility.
          Thank you very much. I realized after some introspection, that since I had coded my two predictor variables to dichotomous, and since the datapoints were all equally-spaced and balanced, that I could run a traditional repeated-measures ANOVA.

          But I think the better analysis here is to revert to the time-fluctuating values of each predictor, and run a mixed model, because one predictor would be a fixed effect (i.e., whether a statute had been passed in the county), and the other would be a random effect (i.e. total EAV property value in the county). Question: is it appropriate in a mixed multi-level model to code an interaction term and run that as a predictor, as one would do in a traditional OLS regression? I see very little about this.

          Comment


          • #6
            I do not understand what kind of mixed model you are proposing, so I cannot comment on it. In particular, total EAV property value in the county does not sound like a variable that would typically be treated as a random effect.

            But in general terms, you can use interactions in mixed models just as you would use them in any other regression model.

            Comment

            Working...
            X