Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collinearity dependent on the way the fixed effects are added

    Hi,

    I have 12 months of data for a particular year. I have a variable "d" that takes the value of 0 if the the observation belongs to the first two quarters of the year and takes the value of 1 if the observation belongs to the next two quarters of the year. When I add month fixed effects to the data, some fixed effects are omitted due to the problem of collinearity. On further checking on the issue, I came to the conclusion that the collinearity problem stems from the variable d. I got to know this since when I added the month fixed effects before "d", it was d that got omitted. I guess the collinearity problem arose from the fact that the variable "d" does not change in months. However, I noticed something interesting when I added quarter fixed effects instead of the month fixed effects. I found that the collinearity problem had disappeared. I am wondering how it happened . The variable "d" does not vary in quarters. Any insights on this would be helpful.

  • #2
    On further checking on the issue, I came to the conclusion that the collinearity problem stems from the variable d. I got to know this since when I added the month fixed effects before "d", it was d that got omitted. I guess the collinearity problem arose from the fact that the variable "d" does not change in months.
    Precisely so.

    However, I noticed something interesting when I added quarter fixed effects instead of the month fixed effects. I found that the collinearity problem had disappeared. I am wondering how it happened . The variable "d" does not vary in quarters. Any insights on this would be helpful.
    My insight is that I don't think this actually happened. Either the quarterly fixed effects and variable "d" were not correctly specified, or there was indeed colinearity and you just didn't notice it. I'd be delighted if you can prove me wrong by posting example data, commands, and output that reproduce what you are saying here. And if you do, I will be happy to try to figure out why.

    Comment


    • #3
      Himani:
      the following Stata thread can be useful: https://www.statalist.org/forums/for...for-panel-data.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Precisely so.


        My insight is that I don't think this actually happened. Either the quarterly fixed effects and variable "d" were not correctly specified, or there was indeed colinearity and you just didn't notice it. I'd be delighted if you can prove me wrong by posting example data, commands, and output that reproduce what you are saying here. And if you do, I will be happy to try to figure out why.
        Hi Clyde,

        Thank you for your response!

        Following is my data example, code, and the output I got. The variable "d" takes the value of 0 for months before September and takes the values of 1 for months after September. Please let me know if I need to provide more information on any of this. Thank you.

        My data example:
        Example generated by -dataex-. For more info, type help dataex
        clear
        input float(p d month quarter)
        .5925926 0 1 1
        .56666666 0 2 1
        .45333335 0 3 1
        .3966942 0 4 2
        .3835616 0 5 2
        .630137 0 6 2
        .6153846 0 8 3
        .6617647 1 9 3
        .7647059 1 11 4
        . 0 1 1
        . 0 2 1
        .3625378 0 1 1
        .3243243 0 3 1
        .3846154 0 4 2
        .3764706 0 5 2
        .3680982 0 6 2
        .3322034 0 7 3
        .3818182 0 8 3
        .3050847 1 9 3
        .3917526 1 11 4
        .1953125 1 12 4
        .2955665 0 2 1
        .3193277 0 3 1
        .3534884 0 4 2
        .3239437 0 5 2
        .6005509 0 8 3
        .2384106 1 10 4
        . 0 1 1
        . 0 2 1
        . 0 3 1
        . 0 4 2
        . 0 5 2
        . 0 6 2
        . 0 7 3
        . 0 8 3
        . 1 9 3
        . 1 10 4
        . 1 11 4
        . 1 12 4
        .25619835 0 1 1
        .302521 0 2 1
        .3550725 0 1 1
        .3206107 0 2 1
        .3111111 0 4 2
        .3586207 0 5 2
        .3909774 0 6 2
        .3768116 0 7 3
        .36764705 0 8 3
        .3333333 1 9 3
        .3740458 1 10 4
        .44615385 1 11 4
        .5483871 1 12 4
        .5421687 0 1 1
        .36206895 0 2 1
        .4038461 0 4 2
        .6582279 1 11 4
        . 0 1 1
        . 0 2 1
        . 0 3 1
        . 0 4 2
        .5185185 0 6 2
        .3653846 0 7 3
        0 1 12 4
        .3498024 0 1 1
        .344294 0 5 2
        .3589744 0 6 2
        .3711538 0 8 3
        .25 1 10 4
        .3779904 0 1 1
        .3125 0 3 1
        .3576159 0 3 1
        .4 0 2 1
        . 0 1 1
        . 0 2 1
        . 0 3 1
        . 0 4 2
        . 0 5 2
        . 0 6 2
        . 0 7 3
        . 0 8 3
        . 1 9 3
        . 1 10 4
        . 1 11 4
        . 1 12 4
        . 0 1 1
        . 0 2 1
        . 0 3 1
        . 0 4 2
        . 0 5 2
        . 0 6 2
        . 0 7 3
        . 0 8 3
        . 1 9 3
        . 1 10 4
        . 1 11 4
        . 1 12 4
        . 0 1 1
        . 0 2 1
        . 0 3 1
        . 0 4 2
        end
        label values month month
        label def month 1 "Jan", modify
        label def month 2 "Feb", modify
        label def month 3 "Mar", modify
        label def month 4 "Apr", modify
        label def month 5 "May", modify
        label def month 6 "Jun", modify
        label def month 7 "Jul", modify
        label def month 8 "Aug", modify
        label def month 9 "Sep", modify
        label def month 10 "Oct", modify
        label def month 11 "Nov", modify
        label def month 12 "Dec", modify
        [/CODE]


        *******
        My code:
        ** Adding month fixed effects:

        reg p d i.month
        note: 12.month omitted because of collinearity.

        Source | SS df MS Number of obs = 5,611
        -------------+---------------------------------- F(11, 5599) = 178.49
        Model | 30.2539752 11 2.75036138 Prob > F = 0.0000
        Residual | 86.2757387 5,599 .015409134 R-squared = 0.2596
        -------------+---------------------------------- Adj R-squared = 0.2582
        Total | 116.529714 5,610 .020771785 Root MSE = .12413

        ------------------------------------------------------------------------------
        p | Coefficient Std. err. t P>|t| [95% conf. interval]
        -------------+----------------------------------------------------------------
        d | .0892099 .0078568 11.35 0.000 .0738075 .1046124
        |
        month |
        Feb | -.043105 .0077826 -5.54 0.000 -.058362 -.027848
        Mar | -.0631239 .0077385 -8.16 0.000 -.0782943 -.0479535
        Apr | -.0532589 .0078521 -6.78 0.000 -.068652 -.0378657
        May | .0104058 .0079306 1.31 0.190 -.0051413 .025953
        Jun | .0130406 .0079256 1.65 0.100 -.0024966 .0285778
        Jul | -.0195614 .0080197 -2.44 0.015 -.0352831 -.0038397
        Aug | .0281213 .0078054 3.60 0.000 .0128198 .0434229
        Sep | .0528366 .0083627 6.32 0.000 .0364425 .0692307
        Oct | -.1565617 .0081775 -19.15 0.000 -.1725927 -.1405307
        Nov | .0640802 .0079089 8.10 0.000 .0485756 .0795848
        Dec | 0 (omitted)
        |
        _cons | .3582016 .0052644 68.04 0.000 .3478812 .3685219
        ------------------------------------------------------------------------------

        ************************************************** ************************************************** *********************
        ************************************************** ************************************************** *********************

        ** when I add quarter fixed effects
        reg p d i.quarter

        Source | SS df MS Number of obs = 5,611
        -------------+---------------------------------- F(4, 5606) = 202.59
        Model | 14.7174521 4 3.67936303 Prob > F = 0.0000
        Residual | 101.812262 5,606 .018161302 R-squared = 0.1263
        -------------+---------------------------------- Adj R-squared = 0.1257
        Total | 116.529714 5,610 .020771785 Root MSE = .13476

        ------------------------------------------------------------------------------
        p | Coefficient Std. err. t P>|t| [95% conf. interval]
        -------------+----------------------------------------------------------------
        d | .1366082 .0079281 17.23 0.000 .121066 .1521504
        |
        quarter |
        2 | .0231046 .0050715 4.56 0.000 .0131625 .0330467
        3 | .038984 .0057093 6.83 0.000 .0277914 .0501765
        4 | -.0404081 .0093463 -4.32 0.000 -.0587304 -.0220858
        |
        _cons | .3246559 .003475 93.43 0.000 .3178437 .3314682
        ------------------------------------------------------------------------------
        Last edited by Himani Srihan; 18 Feb 2023, 14:54.

        Comment


        • #5
          It is as I suspected. Your variable quarter codes September as being third quarter. Consequently the variable d is not constant within your version of quarter 3: it is 0 in July and August but 1 in September.
          Last edited by Clyde Schechter; 18 Feb 2023, 15:05.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            It is as I suspected. Your variable quarter codes September as being third quarter. Consequently the variable d is not constant within your version of quarter 3: it is 0 in July and August but 1 in September.
            Thank you for your response, Clyde. However, d is still not varying in quarter 1, quarter 2, and quarter 4. Does that imply that as long as there exists variation in one of the quarters , collinearity would not be a problem? And collinearity is a problem when I add month fixed effects because there is no variation in any of the individual months? If for instance, d did not change within all of the first 11 months, but had variation for December, collinearity would not be a problem?

            Thank you for your time.

            Comment


            • #7
              Thank you for your response, Clyde. However, d is still not varying in quarter 1, quarter 2, and quarter 4. Does that imply that as long as there exists variation in one of the quarters , collinearity would not be a problem? And collinearity is a problem when I add month fixed effects because there is no variation in any of the individual months? If for instance, d did not change within all of the first 11 months, but had variation for December, collinearity would not be a problem?
              That is correct.

              Now, if d did not change within the first 11 months but had variation only for December, you would have very near colinearity. That is, if you ran -regress d i.month- in that kind of data, the R2, though not exactly 1, would be rather close to 1. In that case, if estimating the effect of d is important, you would need a very, very large sample size in order to get the confidence interval around the coefficient of d to be narrow enough to make the results useful. But that is a different problem. Exact colinearity would not exist, the model would be identifiable, and the estimate of the effect of d would be unbiased (even though it would likely have very low precision).

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                That is correct.

                Now, if d did not change within the first 11 months but had variation only for December, you would have very near colinearity. That is, if you ran -regress d i.month- in that kind of data, the R2, though not exactly 1, would be rather close to 1. In that case, if estimating the effect of d is important, you would need a very, very large sample size in order to get the confidence interval around the coefficient of d to be narrow enough to make the results useful. But that is a different problem. Exact colinearity would not exist, the model would be identifiable, and the estimate of the effect of d would be unbiased (even though it would likely have very low precision).
                Thank you for your clear explanation. It makes sense! Since adding quarter fixed effects leads to an R square equal to 0.1263, it looks like the sample size has enough predictive power.
                Last edited by Himani Srihan; 18 Feb 2023, 18:34.

                Comment

                Working...
                X