Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • omitted for collinearity : No explicit collinearity or relationship between the variables entered

    good morning everyone,
    I am running regressions on enterprise data (microdata of 15000 workers linked to 7 enterprises). At the time of entering some dichotomous variables regarding presence or absence of enterprise policies it omits them all in bulk (with note written: var... omitted because of collinearity) . The variables are not connected to each other and also it does not remove only the first one (as is usually the case with complementary dummies to have a basis).
    I have checked and the variables have variability (SD > 0 ) within the dataset, I cannot understand.
    I do not know if the problem is that the variables do not vary for all workers related to the same firm... I cannot understand. Somebody could kindly help me?

    Many thanks in advance for your time, wishing you all a great Tuesday ahead.


  • #2
    Chiara:
    if your situation is similar to the following toy-example, omission is unavoidable:
    Code:
    . use "https://www.stata-press.com/data/r17/nlswork.dta"
    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
    
    . xtreg ln_wage i.race, fe
    note: 2.race omitted because of collinearity.
    note: 3.race omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =     28,534
    Group variable: idcode                          Number of groups  =      4,711
    
    R-squared:                                      Obs per group:
         Within  = 0.0000                                         min =          1
         Between = 0.0050                                         avg =        6.1
         Overall =      .                                         max =         15
    
                                                    F(0,23823)        =       0.00
    corr(u_i, Xb) =      .                          Prob > F          =          .
    
    ------------------------------------------------------------------------------
         ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            race |
          Black  |          0  (omitted)
          Other  |          0  (omitted)
                 |
           _cons |   1.674907   .0018961   883.35   0.000     1.671191    1.678624
    -------------+----------------------------------------------------------------
         sigma_u |  .42456905
         sigma_e |  .32028665
             rho |  .63731204   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(4710, 23823) = 8.44                 Prob > F = 0.0000
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,
      many thank for your reply. Honestly, I feel in a different situation : the dummy variables are not complementary or overlapping. Moreover, the firm fixed effects that are not dichotomous but continuous are displayed . I really cannot understand...

      Comment


      • #4
        Chiara:
        1) you ran a -fe- panel data regression: as expected, time-invariant variables are crunched by the -fe- estimator, whereas time-varying ones survive -fe-'s hunger;
        2) you're experiencing an omission of categorical variables that are time-invariant for workers working in the same firm (say: the firm has a gym available for workers Yes/No).
        If 1) and 2) are correct and you -xtset- your dataset with workers as -panelid-, the omission makes sense.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Many thanks Carlo for your suggestions, and I completely got the point of the invariant variables over time in panel data.

          The fact is that I haven't a panel data. I simply have a cross-section: for firm FE I was intending firm-related variables that are therefore the same for all workers in the same firm. Surely I did not express myself correctly.

          What I notice is that as long as I enter a few dichotomies, they work. But sometimes all it takes is to enter just one more and it omits me 3 of the block and if I put them all in (my guess is that as I enter variables state creates certain groupings that it perceives as complementary, even though in fact they are not: this is perhaps possible because I have few holdings and so the variability of the dummies is less than that of the microdata)... I don't know It still a great mystery to me...

          Comment


          • #6
            Did you something like this?
            1. run the model that gives you the error (say y x2 x2 x3).
            2. correl y x1 x2 x3 if e(sample)

            Comment


            • #7
              i tried pwcorr y x2 x2 x3, what the difference with correl and what "if e(sample)" stands for? Now I tried also your suggestions, but still no evidence of collinearity....

              Comment


              • #8
                pwcorr and correl are similar, the former having more options.

                if e(sample) restricts the data to the estimation sample from the regression so you know you're dealing with the same data that gives you the error.

                try

                mdesc y x1 x2 x3

                to see if you have missing data causing the problem.

                If the variable is a linear combination of other variables in the model you'll get the same error and it may not show up in the correlations. Is that a possibility?

                regress the problem variable on the other X and see what happens.



                Comment


                • #9
                  many thanks for all your useful suggestions; I haven't missing data (already checked by looking at obs of summarize) . I will check for linear combinations. Many thanks for your time and suggestions!

                  Comment


                  • #10
                    Chiara:
                    could you please provide an excerpt/example of your dataset (changing the name of variables if confidential) so that interested listers can challenge themselves with data instead of relying on guess-work? Thanks.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Having never used -xtreg-, I was a bit puzzled by Carlo Lazzaro's example in #2. This UCLA page helped me. It also prompted me to try the following:

                      Code:
                      clear
                      use "https://www.stata-press.com/data/r17/nlswork.dta"
                      xtreg ln_wage i.race, fe
                      xtreg ln_wage i.race, be
                      xtreg ln_wage i.race, re
                      xtreg ln_wage i.race
                      mixed ln_wage i.race || idcode:
                      estat icc
                      I'm sure there is nothing new here for -xtreg- veterans, but maybe other -xtreg- newbies will find it helpful.
                      --
                      Bruce Weaver
                      Email: [email protected]
                      Version: Stata/MP 18.5 (Windows)

                      Comment

                      Working...
                      X