Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collinearity when partitioning the whole sample into sub-group

    Hi guys!
    my basic model contains age term (age-40) to quatic of age term and corresponding interactions with my main variable, reflecting the coefficient when individual's 40 years old. It does work and is within expectations. However, when I partition the whole sample into 7 minor cohorts groups according the observations' ages, running the same code, indicating the age term is deleted automatically because of collinearity, and the coefficient of my main varibale becomes abnormal. If dropping the interaction terms, there will be some degree of unknow bias. Is there any solution to fix it?
    Thanks a lot!

  • #2
    Kelsi:
    provided that categorizing contnuous predictors in, in general, something to avoid (see the outstanding http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf), without example/excerpt of your data (that you can easily provide with -dataex-) it's really difficult (for me, at least) to reply helpfully.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Kelsi:
      provided that categorizing contnuous predictors in, in general, something to avoid (see the outstanding http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf), without example/excerpt of your data (that you can easily provide with -dataex-) it's really difficult (for me, at least) to reply helpfully.
      Hi, Carlo:
      Thanks for answering!
      sorry it seems to be not so clear, my first time using dataex. Let me explain briefly: dependent variable l1, main independent variable l2, a-a4 is age term(age-40), square of (age-40)..., while i-i4 is interaction of age term and l2, d1-d7 is binary variable, denoting birth cohorts, for example, if d1=1, it means the individual belongs to birth cohort 1( born between 1951-55); if d2=1, meaning beween 1956-60...
      reg l1 l2 a-a4 i-i4 everything is ok
      but if
      reg l1 l2 a-a4 i-i4 if d1==1
      than colinearity, a is omitted and coefficient of l2 is abnormal.

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(l1 l2 a a2 a3 a4 i i2 i3 i4) byte(d1 d2 d3 d4 d5 d6 d7)
              . 10.153058  -9  81  -729  6561  -91.37753  822.3977  -7401.579  66614.21 0 0 0 0 0 1 0
              . 10.195456  -4  16   -64   256  -40.78182  163.1273  -652.5092 2610.0366 0 0 0 0 1 0 0
              . 10.198337   8  64   512  4096   81.58669  652.6935   5221.548  41772.39 0 0 1 0 0 0 0
       9.903476 10.215785  14 196  2744 38416    143.021  2002.294  28032.113  392449.6 0 1 0 0 0 0 0
              . 10.215785  16 256  4096 65536  163.45256  2615.241   41843.86  669501.7 1 0 0 0 0 0 0
              . 10.195456 -10 100 -1000 10000 -101.95456 1019.5455 -10195.455 101954.55 0 0 0 0 0 1 0
      10.819766 10.215785  17 289  4913 83521  173.66835  2952.362   50190.15  853232.6 1 0 0 0 0 0 0
       10.12662 10.215785   9  81   729  6561   91.94206  827.4786   7447.307 67025.766 0 0 1 0 0 0 0
        10.4143 10.215785  -4  16   -64   256  -40.86314 163.45256  -653.8102  2615.241 0 0 0 0 1 0 0
      10.637444 10.153058  -4  16   -64   256  -40.61223 162.44893  -649.7957  2599.183 0 0 0 0 1 0 0
      end
      ------------------ copy up to and including the previous line ------------------

      Comment


      • #4
        Kelsi:
        some comments about your query:
        1) you have an apparent missing values issue with your dependent variable and Stata atomatically omits observations with missing values in any variables;
        2) you've created catgorical variables and interactions by hand; use- fvvarlist- notation, instead;
        3) please share via CODE delimiters (see tha FAQ) what you typed and what Stata gave you back when you run your regression with and without condition, as your data excerpt contains too few data to replicate your problem.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Kelsi:
          some comments about your query:
          1) you have an apparent missing values issue with your dependent variable and Stata atomatically omits observations with missing values in any variables;
          2) you've created catgorical variables and interactions by hand; use- fvvarlist- notation, instead;
          3) please share via CODE delimiters (see tha FAQ) what you typed and what Stata gave you back when you run your regression with and without condition, as your data excerpt contains too few data to replicate your problem.
          but I want to run seperately to see each coefficient for every group. If use command of fvvvar(i.(varlist) right?), there will be just one equation. So I still don't quite understand.

          Comment


          • #6
            Kelsi:
            I guess you're experiencing somethin like is reported in the following toy-example:
            Code:
            sysuse auto
            
            . regress price mpg i.rep78
            
                  Source |       SS           df       MS      Number of obs   =        69
            -------------+----------------------------------   F(5, 63)        =      4.39
                   Model |   149020603         5  29804120.7   Prob > F        =    0.0017
                Residual |   427776355        63  6790100.88   R-squared       =    0.2584
            -------------+----------------------------------   Adj R-squared   =    0.1995
                   Total |   576796959        68  8482308.22   Root MSE        =    2605.8
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -280.2615   61.57666    -4.55   0.000    -403.3126   -157.2103
                         |
                   rep78 |
                      2  |   877.6347   2063.285     0.43   0.672     -3245.51     5000.78
                      3  |   1425.657   1905.438     0.75   0.457    -2382.057    5233.371
                      4  |   1693.841   1942.669     0.87   0.387    -2188.274    5575.956
                      5  |   3131.982   2041.049     1.53   0.130    -946.7282    7210.693
                         |
                   _cons |   10449.99   2251.041     4.64   0.000     5951.646    14948.34
            ------------------------------------------------------------------------------
            
            
            . bysort rep78: regress price mpg i.rep78
            
            ---------------------------------------------------------------------------------------------------------------------
            -> rep78 = 1
            note: 1.rep78 omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =         2
            -------------+----------------------------------   F(1, 0)         =         .
                   Model |    273060.5         1    273060.5   Prob > F        =         .
                Residual |           0         0           .   R-squared       =    1.0000
            -------------+----------------------------------   Adj R-squared   =         .
                   Total |    273060.5         1    273060.5   Root MSE        =         0
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -123.1667          .        .       .            .           .
                 1.rep78 |          0  (omitted)
                   _cons |       7151          .        .       .            .           .
            ------------------------------------------------------------------------------
            
            ---------------------------------------------------------------------------------------------------------------------
            -> rep78 = 2
            note: 2.rep78 omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =         8
            -------------+----------------------------------   F(1, 6)         =      4.69
                   Model |    39346087         1    39346087   Prob > F        =    0.0735
                Residual |  50336476.9         6  8389412.82   R-squared       =    0.4387
            -------------+----------------------------------   Adj R-squared   =    0.3452
                   Total |  89682563.9         7  12811794.8   Root MSE        =    2896.4
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |   -630.823    291.288    -2.17   0.074    -1343.579    81.93295
                 2.rep78 |          0  (omitted)
                   _cons |   18032.12   5664.222     3.18   0.019     4172.264    31891.97
            ------------------------------------------------------------------------------
            
            ---------------------------------------------------------------------------------------------------------------------
            -> rep78 = 3
            note: 3.rep78 omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =        30
            -------------+----------------------------------   F(1, 28)        =     14.65
                   Model |   123788982         1   123788982   Prob > F        =    0.0007
                Residual |   236582733        28  8449383.33   R-squared       =    0.3435
            -------------+----------------------------------   Adj R-squared   =    0.3201
                   Total |   360371715        29  12426610.9   Root MSE        =    2906.8
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -498.8875   130.3389    -3.83   0.001    -765.8747   -231.9003
                 3.rep78 |          0  (omitted)
                   _cons |   16124.28    2587.92     6.23   0.000     10823.17    21425.39
            ------------------------------------------------------------------------------
            
            ---------------------------------------------------------------------------------------------------------------------
            -> rep78 = 4
            note: 4.rep78 omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =        18
            -------------+----------------------------------   F(1, 16)        =      0.90
                   Model |  2643203.86         1  2643203.86   Prob > F        =    0.3572
                Residual |  47043724.6        16  2940232.79   R-squared       =    0.0532
            -------------+----------------------------------   Adj R-squared   =   -0.0060
                   Total |  49686928.5        17   2922760.5   Root MSE        =    1714.7
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -79.90338   84.27343    -0.95   0.357    -258.5551    98.74832
                 4.rep78 |          0  (omitted)
                   _cons |    7802.74   1870.119     4.17   0.001     3838.264    11767.22
            ------------------------------------------------------------------------------
            
            ---------------------------------------------------------------------------------------------------------------------
            -> rep78 = 5
            note: 5.rep78 omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =        11
            -------------+----------------------------------   F(1, 9)         =      7.88
                   Model |  31950588.4         1  31950588.4   Prob > F        =    0.0204
                Residual |  36471559.6         9  4052395.52   R-squared       =    0.4670
            -------------+----------------------------------   Adj R-squared   =    0.4077
                   Total |    68422148        10   6842214.8   Root MSE        =    2013.1
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -204.6947   72.89925    -2.81   0.020    -369.6042   -39.78513
                 5.rep78 |          0  (omitted)
                   _cons |   11514.19   2085.085     5.52   0.000       6797.4    16230.98
            ------------------------------------------------------------------------------
            
            ---------------------------------------------------------------------------------------------------------------------
            -> rep78 = .
            no observations
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X