Issue on yearly dummies (panel data)

Kodi Hannon

Join Date: Feb 2015

Posts: 81
#1

Issue on yearly dummies (panel data)

09 Apr 2015, 11:06

Hello everybody!

I would kindly want to ask you about an issue I have with dummy variables (in panel data).

Specifically, which is the difference between: " yr* " and " i.year " ?

I specify better. I have a panel data of N=500 and T=20 (from 1994 to 2013).

I want to analyze the impact of yearly dummies and income on consumption.

I hence start with a pooled gls regression:

“ reg consumption income yr* ” (which is the same of: “ reg consumption income yr1 yr2 yr3 … yr20 ”.

However, when I carry out:

“ reg consumption income i.year ”

I obtain different results; moreover, in the results table, in this case I obtain a variable “year” but starting from 2000 instead of 1994.

The same thing if I carry out a FE regression:

“ xtreg consumption income yr* ”
and
“ xtreg consumption income i.year ”

Also in this case, I always end up obtaining different results between the two.

What is the difference, hence, between yr* and i.year ? Why aren't they the same? In the sense that they should both refer (?) to the impact of (yearly) time over consumption.

So why do I obtain different results?

Could someone please kindly help me to understand this? I would be really grateful.

Thank you very much!

K
Tags: None
Sergio Correia

Join Date: Apr 2014

Posts: 420
#2

09 Apr 2015, 11:18

Kodi,

First, remember that -xtreg- without the -fe- option defaults to a random effects regression.

About the factors, maybe it's because i.year omits the base year? If you use ibn.year instead, you would get all years, although one of them will be dropped due to collinearity.

Best,
S
Comment
Kodi Hannon

Join Date: Feb 2015

Posts: 81
#3

09 Apr 2015, 12:46

Dear Sergio,

thank you for your reply.

Yes, I forgot to write the fe option, indeed I wanted to write "xtreg consumption income yr*,fe "

However, I did not get an answer to my question: ie.: which is the difference between yr* and i.year.

Moreover, even when using ibn.year, I only get one more year in the table, while the other 5 years are still missing..
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#4

10 Apr 2015, 07:39

Kodi,
The fact that you are not obtaining all the year dummies as you do in the other case might indicate that yr* that you have and the dummies created using i.year are not the same. HAve you explore your data to see how they correlate?
Perhaps something like:
tabstat yr*, by(year)
Will give you some idea.
HTH
F
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

10 Apr 2015, 08:50

Kodi: Sergio already answered your question (and Fernando as well), perhaps you just did not understand the responses. Conveniently, I have some panel data which I had generated previously for illustration which can help you understand the difference.

Code:

clear 
input y x1 x2 x3 x4 id year
78 19  45 44  15   1 1
23 19  17 47  72   1 2
10 19  32 62  65   1 3
34 19  11 21  20   1 4
77 19  42 23  100  1 5
91 55  12 13  14   2 1
62 55  27 37  47   2 2
33 55  13 14  15   2 3
16 55  58 68  78   2 4
99 55  80 90  70   2 5
20 51  18 62  82   3 1
38 39  39 11  63   3 2
40 87  46 93  90   3 3
56 03  64 80  28   3 4
73 200 88 103  36  3 5
115 70  85 18  85  4 1
49 51  67 22 76    4 2
57 28  49 26  96   4 3
74 32  31 41  77   4 4
110 16  12 60  80  4 5
24 112  26 20  26  5 1
111 123 81 82  37  5 2
64 45  59 39  49   5 3
39 72  31 29  92   5 4
79 80  16 77  107  5 5
37 47  19 89  12   6 1
23 61  38 45  22   6 2
32 83  82 83  66   6 3
120 115 91 116 108 6 4
7 150  54 93  72   6 5
92 28  30 41  90   7 1
100 28 40 96 102   7 2
108 28  50 29  59  7 3
116 28  60 42  76  7 4
128 28  70 80  94  7 5
39 7  55 103  106  8 1
51 50  27 98  62   8 2
73 61  19 81  74   8 3
94 86  112 99  53  8 4
103 99  67 102  10 8 5
89 80  105 54  69  9 1
62 90  97 108  62  9 2
13 100  102 92  39 9 3
100 110 81 66  85  9 4
115 120 92 50  67  9 5
40 37  75 19  14  10 1
92 65  87 5  34   10 2
56 72  92 15  40  10 3
119 128  23 21 63 10 4
84 80  82 67  29  10 5
end

We can now generate year dummies as you have

Code:

tab year, gen(year)

So we have year1 - year 5 in this case

Code:

 
   year |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         10       20.00       20.00
          2 |         10       20.00       40.00
          3 |         10       20.00       60.00
          4 |         10       20.00       80.00
          5 |         10       20.00      100.00
------------+-----------------------------------
      Total |         50      100.00
.

Now what's the difference between i.year and year*? As you recall, with dummy variables, you always have to drop one dummy so that you do not fall into the dummy variable trap, By using i.year, Stata runs the regression having omitted the first year

Code:

xtset id year
xtreg y x* i.year,fe

Output

Code:

. 

Fixed-effects (within) regression               Number of obs      =        50
Group variable: id                              Number of groups   =        10

R-sq:  within  = 0.2926                         Obs per group: min =         5
       between = 0.0136                                        avg =       5.0
       overall = 0.1888                                        max =         5

                                                F(8,32)            =      1.65
corr(u_i, Xb)  = -0.1009                        Prob > F           =    0.1485

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .1443385   .1560705     0.92   0.362    -.1735668    .4622437
          x2 |   .2559484   .1982288     1.29   0.206    -.1478303    .6597272
          x3 |  -.0618487    .214448    -0.29   0.775     -.498665    .3749677
          x4 |   .0251512   .1788148     0.14   0.889    -.3390827    .3893851
             |
        year |
          2  |   -3.37898   13.61836    -0.25   0.806    -31.11866     24.3607
          3  |  -16.59534   13.63284    -1.22   0.232    -44.36454    11.17386
          4  |   10.21783   14.08045     0.73   0.473    -18.46312    38.89877
          5  |   18.03578   15.46857     1.17   0.252    -13.47266    49.54422
             |
       _cons |   44.74023   17.97087     2.49   0.018     8.134777    81.34569
-------------+----------------------------------------------------------------
     sigma_u |  20.884301
     sigma_e |  30.034388
         rho |   .3259214   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(9, 32) =     1.83               Prob > F = 0.1017

If you proceed manually using year*, then you are telling Stata to use all year variables, i.e. year1, year2,..., yearN. However, Stata will automatically omit one year dummy to avoid the dummy variable trap. Which dummy is omitted may not be exactly the same as the one under i.year, and in this case, you may get different coefficients for the dummies (not your regressors). So make sure that the omitted variable using year* is exactly the same as the one under i.year if you want identical coefficients for the dummies.

Code:

rename year yr
xtreg x* year*,fe

Output

Code:

note: year1 omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =        50
Group variable: id                              Number of groups   =        10

R-sq:  within  = 0.2926                         Obs per group: min =         5
       between = 0.0136                                        avg =       5.0
       overall = 0.1888                                        max =         5

                                                F(8,32)            =      1.65
corr(u_i, Xb)  = -0.1009                        Prob > F           =    0.1485

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .1443385   .1560705     0.92   0.362    -.1735668    .4622437
          x2 |   .2559484   .1982288     1.29   0.206    -.1478303    .6597272
          x3 |  -.0618487    .214448    -0.29   0.775     -.498665    .3749677
          x4 |   .0251512   .1788148     0.14   0.889    -.3390827    .3893851
       year1 |  (omitted)
       year2 |   -3.37898   13.61836    -0.25   0.806    -31.11866     24.3607
       year3 |  -16.59534   13.63284    -1.22   0.232    -44.36454    11.17386
       year4 |   10.21783   14.08045     0.73   0.473    -18.46312    38.89877
       year5 |   18.03578   15.46857     1.17   0.252    -13.47266    49.54422
       _cons |   44.74023   17.97087     2.49   0.018     8.134777    81.34569
-------------+----------------------------------------------------------------
     sigma_u |  20.884301
     sigma_e |  30.034388
         rho |   .3259214   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(9, 32) =     1.83               Prob > F = 0.1017

In this case, year1 is automatically omitted (as under i.year), so estimates are identical. Let us omit year 2 instead

Code:

xtreg y x* year1 year3 year4 year5, fe

Code:

Fixed-effects (within) regression               Number of obs      =        50
Group variable: id                              Number of groups   =        10

R-sq:  within  = 0.2926                         Obs per group: min =         5
       between = 0.0136                                        avg =       5.0
       overall = 0.1888                                        max =         5

                                                F(8,32)            =      1.65
corr(u_i, Xb)  = -0.1009                        Prob > F           =    0.1485

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .1443385   .1560705     0.92   0.362    -.1735668    .4622437
          x2 |   .2559484   .1982288     1.29   0.206    -.1478303    .6597272
          x3 |  -.0618487    .214448    -0.29   0.775     -.498665    .3749677
          x4 |   .0251512   .1788148     0.14   0.889    -.3390827    .3893851
       year1 |    3.37898   13.61836     0.25   0.806     -24.3607    31.11866
       year3 |  -13.21636   13.45437    -0.98   0.333    -40.62202     14.1893
       year4 |   13.59681   13.61522     1.00   0.325    -14.13649    41.33011
       year5 |   21.41476   14.52492     1.47   0.150    -8.171545    51.00106
       _cons |   41.36125   19.82699     2.09   0.045     .9749953    81.74751
-------------+----------------------------------------------------------------
     sigma_u |  20.884301
     sigma_e |  30.034388
         rho |   .3259214   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(9, 32) =     1.83               Prob > F = 0.1017

Coefficients for the dummies and intercept are different, but coefficient estimates of the regressors still stay the same. Now the question is, why would you be interested in the intercept and dummy coefficients in the first place? They are not meaningful in fixed effects regressions.

Comment

Kodi Hannon

Join Date: Feb 2015

Posts: 81
#6

13 Apr 2015, 00:30

Thank you Andrew!

You are a king!
Comment
Kodi Hannon

Join Date: Feb 2015

Posts: 81
#7

13 Apr 2015, 00:31

Thank you also Fernando and Sergio!
Comment

Announcement