Fixed effect regression questions

Nick Bertel

Join Date: Mar 2023

Posts: 27
#1

Fixed effect regression questions

18 May 2023, 15:38

Hello,

i'm kinda new to stata and empirical research so sorry if this question is very basic.

1: So i have observations for different companies for different years. I have a dependent variable and several independent variables. I would like to include their industry aswell as the year in which the observations was recorded as a fixed variable.
To my understanding, one does that by going

xtset industry year
xtreg DV IV1 IV2...., fe

However i have repeated time values within this data so that doesnt work with xtset. Do i just do that by combining the industry and year into one variable and then doing the regression or do i miss something substantial?

egen industry_year = group(industry year)
xtset industry_year
xtreg DV IV1 IV2...., fe

2: As a regression result for Rsquared the overall score is used when doing xtreg, right?

3: In my mind, simply inserting the different years and industries as a dummy variable in the regression should yield the same result as in 1.

So just doing:

regress DV IV1 IV2..... year1 year2 year3.....industry1 industry2.....

should be the same as:

xtset industry_year
xtreg DV IV1 IV2...., fe

But doing so, i get a slightly different result. Why is that, or is my appraoch in 1 flawed?

Thank you in advance for your answer and sorry, if these questions are basic
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17739

19 May 2023, 00:30

Nick:
1) the -fe- estimator wipes out time-invariant predictor;
2) the error that Stata threw can be easily worked around by -xtset-ting your dataset with -panelid- only. However, this fix comes at the cost of making time-series related operators, such as lags and leads, unavailable (you can still include among your predictors -i.timevar-, though);
3) not quite, You should take a look at within Rsq;
4) if I got you right, the output should be the same for the shared coefficients (for more helpful replies, as per FAQ please share what you typed and what Stata gave you back. Thanks), as you can see in the following toy-example:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age if idcode<=3, fe

Fixed-effects (within) regression               Number of obs     =         39
Group variable: idcode                          Number of groups  =          3

R-squared:                                      Obs per group:
     Within  = 0.6382                                         min =         12
     Between = 0.8744                                         avg =       13.0
     Overall = 0.2765                                         max =         15

                                                F(2,34)           =      29.99
corr(u_i, Xb) = -0.2473                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .2512762   .0450106     5.58   0.000     .1598037    .3427487
             |
 c.age#c.age |  -.0037603   .0007625    -4.93   0.000    -.0053098   -.0022107
             |
       _cons |  -2.189815   .6402959    -3.42   0.002    -3.491053   -.8885773
-------------+----------------------------------------------------------------
     sigma_u |  .31366066
     sigma_e |  .19867104
         rho |  .71367959   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2, 34) = 29.72                      Prob > F = 0.0000

. reg ln_wage c.age##c.age i.idcode if idcode<=3

      Source |       SS           df       MS      Number of obs   =        39
-------------+----------------------------------   F(4, 34)        =     24.28
       Model |  3.83375281         4  .958438203   Prob > F        =    0.0000
    Residual |  1.34198615        34  .039470181   R-squared       =    0.7407
-------------+----------------------------------   Adj R-squared   =    0.7102
       Total |  5.17573896        38  .136203657   Root MSE        =    .19867

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .2512762   .0450106     5.58   0.000     .1598037    .3427487
             |
 c.age#c.age |  -.0037603   .0007625    -4.93   0.000    -.0053098   -.0022107
             |
      idcode |
          2  |  -.4231615   .0816747    -5.18   0.000    -.5891444   -.2571786
          3  |  -.6126416   .0809386    -7.57   0.000    -.7771285   -.4481546
             |
       _cons |   -1.82398   .6366167    -2.87   0.007    -3.117741   -.5302195
------------------------------------------------------------------------------

.

5) using -fvvarlist- notation for categorical variables and interactions is highly recommended.

Last edited by Carlo Lazzaro; 19 May 2023, 00:37.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Bertel

Join Date: Mar 2023

Posts: 27
#3

19 May 2023, 16:29

Thanks for your answer Carlo.

1)2) Using the ID is a more elegant version but is essentially the same as my approach right?

I would just use:

egen ID=group(year industry), label
xtreg IV DV1 DV2...., fe

That should work, right?

3) So you are saying i should use the within R-squared?

thanks again!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#4

20 May 2023, 00:27

Nick:
1) 2) yes, but -xtreg- can give you an idea about the evidence of a panel-wise effect. In addition, with a large number of panels, -regress- takes hours to get what -xtreg- does in a handful of seconds;
3) yes, because the -fe- estimator focuses on the within-panel variation.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement