Problems with panel regression model (specifically xtoverid, predicor collinearity)

Maria Kohnen

Join Date: Dec 2017

Posts: 45
#1

Problems with panel regression model (specifically xtoverid, predicor collinearity)

17 Dec 2017, 06:58

Dear STATA forum,

I am a first time stata user and am running into a few problems. I already searched this forum for a few hours but could not find answers to my problems (which I assume are quite easy to solve for someone having experience with the software)

1) I am running a panel data analysis on the effects of a fine on R&D expenditure. I have 193 firm observations, and each firm got fined at a different point of time. I always have R&D expenditure 5 years before the fine, and 5 years after the fine.
Firstly, I created a dummy variable, POST_FINE, to differentiate between 0=time before fine, 1=time post fine. I then regress this dummy to R&D expenditure to see if there is a sig diffrence in the two periods
2) I assume I need a fixed effect model to account for the firm differences (which are obviosuly present). I considered running the Hausman test, but this test is not possible when including robust standard errors. So I used the command xtoverid. however, when I use it, the answer is xtoverid is not allowed:

xtreg RD POST_FINE_DUMMY, re xtoverid
option xtoverid not allowed
r(198);

when is just regress xtreg RD POST_FINE_DUMMY, re and then just do the command xtoverid it gives me a result...is that the same?

The Hausman test without robust SE suggested to use a RE model, even though FE is clerly the favourited model here...

2) I also want to see if the level of the fine has an effect, so I made 3 dummy variables, small,medium and large fine (calculated as FINE/average revenue of the time period)
whenever I regress these against R&D expense they get ommited. The same occurs when I just regress FINE against R&D...for both I used fe...when I just use xtreg RD FINE it works...
Is it because fe already accounts for the differences between the firms and given that the fine is always the same in my data for every year for the same firm, there is nothing left compare?
How would I do that in a regression, if I want to see if the level of fine has an effect on R&D?

Thank you in advance,
Best
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

17 Dec 2017, 07:40

Maria:
welcome to this forum.
- the user-written command -xtoverid- (which should be run after -xtreg-as it is not an option of -xtreg- itself) does not support factor variables notation: hence, you should use the -xi.- prefix, as you can see from the following toy-example:

Code:

. use "http://www.stata-press.com/data/r14/nlswork.dta", clear
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage i.race, re

Random-effects GLS regression                   Number of obs     =     28,534
Group variable: idcode                          Number of groups  =      4,711

R-sq:                                           Obs per group:
     within  = 0.0000                                         min =          1
     between = 0.0198                                         avg =        6.1
     overall = 0.0186                                         max =         15

                                                Wald chi2(2)      =      99.02
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      black  |  -.1300382    .013486    -9.64   0.000    -.1564702   -.1036062
      other  |   .1011474   .0562889     1.80   0.072    -.0091768    .2114716
             |
       _cons |   1.691756   .0071865   235.41   0.000     1.677671    1.705841
-------------+----------------------------------------------------------------
     sigma_u |  .38195681
     sigma_e |  .32028665
         rho |  .58714668   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid
1b:  operator invalid
r(198);

. xi: xtreg ln_wage i.race, re
i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)

Random-effects GLS regression                   Number of obs     =     28,534
Group variable: idcode                          Number of groups  =      4,711

R-sq:                                           Obs per group:
     within  = 0.0000                                         min =          1
     between = 0.0198                                         avg =        6.1
     overall = 0.0186                                         max =         15

                                                Wald chi2(2)      =      99.02
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Irace_2 |  -.1300382    .013486    -9.64   0.000    -.1564702   -.1036062
    _Irace_3 |   .1011474   .0562889     1.80   0.072    -.0091768    .2114716
       _cons |   1.691756   .0071865   235.41   0.000     1.677671    1.705841
-------------+----------------------------------------------------------------
     sigma_u |  .38195681
     sigma_e |  .32028665
         rho |  .58714668   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  
Sargan-Hansen statistic  99.022  Chi-sq(2)    P-value = 0.0000

.

- why creating categorical variables yourself when -fvarlist- can do it for you?
That said, the omission due to collinearity with -fe- is the most likely reason of what is going on with your last regression model.

Kind regards,
Carlo
(Stata 19.0)

Comment

Maria Kohnen

Join Date: Dec 2017

Posts: 45
#3

17 Dec 2017, 08:11

Dear Carlo,
thank you for welcoming to the forum and your reply.
As I am completely new to this, I am having a bit of a hard time with your answer?reading the output.

Applied to to my example, would this be correct?

xi: xtreg RD POST_FINE_DUMMY, re

Random-effects GLS regression Number of obs = 2,279
Group variable: ID Number of groups = 193

R-sq: Obs per group:
within = 0.0136 min = 4
between = 0.0000 avg = 11.8
overall = 0.0004 max = 20

Wald chi2(1) = 28.72
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
POST_FINE_DUMMY | 1.92e+08 3.58e+07 5.36 0.000 1.22e+08 2.62e+08
_cons | 9.13e+08 2.46e+08 3.71 0.000 4.30e+08 1.40e+09
----------------+----------------------------------------------------------------
sigma_u | 3.402e+09
sigma_e | 7.618e+08
rho | .95224482 (fraction of variance due to u_i)
---------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re
Sargan-Hansen statistic 0.072 Chi-sq(1) P-value = 0.7890

.
If yes, this would mean I need to use a RE model? my supervisor made it quite clear that fe is the model go..

and secondly, about your fvarlist recommendatin, I did not know about this option so far. Regardless, is it also okay to use the dummy i created?
g POST_FINE_DUMMY = year > DATE_FINE_FULLYEAR , so 0 if the RD expense was before the fine and 1 if after

Obviously there are different fines for different firms. Additionally, I want to test if the level of the fine also has an impact. How would I do that? As I mentioned I created the three dummies, but they always get omitted (since I use fe, and the fines are obviously always the same in each year for the respective firm)

I also want to control for some additional factors such as industry. Here I wanted to include the SIC_CODE, but again, a fixed effect model can only omit the variable since it is constant for every year for the respective firms. How would someone go about that?

Thank you very much,
best
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

17 Dec 2017, 09:47

Maria:
-xtoverid- output tells that -re- model is the way to go.
Perhaps you can consider with your supervisor https://blog.stata.com/2015/10/29/fi...dlak-approach/

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#5

17 Dec 2017, 10:13

Dear Carlo,

thnak you very much. I highly appreciate it. Interesting, the Mundlak approach. Seems that there is a lot to learn about and with STATA. Exciting.
Wihtout wanting to push the envelope too much, but do you mind giving me your opinion on the other aspects I am wondering about?

best regards
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

17 Dec 2017, 11:07

Maria:
I did not commented on other aspects in my previous reply because you already did the right diagnosis: they're related to -fe- specification.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#7

17 Dec 2017, 11:22

Dear Carlos,

okay, thank you again for the kind responses.

best
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

17 Dec 2017, 13:18

Maria:
my pleasure.
Sorry for my previous copy-and-paste mistake: I meant <I did not comment...>

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#9

19 Dec 2017, 04:48

Dear Carlo,

If you do not mind I would like to ask you another question.

As mentioned, I am looking at the relationship between fines and R&D spending. Specifically, I am looking at cartel members that were fined by the European Commission, and if the fine (and possible level of fine) had any effect on subsequent investments in R&D.
As mentioned, I created a dummy variable with 1= 5 years post the fine and 0=5 years pre-fine to look at any significant differences in the two periods regarding R&D spending.
I ran the xtoverid as and chose a RE model.
Now, I also want to check if the different industry, size of the company and geographic origin play a role. I already mentioned that a FE model would account for all of these factors. What baout an RE model? Is every difference already accounted for? I mean, the model does not fix the differences between the companies, right?

How would a potential regression look like?
So far, my regression is: RD=POST_FINE_DUMMY i.year, re which is a little basic....do you have any suggestions to include any of the other factors I wanna look for? also, I wanted to use i.year to check for influences such as potential financial crisis etc...or should I include an extra dummy for that?

Additionally, I wanted to check if the level of the fine makes a difference. For that I created three dummy variables, namely fine_small, fine_medium and fine_large, however in the FE model they always get omitted. Any way/sense to incklude them in the re model?

Best regards, Marian
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

19 Dec 2017, 05:02

Maria.
as an add-on to my previous reply, please note:
- the mean diffeence between -fe- and -re- specification is that the first one focuses on variation along time within the same panel, whereas the other one do the same but across panels;
- controlling for

...different industry, size of the company and geographic origin...

is possible with categorical variables as predictors. They are easy to create with -fvvarlist- notation if your data are in -long- format, which is the almost mandatory lay-out for dealing with panel data efficienly;
- the omission you complain about is probably due to collinearity with fixed effect. If this is the case, there's nothing you can do, but change your regression model specification.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#11

19 Dec 2017, 05:25

Dear Carlo, thank you for your response.Yes, my data is in long format. When I use the RE model (which the Sargan-Hansen test/xtoverid function suggested), none of the variables (SIC_CODE...) gets omitted (of course, when changes in time are not looked at like in the fe model).
So in my understanding, FE models controls for all time invariant differences between the individuals (which would be differences in companies, right?), I mean, controlling for time invariant things (FE) means, that you take out the effects of the IV on the DV that are due to diferences that will not change with time, right? Isn T that difFerences in companieS, industries and geographic origin?while in the RE model the variation across entities is assumed to be random and uncorrelated with the predictor, which wouldnt be the case i guess?
so shouldnt I use the FE model regardless what xtoverid suggests to me?
when i do my regression as a re model, this comes out:

xi: xtreg RD POST_FINE_DUMMY, re

Random-effects GLS regression Number of obs = 2,279
Group variable: ID Number of groups = 193

R-sq: Obs per group:
within = 0.0136 min = 4
between = 0.0000 avg = 11.8
overall = 0.0004 max = 20

Wald chi2(1) = 28.72
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
POST_FINE_DUMMY | 1.92e+08 3.58e+07 5.36 0.000 1.22e+08 2.62e+08
_cons | 9.13e+08 2.46e+08 3.71 0.000 4.30e+08 1.40e+09
----------------+----------------------------------------------------------------
sigma_u | 3.402e+09
sigma_e | 7.618e+08
rho | .95224482 (fraction of variance due to u_i)
---------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re
Sargan-Hansen statistic 0.072 Chi-sq(1) P-value = 0.7890

. xtreg RD POST_FINE_DUMMY i.year, re

Random-effects GLS regression Number of obs = 2,279
Group variable: ID Number of groups = 193

R-sq: Obs per group:
within = 0.0426 min = 4
between = 0.0010 avg = 11.8
overall = 0.0013 max = 20

Wald chi2(20) = 90.66
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
POST_FINE_DUMMY | -8.00e+07 5.44e+07 -1.47 0.142 -1.87e+08 2.67e+07
|
year |
1997 | 4.79e+07 3.42e+08 0.14 0.889 -6.23e+08 7.19e+08
1998 | 1.25e+08 3.03e+08 0.41 0.681 -4.70e+08 7.20e+08
1999 | 1.20e+08 2.81e+08 0.43 0.671 -4.32e+08 6.71e+08
2000 | 6.37e+07 2.67e+08 0.24 0.812 -4.61e+08 5.88e+08
2001 | 1.01e+08 2.67e+08 0.38 0.705 -4.22e+08 6.24e+08
2002 | 1.29e+08 2.65e+08 0.49 0.626 -3.90e+08 6.48e+08
2003 | 1.18e+08 2.60e+08 0.45 0.650 -3.91e+08 6.27e+08
2004 | 1.28e+08 2.60e+08 0.49 0.622 -3.81e+08 6.37e+08
2005 | 1.42e+08 2.59e+08 0.55 0.582 -3.65e+08 6.50e+08
2006 | 2.47e+08 2.58e+08 0.96 0.339 -2.59e+08 7.52e+08
2007 | 2.21e+08 2.58e+08 0.86 0.392 -2.85e+08 7.26e+08
2008 | 2.86e+08 2.60e+08 1.10 0.271 -2.23e+08 7.96e+08
2009 | 3.00e+08 2.61e+08 1.15 0.249 -2.11e+08 8.11e+08
2010 | 3.72e+08 2.61e+08 1.42 0.154 -1.40e+08 8.84e+08
2011 | 4.33e+08 2.63e+08 1.65 0.099 -8.17e+07 9.48e+08
2012 | 5.50e+08 2.64e+08 2.09 0.037 3.31e+07 1.07e+09
2013 | 5.26e+08 2.65e+08 1.99 0.047 7189119 1.04e+09
2014 | 5.65e+08 2.65e+08 2.13 0.033 4.58e+07 1.08e+09
2015 | 7.20e+08 2.67e+08 2.70 0.007 1.97e+08 1.24e+09
|
_cons | 7.01e+08 3.41e+08 2.05 0.040 3.20e+07 1.37e+09
----------------+----------------------------------------------------------------
sigma_u | 3.234e+09
sigma_e | 7.540e+08
rho | .94844527 (fraction of variance due to u_i)
---------------------------------------------------------------------------------

. xtreg RD FINE_MEDIUM FINE_LARGE, re

Random-effects GLS regression Number of obs = 2,279
Group variable: ID Number of groups = 193

R-sq: Obs per group:
within = 0.0000 min = 4
between = 0.0276 avg = 11.8
overall = 0.0218 max = 20

Wald chi2(2) = 5.39
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0674

------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
FINE_MEDIUM | -9.40e+08 5.33e+08 -1.76 0.078 -1.98e+09 1.04e+08
FINE_LARGE | -1.46e+09 7.53e+08 -1.94 0.052 -2.94e+09 1.44e+07
_cons | 1.53e+09 3.34e+08 4.58 0.000 8.75e+08 2.18e+09
-------------+----------------------------------------------------------------
sigma_u | 3.363e+09
sigma_e | 7.669e+08
rho | .95057513 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg RD SIC_CODE, re

Random-effects GLS regression Number of obs = 2,279
Group variable: ID Number of groups = 193

R-sq: Obs per group:
within = 0.0000 min = 4
between = 0.0018 avg = 11.8
overall = 0.0026 max = 20

Wald chi2(1) = 0.34
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.5591

------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SIC_CODE | 194901.6 333667.3 0.58 0.559 -459074.3 848877.6
_cons | 3.70e+08 1.14e+09 0.33 0.745 -1.86e+09 2.60e+09
-------------+----------------------------------------------------------------
sigma_u | 3.399e+09
sigma_e | 7.669e+08
rho | .95155638 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg RD SIZE1995, re

Random-effects GLS regression Number of obs = 1,500
Group variable: ID Number of groups = 128

R-sq: Obs per group:
within = 0.0000 min = 4
between = 0.9384 avg = 11.7
overall = 0.9118 max = 20

Wald chi2(1) = 1954.99
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SIZE1995 | 2645.852 59.84019 44.22 0.000 2528.567 2763.136
_cons | 6.84e+08 9.14e+07 7.49 0.000 5.05e+08 8.63e+08
-------------+----------------------------------------------------------------
sigma_u | 9.848e+08
sigma_e | 8.862e+08
rho | .55254789 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg RD FINAL_FINE, re

Random-effects GLS regression Number of obs = 2,279
Group variable: ID Number of groups = 193

R-sq: Obs per group:
within = 0.0000 min = 4
between = 0.0004 avg = 11.8
overall = 0.0015 max = 20

Wald chi2(1) = 0.08
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.7759

------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
FINAL_FINE | .6432327 2.259961 0.28 0.776 -3.786209 5.072675
_cons | 9.79e+08 2.81e+08 3.48 0.000 4.29e+08 1.53e+09
-------------+----------------------------------------------------------------
sigma_u | 3.401e+09
sigma_e | 7.669e+08
rho | .95161925 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg RD POST_FINE_DUMMY* SIZE1995; RE
; invalid name
r(198);

. xtreg RD POST_FINE_DUMMY* SIZE1995, re

Random-effects GLS regression Number of obs = 1,500
Group variable: ID Number of groups = 128

R-sq: Obs per group:
within = 0.0106 min = 4
between = 0.9377 avg = 11.7
overall = 0.9118 max = 20

Wald chi2(2) = 1979.48
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
RD | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
POST_FINE_DUMMY | 1.83e+08 5.05e+07 3.62 0.000 8.39e+07 2.82e+08
SIZE1995 | 2645.386 59.66703 44.34 0.000 2528.44 2762.331
_cons | 5.84e+08 9.52e+07 6.13 0.000 3.97e+08 7.70e+08
----------------+----------------------------------------------------------------
sigma_u | 9.812e+08
sigma_e | 8.818e+08
rho | .553193 (fraction of variance due to u_i)
---------------------------------------------------------------------------------

could that be correct?
best regards
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#12

19 Dec 2017, 05:46

Maria:
- -fe- specification controls for time invariant observed and unobserved heterogeneity (e.g. firm location if no delocalization occurs as time goes by);
- I do not follow you concerns about -re- specification;
- you perform too many regression models: just focus on the one that includes those predictors which are consistent with the theory in your research field;
- I do no follow the way you create firm size dummies: you should create (via -fvvarlist-) a three-level categorical variable (low, medium, large size), instead;
- as a closing-out comment, please post what you typed and what Stata gave you back via CODE delimiters (see the FAQ on how to do it). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Maria Kohnen

Join Date: Dec 2017
Posts: 45

#13

19 Dec 2017, 06:36

thank your carlos.
i need to get back to the drawing board. in my understanding it does not make to much sense to use a RE model, even though hausman (without robust SE) and Sargan-Hansen suggested so.
when I do use an fe model, everything gets ommited and i basically end up with the simple regression: xtreg RD POST_FINE_DUMMY i.year, fe which apears to be the most basic regression one could ever make.
if i use a RE model, nothing gets ommited, but as I mentioned, theoretically, from my understanding re does not make any sense.

the theoretical framework for my study is: DV= R&D; IV= POST_FINE_dummy (comparing five years prior and post to the fine). as moderators I wanted to include the level of the fine, firm size, and industry. i also wanted to include year fixed effects. i have 2498 observations belonging to 193 firms that have been fined for cartel activity.

I gathered all the relevant information and put and have my data in long format

what i did:

Code:

 xtreg RD POST_FINE i.year,fe, est store fe

Code:

 xtreg RD POST_FINE i.year, re est store re

Code:

 hausman fe, re

(cant be done with robust SE)m so i did xi: xtreg RD POST_FINE, re and then xtoverid, RE suggested

then I ran the regression

Code:

 xtreg RD POST_FINE i.year, re

now I want to include the other variables

Code:

 xtreg RD POST_FINE FINAL_FINE SIC_CODE i.year, re vce (robust)

Random-effects GLS regression                   Number of obs     =      2,279
Group variable: ID                              Number of groups  =        193

R-sq:                                           Obs per group:
     within  = 0.0426                                         min =          4
     between = 0.0002                                         avg =       11.8
     overall = 0.0037                                         max =         20

                                                Wald chi2(22)     =      70.80
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                      (Std. Err. adjusted for 193 clusters in ID)
---------------------------------------------------------------------------------
                |               Robust
             RD |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
POST_FINE_DUMMY |  -7.93e+07   6.20e+07    -1.28   0.201    -2.01e+08    4.23e+07
     FINAL_FINE |   .5017941   1.124928     0.45   0.656    -1.703023    2.706612
       SIC_CODE |   157799.9   138785.2     1.14   0.256      -114214    429813.9
                |
           year |
          1997  |   4.78e+07   5.91e+07     0.81   0.418    -6.80e+07    1.64e+08
          1998  |   1.25e+08   6.66e+07     1.87   0.061     -5717276    2.55e+08
          1999  |   1.19e+08   1.00e+08     1.19   0.233    -7.70e+07    3.16e+08
          2000  |   6.35e+07   1.33e+08     0.48   0.633    -1.97e+08    3.24e+08
          2001  |   1.01e+08   1.31e+08     0.77   0.440    -1.55e+08    3.57e+08
          2002  |   1.28e+08   1.25e+08     1.03   0.304    -1.17e+08    3.74e+08
          2003  |   1.17e+08   1.37e+08     0.86   0.392    -1.51e+08    3.86e+08
          2004  |   1.27e+08   1.43e+08     0.89   0.374    -1.53e+08    4.08e+08
          2005  |   1.41e+08   1.57e+08     0.90   0.369    -1.67e+08    4.50e+08
          2006  |   2.46e+08   1.46e+08     1.69   0.092    -4.00e+07    5.32e+08
          2007  |   2.20e+08   1.58e+08     1.39   0.163    -8.93e+07    5.29e+08
          2008  |   2.85e+08   1.73e+08     1.65   0.099    -5.37e+07    6.24e+08
          2009  |   2.99e+08   1.77e+08     1.69   0.091    -4.81e+07    6.46e+08
          2010  |   3.71e+08   1.83e+08     2.02   0.043     1.18e+07    7.30e+08
          2011  |   4.32e+08   1.89e+08     2.29   0.022     6.23e+07    8.02e+08
          2012  |   5.49e+08   2.00e+08     2.75   0.006     1.57e+08    9.40e+08
          2013  |   5.24e+08   2.24e+08     2.34   0.019     8.57e+07    9.63e+08
          2014  |   5.64e+08   2.20e+08     2.56   0.010     1.33e+08    9.94e+08
          2015  |   7.19e+08   2.57e+08     2.80   0.005     2.15e+08    1.22e+09
                |
          _cons |   1.46e+08   4.61e+08     0.32   0.751    -7.57e+08    1.05e+09
----------------+----------------------------------------------------------------
        sigma_u |  3.249e+09
        sigma_e |  7.540e+08
            rho |  .94889183   (fraction of variance due to u_i)
---------------------------------------------------------------------------------

this is what i ve got so far

edit: I accidentally replied in another thread. I still hope to receive some advise here, since I feel I'm close

Last edited by Maria Kohnen; 19 Dec 2017, 07:27.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#14

19 Dec 2017, 08:32

Maria:
as per your last regression results, you should test whether -i-year- makes any sense via:

Code:

testparm(i.year)

If you're still doubtful about the "right" specification, take a look (and act on) what Others did in your research fielsd when presented with the same research topic.
As an aside, I'm not clear with the reason that supports clustering your SEs.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#15

19 Dec 2017, 10:37

thank you. in all honesty, i do not know. i thought that would guard against heteroskedasticity. should i not? how can i know?
i am still trying to figure out how to use fvvarlist..been looking for it quite some time now..since i want to create categories from a continous variabe. i thought fvvarlist transforms indicatr variables into cat variables..
i mean, is there something wrong with making three dummys (size small, medium, large) and including two as one serves as the base line?
it seems your way is more elegant..any hint?

EDIT (code NOT USABLE WHEN EDITING::?) i tried is this way (internet example):
generate byte agecat=21 if age<=21
(176 missing values generated)
. replace agecat=38 if age>21 & age<=38
(148 real changes made)
. replace agecat=64 if age>38 & age<=64
(24 real changes made)
. replace agecat=75 if age>64 & age<.
(4 real changes made)

got it...nice tool

Last edited by Maria Kohnen; 19 Dec 2017, 10:50.
Comment

Announcement