Problems with panel regression model (specifically xtoverid, predicor collinearity)

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#16

19 Dec 2017, 11:40

Maria:
- in large N, small T panel dataset heteroskedasticity might be an issue, whereas autocorrelation seldom bites;
- you can visually inspect the distribution of the idiosyncratic error term and see whether an heteroskedastic pattern comes alive;
- there's nothing wrong in making categorical variables by hand: it's simply inefficient;
- as per your code, you can use -label- (after a minor tweaking)then:

Code:

replace agecat=999 if age==. label define agecat 21 "<=21" 38 "21-38" 64 "39-64" 75 ">64" 999 "Missing" label val agecat agecat

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#17

19 Dec 2017, 11:43

Hey Carlo,
I did generate the multilevel categorial variables.
This one is for company size:

Code:

g categorysize=recode( SIZE_AVERAGE_REVENUE,10000000000,50000000000,100000000000)

however, since the panel data has the same number (ave revenue) for each year (in my case 10 years/FIRM), obviously the frequency account for the three groups is very high. can i use the variable like that in my regression? for example, if i want to see if size has moderating effect between fine and R&D expense, could I make an interaction term with this new variable and the PRE_POST_FINE dummy that compares the two periods pre and post the fine

and regarding including i.year, including it or not makes the difference of my major explanatiry variablee to be significant (excluding i.year) or insignificant...i guess it has to be in then..?

best regards

EDIT: i know size and sic code are omitted when using the fe model, but i thought using them as a moderatir variable would not omit them?

Last edited by Maria Kohnen; 19 Dec 2017, 11:50.
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#18

19 Dec 2017, 13:22

thank you Carlo,

about my other question: is it possible to include moderator variables of origin, size or industry in an FE model? to see if there is an effect? because it seems they will ommited as well.
my problem is, that fe will exlcude everything. in the end, i am left with a kindergarten regression like

Code:

xtreg RD POST_FINE_DUMMY

I mean, I thought when I created the multilevel categorial variables i could include them in the regression, to see if industry or size have an effect..
also, what i mentioned earlier, when include year fixed effects ( i.year) the POST_FINE_DUMMY becomes insignificant (since ou asked me if i should include it?), so I guess it has to be in there?

if i do a RE model, I should be able to include them, right? but would that make any sense?

best regards
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#19

20 Dec 2017, 00:11

Maria:
- regression predictors do not know whether they're plugged in as controls, moderators or else: they are all subjected to the saem maths rules. Under -fe-, if they are time-invariant, they will be cancelled out, as the difference between a constant and its mean is zero. Hence, having, as you say, an elementary regression (by the way: I find the phrasing

...kindergarten regression...

pretty funny!) is actually possible;
- preferring one regression model vs another on the grounds of the statistical significance of its predictors is hardly the way to go, as ststistivcal inference should give a fair and true view of the data generating process undelying your sample;
- I would take a look at what others in your research field did in the past when presented with the same research topic .

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#20

20 Dec 2017, 05:04

thank you Carlo.

So from your perspective and the information I provided, would you opt for FE or RE
a) including the information of my Sargan-Hansen test (xtoverid cunction, which suggested RE)
b) from a purely theoretical standpoint.

It would be very important to me to hear an opinion

(and others too)

thank you, and best
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#21

20 Dec 2017, 06:16

Maria:
that is the real issue.
i would go -xtreg, re-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#22

20 Dec 2017, 07:27

thank you carlo,

but two things that just don´t want to sit right with me:
a) random effects assumes that the COV (predictor ; alpha term) = 0, which is likely not the case with the data i have.
b) alpha term follows a random distributionm which likely is not true for my data either.

i fear that i my model gets useless using RE. the question is, why did the Sargan test point to RE?
perfectl normal panel data. 2249 observations of 193 groups over 10 years..

I am really having a hard time with this...

best regards
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#23

20 Dec 2017, 10:08

Maria:

Perhaps you may want to discuss with your supervisor https://blog.stata.com/2015/10/29/fi...dlak-approach/

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#24

20 Dec 2017, 10:21

thank you Carlo, I have read the approach. My supervisor told me it is up to me. I Just have to justifiy it.
Maybe I am doing something wrong when putting up my regression to be checked for FE or RE model?

Code:

xi: xtreg RD POST_FINE_DUMMY FINAL_FINE, re

Code:

xtoverid

sicne RD is heavily skewed i used logRD. the same holds for the fine.
the post_fine_dummy is either 0, for the five years prior to the fine for the company, and 1 5 years post the fine for the company. each company received the fine at adiffeent point in time, so the time periods for the 193 firms are mainly different. any problem here?

each firm was part of a cartel group. so sometimes there were 5 firms in a cartel, sometimes more. shall I use clustering here? if yes, cluster for the cartel case nu,ber? or for single firms?

thank you,
best
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#25

20 Dec 2017, 10:36

Maria:
different issues here:
- if you log some of your variables, -xtoverid- should be run considering logging; that is, you should not run -xtoverid- on the variables in their original metrics and then log;
- logging brings about different interpretations of the relationship between coefficients and dependent variable, which you should be aware of;
- I do not see problem for -post_fine_dummy-;
- if firms are nested within cartel (by the way: is cartel the -panelid-?), you may want to consider -mixed- (which is similar to -xtreg, mle-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#26

20 Dec 2017, 10:47

hey carlo,thank you.
-cartel is not the panel id, the firm itself is. so for example toshiba has Firm ID 1, Samsung ID2....the cartel case # are merely included in the dataset, but my supervisor vaguely mentioned i might want to cluster cartel cases (howeverm there are not my panelid, so it cannot even be done, right?)

-yes, i did run xtoverid with unlogged when using unlogged variables in the xtreg, and i also did it with logs, but then again before and after using xtoveri, so should be fine

-since the DV, R&D is heavily scewed, I assume I should transorm it with sqr or log? and one f the IVs, final fine is also heavily skewed, shouldn´t I transform that one as well then?

- how is it possible, that the Sargan Hansen test suggests RE, if it is such a straightforward rgression, where alpha is obviously highly likely to be correlated to the explanatory variables? Or do I not see something right? for example, taking the main preditor, PRE_POST_FINE dummy. I regress RD on it, so it compares the two means of the two periods against each other, right? I mean, in alpha there certainly must be unobserved heterogenity in relation to that, i guess? r&d intensity f the industry, the country, capital market strucutres in the county.....I mean, theres endless possibility. I wonder how someone can ever use a RE model with the assumptions for 0 COV between alpha and the predictors?

EDIT: i tried to perform the mundlak test you refered me to. in my case, however, i dont have any time varying predicotrs, right? i mean, i only have the POST_FINE_DUMMY, which is a dummy, final_fine, which is time invariant and potentially origin and industry, which are alos both time invariant. so i cannot really do it, right? or does the POST_FINE_DUMMY count as a time varying variable?

best regards

Last edited by Maria Kohnen; 20 Dec 2017, 11:17.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#27

20 Dec 2017, 11:57

Maria:
it seems you're getting overstressed by the same topics:
- -cluster()- shoud be made on -panelid-;
- if the variables to be transformed are right-skewed, logging will fix it; conversely, logging will worsen the skewness of a left-skewed variable (see -help ladder- for more info);
- the assumption of no correlation between the vector of regressors and both ui and ei under -xtreg, re- is not always tenable in the real world (but, on the other side, this is well known);
- if you have so many time invariant variables and you want to estimate their coefficients, -re- is the way to go (or a slightly better approach than -fe-).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#28

20 Dec 2017, 12:20

Dear Carlo, thank you.
You are absolutely right, I am way too fixated ont he fe re thing. I just dont want the foundation if my regression to be wrong. However, I will go with RE, as you alreadysuggested, and as the tests suggest as well.
Just one thing about the Mundlak test paper you send me.
I followed the instructions and did this:

Code:

bysort id: egen mean_POST_FINE_DUMMY = mean(POST_FINE_DUMMY)

Code:

quietly xtreg y RDlog FINA_FINElog mean_POST_FINE_DUMMY, vce(robust)

Code:

estimates store mundlak

Code:

test mean_POST_FINE_DUMMY

and received:

Code:

( 1) mean_POST_FINE_DUMMY = 0 chi2( 1) = 0.02 Prob > chi2 = 0.9000

which also points to RE model, right? my question is, the POST_TIME_DUMMY is the only time variant predictor. can you do the Mundlak test with just one variable like i did?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#29

20 Dec 2017, 22:54

Maria:
yes, you can.
However, the main meaning of Mundlak here is that you can also choose to go -xtreg, re- without the Mundlak's correction.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maria Kohnen

Join Date: Dec 2017

Posts: 45
#30

20 Dec 2017, 23:53

dear Carlo, thank you.
now it gets interesting:

when I test

Code:

xi: xtreg RDlog POST_FINE_DUMMY, re

Code:

xtoverid

re is suggested

when I test

Code:

xi: xtreg RDlog POST_FINE_DUMMY FINAL_FINElog SIZElog, re

Code:

xtoverid

i get fe suggested

so if i only log my DV, re is suggested
if I also log my IV (since they were skewed to the rght heavily) fe is suggested

if i just use

Code:

xi: xtreg RD POST_FINE_DUMMY, re

Code:

xtoverid

re is suggested

however, if i use

Code:

xi:xtreg RD POST_FINE_DUMMY FINAL_FINE SIZE, re

Code:

xtoverid

, fe is suggested

what to do?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment