Panel Data Dummies

Stan Breetman

Join Date: Jun 2017

Posts: 31
#1

Panel Data Dummies

16 Jul 2017, 09:28

Hi everybody

Since I am a beginner with Stata, i would be happy if you could help me.

I have a dataset where I already have a dummy variables with the values 0 or 1.

My question is: Do I have to define the dummy somehow in Stata or you dont have to because Stata will know it?

Thank you very much.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

16 Jul 2017, 09:44

Stan:
-help fvvarlist- covers that issue comprehensively.

Kind regards,
Carlo
(Stata 19.0)
Comment
Stan Breetman

Join Date: Jun 2017

Posts: 31
#3

16 Jul 2017, 10:48

Thank you very much Carlo.

I read it and think that the right way is the following. I tipped in the following code and the result was this:

xtreg win_pct gini_12 srs tot_ln_salary_all_disc av_age superstars_dummy coa_cha playoff_previous_season i.year_id, fe robust

So my questions are:

1. Is this code right, when I check for team and year fixed effects? I have already declared for panel with

. xtset team_id year_id
panel variable: team_id (unbalanced)
time variable: year_id, 2001 to 2014
delta: 1 unit

2. And is it correct that i dont have to define a dummy variable, because Stata recognizes it because they already have the values 0 or 1? I just want the Year effects what I get here.. And thats somehow the reaseon why I dont understand why I have to set the i before year_id. Just to show me the year effects?
The Dummys here would be superstars_dummy, coa_cha and playoff_previous_season.

I would be happy if I get a confirmation soon.
Stan

Last edited by Stan Breetman; 16 Jul 2017, 11:22.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

16 Jul 2017, 11:57

Stan:
q1) you code is correct to investigate team and year fixed effects.
You can check wheter years are jointly statistically significant via:

Code:

testparm i.(year)

q2) yes, you're correct. However, I would recommend to get yourself familiar with -fvvarlist- notation for categorical variables, so that when you deal with a, say, three-level factor variable you do not have to remind yourself to prefix it with the -i.- notation;
q2bis) -i.year- allows you to calculate the coefficient for each year. You can test yourself how your regression outcome would change if you plugged in simply -year- among your predictors.

Kind regards,
Carlo
(Stata 19.0)
Comment
Stan Breetman

Join Date: Jun 2017

Posts: 31
#5

16 Jul 2017, 13:07

Thank you very much Carlo.

I have done that, so as my results shows the years are not significant. What else could I do? Should I not do the years fixed effects? What are your recommendation/s?

testparm i.(year)

( 1) 2002.year_id = 0
( 2) 2003.year_id = 0
( 3) 2004.year_id = 0
( 4) 2005.year_id = 0
( 5) 2006.year_id = 0
( 6) 2007.year_id = 0
( 7) 2008.year_id = 0
( 8) 2009.year_id = 0
( 9) 2010.year_id = 0
(10) 2011.year_id = 0
(11) 2012.year_id = 0
(12) 2013.year_id = 0
(13) 2014.year_id = 0

F( 13, 29) = 0.38
Prob > F = 0.9664
Comment
Habib Hinn

Join Date: Jul 2017

Posts: 1
#6

16 Jul 2017, 14:39

You don't have to define a new dummy variable unless you are not satisfied with how the year dummy is defined in the original data set.

What else you can do depends on the purpose of your regression.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

16 Jul 2017, 22:50

Stan:
in your regression model, -year- has a negligible effect within the same panel (since you adopted an -fe- specification).
As Habib said, it's up tp you to keep that independent variabe within the predictors or get a rid of it.
As an aside: why did you decide to robustify your standard errors? Are you concerned about heteroskedsticity and/or autocorrelation issues?

Kind regards,
Carlo
(Stata 19.0)
Comment
Stan Breetman

Join Date: Jun 2017

Posts: 31
#8

17 Jul 2017, 08:03

Thank you all.

Yes, I think the same way.

Carlo, I use it due to a friends advice and the description in stata, but I am still not sure. My variables are more significant if I dont use it what I like very much.
1. I would be happy if You could explain me why I should not use it? Because dont understand exactly what happens if I dont use them.

((((=> -help robust- = _robust is a programmer's command that computes a robust variance estimator based on a
varlist of equation-level scores and a covariance matrix. It produces estimators for
ordinary data (each observation independent), clustered data (data not independent
within groups, but independent across groups), and complex survey data from one stage
of stratified cluster sampling.)))))

2. And yes, heteroskedsticity could be a problem. I got the following correlation matrix but I have to look if this correlation is ok between the variables.
(the first three variables are the possible dependent variables)

3. In the down part are the fixed effects regressions with the command robust and without. It would be good if I would know why not to use robust, because my most important variable gini_12 is significant then.

4. I am not sure if I cant let out the fixed effects, because I am observing NBA basketball teams over 14 years, so there should be? some fixed effects because of quality of the teams which they have, but it seems not... I dont know really if I should put away the year fixed effects but keep the team fixed effects.

Last edited by Stan Breetman; 17 Jul 2017, 08:41.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

17 Jul 2017, 11:34

Stan:
q2) I'm not clear about your "possible dependent variables". You should decide what you want to investigate before looking at the results concerning your sample. Then check whether heteroskedsticity is or not a problem; the fact that you get more statistical significant results with one approach or the other may be overlooked as data mining (this last remark relates to your q1);
q4) keep -i.year-. However, I woud also wonder whether -re- specification would not be more appropriate given that you want to compare, say, Philadelphia 76ers with Atlanta Hawks (and -fe- is not good at that).
Sorry if team manes are not updated: my last shot on a basketball playground dates back to the era before the three-point line!

Kind regards,
Carlo
(Stata 19.0)
Comment
Stan Breetman

Join Date: Jun 2017

Posts: 31
#10

17 Jul 2017, 15:34

Firstfull Thank you.

Haha.. I would give it a try. For me its good to play basketball for a break, so I dont just "play" with statistics (ehmm on the other side basketball is statistic pure too ... so no break with statistics :P)

1. to q2) About what I want to investigate I am not sure. The reason: I have the hole regression for example (first picture), where my main variable (gini_12) is not significant and when removing variable like: tot_disc_salary_12, the main variable becomes significant (picture 2). And then there is the problem with the command - robust - also, when I dont use it in the regression of picture 1, then the main variable is significant. I dont know how to handle this. Should I accept that maybe my variable is insignificant?

(Also when adding controlling variables, the gini_12 becomes insignificant.)

2. Also a note: Is it possible, that from regression in picture 1 to picture 2, the variable gini_12 gets insignificant, and that the R2 just drops by 0.047?

I have also other examples and variables, where tha P-value goes to 0.31 etc..

Sorry I know its a lot.. But I would be so happy if you people could help me because I have to finish this ..

Last edited by Stan Breetman; 17 Jul 2017, 15:41.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#11

17 Jul 2017, 22:49

Stan:
most part of my previous reply still applies.
The changes in R2 are due to the fact that you ran different regerssion models: so, why do you expect they give you back similar results?
Eventually, I would skim through the literature about sport statistics and see if some suggestions come alive as far as your regression specification is concerned.

Kind regards,
Carlo
(Stata 19.0)
Comment
Stan Breetman

Join Date: Jun 2017

Posts: 31
#12

18 Jul 2017, 01:43

Carlo, maybe I didnt understand your answer.

Your answer was: q2) I'm not clear about your "possible dependent variables". You should decide what you want to investigate before looking at the results concerning your sample. Then check whether heteroskedsticity is or not a problem; the fact that you get more statistical significant results with one approach or the other may be overlooked as data mining (this last remark relates to your q1

1.You should decide what you want to investigate before looking at the results concerning your sample.
I decided.

2.Then check whether heteroskedsticity is or not a problem.
But now how can I check for heteroskedasticity? I have read about some codes (I dondt know the names anymore) but they are restricted for example: must be balanced what my panel is not.
Check for heteroskedasticity is possible through looking at the correlation, if this is low between variables there should not be multicolineraity, right?

3. the fact that you get more statistical significant results with one approach or the other may be overlooked as data mining.
I didnt understand this exactly. If it can be overlooked as data mining, whats the implication behind? So should I use robust or not or better is it appropriate for my regression?

Thank you.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#13

18 Jul 2017, 02:02

Stan:
q2) looking at correlation is not the right way to check for heteroskedasticity (that affects residual distribution). See also the following Stata thread:http://www.stata.com/statalist/archi.../msg01258.html ;
q3) I mispelled -overlooked- for -mistaken- as data mining (ie, hunting for the model with the highest number of statistial significant coefficients is not a scientific approach). Sorry for the mishap.

Kind regards,
Carlo
(Stata 19.0)
Comment
Stan Breetman

Join Date: Jun 2017

Posts: 31
#14

18 Jul 2017, 03:25

q2) the logic approach in this linked thread stops here where the questions is and the following answer.. To this point i came to. But i dodnt know how to take the next step.
So
For plotting the residuals I only know folowing command: rvfplot, yline(0) which again doesn`t work for panel data. Is there another possibility?
ou should take a step back and ask > yourself how heteroskedasticity might manifest itself in > your panel. Since there are various sources of > potential heteroskedasticity, you may need to adopt > different model specifications to test different ones. > > The classic form is panel-level heteroskedasticity but with > 6 years for each of 104 companies you have not got enough > observations to test this properly. There is an FAQ at: As stated, my problem is shown in the picutres, that for letting out robust, i get more significant variables. I just dont know which approach ishould take. If the right is the with or without comand -robust-. And the problem with heteroscedasticity, where i looked at the FAQ, but I dont know which code is the right for me to test. Maybe you can tell me? - xtgls depvar indepvars, igls panels(heteroskedastic) - or - local df = e(N_g) - 1?
q3I dont know what you mean by hunting for the best model. I used variables that were used usually in sports statistics. I just want answers for how to test my regression for heteroscedasticity and if i should use robust or not. By connecting the coefficients to the significant results, it seems logic and possible. But there was no hunting for "high significance" at all may just luck .
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#15

18 Jul 2017, 03:57

Stan:
q2) you can use -scatter- to plot residuals vs predictor(s). (you can retrieve residuals from -xtreg postestimation- command) or you can plot residuals distribution via -kdesnity- or -histogram-;
q3) if your approach considers variables used in similar reaserch project, you're on the right path. My previous remark referred to the fact that you cannot consider -robust- the way to go conditional on statistical significance of -gini-12-. Thats'all.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Panel Data Dummies

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment