Please help me for my Masterthesis - Panel Data und Multilevel Regression

Benno Klocker

Join Date: Nov 2023

Posts: 3
#1

Please help me for my Masterthesis - Panel Data und Multilevel Regression

23 Nov 2023, 01:36

Hello Stata community,

I'm relatively new to Stata and currently working on my master's thesis, which involves measuring the performance of golf players. I have gathered extensive data from various players across different tournaments, intending to measure performance per hole (Netto_hole) and overall tournament performance (Netto_tournament).

My goal is to assess player performance based on individual abilities (e.g., scores from one and two holes before the observed hole, handicap, age, gender) and flight composition (e.g., male and female players in a flight, average handicap per flight, potential differences in handicap and performance between male and female players).

I'm facing two main challenges. Firstly, my dataset is structured as panel data, with holes 1-18 acting as the time variable, and it has multiple levels: player level, flight level, and tournament level.

The first regression works smoothly:
stataCopy code
xtreg Netto_hole l1.Netto_hole l2.Netto_hole HCPI years_in_club, fe vce(cluster TurnierID)
However, in the second regression, a significant portion of the variance is explained by effects between different groups:
stataCopy code
xtreg Netto_hole avg_hcpi_flight female_better_on_hole avg_Netto_flight_hole, fe vce(cluster TurnierID)
For the third regression, I'm unsure whether the arrangement of different levels is correct and meaningful. I've tested various regression options with estat ic, and this form seemed to be the most specific:
stataCopy code
mixed Netto_hole l1.Netto_hole l2.Netto_hole female Age HCPI member good_start1 everyone_better || TurnierID: || flightcluster: mixed_flight avg_hcpi_flight female_betterHCPI_than_male female_better_total || ID:
However, when I attempt to examine the output for the overall tournament (Netto_tournament), I encounter an issue. Regardless of the xtregression method used, every independent variable is labeled as "omitted," and I'm struggling to understand why:
stataCopy code
xtreg Netto_tournament female HCPI years_in_club member special_tournament , fe vce(cluster TurnierID)
I would greatly appreciate any insights, suggestions, or assistance you could provide. Thank you in advance for your help!

Best regards, Benno
Tags: None
Daniel Feenberg

Join Date: Oct 2014

Posts: 329
#2

23 Nov 2023, 07:04

The results are telling you that there is no variation of the named independent variables within each TurnierID. I can guess that no participant changes sex during the tournament, or changes years_in_club. So those won't be identified. I don't have a guess about the others. Is this just a wild guess on my part?

Last edited by Daniel Feenberg; 23 Nov 2023, 07:08. Reason: Spelling correction
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17729
#3

23 Nov 2023, 08:02

Benno:
welcome to this forum.
As per FAQ, please share not only what you typed, but also what Stata gave you back.
That said, did you try the community-contributed module -xtoverid- to compare -fe- vs -re- specification with non-default standard errors?

Kind regards,
Carlo
(Stata 19.0)
Comment
Benno Klocker

Join Date: Nov 2023

Posts: 3
#4

24 Nov 2023, 01:18

Thank you for your replies Carlo and Daniel.
in the last regression i pointed out above the output is:

xtreg Netto_tournament female HCPI years_in_club member special_tournament , fe vce(cluster TurnierID)
note: female omitted because of collinearity
note: HCPI omitted because of collinearity
note: years_in_club omitted because of collinearity
note: member omitted because of collinearity
note: special_tournament omitted because of collinearity

Fixed-effects (within) regression Number of obs = 19,997
Group variable: ID Number of groups = 1,111

R-squared: Obs per group:
Within = . min = 17
Between = . avg = 18.0
Overall = . max = 18

F(0, 20) = .
corr(u_i, Xb) = . Prob > F = .

(Std. err. adjusted for 21 clusters in TurnierID)
------------------------------------------------------------------------------------
| Robust
Netto_tournament | Coefficient std. err. t P>|t| [95% conf. interval]
-------------------+----------------------------------------------------------------
female | 0 (omitted)
HCPI | 0 (omitted)
years_in_club | 0 (omitted)
member | 0 (omitted)
special_tournament | 0 (omitted)
_cons | 32.6752 . . . . .
-------------------+----------------------------------------------------------------
sigma_u | 6.1630509
sigma_e | 0
rho | 1 (fraction of variance due to u_i)
------------------------------------------------------------------------------------

it makes sense, that as all the independet variables are not changing during the tournament, they all are omitted. Do the fixed Effects used in that regression automatically include these effects, or how am i able to analyse these effects?
Thanks in advance!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17729
#5

24 Nov 2023, 02:15

Benno:
if all your predictors are time invariant, as expected -fe- will not give back any coefficient.
Therefore, provided that your data support the evidence of a grou-wise effect, you can only go -re-.
In addition, 21 clusters are not enough to use non-default standard errors.

Kind regards,
Carlo
(Stata 19.0)
Comment
Benno Klocker

Join Date: Nov 2023

Posts: 3
#6

24 Nov 2023, 03:07

Thank you very much Carlo.
I'm sorry for asking relatively easy questions, im currently trying to understand that whole Stata thing.

i'm also looking for the Performance "per hole" by the players.

Therefore i use: xtreg Netto_hole avg_hcpi_flight female_better_on_hole avg_Netto_flight_hole, fe vce(cluster TurnierID).

i will show the output below.

aren't these predictors also time invariant?
Why am i getting an output here?

And if you say 21 clusters aren't enough, would you suggest trying other clustering (f.ex. flightcluster), or just using these regressions without clusters?

Thank you so much in advance!

xtreg Netto_hole avg_hcpi_flight female_better_on_hole avg_Netto_flight_hole, fe vce(cluster TurnierID)

Fixed-effects (within) regression Number of obs = 21,959
Group variable: ID Number of groups = 1,220

R-squared: Obs per group:
Within = 0.3763 min = 17
Between = 0.0108 avg = 18.0
Overall = 0.0064 max = 18

F(2, 20) = .
corr(u_i, Xb) = -0.9958 Prob > F = .

(Std. err. adjusted for 21 clusters in TurnierID)
---------------------------------------------------------------------------------------
| Robust
Netto_hole | Coefficient std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
avg_hcpi_flight | 1.504362 .0006589 2282.98 0.000 1.502987 1.505736
female_better_on_hole | 1.384788 .0254388 54.44 0.000 1.331724 1.437853
avg_Netto_flight_hole | .989307 .0040657 243.33 0.000 .980826 .9977879
_cons | -30.60801 .02069 -1479.36 0.000 -30.65117 -30.56485
----------------------+----------------------------------------------------------------
sigma_u | 7.5400589
sigma_e | .79844154
rho | .98891096 (fraction of variance due to u_i)
---------------------------------------------------------------------------------------
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17729
#7

24 Nov 2023, 07:06

Benno:
it depends on how you -xtset- your data.
If you have 1220 panels, you should -cluster- at their level.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Please help me for my Masterthesis - Panel Data und Multilevel Regression

Comment

Comment

Comment

Comment

Comment

Comment