Right method for unbalanced panel data

Maximilian Bartels

Join Date: May 2017

Posts: 11
#1

Right method for unbalanced panel data

31 May 2017, 08:08

Dear stata users,

I am searching for the right statistical method for my panel data. As I am new to stata, it is not that easy for me to find the right solutions for my problems.

My panel data includes data for soccer players and my dependent variable is, if the player scored in the specific match (dummy variable). But I only want to examine the matches, where the player could score a goal (dummy variable if he played). Therefore my panel data is unbalanced, because a player does not have the opportunity to score every week. Also there are sometimes missing values, if there was no game at that specific week.

To examine the likelihood of scoring a goal: which model could I use and how?

I thought about a logit-model with an if-condition, does this work? Or would you suggest a tobit model/heckman correction? If so, I would appreciate, if you could help me using a tobit model/heckman correction with a dummy variable as a censor.

It would be a pleasure, if you could help me, since I haven't found any solution.

Best Regards,

Maximilian
Tags: None
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#2

31 May 2017, 08:44

I think you don't need to worry about the games where the player didn't play and hence don't need to use tobit/heckman. It might depend on your eventual research goal, but I would suggest to indeed use a logit or probit model conditional on the player actually playing. You can do this through the if-condition, or by setting the number of goals scored to missing if a player didn't play. Personally, I prefer the latter approach because it is easier to check (you can browse the data to see if it makes sense) and also just conceptually makes sense (a player didn't fail to score in a match he didn't play, he simply wasn't part of it).
Comment
Maximilian Bartels

Join Date: May 2017

Posts: 11
#3

01 Jun 2017, 01:51

Thanks for your reply, Jesse.

We also have interesting data in those weeks, where the player didn't play (like injuries), where we suggest that they have an influence on the game performance. We are afraid that a logit with an if condition wouldn't take a look on it. Is that right?

Is it anyway possible to censor a tobit via a dummy variable? Couldn't find anything about it.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#4

01 Jun 2017, 03:21

Eh, that quickly gets complicated I would say. You will probably need a two-equation model, but how to structure it really depends on the research and is somewhat out of the scope of this forum, or at least outside my area of expertise...
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#5

01 Jun 2017, 12:02

There is a heckprobit routine that allows for sample selection in a probit model. Alternatively, you probably can do this in cmp or GSEM.
Comment
Maximilian Bartels

Join Date: May 2017

Posts: 11
#6

02 Jun 2017, 02:43

Thanks Phil! Seems like the heckprobit could be the right model for the limitation that a player played the game.

But regarding the sample selection: I would use the dummy variable game played (0/1) for the sample selection, but I don't have to estimate the effects of independent variables on it, because I just observe, if a player played in a game. Does this effect anything on the heckprobit? Should I just don't put in independent variables behind the sample selection (e.g. select (game_played) )?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#7

02 Jun 2017, 02:58

Maximilian:
you might also want to consider a hurdle model, where the hurdle is playing or not a given match.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#8

02 Jun 2017, 03:26

One thing to keep in mind is whether there is actually a selection problem. That is, why is it an issue that a player didn't take part in a game? I wasn't a candidate in the US elections, but that doesn't mean a researcher trying to predict the US election needs to explain why I didn't raise my candidacy. It only becomes an issue once there is some unobserved factor that causes certain people to take part and others not, that also affects who eventually wins the election.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#9

02 Jun 2017, 03:42

Good point, Jesse!

Kind regards,
Carlo
(Stata 19.0)
Comment
Maximilian Bartels

Join Date: May 2017

Posts: 11
#10

04 Jun 2017, 04:59

Thanks for all of your replies! I think heckprobit could be a great method.

But regarding the selection model, I have a problem with an explanatory variable. The dummy variable injured (0/1) predicts the failure of game played (0/1 - selection variable) perfectly. So stata ommited the variable while trying a logisitic regression and when I use injured (0/1) in the heckprobit, the iterations of the selection model don' end (log likelihood stays the same, not concave).

Do you know, how I still can use injured as a explanatory variable? Because it would really raise the R2 of the selection model.
Comment
Maximilian Bartels

Join Date: May 2017

Posts: 11
#11

08 Jun 2017, 05:46

Originally posted by Maximilian Bartels View Post

Thanks for all of your replies! I think heckprobit could be a great method.

But regarding the selection model, I have a problem with an explanatory variable. The dummy variable injured (0/1) predicts the failure of game played (0/1 - selection variable) perfectly. So stata ommited the variable while trying a logisitic regression and when I use injured (0/1) in the heckprobit, the iterations of the selection model don' end (log likelihood stays the same, not concave).

Do you know, how I still can use injured as a explanatory variable? Because it would really raise the R2 of the selection model.

Does anyone has a solution for me? I would be delighted, if you could help me.
Comment

Announcement

Right method for unbalanced panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment