Cross classified multilevel model predictions

Santiago Calvo

Join Date: Jul 2021

Posts: 11
#1

Cross classified multilevel model predictions

27 Jul 2021, 08:27

Hello everybody,

I am making a cross classified multilevel model, using the mixed command for a sport racing competition (Formula 1), to evaluate how the points obtained in a given race by the driver and the team vary. I think I have no problems as I use the following regression in the simple version without controlling for other factors (if it is wrong I would appreciate help to improve the specification):

mixed zPoints || Driver: || Team: || TeamYear:, mle variance

My question is how can I then get the residuals for each driver and team individually, as my goal is to be able to determine who is the best driver ever and the best team. I understand that the driver and team with higher residuals would have a greater influence on the results according to the dependent variable.

Thank you very much for your help.
Tags: cross classified, mixed, multilevel
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

27 Jul 2021, 11:55

Well, the syntax you show is not the syntax for a cross-classified model. It is for a nested model, and an odd one at that, because it has Teams nested in Drivers--which seems backwards. So the first thing you need to be clear is whether you want a cross classified model (the same driver can participate in multiple teams) or a nested one (a given driver participates in only one team, but each team contains many drivers), and then modify the syntax accordingly.

Another thing you should consider is whether it really makes sense to have year as a separate level in the model. If you have data extending over several decades, it would. But if you have only say 10 years of data, you will not be getting much of a sample of year-space and any estimate you make of a random effect at that level is going to be very imprecise. You might be better off just including i.year among the fixed effects instead.

Following -mixed-, you can get the random effects at each level using the -predict, reffects- command. See -help mixed postestimation- and click on the blue -predict- link for details of how to use this command.

Last edited by Clyde Schechter; 27 Jul 2021, 11:58.
1 like
Comment
Santiago Calvo

Join Date: Jul 2021

Posts: 11
#3

27 Jul 2021, 13:10

Thank you very much for your reply.

So how should I specify the regression? In this case, a driver belongs to several teams throughout the sample (for example, Lewis Hamilton has been a driver for both Mclaren and Mercedes), plus a team contains several drivers at the same time (in 2020, Lewis Hamilton and Valtteri Bottas are both Mercedes drivers).

I also take into account the year effect for each team because over the 71 years contained in the sample, the same constructor can vary a lot in its performance (e.g. the 1988 Mclaren is much better than the 2015 Mclaren).

Thanks again for your help! I've never worked with this kind of models and I'm a bit lost.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

27 Jul 2021, 13:31

OK. So it really is a crossed-effects design, and you do have enough years to warrant having a year-level in the model.

Code:

mixed zPoints || _all: R.Team: || Driver: || TeamYear:

1. I am assuming there are fewer teams than drivers. The code will work either way, but as structure it will be a bit faster with fewer teams and more drivers.
2. I've removed the -mle- and -variance- options because unless you are using a rather old version of Stata, those are now the defaults with -mixed-.
3. You don't need to create a TeamYear variable for this. TeamYear is going to be nested in the intersection of Team and Driver, so if you just use Year: (assuming you have a variable by that name) as the final random effect, you will get the same results, as it will be understood that Year is nested.
Comment
Santiago Calvo

Join Date: Jul 2021

Posts: 11
#5

27 Jul 2021, 13:54

Perfect. Thank you very much for your help.

Regarding the prediction, I understand that if I run the command <predict varname, reffects relevel(Driver)>, I get the predicted performance of each driver according to the dependent variable, the rest of the random part of the model being set to 0. Then, I can generate a ranking and use the list command to display the predicted data for each driver by adding a tag so that each driver is only shown once. No?

Finally, is there a quick option to get the model results for each year individually? I would like to see if the team and driver variation has evolved over the time sample (my intuition leads me to think that the influence of the drivers has reduced in recent years).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

27 Jul 2021, 14:42

Regarding the prediction, I understand that if I run the command <predict varname, reffects relevel(Driver)>, I get the predicted performance of each driver according to the dependent variable, the rest of the random part of the model being set to 0. Then, I can generate a ranking and use the list command to display the predicted data for each driver by adding a tag so that each driver is only shown once. No?

Correct.

Finally, is there a quick option to get the model results for each year individually? I would like to see if the team and driver variation has evolved over the time sample (my intuition leads me to think that the influence of the drivers has reduced in recent years).

Not that I'm aware of.
Comment
Santiago Calvo

Join Date: Jul 2021

Posts: 11
#7

28 Jul 2021, 03:55

Hello,

I still have doubts about the specification you indicated to have a cross-classified multilevel model running. According to you, it is necessary to put the team (_all:R.Team) at a higher level, but both drivers and TeamYear also nest the other variables in a multilevel way. I was reading that in these cases the ideal would be to use a Markov Chain Monte Carlo. How could I do this in Stata?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

28 Jul 2021, 10:35

The notation -|| _all: R.Team: || Driver: || TeamYear:- is the way Stata handles crossed random effects. I suggest you read the -mixed- chapter in the PDF manuals that come with your Stata installation for a full explanation of how this works. At the beginning of the chapter, click on the link for Remarks and Examples, and then on that page click on the link for crossed-effects models. There is a full explanation of how this works. It is not putting Team at a higher level--Team and Driver are crossed with each other and at the same level.

I do not understand what you mean when you say that "both drivers and TeamYear also nest the other variables in a multilevel way." You haven't even mentioned any other variables in this modeling (except for the outcome variable.) If you are talking about the relationships among Team Driver and TeamYear, suffice it to say that TeamYear is necessarily nested inside Team and cannot nest Team or Driver.

Markov Chain Monte Carlo is not a model: it is a method for estimating model parameters and is usable with a broad array of models. I am not aware of it being any more or less useful than maximum likelihood estimation for crossed-effects multi-level models. In any case, if you want to use it, you are moving into Bayesian statistics. I have done very little of this kind of work and don't feel qualified to advise you of the details. Suffice it to say, if you are working with a large data set, which it sounds like you are, the likelihood is going to overwhelm the prior and your results will differ little from what you would just get with maximum likelihood estimation. So unless there is some special reason for going Bayesian (for example, the model has unidentifiable parameters and you can use the prior to identify them) I don't see what you would gain in return for the huge increase in computational intensity you will be superimposing on an already computationally intense calculation.
Comment

Announcement

Cross classified multilevel model predictions

Comment

Comment

Comment

Comment

Comment

Comment

Comment