Stata question 1: this is too vague to answer. What does "over the years" mean? Do you mean stay in the same time from the beginning to the end of the data on them? Do you mean that they stay on some team for more than one year? On all teams they are ever on for more than one year? Or some other time period other than just more than one year? If you can make this more precise, I'm sure there's a way to code it.
Stata question 2: Perhaps somebody else who knows this command can help. In any case, though, you should show the commands leading up to it: -estat- is a post-estimation command, and sometimes how it works depends on lot on the original estimation command. The problem may be there.
Stata question 3. First there is the substantive question: what is a meaningful region in your context? Perhaps that is pre-defined by the leagues of the sport you are studying. Or perhaps there is some other economic basis for deciding which teams belong to which region. Anyway, that's not a Stata issue. But let me assume you've resolved that issue. The easiest way to do this is to create a new file with just two variables: team name and region--one observation for each team. Then you can -merge 1:m- that one with your original file, and that will assign the region to each observation for that team. When creating the region variable, you can create it as a string and then use the -encode- command to make a numeric variable out of it. If you are not familiar with -merge- and -encode-, do read the corresponding manual sections. -encode- is quite simple to learn and use. (The hard part is learning when it is appropriate to use!) -merge- is a bit more complicated, but the manual section is quite clear and has good examples.
Let me make a general comment here regarding your econ questions, though not specifically answering any of them. It seems you have inherently multi-level data. You have repeated observations nested within players who are nested within teams which are nested in regions which are nested in countries! I recognize that mixed-effects multi-level models are viewed somewhat skeptically in economics and there is a strong preference for fixed effects estimators. And I understand that fixed-effects estimators offer unsurpassed advantages in controlling for omitted variable bias regarding any time-invariant attributes. And I understand that they provide consistent estimates, where random-effects may or may not. But omitted variable bias isn't the only problem one faces in data analysis, and often one can come very close to dealing with it in other ways that are not as constricting as fixed-effects models. And consistent estimates from a mis-specified model are not necessarily better than inconsistent estimates from a properly-specified model that also accounts for more sources of variation. In a data set with this many levels, I think it is likely that a mixed-effects multi-level model is the way to go. Then you don't have the dilemma of "which level should I cluster at" (a question that is asked because it is recognized that all of the possible answers are in some way wrong). You "cluster" at all relevant levels. Omitted variable bias can be partly, and often nearly completely, dealt with by including relevant covariates. With only two countries, I would not include a country-level in the model: I would just add an indicator variable for Canada vs US. But I would include all of the other levels of nesting in the model. Think about it.
Stata question 2: Perhaps somebody else who knows this command can help. In any case, though, you should show the commands leading up to it: -estat- is a post-estimation command, and sometimes how it works depends on lot on the original estimation command. The problem may be there.
Stata question 3. First there is the substantive question: what is a meaningful region in your context? Perhaps that is pre-defined by the leagues of the sport you are studying. Or perhaps there is some other economic basis for deciding which teams belong to which region. Anyway, that's not a Stata issue. But let me assume you've resolved that issue. The easiest way to do this is to create a new file with just two variables: team name and region--one observation for each team. Then you can -merge 1:m- that one with your original file, and that will assign the region to each observation for that team. When creating the region variable, you can create it as a string and then use the -encode- command to make a numeric variable out of it. If you are not familiar with -merge- and -encode-, do read the corresponding manual sections. -encode- is quite simple to learn and use. (The hard part is learning when it is appropriate to use!) -merge- is a bit more complicated, but the manual section is quite clear and has good examples.
Let me make a general comment here regarding your econ questions, though not specifically answering any of them. It seems you have inherently multi-level data. You have repeated observations nested within players who are nested within teams which are nested in regions which are nested in countries! I recognize that mixed-effects multi-level models are viewed somewhat skeptically in economics and there is a strong preference for fixed effects estimators. And I understand that fixed-effects estimators offer unsurpassed advantages in controlling for omitted variable bias regarding any time-invariant attributes. And I understand that they provide consistent estimates, where random-effects may or may not. But omitted variable bias isn't the only problem one faces in data analysis, and often one can come very close to dealing with it in other ways that are not as constricting as fixed-effects models. And consistent estimates from a mis-specified model are not necessarily better than inconsistent estimates from a properly-specified model that also accounts for more sources of variation. In a data set with this many levels, I think it is likely that a mixed-effects multi-level model is the way to go. Then you don't have the dilemma of "which level should I cluster at" (a question that is asked because it is recognized that all of the possible answers are in some way wrong). You "cluster" at all relevant levels. Omitted variable bias can be partly, and often nearly completely, dealt with by including relevant covariates. With only two countries, I would not include a country-level in the model: I would just add an indicator variable for Canada vs US. But I would include all of the other levels of nesting in the model. Think about it.
Comment