Declaring panel dataset for fixed effects or mixed models regression: addressing repeated time values within panel properly

Suryadipta Roy

Join Date: Nov 2014

Posts: 17
#1

Declaring panel dataset for fixed effects or mixed models regression: addressing repeated time values within panel properly

05 Feb 2016, 10:30

Dear Statalisters,
I am working with survey data on a cross-section of firms belonging to different industries in different countries over a number of years. Thus while the firms are unique, the countries and the industries get repeated over time. I have included a small sample of the dataset (properly labelled) with this post. I found declaring the panel nature of the data challenging. For example,

Code:

xtset industryid year

or

Code:

xtset countryid year

was understandably flagged to have repeated time values within panel. Following some of the previous queries in the list relating to my post, I have thus created a unique identifier, and now can declare the time series nature of the data, e.g.

Code:

xtset newvar1 year

works fine. I was wondering if any member could point out the consequences of this declaration for the purposes of (country & industry) fixed effects regression. Essentially the above declaration is the same as declaring

Code:

xtset firmid year

since

Code:

xtsum

shows no within variation for either the firm-level (c205a) or the country-level variable (realgdp).

Any observations or comments & suggestions to help me to understand the data better for proper econometric analysis will be greatly appreciated.

Thank you,
Suryadipta.
Attached Files

sample.dta (13.3 KB, 1 view)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#2

05 Feb 2016, 11:23

Well, it seems to me that the natural panel structure of your data is firmid year, if your sample is representative of your entire data set. countryid and industryid are both constant for all observations of any given firmid. So in a fixed-effects model, those two variables will be collinear with the firm effect and will be dropped anyway. You cannot estimate their effects in any fixed effects model that also includes firmid.

Also, I see no obstacle to -xtset firmid year- in your data. There are no duplicate observations of year within firmid. So I would proceed with -xtset firmid year- and run your analyses accordingly. You will not be able to estimate country and industry effects. If those are your primary interest, then you need to either omit firm effects (which seems a dubious approach unless the firm effects are very small) or go to a random-effects model with firms nested in industry and country (the latter two being crossed with each other, though sparsely populated).

Finally, I do see one potential problem in your data. The variable realgdp is constant for all observations of a given country, even over different years. That doesn't seem plausible to me.
Comment
Suryadipta Roy

Join Date: Nov 2014

Posts: 17
#3

05 Feb 2016, 11:46

Dear Dr. Schechter,
Thank you very much for your comments! The firmid-s are indeed unique observations and these are different firms surveyed over different years in different countries. So, I will go ahead with

Code:

xtset firmid year

as you have advised me to do. Actually, I do intend to investigate the effect of an industry-level variable and some country-level variables on my dependent variable, and the latter is a firm-level variable. Thus as you have suggested, I will not be able to use country- or industry-fixed effects methods. I will probably turn towards correlated random effects or other mixed effects techniques.

As regards the variable realgdp, I checked that the sample that I posted in the list (first 100 observations from my dataset) does not have data for any country for more than one year

Code:

sort countryid year

. I believe that is why the realgdp measure appears to be constant for all observations for a given country. Thank you very much once again!

Warm regards,
Suryadipta.
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#4

05 Feb 2016, 13:23

I don't see your design in quite the same way that Clyde does. If I understand your design, you have a repeated cross sectional sample of firms within year. Presumably, a given firm appears in one and only one year. However, your sample design is such that you sample those firms within a fixed classification of countries and industries which does not vary from year to year. I would guess that both counties and industries are not exhaustive, that is, they are samples of the population of countries and industries defined in some way, perhaps arbitrarily. Thus, if I understand your description of your design, the countries and and industries do not vary from year to year, but the cases within each cell of the cross classification do, hence the term "repeated cross section." If I am correct there is no reason why you can't model an outcome as a function of year, country and industry. As Clyde notes, given the design, there is no way to estimate a "firm effect." You can, in various ways, adjust your analysis for clustering on country and/or firm.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#5

05 Feb 2016, 14:54

The explanation and data sample provided by Suryadipta are indeed consistent with Dick's interpretation of his design as well. I think Suryadipta himself needs to understand which it is and then proceed accordingly.
Comment
Suryadipta Roy

Join Date: Nov 2014

Posts: 17
#6

05 Feb 2016, 17:33

Dear Dr. Campbell and Dr. Schechter,
Thank you very much for your insightful comments on my post! I believe that the sample from my dataset that I had posted before was not representative. Thus I have attached a random sample of 100 observations (i.e. firms) with this post. The firms are entirely unique, i.e. no single firm seems to have been surveyed twice based on the firmid variable. However, the countries and the industries do vary at times, e.g. countryid = 13 appears in two different years, e.g. in 2002 and 2003. Similarly, industryid = 2 appears in three different years, e.g. in 2001, 2002, and 2004. Thus the countries and industries do vary from year to year. The data are for 5 years, i.e. 2001 - 2005. Based on this design, I was wondering if you could throw any light on the mode of analysis.

Best regards,
Suryadipta.
Comment
Suryadipta Roy

Join Date: Nov 2014

Posts: 17
#7

05 Feb 2016, 17:34

I forgot to attach the new data sample in my previous post. Sorry!

Suryadipta.
Attached Files

sample.dta (11.5 KB, 2 views)
Comment

Announcement

Declaring panel dataset for fixed effects or mixed models regression: addressing repeated time values within panel properly

Comment

Comment

Comment

Comment

Comment

Comment