Panel data: Multiple observations per year

Johanne Laegsgaard

Join Date: Mar 2020
Posts: 2

Panel data: Multiple observations per year

23 Mar 2020, 02:04

Hi,

I'm new to STATA and its commands so excuse my inexperience. My dataset contains multiple observations per ID. In a simplistic way, it looks like this (with made-up numbers):

ID	Time	ROE	Turnover	Age	Board_gender
1	2012	25,4	1234	43	M
1	2012			53	M
1	2012			34	F
1	2013	24,1	3402	45	M
1	2013			54	F
1	2013			43	F
1	2013			44	F
1	2013			34	M
1	2014	33,1	3500	63	M
1	2014			52	F
1	2015	32,2	3478	41	M
1	2015			38	M
1	2015			57	M
1	2015			42	F
2	2012	24,5	4350	36	F
2	2012			61	M
2	2013	33,4	4590	43	M
2	2013			45	M
2	2013			51	M

...And so on. I have +5000 ID's.

I really want to do some pooled OLS, FE, IV ect., but my data is highly unbalanced and I get "repeated time values within panel".
My parameters of interest are ROE and Turnover, and I want to know the effect of Board_gender and Age on these parameters. Do I have to replace all Board_gender variables with a number, such that I only have one observation per year? (same with gender). Or am I able to apply STATA-tricks without deleting rows?

Thank you so much in advance!

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#2

23 Mar 2020, 02:10

Johanne:
welcome to this forum.
If you have repeated time values within panel, you can simply -xtset- your data with -panelid- only:

Code:

xtset panelid

This trick comes at the cost of Stata not supporting time-series related commands, such as lags and leads, that you might be interested in.
Besides:
- Stata can handle both balanced and unbalanced panel datasets without any problem;
- please be advised that OLS,FE,IV are pretty different beasts and hardly interchangeable.

Kind regards,
Carlo
(Stata 19.0)
Comment
Johanne Laegsgaard

Join Date: Mar 2020

Posts: 2
#3

23 Mar 2020, 02:57

Thank you! And thank you for your quick response!

Won't I miss the whole idea of panel data, if I don't include my time variable - or can STATA handle this? :-)

(I do not need time-series related commands, but would like to explore the evolution throughout the years and compare the results. If I drop the time variable, won't I get all my data mixed up - thus, still sorted by ID of course)

Kind regards,
Johanne
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17702

23 Mar 2020, 03:51

Johanne:
not quite.
See the foreword of the Remarks and examples section, -xtset- entry, Stata .pdf manual (pages 504-505).
That said, a toy-example might be helpful:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtset idcode
       panel variable:  idcode (unbalanced)

. xtreg ln_wage age, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,23799)        =    2720.20
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
       _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000

. xtset idcode year
       panel variable:  idcode (unbalanced)
        time variable:  year, 68 to 88, but with gaps
                delta:  1 unit

. xtreg ln_wage age, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,23799)        =    2720.20
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
       _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000

. help xtset

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#5

24 Mar 2020, 12:35

Since I work with this kind of data, let me add a little bit to Carlo's helpful comments.

The reason you have multiple observations in a given year is because you have multiple executives listed for given company in a given year.

However, your dependent variable is almost certainly at the firm level. It would make most sense to collapse your data to the firm level. I don't think you can neatly collapse string data, but that's not much of a problem.

If you need to count the number of males and females for each firm year, you can do this with by sort firmyear and then the appropriate egan option.
bysort firm year: egen numfem=count(Board_gender=="F")
Comment

Announcement

Panel data: Multiple observations per year

Comment

Comment

Comment

Comment