Hi all,
I use unbalanced panel data and used the following link to identify 3 consecutive observations over years for each firm: http://www.stata.com/support/faqs/da...-observations/
The commands I use:
gen run=.
by id: replace run = cond(L.run == ., 1, L.run + 1)
by id: egen maxrun = max(run)
I also give a simple example in the table below to describe the problem.
"run" counts the number of positive values of Var 1 in the table, and maxrun takes the highest number of "run" for each firm. For instance, for firm D1 "run" counts from 2000 to 2002, and after the missing value in 2003, it starts counting again from 2004 to 2005. And maxrun takes the highest counted number under "run" which is 3 for the firm D1. The problem here is that maxrun contains the same constant, 3, in every observation for firm D1 from 2000 to 2005. And when I use maxrun for my regression, it uses not just 3 consecutive observations from 2000-2002 (0.04, 0.03, 0.06), but also two consecutive observations from 2004-2005 ( 0.07 and 0.02) since maxrun values are also 3 for these observations. I just need 3 consecutive observations, not 2 consecutive.
I really appreciate if anyone helps to identify the firms with just 3 consecutive observations?
Thanks in advance
I use unbalanced panel data and used the following link to identify 3 consecutive observations over years for each firm: http://www.stata.com/support/faqs/da...-observations/
The commands I use:
gen run=.
by id: replace run = cond(L.run == ., 1, L.run + 1)
by id: egen maxrun = max(run)
I also give a simple example in the table below to describe the problem.
"run" counts the number of positive values of Var 1 in the table, and maxrun takes the highest number of "run" for each firm. For instance, for firm D1 "run" counts from 2000 to 2002, and after the missing value in 2003, it starts counting again from 2004 to 2005. And maxrun takes the highest counted number under "run" which is 3 for the firm D1. The problem here is that maxrun contains the same constant, 3, in every observation for firm D1 from 2000 to 2005. And when I use maxrun for my regression, it uses not just 3 consecutive observations from 2000-2002 (0.04, 0.03, 0.06), but also two consecutive observations from 2004-2005 ( 0.07 and 0.02) since maxrun values are also 3 for these observations. I just need 3 consecutive observations, not 2 consecutive.
YEAR | Firm ID | Var 1 | run | maxrun |
2000 | D1 | 0.04 | 1 | 3 |
2001 | D1 | 0.03 | 2 | 3 |
2002 | D1 | 0.06 | 3 | 3 |
2003 | D1 | . | . | . |
2004 | D1 | 0.07 | 1 | 3 |
2005 | D1 | 0.02 | 2 | 3 |
2000 | E1 | |||
2001 | E1 | |||
2002 | E1 | |||
2003 | E1 |
Comment