Creating a panel data identifier

Ellinor Hjelvik

Join Date: Sep 2014

Posts: 10
#1

Creating a panel data identifier

08 Dec 2015, 08:54

I have pooled data from a survey that is done twice.

Some of the observations come from the same firm, but the firm was given different ids over the two years. However, I have an additional variable that signalizes a unique firm. I would use the group command, but I have a lot of missing variables and some numbers can reoccur in the second variable. The latter can be separated by using observations of country and year.
Obs Firm Panel Country Year

1 258 2 USA 1

2 658 3 USA 1

3 599 3 USA 2

4 245 . USA 1

5 689 3 Canada 1

6 789 3 Canada 2

7 359 58 Canada 1

8 385 . Canada 2

As you can see, by having the same number in the “panel” variable, observation 2 and 3 (+5 and 6) represent the same firm.

Any suggestions as to how I can convert this into (unbalanced) panel data?

Last edited by Ellinor Hjelvik; 08 Dec 2015, 09:14.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#2

08 Dec 2015, 09:25

Are you saying that observations 2, 3, 5, and 6 are all one firm (which operates in both the USA and Canada), or are you saying that observations 2 & 3 are one firm, and 5& 6 are another, both being designated by panel == 3, but distinguished by the country variable?
Comment
Ellinor Hjelvik

Join Date: Sep 2014

Posts: 10
#3

08 Dec 2015, 09:36

The latter! Observations 2 and 3 are one firm, and 5 and 6 are another. The country variable distinguishes them.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30061

08 Dec 2015, 09:55

OK. I've made some assumptions: Country is never missing, and if there were more than one observation showing the same value of Firm, Country and Year, they would all show the same value of Panel. (If that isn't true, then it doesn't seem like Panel is helpful in identifying firms.) I also assume that if Panel is missing, then we have to presume that Firm & Country combined distinctly identify each firm, as there doesn't seem to be anything else to go on. So try this:

Code:

clear*
input Obs    int Firm    int Panel    str6 Country    int Year
1    258    2    USA    1
2    658    3    USA    1
3    599    3    USA    2
4    245    .    USA    1
5    689    3    Canada    1
6    789    3    Canada    2
7    359    58    Canada    1
8    385    .    Canada    2
end

list, noobs clean

//    VERIFY COUNTRY IS NEVER MISSING
assert !missing(Country)

//    VERIFY Panel IS CONSISTENT WITHIN FIRM COUNTRY YEAR
//    (IF MORE THAN ONE SUCH OBSERVATION--THIS DATA DOESN'T HAVE ANY SUCH)
by Firm Country Year (Panel), sort: assert Panel[1] == Panel[_N]

//    GENERATE AN ID BASED ON PANEL, COUNTRY
egen id = group(Panel Country)
//    ELIMINATE ID BASED ON MISSING VALUE OF PANEL
replace id = . if missing(Panel)

//    FOR REMAINING FIRMS, GENERATE NEW ID BASED ONLY ON FIRM
summ id, meanonly // DESPITE NAME, ALSO CALCULATES MIN & MAX
local id_start = r(max)

by id Firm Country, sort: gen id2 = (_n == 1) if missing(id)
replace id2 = sum(id2)
replace id = id2 + `id_start' if missing(id)

list, noobs clean

Comment

Ellinor Hjelvik

Join Date: Sep 2014

Posts: 10
#5

08 Dec 2015, 10:30

Thank you Sir! It worked ... and I got the logic! I was just missing (a lot of) commands.

None of the observations show the same value of Firm, Country and Year so that would not have been a problem.

Thank you again for your help, and have a nice day!
Comment

Announcement