Merging two panel data sets with different time period

Maria Sung

Join Date: Aug 2024
Posts: 3

Merging two panel data sets with different time period

13 Aug 2024, 21:48

Hello!
I have two panel data sets which i have to merge, but they have different time period (first one is from 2002-2016, the second one is from 2017-2022) . I could append them, but list of companies in first data set and the second one are different. For example in first one i have companies: A, B, C, D, F. But in second one: A, B, D. So i need to merge them and keep only those companies which are present in both data sets.

I have common variables in both data sets: company number, registretion number and kis. I cannot use Company name variable because company could change its name during 20 years. So when i try to merge data sets using those vars stata return an error:

merge 1:1 company_number registretion_number kis
variables company_number, registretion_number, kis do not uniquely identify observations in the master data r(459)

I also tried to merge 1:m and m:1, but neither of them worked.

I would really appreciate your help.

Below are the examples of data sets:
Firt data set:

company_number	registretion_number	kis	company_name	Year
1248138892	1.24311e+12	000701	A	2007
1248138892	1.24311e+12	000701	A	2010
1248138892	1.24311e+12	000701	A	2012
2118134787	1.10114e+12	001872	B	2002
2118134787	1.10114e+12	001872	B	2003
2118134787	1.10114e+12	001872	B	2004
2118134787	1.10114e+12	001872	B	2005
2118139521	1.10111e+12	002267	C	2003
2118139521	1.10111e+12	002267	C	2005

Second data set:

company_number	registretion_number	kis	company_name	Year
1248138892	1.24311e+12	000701	A	2017
1248138892	1.24311e+12	000701	A_i	2018
1248138892	1.24311e+12	000701	A_i	2019
1248138892	1.24311e+12	000701	A_i	2020
1248138892	1.24311e+12	000701	A_i	2022
2118139521	1.10111e+12	002267	C	2019
2118139521	1.10111e+12	002267	C	2020
2118139521	1.10111e+12	002267	C	2021
6088106068	1.90111e+12	003456	D	2020

Tags: None

Andreas Backhaus

Join Date: Mar 2023

Posts: 21
#2

14 Aug 2024, 02:15

Why don't you append them and then calculate the number of non-missing observations by company_number to identify the companies that are present in both datasets?
Comment
Maria Sung

Join Date: Aug 2024

Posts: 3
#3

14 Aug 2024, 02:30

I don't exactly undestand what you mean by "calculate the number of non-missing observations by company_number". But if i undestood it right, there are 5000 observations in one data set and 4800 in another one. And non of them has missing observations in company_number
Comment
Andreas Backhaus

Join Date: Mar 2023

Posts: 21
#4

14 Aug 2024, 03:33

A company that is in both datasets should have 21 observations of the variable Year after appending. So in the appended dataset you could run:

Code:

egen noyears=count(Year), by(company_number)

and then drop the companies which have noyears<21.
Comment
Maria Sung

Join Date: Aug 2024

Posts: 3
#5

14 Aug 2024, 22:46

It could work, but it is not necessary that company has to have all years ( from 2002 to 2022). For example, company A has observations for 2007, 2010 and 2012 and in another data set from 2017 - 2020 and 2022, so in total 8 years. Even if there less than 21 years that company is present in both datasets so i have to keep it.
Comment
Andreas Backhaus

Join Date: Mar 2023

Posts: 21
#6

15 Aug 2024, 01:00

You can make use of the gen() option of the append command: append using ..., gen(newwave)
Then run: egen newobs=total(newwave), by(company_number)
If newobs is larger than zero for a given company_number, then new observations have been added to this company.
Comment

Announcement

Merging two panel data sets with different time period

Comment

Comment

Comment

Comment

Comment