Merging two Datasets while one dataset has many individuals in many firms and the using dataset has firmdata

Jens Huang

Join Date: Apr 2017
Posts: 5

Merging two Datasets while one dataset has many individuals in many firms and the using dataset has firmdata

23 Jul 2018, 10:04

Dear stata community,

I have two datasets I want to merge: master data contains many individuals clustered in firms on the year level and the other one has simple month yearly firmlevel data. The problem is, the master data does contain one month for each director ID, so it is not a one to one match, we want to "fill
-in" the other months for every director with the using dataset. Therefore my Datasets look as follows:

Individual Dataset (master):

CompnayID	year	month	DirectorID	other variables
1	2000	3	1
1	2000	3	2
1	2000	4	3
1	2000	6	4
2	2000	3	5
2	2000	3	6
2	2000	3	7
...	...	...	...	...

Firmlevel dataset (using):

Compnay ID	year	month	other variables
1	2000	1
1	2000	2
1	2000	3
1	2000	4
1	2000	5
1	2000	6
1	2000	7
...	...	...	...

In the End I want to achive a dataset which looks something like this:

CompanyID	DirectorID	year	month	other variables from both datasets
1	1	2000	1
1	1	2000	2
1	1	2000	3
1	1	2000	4
1	1	2000	5
1	1	2000	6
...	...	...	...	...
1	2	2000	1
1	2	2000	2
1	2	2000	3
..	..	..	...	...

Tags: None

Sarah Edgington

Join Date: Apr 2014

Posts: 284
#2

23 Jul 2018, 11:16

I think you probably want to start by using the expand command on your master data and filling in the months. Then you can do a m:1 merge linking on company year and month.

This might look something like this:

Code:

expand 12 bysort companyid year directorid: replace month=_n merge m:1 companyid year month using ....

This requires that in your firm level data companyid year and month combined are a unique identifier. If you ever have multiple observations per month for a company it won't work.

Likewise if you ever have more than one observation per year for a given director it's going to be more complicated and what I've suggested won't work as written.
Comment

Christoph Halbmeier

Join Date: Jul 2018
Posts: 4

23 Jul 2018, 11:27

Hello Jens,

I am assuming in my answer that DirectorID is unique across firms. The solution also works if there is more than one month per year and DiretorID in the master.dta.

You could first create a "skeleton" dataset and merge the individual dataset (master) and the firm-level data.

Code:

use master.dta, clear

* Keep one random observation per DirectorID and year
keep DirectorID year
bysort DirectorID year: keep if  _n==_N

* Generate 12 observations per DirectorID and year
expand 12
bysort DirectorID year: gen month = _n

merge 1:1 year month DirectorID using master.dta, keep(master match)    // merge individual data
merge m:1 year month CompanyID using using.dta, keep(master match)   // merge firm-level data

If, on the other hand, the DirectorID is not unique across firms, you can change the code as follows:

Code:

use master.dta, clear

* Keep one random observation per CompanyID, DirectorID, and year
keep CompanyID DirectorID year
bysort CompanyID DirectorID year: keep if  _n==_N 

* Generate 12 observations per CompanyID, DirectorID, and year
expand 12
bysort CompanyID DirectorID year: gen month = _n

merge 1:1 year month CompanyID DirectorID using master.dta, keep(master match)    // merge individual data
merge m:1 year month CompanyID using using.dta, keep(master match)   // merge firm-level data

I hope that helps!

Best, Christoph

Comment

Jens Huang

Join Date: Apr 2017

Posts: 5
#4

24 Jul 2018, 07:25

thanks so much for your remarks!
I will try to implement them later today.
Comment
Jens Huang

Join Date: Apr 2017

Posts: 5
#5

26 Jul 2018, 04:41

Thanks again, I implemented Christophs second approache, since I'm not completeley sure whether Directors are unique across firms. And it just worked perfectly!
Comment

Announcement

Merging two Datasets while one dataset has many individuals in many firms and the using dataset has firmdata

Comment

Comment

Comment

Comment