Merging Data with Missing Identifiers in some observations

hussein bataineh

Join Date: Apr 2023

Posts: 159
#1

Merging Data with Missing Identifiers in some observations

16 Mar 2025, 05:59

In my study, I am using two identifiers because some databases require a specific identifier. However, one of the identifiers, CompanyID, has missing values in some observations.

When I attempt to merge datasets using CompanyID, the merge does not work due to these missing values. Could you advise on how to resolve this issue without deleting the observations that have missing CompanyID?

I appreciate your help and look forward to your guidance.

HTML Code:

. merge 1:1 CompanyID year using "C:\Users\lenovo\Desktop\Data\source\DATA\work\third analysis\Governance\final_governance_varaibles.dta" variables CompanyID year do not uniquely identify observations in the master data r(459); . order id year ticker CompanyID
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

16 Mar 2025, 06:27

What are the implications of a missing identifier?

If you have cross-sectional data, simply assign unique values to these missing identifiers. However, with panel data, missing identifiers can be problematic. This is because such observations may belong to a unit with a non-missing ID in the dataset (e.g., a person, firm, or country) or another unit with a missing ID, making it impossible to properly associate them.

In any case, merging requires unique identifiers, so you need to assign unique values to the missing IDs before merging. After merging, you can convert them back to missing values. Using negative values is a convenient way to generate unique identifiers since numeric identifiers are typically positive integers.

Code:

assert CompanyID>0 replace CompanyID = -_n if missing(CompanyID) merge 1:1 CompanyID year using ... replace CompanyID=. if CompanyID<0

Last edited by Andrew Musau; 16 Mar 2025, 06:48.
1 like
Comment
hussein bataineh

Join Date: Apr 2023

Posts: 159
#3

16 Mar 2025, 08:20

Thanks Andrew for help
Comment

Announcement

Merging Data with Missing Identifiers in some observations

Comment

Comment