Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Data with Missing Identifiers in some observations

    In my study, I am using two identifiers because some databases require a specific identifier. However, one of the identifiers, CompanyID, has missing values in some observations.

    When I attempt to merge datasets using CompanyID, the merge does not work due to these missing values. Could you advise on how to resolve this issue without deleting the observations that have missing CompanyID?

    I appreciate your help and look forward to your guidance.



    HTML Code:
    . merge 1:1 CompanyID  year using "C:\Users\lenovo\Desktop\Data\source\DATA\work\third analysis\Governance\final_governance_varaibles.dta"
    variables CompanyID year do not uniquely identify observations in the master data
    r(459);
    
    . order id year ticker CompanyID

  • #2
    What are the implications of a missing identifier?

    If you have cross-sectional data, simply assign unique values to these missing identifiers. However, with panel data, missing identifiers can be problematic. This is because such observations may belong to a unit with a non-missing ID in the dataset (e.g., a person, firm, or country) or another unit with a missing ID, making it impossible to properly associate them.

    In any case, merging requires unique identifiers, so you need to assign unique values to the missing IDs before merging. After merging, you can convert them back to missing values. Using negative values is a convenient way to generate unique identifiers since numeric identifiers are typically positive integers.

    Code:
    assert CompanyID>0
    replace CompanyID = -_n if missing(CompanyID)
    merge 1:1 CompanyID year using ...
    replace CompanyID=. if CompanyID<0
    Last edited by Andrew Musau; 16 Mar 2025, 06:48.

    Comment


    • #3
      Thanks Andrew for help

      Comment

      Working...
      X