Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two panel data sets with different time period

    Hello!
    I have two panel data sets which i have to merge, but they have different time period (first one is from 2002-2016, the second one is from 2017-2022) . I could append them, but list of companies in first data set and the second one are different. For example in first one i have companies: A, B, C, D, F. But in second one: A, B, D. So i need to merge them and keep only those companies which are present in both data sets.

    I have common variables in both data sets: company number, registretion number and kis. I cannot use Company name variable because company could change its name during 20 years. So when i try to merge data sets using those vars stata return an error:

    merge 1:1 company_number registretion_number kis
    variables company_number, registretion_number, kis do not uniquely identify observations in the master data r(459)

    I also tried to merge 1:m and m:1, but neither of them worked.

    I would really appreciate your help.

    Below are the examples of data sets:
    Firt data set:
    company_number registretion_number kis company_name Year
    1248138892 1.24311e+12 000701 A 2007
    1248138892 1.24311e+12 000701 A 2010
    1248138892 1.24311e+12 000701 A 2012
    2118134787 1.10114e+12 001872 B 2002
    2118134787 1.10114e+12 001872 B 2003
    2118134787 1.10114e+12 001872 B 2004
    2118134787 1.10114e+12 001872 B 2005
    2118139521 1.10111e+12 002267 C 2003
    2118139521 1.10111e+12 002267 C 2005
    Second data set:
    company_number registretion_number kis company_name Year
    1248138892 1.24311e+12 000701 A 2017
    1248138892 1.24311e+12 000701 A_i 2018
    1248138892 1.24311e+12 000701 A_i 2019
    1248138892 1.24311e+12 000701 A_i 2020
    1248138892 1.24311e+12 000701 A_i 2022
    2118139521 1.10111e+12 002267 C 2019
    2118139521 1.10111e+12 002267 C 2020
    2118139521 1.10111e+12 002267 C 2021
    6088106068 1.90111e+12 003456 D 2020

  • #2
    Why don't you append them and then calculate the number of non-missing observations by company_number to identify the companies that are present in both datasets?

    Comment


    • #3
      I don't exactly undestand what you mean by "calculate the number of non-missing observations by company_number". But if i undestood it right, there are 5000 observations in one data set and 4800 in another one. And non of them has missing observations in company_number

      Comment


      • #4
        A company that is in both datasets should have 21 observations of the variable Year after appending. So in the appended dataset you could run:

        Code:
        egen noyears=count(Year), by(company_number)
        and then drop the companies which have noyears<21.

        Comment


        • #5
          It could work, but it is not necessary that company has to have all years ( from 2002 to 2022). For example, company A has observations for 2007, 2010 and 2012 and in another data set from 2017 - 2020 and 2022, so in total 8 years. Even if there less than 21 years that company is present in both datasets so i have to keep it.

          Comment


          • #6
            You can make use of the gen() option of the append command: append using ..., gen(newwave)
            Then run: egen newobs=total(newwave), by(company_number)
            If newobs is larger than zero for a given company_number, then new observations have been added to this company.

            Comment

            Working...
            X