Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two Datasets while one dataset has many individuals in many firms and the using dataset has firmdata

    Dear stata community,

    I have two datasets I want to merge: master data contains many individuals clustered in firms on the year level and the other one has simple month yearly firmlevel data. The problem is, the master data does contain one month for each director ID, so it is not a one to one match, we want to "fill
    -in" the other months for every director with the using dataset. Therefore my Datasets look as follows:

    Individual Dataset (master):
    CompnayID year month DirectorID other variables
    1 2000 3 1
    1 2000 3 2
    1 2000 4 3
    1 2000 6 4
    2 2000 3 5
    2 2000 3 6
    2 2000 3 7
    ... ... ... ... ...
    Firmlevel dataset (using):
    Compnay ID year month other variables
    1 2000 1
    1 2000 2
    1 2000 3
    1 2000 4
    1 2000 5
    1 2000 6
    1 2000 7
    ... ... ... ...
    In the End I want to achive a dataset which looks something like this:
    CompanyID DirectorID year month other variables from both datasets
    1 1 2000 1
    1 1 2000 2
    1 1 2000 3
    1 1 2000 4
    1 1 2000 5
    1 1 2000 6
    ... ... ... ... ...
    1 2 2000 1
    1 2 2000 2
    1 2 2000 3
    .. .. .. ... ...


  • #2
    I think you probably want to start by using the expand command on your master data and filling in the months. Then you can do a m:1 merge linking on company year and month.

    This might look something like this:

    Code:
    expand 12
    bysort companyid year directorid: replace month=_n
    merge m:1 companyid year month using ....
    This requires that in your firm level data companyid year and month combined are a unique identifier. If you ever have multiple observations per month for a company it won't work.

    Likewise if you ever have more than one observation per year for a given director it's going to be more complicated and what I've suggested won't work as written.

    Comment


    • #3
      Hello Jens,

      I am assuming in my answer that DirectorID is unique across firms. The solution also works if there is more than one month per year and DiretorID in the master.dta.

      You could first create a "skeleton" dataset and merge the individual dataset (master) and the firm-level data.

      Code:
      use master.dta, clear
      
      * Keep one random observation per DirectorID and year
      keep DirectorID year
      bysort DirectorID year: keep if  _n==_N
      
      * Generate 12 observations per DirectorID and year
      expand 12
      bysort DirectorID year: gen month = _n
      
      merge 1:1 year month DirectorID using master.dta, keep(master match)    // merge individual data
      merge m:1 year month CompanyID using using.dta, keep(master match)   // merge firm-level data

      If, on the other hand, the DirectorID is not unique across firms, you can change the code as follows:

      Code:
      use master.dta, clear
      
      * Keep one random observation per CompanyID, DirectorID, and year
      keep CompanyID DirectorID year
      bysort CompanyID DirectorID year: keep if  _n==_N 
      
      * Generate 12 observations per CompanyID, DirectorID, and year
      expand 12
      bysort CompanyID DirectorID year: gen month = _n
      
      merge 1:1 year month CompanyID DirectorID using master.dta, keep(master match)    // merge individual data
      merge m:1 year month CompanyID using using.dta, keep(master match)   // merge firm-level data
      I hope that helps!

      Best, Christoph

      Comment


      • #4
        thanks so much for your remarks!
        I will try to implement them later today.

        Comment


        • #5
          Thanks again, I implemented Christophs second approache, since I'm not completeley sure whether Directors are unique across firms. And it just worked perfectly!

          Comment

          Working...
          X