Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging

    Hello List,

    I am quite new to stata and as you can see, this forum. I was wondering if one of you could help me and/or give me some guidance regarding a problem with merging two datasets.

    I have two datasets:

    (1) with deals over a certain period.

    Announcement Date | cusip | company name | Deal value | acc information

    For example:
    31-Jan-2003 | (6digit cusip ) | IBM | 120 mln | XXX

    (there could be multiple deals on one day for a single company)


    (2) CEO information per company (out of Execucomp)

    Company name | name ceo | starting date CEO position | end date CEO position | age | current ceo | cusip | gvkeys | other personal information

    For example:
    IBM | Ginni Rometty | 01-jan-2000 | 01-jan-2015 | 56 years | (6digit cusp) | yes | XX
    IBM | Samuel J. Palmisano | 03-march-1995 | 01-jan-2000 | 78 years | (6digit cusp) | no | XX

    I want to match these CEO - Company combination with the deals from SDC. So basically I want to merge these two datasets (2 on 1), but there are more than one observation per cusip. How do I solve this problem? I thought of a m:m merge, but than the observations are all mixed and this doesn't feel right.

    I think the the problem is that these deals lie within the timespans that these CEO's are in there position.
    I hope that someone could help me with this, and if you need more information let me know.

    Thank you for taking the time to respond/read this.

    Sander






    Last edited by Sander de Boer; 18 Jun 2015, 11:00.

  • #2
    I gather from what you say that cusip is the main identifier in the two files, correct?

    if yes, then is the goal to match only those where the announcement date is within the starting date CEO and the ending date CEO?

    if yes, then I would (after making sure my dates were Stata dates (maybe they already are - you don't say), use joinby to form all combinations and then
    Code:
    keep if inrange(announcement_date,starting_date, ending_date)
    note that what you show as variable names can't be correct but I have changed them only slightly

    if I am wrong about what you want, please clarify

    Comment


    • #3
      Mr Goldstein,

      The cusip is the main identifier indeed.

      And yes I want to have the following dataset in the end

      (1) Announcement Date | cusip | company name | Deal value | CEO | acc information

      So I want to identify the CEO that was active during the deal by the cusip identifier and date.

      As I am completely new to stata, I don't know what you mean by the difference in dates and or Stata dates? I imported excel files into stata, and I didn't adjust the announcement_dates any further.

      And with using the joinby command, should I only use: joinby cusip using "ceo.dta file" ?

      The variable names are changed indeed, as I am not at my working place right now but I want to start on it tmr morning.
      But you've been right about the things that I wanted. Thank you!




      Comment


      • #4
        I am not familiar with Excel; nor do you say how you imported the data from Excel to Stata

        I would start with
        Code:
        describe announcement_date start_date end_date
        or whatever the actual names are; if they show as string (e.g., %9s) then you will need to change them to numeric dates with date formats (for help on this, type "help datetime"); if they show as numeric (e.g., an "int") and they have a format that starts "%t", then you are ok

        your proposed joinby command looks fine to me (note that the double quote marks are only needed if there is a space in the file name; I am assuming here that " file" is a typo - don't include it

        Comment

        Working...
        X