Hi everyone,
I'll start by saying that this is my first post (I've read the FAQs and tips on posting), so I apologize in advance if the format isn't entirely correct. However, I'll try to be as complete as possible in describing the problems I've encountered.
As a part of my thesis, I need to merge deal data from LSEG with both Compustat and daily CRSP data using Stata MP 18. The challenge is the lack of a universal identifier across these datasets, which has forced me to try various workarounds but am still left with a disappointing number of observations matched to CRSP data.
Here’s a brief overview of what I’ve done:
I’ve also tried using other identifiers (tickers, company names) but haven’t managed to improve the match rate. At this point I’m not sure if I’ve missed something fundamental in the merging process or if there’s a better approach. As a result, I am now slightly worried and very open to any advice or tips from anyone more skilled at this than I am, as I am starting to feel like I'm running out of things to try. If anyone has suggestions or can spot flaws in my logic, I would be incredibly grateful. I’m happy to share sample code or additional details if needed.
I'll start by saying that this is my first post (I've read the FAQs and tips on posting), so I apologize in advance if the format isn't entirely correct. However, I'll try to be as complete as possible in describing the problems I've encountered.
As a part of my thesis, I need to merge deal data from LSEG with both Compustat and daily CRSP data using Stata MP 18. The challenge is the lack of a universal identifier across these datasets, which has forced me to try various workarounds but am still left with a disappointing number of observations matched to CRSP data.
Here’s a brief overview of what I’ve done:
- I began by linking SDC deal numbers to gvkeys using a mapping file, and then merged that with the CCM linking table to get both gvkeys and permnos for acquirers and targets.
- For deals without a direct gvkey/permno match, I used 6-digit CUSIPs to recover more links from the CCM table.
- I successfully merged Compustat financials for both targets and acquirers.
- Since the full CRSP daily dataset was too large (>5GB), I split it into ~7-year chunks.
- I then merged CRSP with the CCM linktable to attach permnos and gvkeys to each firm.
- Here’s where I think I may have gone wrong: I created a deal window file containing deal numbers and associated gvkeys/permnos, then merged this into the CRSP files several times to conserve observations, so for both target and bidder I used gvkey and permno. The idea was to later use a 1:m merge on deal number to bring CRSP data into the main dataset with deal and accounting data.
I’ve also tried using other identifiers (tickers, company names) but haven’t managed to improve the match rate. At this point I’m not sure if I’ve missed something fundamental in the merging process or if there’s a better approach. As a result, I am now slightly worried and very open to any advice or tips from anyone more skilled at this than I am, as I am starting to feel like I'm running out of things to try. If anyone has suggestions or can spot flaws in my logic, I would be incredibly grateful. I’m happy to share sample code or additional details if needed.