Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets with multiple observations per company ID

    Hi,
    Newbie to Stata and trying two merge two datasets (one from Compustat and one from Execucomp) with unique ID numbers (these are called 'gvkey'). The problem I'm facing arises when I want to merge the two datasets based upon their 'gvkey', I keep getting the message that "gvkey does not uniquely identify observations" (both for master and using data). I get why Stata reports this message, as each 'gvkey' has a observation per year, but I do not know how to go further at this point. My goal is to merge the two datasets based on their 'gvkey' and 'fyear' (see below).

    Dataset 1:
    GVKEY FYEAR IBC OANCF
    001004 1994 $X $X
    001004 1995 $X $X
    001004 1996 $X $X

    Dataset 2:
    GVKEY FYEAR CEO EARLYYEARS
    001004 1994 1 1
    001004 1995 1 1
    001004 1996 1 1

    ----------

    What I whish to get is this:

    Dataset 1+2 (merged):
    GVKEY FYEAR IBC OANCF CEO EARLYYEARS
    001004 1994 $X $X 1 1
    etc. etc.

    ----------

    Please help me out!
    Thanks in advance!!

  • #2
    you can merge on more than 1 variable at a time - see the help file
    Code:
    help merge
    also, please read the FAQ which has advice on how to post so that people can respond better (e.g., see the FAQ re: -dataex-)

    Comment


    • #3
      Hey Rich, thanks for your response. Actually, the problem is not merging more than 1 variable at a time. The problem is that the ID (gvkey) has multiple observations (one per year) and therefore does not uniquely identify observations, for this reason I can not merge the two datasets! Do you know how to solve this without losing observations?

      Comment


      • #4
        Apparently you did not correctly understand the output of help merge which tells us that one example of the syntax is
        Code:
            One-to-one merge on specified key variables
        
                merge 1:1 varlist using filename [, options]
        Clicking on varlist explains that it is a list of one or more variables. So instead of gvkey specify gvkey fyear on your merge command.

        Comment


        • #5
          Hi William, thanks for your response. However, I keep getting the message "gvkey fyear do not uniquely identify observations". How do I resolve this?

          Comment


          • #6
            Hi Robin,

            you can try:

            merge 1:m GVKEY using my_data
            or
            merge m:1 GVKEY using my_data

            depends of which data is the master data
            Last edited by AY Russof; 15 Jan 2020, 10:46.

            Comment


            • #7
              In one of your datasets (Stata told you whether it was your "master" dataset or your "using" dataset, but you didn't tell us what Stata told you) you have multiple observations for the same combination of GVKEY and FYEAR. Is this what you expected? I have no idea what to expect from Compustat and Educomp.

              If you expected one observation per company per fiscal year in both datasets, you have a problem in your data that you have to resolve.

              If you expected multiple observations per company per fiscal year in one of your datasets, then you need
              Code:
              merge 1:m GVKEY FYEAR USING ...
              if it was in your "using" dataset, or
              Code:
              merge m:1 GVKEY FYEAR USING ...
              if it was in your "master" dataset - the one that is in memory when you run the merge command.

              If you expected multiple observations per company per fiscal year in both of your datasets, then you're in the same shape you were in when you thought you could merge on GVKEY alone. One or both of your datasets should have a combination of variables that uniquely identifies each observation. If that's not the case for your data (again, many Statalist members do not work with financial data and have no idea what to expect of your datasets) then you need to better describe the data you are trying to merge.

              We can better help you if we know what commands you have tried and what Stata told you to indicate that there was a problem. Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. See especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], as described in section 12 of the FAQ.

              Comment

              Working...
              X