Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Compustat annual and segment database

    I have the compustat segment data that looks as follows:
    gvkey stype sid datadate cogss dps
    1004 BUSSEG 11 20000531 0 4.994
    1004 BUSSEG 8 20000531 422.507 3.526
    1004 BUSSEG 12 20000531 326.641 3.239
    1004 BUSSEG 13 20000531 195.718 6.304
    1004 GEOSEG 1 19990531 727.714 17.063
    1004 BUSSEG 7 19990531 727.714 17.063
    1004 BUSSEG 9 19990531 312.523 5.146
    1004 BUSSEG 10 19990531 107.49 3.455
    1004 GEOSEG 1 19990531 727.714 17.063
    1004 BUSSEG 8 19990531 457.336 2.916
    And I have the compustat annual fundamentals that looks as follows:
    gvkey datadate fyear cshpri dlc
    1004 19990531 1998 27.549 0.42
    1004 20000531 1999 27.103 26.314
    1004 20010531 2000 26.913 13.652
    1004 20020531 2001 28.282 42.525
    1004 20030531 2002 31.852 92.256
    1004 20040531 2003 32.111 3.392
    1004 20050531 2004 32.297 3.745
    1004 20060531 2005 33.53 2.289
    1004 20070531 2006 36.389 74.245
    I want to merge them by datadate and gvkey. But the segment data also has sid. The way I want is that for a given datadate and gvkey in the first table, each entry regrardless of the segment id have the data from the second table corresponding to the same gvkey and datadate.

    I tried merge m:1 gvkey datadate sid using "FirmLevelData09122022.dta" but that is not going to work since the second dataset doesnt have segment id.


  • #2
    Either
    Code:
    use fundamentals
    merge 1:m gvkey datadate using segment
    or
    Code:
    use segment
    merge m:1 gvkey datadate using fundamentals
    should do it. The variables named in the merge command are the variables to match on - so you don't include sid in the list.

    Comment


    • #3
      All you need is -merge m:1 gvkey datadate using FirmLevelData09122022-. There is no need to mention sid. In fact, as you have noted, you can't mention sid because it does not exist in the firm-level data. The way -merge- works, it will automatically apply the data from any combinatin of gvkey datadate in the FirmLevelData09122022 data set to all of the observations of business segments in the segment-level data set that match that gvkey and datadate. You don't have to do anything explicit to make that happen.

      Added: Crossed with #2 which says the same thing.

      Comment


      • #4
        I have tried it both ways, keep getting the following error:

        variables gvkey datadate do not uniquely identify observations in the using data

        Comment


        • #5
          You need to run the following:
          Code:
          use fundamentals
          isid gvkey datadate
          to confirm that gvkey and datadate are distinct in your fundamentals.

          Are they supposed to be distinct? Your merge command in post #1 suggests you expect they are. If indeed you expect any combination of gvkey and datadate to appear only once in the fundamentals dataset, then you have a data problem that you need to investigate and correct.

          In that case, see the output of
          Code:
          help duplicates
          for an explanation of the tools that help you locate duplicates in your data. For example,
          [CODE]
          use fundamentals
          duplicates report gvkey datadate
          [/CODE
          is a place to start.

          Comment


          • #6
            Thank you William. That diagnostic test helped a lot.

            Comment

            Working...
            X