Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unifiying databases

    Dear users of StataCorp forum,


    As follows I describe my puzzle:


    Dataset 1
    variables: date cusip prc ret shrout
    Dataset 2
    variables: fyear cusip bkvlps csho ggroup gsector

    The time observations in Dataset 2 are in annual basis, whilst those in Dataset 1 are in monthly basis. Each cusip value corresponds to one unique company and so in Dataset 1 cusip values are repeated for all months when the company´s stock traded.

    What I need to do is to set the variables bkvlps csho ggroup gsector from dataset 2, for which I have one observation per year (D2: fyear) and per company (cusip) repeatedly for every month (D1: date) in dataset 1 that belongs to that particular year (D2: fyear) and it must also coincide for that given company (cusip). Therefore, at the end, I should have a unique dataset that has the variables date cusip prc ret shrout bkvlps csho ggroup gsector wherein the variables bkvlps csho ggroup gsector repeat accordingly to cusip and to date depending on whether the month in date coincides with one within the year fyear.


    Dataset 1

    Click image for larger version

Name:	Screenshot (289).png
Views:	1
Size:	22.1 KB
ID:	1488659



    Dataset 2

    Click image for larger version

Name:	Screenshot (290).png
Views:	1
Size:	23.7 KB
ID:	1488660



    In the images you can observe only one cusip, ggroup, and gsector, but the variable cusip changes and is unique for each different firm while the ggroup and gsector remain constant across the time-series for each cusip but are not unique to that particular company as is the cusip.



    If you need any more information to help me with this puzzle, do not hesitate in asking me.

    Thank you for your time and attention.

  • #2
    Welcome to Statalist.

    What you want is to merge your two datasets, as described in the output of help merge which you should read before proceeding. You will want to match observations using the cusip and fyear. You have two problems:
    • dataset1 does not have fyear
    • dataset2 has the 9-digit cusip, of which the first 8 digits will match the cusip in dataset1
    Here's (untested) example code that might start you in a useful direction.
    Code:
    use dataset2, clear
    replace cusip = substr(cusip,1,8)
    save d2merge, replace
    
    use dataset1, clear
    generate fyear = year(date)
    merge 1:m cusip fyear using d2merge

    Comment


    • #3
      Thank you William for your quick reply.

      Your initial direction has been of much of a use for me. As follows I leave what I have done, for the record:



      In dataset2

      replace cusip = substr(cusip,1,8)

      Convert fyear to a string: tostring fyear, gen(syear)

      Concatenate syear and cusip: egen syear_cusip=concat(syear cusip)


      In dataset1

      generate fyear = year(date)

      Convert fyear to a string: tostring fyear, gen(syear)

      Concatenate syear and cusip: egen syear_cusip=concat(syear cusip)

      merge m:1 syear_cusip using "dataset2.dta"



      Thank you for your time!
      Last edited by Roberto Iglesia; 18 Mar 2019, 10:43.

      Comment


      • #4
        Roberto,

        What you have done will work for your present purposes. But for the future you might want to take note of the following:

        1. There is no need to create a separate variable syear. the -egen, concat()- function will do the conversion for you if you coded it as -egen fyear_cusip = concat(fyear cusip)-.

        2. You don't need to combine year and cusip into a single variable to do the merge. -merge m:1 fyear cusip using dataset2- would have done the trick.

        There is no harm, other than perhaps wasted effort and memory, in creating syear and syear_cusip as you have done. But unless you need those variables for some other purpose, they are unnecessary. (And, to be honest, I can't think of any situation where a variable like syear would be useful.)

        Comment


        • #5
          Thank you for your response Clyde. I will have it into account looking forwards.

          Comment

          Working...
          X