Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge one to many unspecified

    I am attempting to combine several datasets. One contains yearly ownership of various firms by three funds, and the other contains firm-specific financials (R&D, CAPEX, etc) for 2012-2021. ~86,000 observations for ~670 firms. I wish to merge the firm yearly data to the appropriate ownership data (sorted here by holdingDate); the 2012 R&D spending to all filing dates in 2012, and so on. I was hoping to use firm name and year as key variables, but as year is a variable in the firm financials dataset, this isn't working. Is there a way to do this?

    Attached Files

  • #2
    Having year as a variable would be essential to merging with year and firm as the key variables, so I'm confused by what you say. Having proper example data for both of your files, created using the -dataex- command, as described in the StataList FAQ, would be helpful. Per the FAQ, screenshots aren't generally as helpful as one might think.

    Comment


    • #3
      You will need to use reshape long to convert your firm dataset into a dataset with separate observations for each year for each firm, containing the three ownership values for the three funds for that firm in that year. Something like (given the picture of your data)
      Code:
      reshape long CoG ..., i(CompanyName) j(year)
      where along with CoG you include also the other two funds.

      The output of
      Code:
      help reshape
      will give you more details, but you would be well advised to follow the link at the top of that output to read the full PDF documentation.

      Comment


      • #4
        Hello Mike, and thanks for that. Please find below first the dataex for the ownership file (selected variables; these are the important ones), and for the company financials file. I understand what you're saying regarding the year variable, unfortunately the firm data downloaded didn't come with year as a variable. If there's a way to fix that, I'd love to know the workaround.

        Appreciate your time. Working on my dissertation, and my STATA knowledge from undergrad econometrics isn't as comprehensive as I'd hoped it was.


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str6 tickerSymbol int holdingDate double sharesHeld str50 company float year
        ""         .        . ""       .
        ""         .        . ""       .
        ""         .        . ""       .
        ""         .        . ""       .
        ""         .        . ""       .
        "3105" 19173    23909 "WIN" 2012
        "3105" 19023   671156 "WIN" 2012
        "3105" 19297  1356769 "WIN" 2012
        "3105" 21730 14661426 "WIN" 2019
        "3105" 21914 13402433 "WIN" 2019
        "3105" 21914   507349 "WIN" 2019
        "3105" 21639 25165401 "WIN" 2019
        "3105" 21639   519149 "WIN" 2019
        "3105" 22280 15059059 "WIN" 2020
        end
        format %tdnn/dd/CCYY holdingDate



        [CODE]
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str50 CompanyName str5 tickerSymbol str17(Cog2012 CoG2013 GoG2014 Cog2015 CoG2016)
        "3I GROUP PLC " "III" "0" "0" "0" "0" "0"
        "3M Company" "MMM" "15780" "15765" "16168" "16237" "15200"
        "A. O. Smith Corporation" "AOS" "1227" "1298.1" "1406" "1529.7" "1511.6"
        "ABB" "ABBN" "26944" "28605" "29736" "27925"



        Comment


        • #5
          I think that could be perfect William!

          Comment

          Working...
          X