Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging daily data and quarterly data

    Hi everyone,
    I have to merge two datasets. In the first, I have daily stock prices and in the second accounting quarterly data. Because of, the date does not match by using the merge command 1:m Data ISIN using accounting.dta, it creates many missing value. For instance, If I have in daily data 2024q3 and in the accounting data the most recent pubblication is 2024q2, it is not matched. How I can solve this issue?

  • #2
    Your situation raises two problems, neither of which is sufficiently spelled out in your post to support a specific solution.

    Problem 1: it is unclear whether your "daily data" uses actual daily dates or if you have already converted them to quarterly dates, and, similarly, whether your accounting data has quarterly dates or daily dates within certain quarters. So you need to use the -dataex- command to post example data (including the necessary metadata) so that these issues can be understood and dealt with appropriately.

    Problem 2. You need to give a general and exact description of how you want to associate accounting entries with stock prices. Riffing on the example you gave, if you have daily stock prices in both 2024q2 and 2024q3, do you want to associate that accounting event with the stock prices in both of those quarters? What if there is no further accounting data even in 2024q4. Do you want to also associate the 2024q4 stock prices with that 2024q2 acounting event? Is there any limit? What if there are multiple accounting events in 2024q2? Are they all associated with all of those daily stock prices, or does only the last one get associated with 2024q3 (and maybe beyond that). You need to state your requirements exactly and you must cover all the possibilities that can arise.

    Comment


    • #3
      I try to be clearer. I have data on stock returns on a daily basis.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int Data double(AR Raw_return) str12 ISIN
      23594 -1.3465626670017383 -.845327850125667 "AN8068571086"
      23595  -1.745474448212471  -1.2442396313364 "AN8068571086"
      23596   1.435302747286319  1.93653756416239 "AN8068571086"
      23597  -.8216742767066971 -.320439459830626 "AN8068571086"
      23600  .30243912112622884    .8036739380023 "AN8068571086"
      end
      format %tdnn/dd/CCYY Data
      I want to merge accounting information which are available on quarter basis.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int Data str12 Isin double(TotalAssets PricetoBookValueperShare TotalDebtPercentageofTotalE CashCashEquivalentstoTotal ReturnonAverageCommonEquity)
      23831 "AN8068571086" 4.9002e+10 2.91304125032027 67.48602 .05992  20.819358525254
      23741 "AN8068571086" 4.8935e+10 2.54181756283956 54.02237 .07242 21.5929717563349
      23649 "AN8068571086" 4.9775e+10 2.75363302496397 56.85938   .062 21.8989167909627
      23557 "AN8068571086" 4.9373e+10 3.17951687152959 59.19659 .05981 22.2586254693919
      23466 "AN8068571086" 4.7856e+10 3.77717447916667 55.51248 .05826  22.353365632409
      end
      format %tdnn/dd/CCYY Data
      I have generated the variable quarter in both dateset as follows:
      gen qdate_= qofd(Data)
      and then merge the two dataset:
      merge 1:m ISIN qdate using accounting.dta
      The problem is that I have many missing values for accounting data because they are not available every quarter. So, I want to replace the missing data with the most recent accounting data.

      Comment


      • #4
        Code:
        use `accounting', clear
        rename Isin ISIN
        tempfile holding
        save `holding'
        
        use `returns', clear
        isid ISIN Data, sort
        
        rangejoin Data . Data using `holding', by(ISIN)
        by ISIN Data (Data_U), sort: keep if _n == _N
        -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

        Notes:

        Actually the use of quarterly dates is not only unnecessary in this application, it actually can lead you to getting wrong results. Your example accounting data contains five different values of Data. But the one that is in the quarter that matches the example returns data is dated 9/30/2024, which is after all of the return dates. So using the daily dates, we would link all of the shown returns observations with the 6/30/2024 accounting observation, which is the most recent available as of those returns dates. But if we did this based on quarterly dates, we would select 9/30, which would, I think, be wrong.

        In developing this code, I saved the two data sets as tempfiles, named `returns' and `accounting', but you should replace those references by the actual names of your existing data sets.

        Be aware that the -rangejoin- command as written joins each observation in the returns data set with every observation of the same ISIN in the accounting data set whose date equals or precedes the date of the returns observation. Working with the real data sets, this could turn out to be a huge data set, possibly exceeding your memory limits. If that problem does arise for you, there are several possible solutions, such as chunking the data sets to single ISINs, processing them separately, and then appending all the results together, or, better, doing the same thing in software by using the -runby- command (by Robert Picard and me, available from SSC). Another approach would be to impose some limit on how far back in time you're willing to go to find the most recent accounting data. After all, although I don't know precisely what you plan to do with this combined data set once you have it, it is hard for me to imagine that it would be useful to match up a returns observation with an accounting observation from 10 years earlier. I imagine having that one be just missing would be better, am I right? In that case, you would revise the -rangejoin- command shown to use a real lower limit instead of missing value (.) .
        Last edited by Clyde Schechter; 28 May 2025, 14:44.

        Comment


        • #5
          Thank you very much for your detailed and insightful explanation. The code works perfectly, Linking the returns from August 2024 to the 6/30/2024 accounting data proved to be exactly the right approach. I'm grateful for that clarification.

          Comment

          Working...
          X