Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge Mutual Fund Holdings quarterly data with monthly CRSP stocks data

    I have a difficult merging task to do, and I would like your help on this.

    So I have two data sets:

    1. Mutual Fund Holdings quarterly data; (master dataset)
    I need three variables to uniquely identify each observation in the data: wficn cusip yrqt;
    wficn: Wharton Financial Institution Center Number (WFICN), which is a unique and permanent fund portfolio identifier.
    cusip: The holdings are identified by CUSIP
    yrqt: such as 1998q1, 1998q2, etc..

    for each quarter, there are many funds, each of which has many stock holdings.

    2. CRSP monthly stocks data-unbalanced panel data (using dataset)

    cusip: stock identifier
    yrm: time/date variable, such as 1998m1, 1998m2...1998m12, etc....


    TASK: merge these two datasets

    What I really want (the ideal merge result) is:

    assume the equity holdings for fund i is the same between 1998q1, and 1998q2; that is assume the fund portfolio i holds the same set of equities in month 1998 April, May as in 1998 June. What can I do about the first dataset to achieve this ?


    What I actually compromise is to convert the 2nd dataset to quarterly level;
    then use m:1 merge using cusip yrqt as the key.













  • #2
    Generically speaking, converting monthly data to quarterly data will be done rather easily by something like

    Code:
    gen yrqt = qofd(dofm(yrm))
    collapse [whatever], by(cusip yrqt)
    in your second data set.

    The hard part is figuring out what [whatever] should be. You have multiple month observations in each quarter and you need to decide how to combine those into a single observation that represents the quarter. Depending on what the variables you are working with are, there are different solutions that make sense: totals, means, first, last, smallest, largest, some other summary measure, etc.. In any case, that is more likely a question about finance than a question about statistics.

    Comment


    • #3
      As I re-read your post, I think I may have misunderstood your intent. I pointed you in the direction of combining the monthly observations of dataset 2 into quarterly aggregates for merger with dataset 1. But perhaps what you want to do is expand dataset 1 into monthly observations to merge with the monthly data in dataset 2. And you specifically want to assume that the observed variables for any given quarter apply to each month in that quarter unchanged. That, too, is pretty straightforward:

      Code:
      // USING DATASET 2
      expand 3 // 3 MONTHS IN EACH QUARTER
      bysort wifcn cusip yrqt: gen yrm = mofd(dofq(yrqt)) + _n - 1
      That will give you three copies of each quarterly observation and will serially assign yrm to the 3 months in the quarter.

      Comment


      • #4
        Thank you so much, Clyde! I do want to do my further analysis in monthly frequency, since I do have lots of other variables are in monthly levels.

        Comment


        • #5
          Hi, Clyde, can you explain this line here? how to generate the year- month date variable?
          bysort wifcn cusip yrqt: gen yrm = mofd(dofq(yrqt)) + _n - 1

          Comment


          • #6
            At the point when this code gets executed, there are 3 observations for each wifcn cuspi yrqt combination. And they are identical on all variables. What this line does is to create a new variable, yrm, designed to match the monthly dates in the other data set. Within each by-group (i.e. within each set of 3 identical observations), we take the quarterly date, and find out the daily date of the first day of the quarter (dofq). Then we convert that to a monthly date (mofd). In the first of the three observations we then add 0. In the next of the three observations we add 1--taking us to the next month. And in the third observation we add 2, taking us to the third month of that quarter.

            I'm not sure which part of it you're really asking about. If you don't understand mofd() and dofq() read the help and manual on date functions. If you don't understand what bysort and _n are doing, read [U] sections 11.5, 13.7 and 27.2. What you will learn there will be indispensable for the effect use of Stata--the time spent will repay itself many fold and rapidly.

            Comment

            Working...
            X