Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with CRSP data, all returns or prices in one column, merging two datasets

    Dear forum-members,

    first of all hello and it nice job with the forum, but the search did not work well for me. I am in big trouble have to do my thesis with stata, never worked before with it. I have to work with CRSP data, a financial data provider, the problem for me is that everything is in one column. I am used to it that I have a column per firm or share, CAN I CHANGE THAT IN STATA, to get from one column, to several for each firm?

    Another problem is that I have to merge the CRSP data so all shares listed from 1925-2014, with the Fama French factors from prof. French's website. The problem is the factors are in a text file, where the date is written as 192509, and I am quite desperate can't get the date into a stata format. As I've read so far, I have to make an identifier for the CRSP and the same one for the FF factors, so I could merge them. But the date identifier for the CRSP data is a different one for different stocks but the same date, and that is kind of unlogical to me.

    SO in short CRSP data all returns in one column, all dates in one column. In excel I would have sorted it column by column, but it is not possible to work with this data in excel, since I got 3.3 mil rows, excel can just handle a bit over 1mil.
    Then I need to sort 10th decile portfolios, and afterwards regress all individual returns over the factors, and store the residuals for each time and stock, then I have to sort 10 decile portfolios, on these residual returns and form momentum portfolios based on these. Holding period 12 months excluding the last month. So I have to look which stock performed the best over this 12 month period and assign it to one of 10 portfolios based on its relative performance at time t (month t, since monthly data). Rebalancing happens every year, for the investment strategy.

    The CRSP data in STATA seems quite unlogical to me.

    Any help greatly appreciated,

    Regards,
    Thomas

  • #2
    You are assuming here that we know what the CRSP data are and where they are. Some members may well know, but more might be able to help if you told them where the data are so that they can look.

    Comment


    • #3
      Sorry for that,

      I don't know if or how many of you have access to the Wharton CRSP data, I can access it over the uni, don't know if posting it here is allowed legaly, but the Fama French factors and Portfolios can be found here http://mba.tuck.dartmouth.edu/pages/....html#Research.
      BUt the CRSP data is simply stock return, price data, and some more stuff of American companies. The problem is that I get one column with returns for every single stock, for different dates. So for e.g. permno is the company identifier I get something like:
      Date permno
      jan1925 10001
      feb1925 10001
      .
      .
      .
      jan1967 10001
      jan1947 10024
      feb1947 10024
      .
      .
      .
      jan1970 10024

      For me the CRSP data is a mess, different time frames, one after the other for different stocks, I don"t know further when everything is in one variable, how stata should\could differentiate between the different stocks in the regressions since a lot of them have a different timeframe. This is a screenshot of the CRSP data in stata
      Attached Files
      Last edited by Thomas Maurer; 09 May 2014, 07:18.

      Comment


      • #4
        Nick may I offer on the forum money for help or tutoring lessons? I am in a quite desperate position, I exactly know what has to be done, but reading simple things like counting the number of distinct companies in stata took me one hour and it is not working properly. I know for sure that I have only three exchanges included in the data but the command unique gives me 16, and that cannot be. I have also read your paper about it. Unfortunately I do not have the time to read papers like that, don't get me wrong I am far from lazy, but stata has a very peculiar syntax in my view, which I would first have to understand through a lot of samples and I simply do not have the time.

        I would like to reshape the data, but it seems that the data I posted (the picture) is already the right way how stat works with financial data.

        Kind Regards,
        Thomas

        Comment


        • #5
          You can ask generally for people who might be willing to help you in return for money to contact you. You should supply an email address or some other way to contact you privately. Any such arrangement would be entirely a matter for the individuals concerned.

          If you are asking me, in public, whether I am willing to do this personally the answer is No. If anyone else has a similar question the answer is also No.

          My own advice is that emphasising how urgent and how desperate you are, although intended candidly, is all too likely to be counter-productive. I know people in the Stata community who earn their living through consultancy and I know none who would be encouraged to take on work by being told how much needs to be done and that it must be done quickly. I should add that I am not going to name any such person, either publicly or privately.

          Comment


          • #6
            Thomas
            I can understand your problem exactly. Some few comments, as far as I know Fama French factors are monthly provided in French website. Those risk factors are returns on mimicking portfolios. No company identifies in the file you download from French webiste that include the three factors and the one with the momentum factor. So you will merge only on dates, as you want the risk factors for each year month. I am not sure how to transfer to Stata format but did you try http://www.stata.com/support/faqs/da...-format-files/

            I am currently planning to estimate Fama French three factor model with momentum factor. I think the way you explained the model will not be clear to most people here. Please read "Do Investors understand really dirty surplus? " by Landsman et al. as it has a good section that explains how they implement the Fama French 3 factor model.
            I will post clear explanation for the model and the problem and an initial code so people can contribute next week when I am done with that. In the meantime, I am sure there are lots of people who have done the merge and dealt with the reshaping problem and will be also happy to post their code for the model....Hopefully!

            Ahmed

            Comment


            • #7
              Thomas
              I can understand your problem exactly. Some few comments, as far as I know Fama French factors are monthly provided in French website. Those risk factors are returns on mimicking portfolios. No company identifies in the file you download from French webiste that include the three factors and the one with the momentum factor. So you will merge only on dates, as you want the risk factors for each year month. I am not sure how to transfer to Stata format but did you try http://www.stata.com/support/faqs/da...-format-files/

              I am currently planning to estimate Fama French three factor model with momentum factor. I think the way you explained the model will not be clear to most people here. Please read "Do Investors understand really dirty surplus? " by Landsman et al. as it has a good section that explains how they implement the Fama French 3 factor model.
              I will post clear explanation for the model and the problem and an initial code so people can contribute next week when I am done with that. In the meantime, I am sure there are lots of people who have done the merge and dealt with the reshaping problem and will be also happy to post their code for the model....Hopefully!

              Ahmed

              Comment


              • #8
                @ Nick, I understand, it's fine.
                @ Ahmed, thanks for the tip, I think I will try to reshape the data to get one date column and several company columns(return and price columns), that way it is more natural to me.

                Comment


                • #9
                  I am not giving up, lets try to clear things up... The formatting tricks me, the date for the Text file is the utmost in the example not the price of the stata file.
                  Stata data (see file) is in the format: text file (containing only 1 sequence from 1926-2013, not more for every variable);
                  permno(stock identifier);date(for every stock);Price; Return(Holding period) Date;factor1; factor2; factor3;(factor4)
                  10001 31jan1926 192507
                  10001 28feb1960 192508
                  10025 30mar1990
                  10025 31jan2013 201403

                  SO the question is how to merge the best way? I think for me it is easer to work with it if I could reshape the stata data set into:
                  date; Price1...Price2300; Return1...2300 so columns for each stock.
                  then I could simply index it 1 to T, the same for the text file 1 to T and merge it based on that index, assuming I start each file at the same date.
                  Last edited by Thomas Maurer; 10 May 2014, 05:36.

                  Comment

                  Working...
                  X