Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Monthly Standard deviation (Volatility) matching to yearly data

    So basically I'm not really "professional" sort of say in STATA but I have to work with it in my Thesis at the moment. I started by getting annual variables from compustat and then i stumbled upon standard deviation. I tried inputting monthly data on a different sheet and some how match or code through it, but im simply into skilled enough in STATA to do this. This could be done through a loop or something similar! To be more specific:
    Year ID Variables.....
    1 1
    2 1
    3 1
    1 2
    2 2
    3 2
    etc......

    This "kind off" represents my first data set.
    Month ID Returns
    1 1
    2 1
    3 1
    1 2
    2 2
    3 2
    etc......

    This represents my second data set....

    Note that of course, most firms will have monthly data, which means 12 data points for each year. Now I have two problems, first of all from compustat, some firms just simply "die" at month 5 for example and then if I do an if function, this will ruin my data. So i want a way to simply compute from month 1 till month 5 (in my example) for each year. My second concern is how after that I could match each data point of the standard deviation to the first annual set. Note Ofcourse the ID is the same but with different frequency (yearly and monthly). Also, note my Month, is actually not month in a way, it's 1 2 3 4 but more of how stata downloads it for example "31jan2010". I'm not sure if this is a problem but it's something i felt like noting!

    I'm really unsure how to achieve such a feet in STATA and I wouldn't mind sending my data if you'd like to view it. If you could give me the codes how to achieve this if it's not much work for you, it would help my research a lot!

    One final remark, is that in the second "monthly" data, I might have different firms (less or more) for example i could have 1 1 1 1 then 3 3 3 3 where it skips, firm 2 without having (ill add that as missing data or might just drop it) but yeah mainly I want to match both.

    Thank you for taking the time to read through my post!

    Regards,
    Last edited by Yousef Kaddoura; 26 Apr 2018, 19:47.

  • #2
    "Sending" data is discouraged here. It is helpful, however, to include examples of your data in your posts, using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    In addition, I really did not understand at all what you want to do with these two data sets. So I think a clearer explanation is in order. I gather you want to do something about 5 month periods of time, but I couldn't figure out what. You also say you want to match the monthly data to the yearly data, but I can think of many different ways of doing that, all of which would look very different. So I think perhaps you need to show a hand-made example of what the end result you want would look like.

    Comment


    • #3
      Hello again ^_^,

      Thank you for the fast reply! Yeah I thought it's not a good idea to send it!

      I'll try to be more clear in the following.

      I'm trying to run a panel regression where volatility (standard deviation of monthly returns) is a variable in the regression. My data is collected from Compustat on an annual frequency in 1 sheet. Now when i collect monthly returns, i get those in another sheet. so what i basically want is just match each standard deviation collected in the 2nd sheet of monthly returns in it's right fit in the first sheets of my annual data. so it's more of a matching procedure of the data.

      The following is an example of how my data is structured in on an annual basis sheet 1input int datayearfiscal long firm
      1970 1000
      1971 1000
      1972 1000
      1973 1000
      1974 1000
      1975 1000
      1976 1000
      1977 1000
      1983 1001
      1984 1001
      1985 1001
      1983 1003
      1984 1003
      1985 1003
      1986 1003
      1987 1003
      1967 1004
      1968 1004
      1969 1004
      1970 1004
      1971 1004

      The following is Returns collected from compustat (monthly) sheet 2:

      input long(firm datadate) double prccm float(year ret vol)
      1000 30apr1970 11.75 1970 . .21347223
      1000 31may1970 12.5 1970 .06382979 .21347223
      1000 30jun1970 9 1970 -.28 .21347223
      1000 31jul1970 10 1970 .11111111 .21347223

      .
      .
      .

      1000 31jan1971 9 1971 -0.1 0.1918624

      .
      .
      .


      1000 31jan1978 6605 9.125 1978 -.0135135 .0203081
      1000 28feb1978 6633 8.875 1978 -.0273973 .0203081
      1000 31mar1978 6664 8.875 1978 0 .0203081
      1000 30apr1978 6694 9.125 1978 .028169 .0203081
      1000 31may1978 6725 9.25 1978 .0136986 .0203081
      1000 30jun1978 6755 9.375 1978 .0135135 .0203081
      1000 31jul1978 6786 1978 .0203081


      .
      .
      .
      1001 30sep1983 6.125 1983 . .1432084
      1001 31oct1983 6.75 1983 .10204082 .1432084
      1001 30nov1983 8 1983 .1851852 .1432084
      1001 31dec1983 7.25 1983 -.09375 .1432084
      1001 31jan1984 8 1984 .10344828 .13216163
      etc..



      as you can see sheet 1 is annual, the second is monthly. I just want you to note before I continue, I calculated volatility using the following equations: 1)
      Code:
      by firm: gen ret=prccm/prccm[_n-1]-1
      2)
      Code:
       bys firm year: egen vol=sd(ret)
      I researched a bit, and this seems the correct way to get standard deviation using Stata. and you could see i included this in my second sample under name "vol" so maybe that saves you sometime.


      so what am I looking for?

      maybe something like this by merging two samples:

      input int datayearfiscal long firm vol
      1970 1000 0.21347223
      1971 1000 0.1918624
      1972 1000 etc
      1973 1000 .
      1974 1000 .
      1975 1000 .
      1976 1000 .
      1977 1000 .
      1983 1001 0.1432084
      1984 1001 0.13216163


      I'm sorry I didn't add all the numbers, but I just wanted to show you how I Want it to be briefly. You can see that each year is getting matched to the correct firm and the correct year.

      I colored 1977 in red because I wanted you to notice that for the second sheet ( Monthly Returns) I collected, it has more years or more data I suppose. To summarize, I want the standard deviation calculated to be matched according to the exact year and firm (key input) from 1 sheet to another. I'm sorry for the long text. If there is any thing unclear please tell me

      Thank you very much ^_^

      Regards,
      Last edited by Yousef Kaddoura; 26 Apr 2018, 23:02.

      Comment


      • #4
        I actually figured it out! but thank you for trying to help

        What I did after finishing my initial code in both sheets is the following :
        Code:
         merge 1:1 id year using filename, keep (master match)
        This matched every thing perfectly!

        Thank you nevertheless!

        Regards,

        Comment


        • #5
          And thank you for posting your solution so that others can learn from your experience, too!

          Comment

          Working...
          X