Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sorting groups

    I have panel data, the time-variable is id, and value-variable is Time: xtset time id, monthly

    I need to sort id into 5 groups/portfolios at each month t based on variable x(volatility) from low to high.

    after I get 5 groups including different id in each group, I want to calculate the return (variable y) for each group (sorted in the month t) in next month. "the return for each group" means the average of returns of each id in each group. The groups is sorted in month t, the return I want to get is in month t+1.

    I would be really grateful if some one could help me to figure out the command.


    Attached Files

  • #2
    Please read the Forum FAQ for excellent advice about how to present information so as to maximize your chances of getting a timely and helpful response.

    If you have not yet imported your data into Stata it is premature to be asking for help with code. Once you have done that, you should show your example data by using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Attachments of spreadsheets are, in general, discouraged. Some of our most active members do not use Microsoft Office products, so you preclude getting a response from them. Others are reluctant to download attachments from strangers due to the risk of malware. And, in any case, an Excel file will not have all of the information about the data that is needed to solve typical problems.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Dear Schechter,

      Many thanks for your excellent suggestion. Here is my data example.

      Code:
      Example generated by -dataex-. To install: ssc install dataex
      clear
      input str6 date int id float(volatility return Time)
      "Jan-91"   1 .0076515  .0080953 .
      "Jan-91"   2 .0082569  .0050203 .
      "Jan-91"   3 .0116118  -.110187 .
      "Jan-91"   4  .007547  .0189273 .
      "Jan-91"  12 .0085447  .0023504 .
      "Jan-91"  13 .0088329  .0281606 .
      "Jan-91"  17  .010967 -.0018181 .
      "Jan-91"  25 .0093389    .00644 .
      "Jan-91"  26 .0088709 -.0155422 .
      "Jan-91"  27  .011748  .0205174 .
      "Jan-91"  28 .0098117  .0048176 .
      "Jan-91"  29 .0108828  -.005227 .
      "Jan-91"  30 .0100979 -.0044575 .
      "Jan-91"  31 .0143006  .0110809 .
      "Jan-91"  40 .0101437 -.0458095 .
      "Feb-91" 106 .0142047  .1218685 .
      "Feb-91" 107 .0033309  .0998751 .
      "Feb-91" 109 .0070853  .1022816 .
      "Feb-91" 112 .0020066  .1269131 .
      "Feb-91" 116 .0081126  .0239336 .
      "Feb-91" 117 .0027221  .1054625 .
      "Feb-91" 118  .007132  .0922486 .
      "Feb-91" 119 .0037425   .111814 .
      "Feb-91" 123 .0041721  .1069812 .
      "Feb-91" 124  .021932  .1650661 .
      "Feb-91" 125  .005186  .1086161 .
      "Feb-91" 126 .0051869  .1417216 .
      "Feb-91" 209 .0088042  .0898837 .
      "Feb-91" 212  .008878  .0804817 .
      "Feb-91" 213 .0033121  .1345911 .
      "Feb-91" 214 .0130733  .1257204 .
      "Feb-91" 215 .0079939  .1178345 .
      "Feb-91" 216 .0040007  .0999595 .
      "Feb-91" 217 .0068929  .1253842 .
      end
      format %tmCCYY!mNN Time

      Comment


      • #4
        My work was stopped with the first step of generate monthly numerical Time.

        I know the daily numerical Time code:
        by id, sort: gen Time=daily(date, "DMY")

        I changed the code to monthly:
        by id, sort: gen Time=monthly(date, "MY")
        It doesn't work.

        Comment


        • #5
          Well, you were on the right track when you switched to monthly. The problem is that the "MY" pattern is not suitable for your data. "MY" can only be used with four-digit years. You have only two-digit years, so Stata doesn't know which century your years belong to. You have to tell it.

          Code:
          gen time = monthly(date, "M19Y")
          format time %tm
          
          drop Time date
          
          egen group = xtile(volatility), nq(5) by(time)
          
          egen mean_return = mean(return), by(group time)
          Note: Official Stata's -egen- does not include an -xtile()- function. To get it, run -ssc install egenmore-.

          Comment


          • #6
            Dear Clyde,

            What a brilliant reply! It helps me a lot. I am extremely grateful for your help!

            But, at the last step, it looks we didn't tell stata that the mean return at "t" should match the group sorted at "t-1", are we?

            for example:
            if the id(1,4,5) was sorted in group 1 at the first month (t), I want to calculate the mean return using id(1, 4, 5)'s return at the following month (t+1).

            Many thanks!

            Comment


            • #7
              Ah, yes, you did ask for the return in the following month. So instead of that -egen curent_mean...- command, run:

              Code:
              rangestat (mean) next_mean_return = return, by(group) interval(time 1 1)
              Note: -rangestat- is not part of official Stata. It is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC. Evidently, there will be no results of this calculation for the final time period.

              Comment


              • #8
                Dear Clyde:

                How do you know it!!! sooooo brilliant!!!

                It works very well!

                many many big THX!!!

                Comment


                • #9
                  Dear Clyde:

                  actually, I still have a small question

                  Code:
                   gen time = monthly(date, "M19Y")

                  for this code: what if the date goes beyond 19xx? the time period from 1990-2015, when the time after 1999, the timeline changed to 1900, 1901, ...

                  does the stata know these date is following the year of 1999, or the stata recognize these years earlier than 1999?

                  Many thanks~

                  Comment


                  • #10
                    Code:
                    date id idiovol mut time
                    1991-01-01 1 .007651 .008095 
                    1991-01-01 2 .008257 .00502 
                    1991-01-01 3 .011612 -.110187 
                    1991-01-01 4 .007547 .018927 
                    1991-01-01 12 .008545 .00235 
                    1991-01-01 13 .008833 .028161 
                    1991-01-01 17 .010967 -.001818 
                    1991-01-01 25 .009339 .00644 
                    1991-01-01 26 .008871 -.015542 
                    1991-01-01 27 .011748 .020517 
                    1991-01-01 28 .009812 .004818 
                    1991-01-01 29 .010883 -.005227 
                    1991-01-01 30 .010098 -.004457 
                    1991-01-01 31 .014301 .011081
                    I changed the date type in Excel
                    then, run the code: gen time=monthly(date, "YM") OR gen time=monthly(date, "MY")

                    both doesn't work, why?

                    THX

                    Comment


                    • #11
                      Re #9. There is a way of handling this situation. You can specify a third argument in the monthly() function that defines the numerically largest year. So, on the assumption that you have no dates beyond, say, year 2020 in your data, you would use
                      Code:
                      gen time = monthly(date, "MY", 2020)
                      The effect of this would be that Feb-91 would be interpreted as February 1991, and Mar-10 would be interpreted as March 2010. If you said Apr-20 that would be April of 2020, but May-21 would be May of 1921 (because 2021 > 2020). As long as the span of years in your data is less than 100 years, this works out OK.

                      Re #10. This doesn't work because now the dates you are importing are full daily dates, not just month and year. The monthly() function will only handle strings that contain a month and year and nothing else. So to handle strings like 1991-01-01 you need:

                      Code:
                      gen time = daily(date, "YMD")
                      Note that since the year already says the century, Y, without a preceding 19 or 20, works just fine. Note also that I'm assuming that 1991-01-01 represents 1991-Jan-01, not 1991-01-Jan: if it's the other way around, using "YDM" instead of "YMD." When illustrating dates it is best to give examples where the day is outside the 1-12 range so there is no ambiguity. Had you used 1991-15-01 or 1991-01-15, there would have been no ambiguity which order you meant.

                      If you are going to be working with Stata on financial data, you will need to learn how Stata handles date and time variables. Run -help datetime- and then click on the link near the top to the PDF documentation. This chapter is a long read and it's complicated. There are several different types of date and time representations and there are many functions translating between them, and also translating from string representations of dates to Stata internal dates, and display formats as well. You won't remember everything. Fortunately, you probably won't need to use everything either: there will be things that come up often, others rarely, and many not at all in your work. But by reading the whole thing you will learn the general principles underlying dates and times in Stata, and you will be aware of what kinds of tools Stata offers you to work with them. You'll then be able to make effective use of the help files, or refer back to the PDF documentation, to fill in gaps that you don't remember or to refresh your memory of the details.

                      Finally, you might also want to install Nick Cox's -numdate- command, from SSC. You need to have an understanding of how Stata dates and times work to make use of it, but it takes a lot of the guesswork and memory-straining out of creating Stata internal format date variables from the various forms in which you will find them in other data sets.

                      How do you know it!!!
                      I've been using Stata on a daily basis since 1994. And it also helps that I've been programming computers, in a variety of languages, since 1962.

                      Comment


                      • #12
                        WOW! What a clear explanation! Program Master! Yes, I will follow your suggestions.

                        I just switched from Eviews to Stata, due to loop. STATA attracted me by its great wisdom, really like it~

                        Comment

                        Working...
                        X