Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • quarter, month, week variables representing their sequence

    Hi all

    I have date variable of the format

    ​​​​​​----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long rdq
    19810216
    19810220
    19810225
    19810303
    19810304
    19810303
    19810312
    19810311
    19810318
    19810317
    19810317
    19810320
    19810317
    19810316
    19810317
    19810318
    19810318
    19810324
    19810326
    19810325
    19810325
    19810323
    19810401
    19810409
    19810410
    19810409
    19810416
    19810416
    19810417
    19810413
    19810415
    19810417
    19810416
    19810415
    19810417
    19810417
    19810416
    19810415
    19810414
    19810417
    19810417
    19810417
    19810414
    19810415
    19810417
    19810413
    19810416
    19810414
    19810416
    19810416
    19810416
    19810417
    19810414
    19810414
    19810416
    19810423
    19810423
    19810421
    19810423
    19810422
    19810421
    19810422
    19810423
    19810422
    19810424
    19810424
    19810422
    19810423
    19810424
    19810423
    19810424
    19810423
    19810424
    19810423
    19810422
    19810421
    19810423
    19810424
    19810422
    19810423
    19810422
    19810422
    19810425
    19810421
    19810421
    19810422
    19810422
    19810424
    19810420
    19810422
    19810420
    19810424
    19810427
    19810427
    19810428
    19810430
    19810428
    19810428
    19810429
    19810428
    end
    ------------------ copy up to and including the previous line ------------------

    I want to create three variables:
    year: this should be the year variable (e.g. 1980, 1981...)
    quarter: this should take values from 1 to 4 consistent with the quarter of the year
    month: this should take values 1 to 3 consistent with the months of the quarter
    week: this should take values 1 to 4 consistent with the weeks of the month

    I also want to clarify that I imported the data directly from excel and did not set the rdq variable as a date variable in Stata as I am not sure if this will be required before I create the quarter-week variables?

    Thanks



  • #2
    Code:
    tostring rdq, replace format(%8.0f)
    gen date = daily(rdq, "YMD")
    assert missing(date) == missing(rdq)
    format date %td
    gen year = yofd(date)
    gen qofy = quarter(date)
    gen month = month(date)
    gen mofq = mod(month, 3)
    replace mofq = 3 if mofq == 0
    The variables year, qofy (i.e. quarter of year), and mofq (i.e. month of quarter) are what you requested as quarter and month.

    I have not dealt with week because weeks are a real mess. First, you need to be clear how you even want to define it. Once you do that, it will probably be a huge mess to code it. Here are the pitfalls:

    1. Numbering weeks of the month 1 through 4 is a problem because 4 weeks only cover 28 days, so all months but February are longer than 4 weeks.
    2. In the US at least, the convention is that weeks begin on Sunday. But if the first day of the month falls on, say a Wednesday, does the first week consist only of days 1 through 4, or is the first week the first 7 days of the month? That might be rephrased as: by "week" do you mean a week on a calendar, or do you mean a period of 7 days starting at the beginning of the month. And then, of course, there is the question of what to do about dates that fall after the end of the fourth week of the month.

    also want to clarify that I imported the data directly from excel and did not set the rdq variable as a date variable in Stata as I am not sure if this will be required before I create the quarter-week variables?
    Well, probably there is some long, opaque, convoluted way to get these results without first going through a Stata date variable, but I wouldn't want to go there. In fact, you should take it as an axiom that there are very few useful things that you can do with dates in Stata without having the information as a Stata internal format date variable. Strings and long integers that look like dates to the human eye should almost always be converted to Stata internal format date variables if you plan to use them in any way at all. (I said "almost always" to try to avoid being dogmatic, but, honestly, I can't even think of a single exception.)

    Comment


    • #3
      Thank you so much Clyde.
      Regarding the week variable, I see the issue and thanks for highlighting this very clearly.

      Let us have this week variable as a stylized week variable such that week 1 represents the first 7 days of a given month (days 1 to 7 of a given month), week 2 the next 7 days (days 8 to 14 of a given month), and so on. In line with this, the days that fall after week 4 will then be coded as of week 4.

      Will appreciate if you can adjust your code to do that. Thanks a lot

      Comment


      • #4
        A search for mentions of "week" here yields hundreds of hits, many just incidental mentions. But for example

        https://www.statalist.org/forums/for...om-weekly-data

        includes posts with more detail, including references to discussions of handling weeks published in the Stata Journal.

        #3 however is a definition that it is not discussed anywhere that I can remember, but given daily dates it yields immediately to standard functions.

        Code:
        . clear
        
        . set obs 20
        number of observations (_N) was 0, now 20
        
        . gen ddate = mdy(12, 31, 2019) + runiformint(1, 366)
        
        . format ddate %td
        
        . gen kraftweek = min(4, ceil(day(ddate)/7))
        
        . sort ddate
        
        
        . gen month = month(ddate)
        
        
        . list ddate kraftweek, sepby(month)
        
             +----------------------+
             |     ddate   kraftw~k |
             |----------------------|
          1. | 15jan2020          3 |
          2. | 24jan2020          4 |
             |----------------------|
          3. | 04feb2020          1 |
          4. | 06feb2020          1 |
             |----------------------|
          5. | 02mar2020          1 |
          6. | 10mar2020          2 |
             |----------------------|
          7. | 14apr2020          2 |
             |----------------------|
          8. | 16may2020          3 |
             |----------------------|
          9. | 14jun2020          2 |
         10. | 22jun2020          4 |
         11. | 27jun2020          4 |
         12. | 28jun2020          4 |
             |----------------------|
         13. | 08jul2020          2 |
         14. | 19jul2020          3 |
             |----------------------|
         15. | 20aug2020          3 |
             |----------------------|
         16. | 11sep2020          2 |
             |----------------------|
         17. | 11oct2020          2 |
         18. | 26oct2020          4 |
             |----------------------|
         19. | 17dec2020          3 |
        See https://www.stata-journal.com/articl...article=dm0058 for one personal selection of the functions every seasoned user of Stata should know about.

        Getting your existing daily date variable into a Stata daily date variable has already been nicely covered by Clyde Schechter, and in any case you can go directly to

        Code:
        help datetime 
        Last edited by Nick Cox; 24 Sep 2020, 01:26.

        Comment


        • #5
          Thanks Clyde and Nick.
          I did that and it worked:

          Code:
          tostring rdq, replace format(%8.0f)
          gen date = daily(rdq, "YMD")
          assert missing(date) == missing(rdq)
          format date %td
          
          gen year = yofd(date)
          gen quarter = quarter(date)
          gen mt = month(date)
          gen month = mod(mt, 3)
          replace month = 3 if month == 0
          
          gen week = min(4, ceil(day(date)/7))
          I will also need to create a date variable that represents the new weekly date and tsset my date weekly (by id and this new weekly date). How can I create a new date that represents that?



          Comment


          • #6
            Manifestly your new weekly date variable cycles repeatedly, at least in principle, around 1 to 4 -- but the scope to use it in association with tsset or xtset appears to be zero:

            1. If you have more than one observation for a given week number -- even within the same year, quarter, month triple -- the set will fail.

            2. Conversely if you have at most one observation for a given week number -- within the same year, quarter, month triple -- then the only way for the set to work is that each distinct monthly date defines a single panel. So you could have many panels each up to 4 observations long. It's unlikely that that fits any idea about data generation process, but that's your side.

            Turning this around, there is a good reason why people haven't, in my recollection, hitherto asked for this kind of weekly variable in Stata. They really aren't any use for analysis.

            Comment


            • #7
              Thank you Nick.

              Perhaps I do not need to tsset the data to achieve my objective (explained below). I will clarify this objective now and also ask for your help in relation to the point that I can not actually do.

              I have panel data including firm, industry, and date. My aim is to create time-series data. I explain below.

              I have a variable (profit) and a variable (expense) that represent the profit and expense of each firm within each industry. I need to create different time series for each variable for each industry.
              I will focus on profit when I explain below as I also need time series fo expenses exactly the same way as for profit.
              The time series for the industry profit is defined as rolling weekly median profits for each industry within the quarter. For example, suppose that we have some firms in industry number 1 that reported profits in week 1 of the quarter. Then in weeks 2 of the quarter, more firms in the same industry reported profits and so on until the end of the quarter.

              To convert my panel data into profit time-series variables, I need to define different time-series profit variables for the different industries I have such that profit1 (which is the time-series profit for industry 1 for example) week 1's value is based on the median of firms' profits in this industry in week 1, week 2's value is based on the median firms' profit of this industry in weeks 1 and 2, then week 3's value is based on the median firms' profit of this industry in weeks 1, 2, and 3. and so on until the end of each quarter of the year.

              This will give me median profits at each week of the quarter for each given industry (i.e. the median profit is updated when more profits in the industry are reporting profits)

              After that, I will create different datasets (I call them series) that represent median profit as at each week of the quarter. The aim here is that each "series" represents the median profit of the industry as at each week of the quarter. My analysis will be later conditioned on these series.

              I show you how I proceeded and where I am stuck below:

              Code:
              gen real_month = 3*(quarter-1) + month
              gen quarter_date = qofd(mdy(real_month, 1, year))
              format quarter_date %tq
              gen series = 4*(month-1) + week
              
              
              ***Creating the time series data as explained in my post:
              
              // Here my data will be time series as I explained in my post
              
              // ??? stuck
              
              
              xtset series quarter_date, quarterly
              At this point, I will not have the firms but rather variables that represent the time series profit of each industry and variables that represent the time series expenses of each industry.
              After the data becomes a time series data I can then run time-series regressions such that

              Code:
              regress profit1 profit2 profit3 expense1 expense2 if series==1
              
              regress profit1 profit2 profit3 expense1 expense2 if series==12

              I also show you the data structure here which includes the firm, industry, profit, expense, etc.

              ​​​​​​
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long firm byte industry long profit double expense float(date year quarter month week)
                5709 30   354240   385588 7717 1981 1 2 3
               10618 20    77657   237728 7721 1981 1 2 3
                2497 20    51141   131180 7726 1981 1 2 4
                8577 55    99506   218943 7732 1981 1 3 1
                7922 25   130866   296792 7733 1981 1 3 1
                6829 25   205798   301728 7732 1981 1 3 1
                1410 20    63145    67823 7741 1981 1 3 2
               11259 30   554184   592345 7740 1981 1 3 2
                4598 20   152512   487281 7747 1981 1 3 3
                4990 25   125319   120380 7746 1981 1 3 3
                5667 10    93135   142273 7746 1981 1 3 3
                7116 25   205362   550493 7749 1981 1 3 3
                7138 60    57686    69358 7746 1981 1 3 3
                9538 30    90739   150646 7745 1981 1 3 3
                6307 25  4441594  6102461 7746 1981 1 3 3
                9818 25  1274557  3834146 7747 1981 1 3 3
                3813 25  1428899  2155234 7747 1981 1 3 3
                5018 20   497461  1747335 7753 1981 1 3 4
                1913 15   152901   346410 7755 1981 1 3 4
                9906 55   120843  1027906 7754 1981 1 3 4
                8579 25    40898    72719 7754 1981 1 3 4
                9599 45     3545    14210 7752 1981 1 3 4
                8151 10   349232   759373 7761 1981 2 1 1
                2817 20  2190700  6408098 7769 1981 2 1 2
               10984 50   534012  4335660 7770 1981 2 1 2
               11264 30   489653   489018 7769 1981 2 1 2
               10507 55   555374  6733430 7776 1981 2 1 3
                1045 20   952544  3393143 7776 1981 2 1 3
                6178 55    87216  1071248 7777 1981 2 1 3
                1161 45    80318   224694 7773 1981 2 1 2
                3036 50    81209   753573 7775 1981 2 1 3
                5742 55   609402  4621656 7777 1981 2 1 3
                4029 55   554203  6611035 7776 1981 2 1 3
                3980 50   235066  1404628 7775 1981 2 1 3
                9846 55   908514  7973305 7777 1981 2 1 3
               14912 55   908514  7973305 7777 1981 2 1 3
                7366 55   588403  7490086 7776 1981 2 1 3
               10601 55   108232  1734804 7775 1981 2 1 3
                4798 55   604031  5647645 7774 1981 2 1 2
                5237 20    14835    39054 7777 1981 2 1 3
                5256 20   182286   399121 7777 1981 2 1 3
                7938 10   125411   874685 7777 1981 2 1 3
                4517 55   604031  5647645 7774 1981 2 1 2
                7985 20   435200  1216500 7775 1981 2 1 3
                7993 55   111131   331065 7777 1981 2 1 3
                8215 15   963700  3082901 7773 1981 2 1 2
                9828 55   176961  1820040 7776 1981 2 1 3
                8272 55   775281  2370867 7774 1981 2 1 2
                4988 50   304738  1203203 7776 1981 2 1 3
               10867 20  1511519  6194297 7776 1981 2 1 3
               11161 45     4716    12608 7776 1981 2 1 3
               11185 20    30813    43366 7777 1981 2 1 3
               11555 55   140134   708877 7774 1981 2 1 2
               65095 55   140134   708877 7774 1981 2 1 2
               66591 55   387850  2247094 7776 1981 2 1 3
               12612 55   238925  1755980 7783 1981 2 1 4
               10860 55   240450  3637488 7783 1981 2 1 4
                1440 55  1099957 11100703 7781 1981 2 1 3
                3814 55   238925  1755980 7783 1981 2 1 4
                7437 55    95374   926315 7782 1981 2 1 4
                1209 15   413030  1840308 7781 1981 2 1 3
                1743 20    90622   169110 7782 1981 2 1 4
                2137 50  1406957  9754297 7783 1981 2 1 4
                3170 30  1326361  2724287 7782 1981 2 1 4
                3439 55   822618  6230133 7784 1981 2 1 4
               13948 55   822618  6230133 7784 1981 2 1 4
                7241 35   469909   931652 7782 1981 2 1 4
               12564 55   229625  3160787 7783 1981 2 1 4
                3851 20   881574  2197448 7784 1981 2 1 4
                4091 20    43288   147276 7783 1981 2 1 4
                4093 55   498256  6258867 7784 1981 2 1 4
                2783 55   349428  4323305 7783 1981 2 1 4
                4331 55    22149   224197 7784 1981 2 1 4
                8099 55   308837  4132601 7783 1981 2 1 4
                5903 10  1512000  5488000 7782 1981 2 1 4
                7260 50   108026   300411 7781 1981 2 1 3
                6867 55    72538   588678 7783 1981 2 1 4
                7017 10  2281727  5149828 7784 1981 2 1 4
                7585 45   858897  2163413 7782 1981 2 1 4
                7875 25    37117    92834 7783 1981 2 1 4
                8810 55   982938  6795473 7782 1981 2 1 4
               12749 55   982938  6795473 7782 1981 2 1 4
                8762 30  2949001  6848078 7785 1981 2 1 4
                1075 55   191582  3022642 7781 1981 2 1 3
                1742 55   191582  3022642 7781 1981 2 1 3
                8455 55   314529  4523297 7782 1981 2 1 4
              145348 55   314529  4523297 7782 1981 2 1 4
               23465 55   149435  1075751 7784 1981 2 1 4
                9698 20    17389    48026 7780 1981 2 1 3
                9324 55     5764   119162 7782 1981 2 1 4
               11012 15   100253   185487 7780 1981 2 1 3
               31596 55   918607 11317000 7784 1981 2 1 4
                1104 20     9966    26454 7787 1981 2 1 4
               15448 55   185578  2993933 7787 1981 2 1 4
                3580 20   374885  1019757 7788 1981 2 1 4
                2991 10 11352000 22934004 7790 1981 2 1 4
                3897 55   497396  5917039 7788 1981 2 1 4
               65089 55   497396  5917035 7788 1981 2 1 4
                3413 55  1221300  7394852 7789 1981 2 1 4
                4241 55    54400   729573 7788 1981 2 1 4
              end
              format %td date

              I do appreciate your help so much.

              Comment


              • #8
                Thanks for this, but I think you'll get better advice now from people who use this kind of data. The advice you need seems to be more on what modelling makes sense. I don't see here any specific question on Stata programming.

                Comment


                • #9
                  Thanks a lot for your reply. It might be that I did not explain it well. I will summarize it here as it is likely be resolved using "asrol" in state?!

                  The variable (profit) is form specific and I need to create a profit variable for each industry. The industry-specific profit variable is measured as a rolling median from week 1 until the last week of the quarter. This should give me as many profit time-series variables as the number of industries.

                  How can I create these profit time-series variables? I hope I can get some help with this now. Thanks again.

                  Comment


                  • #10
                    Working from the variables you show in #7, the time variable you want to create is:

                    Code:
                    gen series = week + 4*(month-1) + 12*(quarter-1)
                    Now, you will not be able to -xtset industry series- because you have multiple firms with the same value of series within industry. But after you create the running medians by industry, you can just keep one observation per industry-series group, and then you will be good to go with -xtset industry series-.

                    Comment


                    • #11
                      Thanks Clyde
                      I have created the series as in #7 to give me the 12 series representing weeks within the quarter. This was actually based on advice you gave in an older post.
                      I also used rangesatat to proceed with my problem which seems to be working when the stats are calculated for all firms in the cross section and not within industry. Therefore, I will create a new post for rangestat to be more specific.

                      Thanks

                      Comment

                      Working...
                      X