Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Constructing weekly date when the weeks are determined

    Hi

    I have variables for the year, the quarter (taking values from 1 to 4), and the week (taking values from 1 to 12 within the quarter).

    I want to create a date variable with the format yearquarterweek (e.g., 2012q1w1) noting that q will be from 1 to 4 and w will be from 1 to 12 within the quarter.

    I also have a date variable daymonthyear (e.g., 12nov2015). I tried
    Code:
     gen weekly_date = wofd(date)
    format weekly_date %tw
    However, this will generate a weekly_date variable in which the weeks are taking values from 1 to 52 within the year (so not as I wanted) and also the weeks are state weeks which are different from the weeks I have.

    I look forward to getting some help

    Thanks

  • #2
    As often discussed here, weekly dates often arise that don't use Stata's idiosyncratic rules.

    When that is true Stata's own functions and formats for weeks should not be used because they will not work as desired.

    It seems that you have quite different and even more idiosyncratic rules if you have only 48 weeks in a year.

    What you want is, I guess, yielded by (and only yielded by) a value label. You don't give a data example, but here are some guesses

    Code:
    gen toshow = string(year) + "q" + string(quarter) + "w" + string(week) 
    egen date = group(year quarter week) 
    labmask date, values(toshow)
    where labmask is from the Stata Journal.

    Comment


    • #3
      Thanks Nick.

      Follow up questions:

      1- The "toshow" variable is identifical to "date" variable. The former is string (in red) whereas the latter seems to be capturing a label (in blue). Why do we need both if they are identical? I give an example of the dataset below after your code.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(mydate year quarter week) str9 toshow float date
       4330 1971 4  6 "1971q4w6"   12
       4479 1972 2  1 "1972q2w1"   31
       4493 1972 2  3 "1972q2w3"   33
       4588 1972 3  4 "1972q3w4"   46
       4673 1972 4  3 "1972q4w3"   57
       4794 1973 1  7 "1973q1w7"   73
       4861 1973 2  4 "1973q2w4"   82
       4960 1973 3  4 "1973q3w4"   94
       5045 1973 4  4 "1973q4w4"  106
       5171 1974 1  8 "1974q1w8"  122
      end
      format %td mydate
      label values date date
      label def date 12 "1971q4w6", modify
      label def date 31 "1972q2w1", modify
      label def date 33 "1972q2w3", modify
      label def date 46 "1972q3w4", modify
      label def date 52 "1972q3w10", modify
      label def date 57 "1972q4w3", modify
      label def date 64 "1972q4w10", modify
      label def date 73 "1973q1w7", modify
      label def date 77 "1973q1w11", modify
      label def date 82 "1973q2w4", modify
      label def date 94 "1973q3w4", modify
      label def date 101 "1973q3w11", modify
      label def date 106 "1973q4w4", modify
      label def date 112 "1973q4w10", modify
      label def date 122 "1974q1w8", modify
      label def date 125 "1974q1w11", modify
      label def date 129 "1974q2w3", modify
      label def date 139 "1974q3w1", modify
      label def date 142 "1974q3w4", modify
      label def date 149 "1974q3w11", modify
      label def date 155 "1974q4w5", modify
      label def date 162 "1974q4w12", modify
      label def date 171 "1975q1w9", modify
      label def date 175 "1975q2w1", modify
      label def date 179 "1975q2w5", modify
      label def date 189 "1975q3w3", modify
      label def date 198 "1975q3w12", modify

      2- More importantly, after creating the weekly date, I would like to tsset my data weekly so that I can start running my weekly regressions.

      If I do the following:
      Code:
      duplicates drop gvkey date,force
      xtset gvkey date, weekly  
      sort gvkey date
      I get
      Code:
             panel variable:  gvkey (unbalanced)
              time variable:  date, 1960w2 to 2005w51, but with gaps
                      delta:  1 week
      The output suggests that there is something incorrect I am doing now as my year variable runs from 1971 to 2021 (which is not consistent with the output above) and I also do not have any weeks labeled beyond 12 in any quarter.

      Can you please let me know what would be the correct way to tsset my data in this case?

      Thanks

      Comment


      • #4
        toshow is string and date is numeric and they look the same, but are not the same, because you have applied value labels. That is what labmask does. The alternative is writing your own code to assign value labels, and you're welcome to do that if you prefer it. Otherwise put, the point of creating a string variable with values like "2001q1w1" is precisely to use its values as value labels. This is a work-around, although a practical one, for the limitation that users can't define display formats beyond Stata's syntax.

        xtset, weekly is just as much to be avoided as any Stata weekly function or format. Again, Stata knows only about its own kind of weekly dates. A glance at the help for xtset will show that with that option, Stata will make incorrect assumptions about your variable.

        The fact that gaps are perceived seems just to match your example data. If you have gaps in your dates you need a different recipe as with gaps the dates will be in the right order but will produce crazy graphs and analyses. Let's nod to Stata's conventions and declare week 1 in 1960 as 0.

        Then

        Code:
        gen mydate = 48 * (year - 1960) + 12 * (quarter - 1) + 4 * (week - 1)
        will produce up to 48 weeks in the year and your value labels can then be applied. Again, I am taking you at your word that in the scheme you are using there are 48 weeks in the year.

        A plain

        Code:
        xtset gvkey mydate 
        should be enough. The generic option might help or at least do no harm.

        Comment


        • #5
          The fact that gaps are perceived seems just to match your example data. If you have gaps in your dates you need a different recipe as with gaps the dates will be in the right order but will produce crazy graphs and analyses. Let's nod to Stata's conventions and declare week 1 in 1960 as 0.
          This may not be an issue in my setting. I clarify more. Although my data is a panel dataset, I will calculate an aggregate variable (based on the sum of firm-level data) within each week of the quarter, and then collapse the data to convert it into a time series dataset. After limiting the time series sample, it seems that I have no week gaps.

          Therefore, I think the approach I will use will be :

          Code:
          gen toshow = string(year) + "q" + string(quarter) + "w" + string(week)
          egen weekly_date = group(year quarter week)
          labmask weekly_date, values(toshow)
          xtset gvkey weekly_date
          sort gvkey weekly_date
          and then create the aggregates within each week

          Code:
          bys weekly_date : egen sum_assets= sum (atq)
          duplicates drop weekly_date sum_assets, force
          xtset weekly_date

          Do these make sense now?

          Comment


          • #6
            Having seen gaps in your data I now advise against group() here unless you know that every possible week is represented in the data.

            It is a point of style only but in #5 the first xtset is harmless but pointless, as combining within weeks doesn't depend on, or benefit from, it.

            Comment


            • #7
              I am following up on this now after proceeding with the analysis.

              It appears that although using the code in #5 resulted in a weekly_date variable that makes sense, I was not able to proceed further and create lags in variables based on this date variable.

              For example,
              Code:
               gen lag_profits=L.profits
              resulted in just missing observations. Is there a way to tell Stata that this weekly_date variable is my date variable?

              Thanks

              Comment


              • #8
                I don't understand what is different in #9. If you have just a date or time variable as identifier you must use tsset to declare such a variable to be able to use time series operators. If you have a panel identifier as well you may use either tsset or xtset. You have already used xtset so what is different? Perhaps you have not used tsset previously.

                Comment


                • #9
                  Thanks Nick.
                  I actually used
                  Code:
                  xtset gvkey weekly_date  
                  sort gvkey weekly_date
                  then

                  Code:
                   gen lag_profits=l.profits
                  (902,727 missing values generated)
                  Here is an example of the dataset
                  [CODE]
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float weekly_date long gvkey float(profits lag_profits)
                   12 1000   -.12 .
                   31 1000  -.681 .
                   33 1000    .34 .
                   46 1000   .498 .
                   57 1000   .418 .
                   73 1000   .298 .
                   82 1000   .374 .
                   94 1000   .586 .
                  106 1000   .187 .
                  122 1000   .716 .
                  129 1000   .331 .
                  142 1000   .677 .
                  end
                  label values weekly_date weekly_date
                  label def weekly_date 12 "1971q4w6", modify
                  label def weekly_date 31 "1972q2w1", modify
                  label def weekly_date 33 "1972q2w3", modify
                  label def weekly_date 46 "1972q3w4", modify
                  label def weekly_date 52 "1972q3w10", modify
                  label def weekly_date 57 "1972q4w3", modify
                  label def weekly_date 64 "1972q4w10", modify
                  label def weekly_date 73 "1973q1w7", modify
                  label def weekly_date 77 "1973q1w11", modify
                  label def weekly_date 82 "1973q2w4", modify
                  label def weekly_date 94 "1973q3w4", modify
                  label def weekly_date 101 "1973q3w11", modify
                  label def weekly_date 106 "1973q4w4", modify
                  label def weekly_date 112 "1973q4w10", modify
                  label def weekly_date 122 "1974q1w8", modify
                  label def weekly_date 125 "1974q1w11", modify
                  label def weekly_date 129 "1974q2w3", modify
                  label def weekly_date 139 "1974q3w1", modify
                  label def weekly_date 142 "1974q3w4", modify
                  label def weekly_date 149 "1974q3w11", modify
                  label def weekly_date 155 "1974q4w5", modify

                  As shown here, all lags have missing values as if Stata does not really recognize the date variable.

                  Can you please advise further? Thank you

                  Comment


                  • #10
                    Stata is following its own rules as advertised and acting consistently and in full knowledge of your declarations and data.

                    L.profits for say week 46 means the value of profits in week 45. It doesn't mean in the previous week in which there was an observation with a non-missing value unless that was week 45.

                    In your example data, no two observed values are in successive weeks, so all values for the lagged variable are returned as missing.

                    What did you expect to happen? If you want the previous value, that would be

                    Code:
                    bysort gvkey (weekly_date) : gen previous = profits[_n-1]
                    which will be missing whenever there is no non-missing value to use.

                    Comment


                    • #11
                      Thanks! I proceeded with the analysis and solved most of the related issues so far except for merging with the weekly_date variable.

                      When I tsset the final time-series datasets based on the weekly_date variable and then try to merge using:
                      Code:
                      use time_seriesA.dta,clear
                      merge 1:1 weekly_date using time_seriesB
                      ...the merge is incorrect. How can I deal with the new weekly_date variable to allow for this merge?

                      Thank you

                      Comment


                      • #12
                        I don't see that #11 contains any information that would allow an answer. Two datasets not exemplified; a key variable ditto; a report that results are "incorrect". Still, the diagnosis is don't do that then.

                        If you used the method based on egen, group() to produce separate weekly dates in two datasets, then you can't expect them to match. I did advise against that

                        Comment


                        • #13
                          Thanks!
                          Nick, I created the weekly_date as in #5 and checked that all weeks are represented in the data and that all weeks are correct. They are indeed correct.

                          an example of the first dataset is

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float(aggregate_profit year quarter quarter_date weekly_date)
                             .01173329 1972 1 48   1
                            .006209938 1972 1 48   2
                           .0008214062 1972 1 48   3
                            .001736389 1972 1 48   4
                            .007947976 1972 1 48   5
                              .0078886 1972 1 48   6
                            .006354166 1972 1 48   7
                            .005530342 1972 1 48   8
                           .0041888705 1972 1 48   9
                           .0043823114 1972 1 48  10
                           .0033746874 1972 1 48  11
                            .003201738 1972 1 48  12
                            -.00959001 1972 2 49  13
                          -.0012494902 1972 2 49  14
                          -.0006478796 1972 2 49  15
                          -.0005963731 1972 2 49  16
                          -.0003174301 1972 2 49  17
                           -.000522325 1972 2 49  18
                           -.000665484 1972 2 49  19
                          -.0006802514 1972 2 49  20
                          -.0008191823 1972 2 49  21
                          -.0008013268 1972 2 49  22
                           -.000733789 1972 2 49  23
                          -.0007353995 1972 2 49  24
                            .004534091 1972 3 50  25
                            .005237845 1972 3 50  26
                           end
                          format %tq quarter_date
                          label values weekly_date weekly_date
                          label def weekly_date 1 "1972q1w1", modify
                          label def weekly_date 2 "1972q1w2", modify
                          label def weekly_date 3 "1972q1w3", modify
                          label def weekly_date 4 "1972q1w4", modify
                          label def weekly_date 5 "1972q1w5", modify
                          label def weekly_date 6 "1972q1w6", modify
                          label def weekly_date 7 "1972q1w7", modify
                          label def weekly_date 8 "1972q1w8", modify
                          label def weekly_date 9 "1972q1w9", modify
                          label def weekly_date 10 "1972q1w10", modify

                          The second dataset

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float(aggregate_price year quarter quarter_date weekly_date)
                              .015798844 1972 1 48 577
                             .0016326577 1972 1 48 578
                             .0034234116 1972 1 48 579
                              .007362284 1972 1 48 580
                              .007783426 1972 1 48 581
                             .0009547513 1972 1 48 582
                              .007083584 1972 1 48 583
                              .015277357 1972 1 48 584
                               .02344308 1972 1 48 585
                             -.009369044 1972 1 48 586
                             -.011760525 1972 1 48 587
                              .006773893 1972 1 48 588
                               .02366462 1972 2 49 589
                               .00361514 1972 2 49 590
                             -.009439594 1972 2 49 591
                             -.012075196 1972 2 49 592
                             -.010611446 1972 2 49 593
                            -.0009462991 1972 2 49 594
                              .024126947 1972 2 49 595
                             .0043619615 1972 2 49 596
                             -.013910264 1972 2 49 597
                              .005875014 1972 2 49 598
                             .0021836597 1972 2 49 599
                             -.014861275 1972 2 49 600
                          end
                          format %tq quarter_date
                          label values weekly_date weekly_date
                          label def weekly_date 577 "1972q1w1", modify
                          label def weekly_date 578 "1972q1w2", modify
                          label def weekly_date 579 "1972q1w3", modify
                          label def weekly_date 580 "1972q1w4", modify
                          label def weekly_date 581 "1972q1w5", modify
                          label def weekly_date 582 "1972q1w6", modify
                          label def weekly_date 583 "1972q1w7", modify
                          label def weekly_date 584 "1972q1w8", modify
                          label def weekly_date 585 "1972q1w9", modify
                          label def weekly_date 586 "1972q1w10", modify
                          label def weekly_date 587 "1972q1w11", modify
                          label def weekly_date 588 "1972q1w12", modify
                          label def weekly_date 589 "1972q2w1", modify
                          label def weekly_date 590 "1972q2w2", modify
                          label def weekly_date 591 "1972q2w3", modify
                          Given that weekly_date did not work with -merge- and that I have both quarter_date and week, I worked around this issue now and did the following:

                          Code:
                          merge 1:1 quarter_date week using time_seriesB.dta
                          This led to a perfect match and also the weekly_data variable picks the correct dates in the final dataset.

                          Does this sound right to you now?

                          Thanks

                          Comment


                          • #14
                            You're using a scheme in dataset 1 whereby 1972q1w1 is weekly_date 1 and a scheme in dataset 2 whereby 1972q1w1 is weekly_date 577. So, different rules and so a merge on weekly_date will indeed be nonsensical.

                            The scheme I suggested in #4 was
                            48 * (year - 1960) + 12 * (quarter - 1) + 4 * (week - 1) which leads to 576 for 1972q1w1. If you solved your problem otherwise, that's good.

                            Comment

                            Working...
                            X