Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Undisplayed dates: weekends, public holidays, and other reasons

    Hi

    I have a dataset that includes daily observations and a daily date variable (DATE). Some dates are not displayed (e.g., weekends, some public holidays, days when there are no available daily observations etc.). The date variable is of the format "03jan1972".
    For example, In January 2009, I have 03jan1972, 04jan1972, 05jan1972, 06jan1972, 07jan1972, 10jan1972. There is no 08jan1972 or 08jan1972 in my dataset, as these were weekends. The same thing also happens on some holidays and when there are no available daily observations.

    I have two questions:

    1- How can I add the undisplayed days to my dataset as per the calendar? I understand that these dates will have missing observations but I need to include all dates. Does Stata know whether the year is a leap year and does it also know the calendar so that it adds the correct dates?

    2- If Stata knows the calendar, can I also add the undisplayed days to my dataset under all circumstances except for weekends?

    I hope I get some assistance with these questions. Thank you

    Lisa

  • #2
    So, first the variable date has to be a proper Stata internal format date variable. As you have chosen to describe your data, rather than show example data using -dataex-, I have no way to know if this is the case for your data or not. Suffice it to say that if it isn't a proper Stata internal format date variable, you will have to first convert it to one.

    Next it matters whether your data set is a single time series or panel data. If it is panel data, you need to decide whether you want the start and end dates for every panel to be the same (i.e. the earliest date in any panel to the latest date in any panel) or whether each panel just fills in its own internal gaps. For simplicity, I will assume your data is a single time series.

    Code:
    tsset DATE
    tsfill
    In the future, when asking for help with code, show example data. To show example data helpfully, use the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thanks Clyde. The date is a Stata date, created using this code:
      Code:
      tostring rdq, replace format(%8.0f) // rdp is a reporting date
      gen DATE = daily(rdq, "YMD")
      assert missing(DATE) == missing(rdq)
      format DATE %td
      However, the data is a panel dataset of firms and dates. I want only one complete calendar date as per my original request for the entire panel dataset as the data will later be converted into a time series dataset (after aggregating the cross-sectional observations in a subsequent step)




      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float DATE long gvkey float(year quarter month)
       4479 1000 1972 2 1
       4493 1000 1972 2 1
       4588 1000 1972 3 1
       4673 1000 1972 4 1
       4794 1000 1973 1 2
       4861 1000 1973 2 1
       4960 1000 1973 3 1
       5045 1000 1973 4 1
       5171 1000 1974 1 2
       5220 1000 1974 2 1
       5319 1000 1974 3 1
       5421 1000 1974 4 2
       5540 1000 1975 1 3
       5600 1000 1975 2 2
       5675 1000 1975 3 1
       5760 1000 1975 4 1
       5907 1000 1976 1 3
       5954 1000 1976 2 1
       6039 1000 1976 3 1
       6135 1000 1976 4 1
       6269 1000 1977 1 3
       6332 1000 1977 2 2
       6411 1000 1977 3 1
       6521 1000 1977 4 2
       6653 1000 1978 1 3
       6713 1000 1978 2 2
       8826 1001 1984 1 3
       8874 1001 1984 2 1
       8970 1001 1984 3 1
       9043 1001 1984 4 1
       9194 1001 1985 1 3
       9251 1001 1985 2 1
       9330 1001 1985 3 1
       9406 1001 1985 4 1
       9580 1001 1986 1 3
       8994 1003 1984 3 2
       9086 1003 1984 4 2
       9224 1003 1985 2 1
       9665 1003 1986 2 3
       9756 1003 1986 3 3
       9846 1003 1986 4 3
       9974 1003 1987 2 1
      10028 1003 1987 2 3
      10120 1003 1987 3 3
      10211 1003 1987 4 3
      10350 1003 1988 2 2
      10393 1003 1988 2 3
      10484 1003 1988 3 3
      10576 1003 1988 4 3
       4640 1004 1972 3 3
       4731 1004 1972 4 3
       4827 1004 1973 1 3
       4952 1004 1973 3 1
       5010 1004 1973 3 3
       5093 1004 1973 4 3
       5192 1004 1974 1 3
       5300 1004 1974 3 1
       5374 1004 1974 3 3
       5473 1004 1974 4 3
       5570 1004 1975 2 1
       5677 1004 1975 3 1
       5744 1004 1975 3 3
       5835 1004 1975 4 3
       5927 1004 1976 1 3
       6040 1004 1976 3 1
       6124 1004 1976 4 1
       6221 1004 1977 1 1
       6306 1004 1977 2 1
       6410 1004 1977 3 1
       6494 1004 1977 4 1
       6590 1004 1978 1 1
       6669 1004 1978 2 1
       6754 1004 1978 2 3
       6844 1004 1978 3 3
       6928 1004 1978 4 3
       7026 1004 1979 1 3
       7136 1004 1979 3 1
       7209 1004 1979 3 3
       7293 1004 1979 4 3
       7388 1004 1980 1 3
       7500 1004 1980 3 1
       7574 1004 1980 3 3
       7661 1004 1980 4 3
       7753 1004 1981 1 3
       7871 1004 1981 3 1
       7942 1004 1981 3 3
       8034 1004 1981 4 3
       8123 1004 1982 1 3
       8238 1004 1982 3 1
       8307 1004 1982 3 3
       8392 1004 1982 4 3
       8484 1004 1983 1 3
       8609 1004 1983 3 1
       8671 1004 1983 3 3
       8755 1004 1983 4 3
       8853 1004 1984 1 3
       8973 1004 1984 3 1
       9035 1004 1984 3 3
       9117 1004 1984 4 3
       9216 1004 1985 1 3
      end
      format %td DATE

      I hope to get further help now based on this information. Thanks

      Comment


      • #4
        Hi again,

        I read more about --tsfill-- and the full option but I do not think it achieves my aim. I clarify more below.

        My dataset has daily dates (with gaps when the dates are not included) and firms. On each date, there are a few firms announcing profits. If no firms announce profits on a certain date, the date does not show up in the dataset. My aim is to show all calendar dates even if there are no observations (i.e., even if no firms announce profit). Put differently, I want to guarantee that all dates are represented in my data. If a firm announces profit on a certain date, there will be observations. If no firms announce profit (i.e, there was no date/observations in my original dataset), I want to have the calendar date with missing observations.

        I hope this makes it clear and that I get some assistance
        Last edited by Lisa Wilson; 03 Jun 2022, 03:11.

        Comment


        • #5
          Hi

          I found this great Stata blog https://blog.stata.com/2016/02/04/ha...ess-calendars/

          I actually want to do the same thing and create the calendar business days. The obstacles are;

          1- I have a panel dataset.
          2- When I try to use the part of the code to create the business calendar, I get:

          Code:
          . version 16.0
          
          . purpose "Converting daily financial data into business calendar dates"
          command purpose is unrecognized
          r(199);
          
          . dateformat dmy
          command dateformat is unrecognized
          r(199);
          
          . range 05feb1971 23mar2015
          invalid syntax
          r(198);
          
          . centerdate 05feb1971
          command centerdate is unrecognized
          r(199);
          
          . omit dayofweek (Sa Su)
          command omit is unrecognized
          r(199);
          I use Stata 16.

          Hope someone can help.

          Comment


          • #6
            The big idea in the blog entry is to create instructions in an .stbcal file and then tell bcal what they are.

            The instructions in that file are notregularStata commands, which is what you're guessing. So, you need to enter them in a text editor (Stata's doedit will work fine, or use your own favourite) and save them in such a file.

            I would first make sure that you can reproduce the blog example; and then convert to your own set-up.

            Comment


            • #7
              Re #4, I may be misunderstanding what you need, but it sounds to me like business calendars are not actually it: their whole purpose is to omit weekends, holidays, etc. You said you want an entry for every date, even if nothing happens. Given that it is panel data and you want a complete set of dates for every gvkey, I think what you are looking for is:

              Code:
              xtset gvkey DATE
              tsfill, full
              With the example data you show in #3 it produces a new data set with the same four gvkey's as the original, but with each gvkey having an observation for every date (all 6,098 of them) between 6 April 1972 and 15 Dec 1988.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                Re #4, I may be misunderstanding what you need, but it sounds to me like business calendars are not actually it: their whole purpose is to omit weekends, holidays, etc. You said you want an entry for every date, even if nothing happens. Given that it is panel data and you want a complete set of dates for every gvkey, I think what you are looking for is:

                Code:
                xtset gvkey DATE
                tsfill, full
                With the example data you show in #3 it produces a new data set with the same four gvkey's as the original, but with each gvkey having an observation for every date (all 6,098 of them) between 6 April 1972 and 15 Dec 1988.
                I may need to clarify something here. Firms (i.e., gvkey) can come in and out of the dataset in different times. I must say that I do not want to have complete dates for each gvkey. What I want is a complete date for the entire sample and not for each gvkey.

                For example: for the month of January year 1972, I have the dates 03jan1972, 04jan1972, 05jan1972, 06jan1972, 07jan1972, 10jan1972. There is no 01jan1972, 02jan1972, 08jan1972 and 09jan1972 because they are weekends. I simply want to have them in my dataset with missing obs for gvkey. This means that no firms have announced profits in these days.

                Therefore, I want to have full daily calendar dates from 1 January 1971 to 31 December 2021 regardless of whether there are gvkeys or not. Can I simply create the dates in another dataset and then merge with my panel data? If so, how? of there is a more efficient way to do this?!



                Comment


                • #9
                  I wish I had understood that from the start. Yes, you can create a data set of dates and then -merge- in your data set, as follows:

                  Code:
                  clear
                  
                  local first_date = td(1jan1971)
                  local last_date = td(31dec2021)
                  
                  set obs `=`last_date' - `first_date' + 1'
                  
                  gen DATE = `first_date' + _n - 1
                  format DATE %td
                  
                  merge 1:m DATE using the_dataset
                  I don't know of any more efficient approach.

                  Comment

                  Working...
                  X