Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between the mean of X during the last and first weeks of a month

    Greetings,

    I have a specific index for 365 pairs of countries between 1 January 2020 and 30 September 2022 (daily data). Follows an example of my data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int date float day byte month int year str3(iso3_o iso3_d) float country_pair double index
    21915  1 1 2020 "AUS" "ARE" 1 12.979732513427734
    21916  2 1 2020 "AUS" "ARE" 1 13.065977096557617
    21917  3 1 2020 "AUS" "ARE" 1 11.556705474853516
    21918  4 1 2020 "AUS" "ARE" 1 11.211729049682617
    21919  5 1 2020 "AUS" "ARE" 1 12.462267875671387
    21920  6 1 2020 "AUS" "ARE" 1 12.850366592407227
    21921  7 1 2020 "AUS" "ARE" 1 14.833980560302734
    21922  8 1 2020 "AUS" "ARE" 1 15.825786590576172
    21923  9 1 2020 "AUS" "ARE" 1 15.955153465270996
    21924 10 1 2020 "AUS" "ARE" 1   13.4109525680542
    21925 11 1 2020 "AUS" "ARE" 1 16.041397094726563
    21926 12 1 2020 "AUS" "ARE" 1 16.429494857788086
    21927 13 1 2020 "AUS" "ARE" 1 15.135834693908691
    21928 14 1 2020 "AUS" "ARE" 1 15.437688827514648
    21929 15 1 2020 "AUS" "ARE" 1 14.489004135131836
    21930 16 1 2020 "AUS" "ARE" 1 13.324708938598633
    21931 17 1 2020 "AUS" "ARE" 1 12.332901954650879
    21932 18 1 2020 "AUS" "ARE" 1 12.462267875671387
    21933 19 1 2020 "AUS" "ARE" 1 13.540319442749023
    21934 20 1 2020 "AUS" "ARE" 1 12.807244300842285
    21935 21 1 2020 "AUS" "ARE" 1 11.427339553833008
    21936 22 1 2020 "AUS" "ARE" 1 10.737386703491211
    21937 23 1 2020 "AUS" "ARE" 1 10.651143074035645
    21938 24 1 2020 "AUS" "ARE" 1 10.176800727844238
    21939 25 1 2020 "AUS" "ARE" 1  9.702458381652832
    21940 26 1 2020 "AUS" "ARE" 1  10.69426441192627
    21941 27 1 2020 "AUS" "ARE" 1  10.69426441192627
    21942 28 1 2020 "AUS" "ARE" 1 10.478654861450195
    21943 29 1 2020 "AUS" "ARE" 1  9.745579719543457
    21944 30 1 2020 "AUS" "ARE" 1 10.004312515258789
    21945 31 1 2020 "AUS" "ARE" 1 10.090556144714355
    21946  1 2 2020 "AUS" "ARE" 1 10.263044357299805
    21947  2 2 2020 "AUS" "ARE" 1  9.874945640563965
    21948  3 2 2020 "AUS" "ARE" 1  11.12548542022705
    end
    format %tdnn/dd/CCYY date
    What I would like to do is to compute the difference between a) the average index during the last week of a month and b) the average index during the first week of a month.
    The following code gives me almost what I want:
    Code:
    bysort country_pair month year (date) : gen YYY = (index[1]+index[2]+index[3]+index[4]+index[5]+index[6]+index[7])/7
    bysort country_pair month year (date) : gen ZZZ = (index[_N]+index[_N-1]+index[_N-2]+index[_N-3]+index[_N-4]+index[_N-5]+index[_N-6])/7
    gen XXX= ZZZ-YYY
    The problem is that the word “almost” makes all the difference here, and with this code YYY and ZZZ are missing when there are observations missing in index. I would like to compute the average index, even if there are days with missing data in the first and last weeks. Therefore, for a certain month, there would be only a missing value in my final variable if data for the entire first/last week is missing.
    Could someone help me with this?
    Best regards.

  • #2
    Code:
    gen mdate = mofd(date)
    format mdate %tm
    by iso3_o iso3_d mdate, sort: egen first_week_mean = ///
        mean(cond(inrange(date-firstdayofmonth(date), 0, 6), index, .))
    by iso3_o iso3_d mdate: egen last_week_mean = ///
        mean(cond(inrange(lastdayofmonth(date)-date, 0, 6), index, .))
    gen difference = last_week_mean - first_week_mean
    Added: Actually, your code in #1 may even be farther from correct than you have described. It is, as you note, giving missing values if the value of the index is missing in some of the relevant days. But if, on top of that, there are dates that are completely absent from the data set (i.e. there is no observation corresponding to that date for that country pair), then the references to index[...] will incorrectly include other dates that do not even belong to the first or last week.

    The code presented here is robust to both problems.
    Last edited by Clyde Schechter; 04 Mar 2023, 11:45.

    Comment


    • #3
      Great, thank you very much.
      I did not know these two time functions. However, I get the error unknown function with both of them (I used exactly the code you suggested). Any suggestion?

      Regarding the other limitation of my previous code, I was aware of that. I did not highlight that because it is not the case with my data.
      Best regards.

      Comment


      • #4
        Which version of Stata are you using? It is possible that you need to upgrade or use other code. See the FAQ Advice to declare the version of Stata you are using if it is not the latest.

        Comment


        • #5
          These function were introduced in Stata 17. For some versions before that, here are two of various ways to get the first and last days of the current month.

          The first day is easy once you've pulled out the current month and the current year.

          The last day is trickier as we don't want to write code for whether there are 28 29 30 31 days in the current month. But the last day is one before the first day of the next month.


          Code:
          clear 
          set obs 14
          gen date = mdy(1, 24, 2020) + _n
          
          gen firstdayofmonth = mdy(month(date), 1, year(date))
          gen lastdayofmonth = dofm(mofd(date) + 1) - 1 
          
          format date first last %td 
          
          list, sep(7)
          
          
               +-----------------------------------+
               |      date   firstda~h   lastday~h |
               |-----------------------------------|
            1. | 25jan2020   01jan2020   31jan2020 |
            2. | 26jan2020   01jan2020   31jan2020 |
            3. | 27jan2020   01jan2020   31jan2020 |
            4. | 28jan2020   01jan2020   31jan2020 |
            5. | 29jan2020   01jan2020   31jan2020 |
            6. | 30jan2020   01jan2020   31jan2020 |
            7. | 31jan2020   01jan2020   31jan2020 |
               |-----------------------------------|
            8. | 01feb2020   01feb2020   29feb2020 |
            9. | 02feb2020   01feb2020   29feb2020 |
           10. | 03feb2020   01feb2020   29feb2020 |
           11. | 04feb2020   01feb2020   29feb2020 |
           12. | 05feb2020   01feb2020   29feb2020 |
           13. | 06feb2020   01feb2020   29feb2020 |
           14. | 07feb2020   01feb2020   29feb2020 |
               +-----------------------------------+
          More at

          SJ-19-3 dm0100 . . . . . . . . . . Speaking Stata: The last day of the month
          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
          Q3/19 SJ 19(3):719--728 (no commands)
          discusses three related problems about getting the last day
          of the month in a new variable

          https://journals.sagepub.com/doi/pdf...36867X19874247

          Comment


          • #6
            Ok, so that was the problem, I have Stata 16.
            Thanks for the alternative solution. It worked perfectly.

            Comment

            Working...
            X