Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • keep last observation of the month

    Hi

    I would like to keep by company_num the last observation of each month. The data looks as follows

    Click image for larger version

Name:	example1.JPG
Views:	1
Size:	83.7 KB
ID:	1108352


    Data is in long format and the variable company_num goes from 1 to 5563


    The problem is that last observation of the month might not be the same for every company_num. This means that it could be that for

    company_num=1 the last observation in January 2003 is 31jan2003 but for
    company_num=2 the last observation in January 2003 is 30jan2003.

    Any ideas?

    Thanks








  • #2
    The trick is to create a new variable encoding the month, and then using the usual approach with -by-:

    Code:
    gen month = mofd(date)
    format month %tm // OPTIONAL IF YOU DON'T NEED THIS VARIABLE FOR OTHER PURPOSES
    by company_num month (date), sort: keep if _n == _N

    Comment


    • #3
      By the way, the above presumes that there is only one observation on any given date for each firm. If you have -xtset company_num date-, then Stata will have already verified that for you. If not, you should run -isid company_num date- before running my code. (With multiple observations for a given company on a given date, the sort order within company_num and date will be arbitrary and you will be arbitrarily, and irreproducibly, selecting one of several candidates.)

      Comment


      • #4
        Another trick is that the last day of the month is just before the first day of the next month. You have gaps here so I will not pursue the details, as no work-around for the gaps will, I believe, improve on Clyde's approach. I just want to mention that for anyone coming to this thread because its title seems (close to) what you want.

        Comment


        • #5
          Dear Clyde,

          Lets assume I am working with daily prices of stocks and I want to keep 1st and last daily observations to calculate returns, then I can apply the following code:

          bys ticker month (date): keep if _n==_N | _n==1

          where ticker is a stock, month is a month date variable (e.g. 1994m6), and date is a daily date variable (e.g. 01jun1994). The issue is that the stock may trade only on the 10th and 28th days of the month and if I apply this code I get missing variables sometimes in the 1st and last day of a month for a given stock and returns cannot be calculated. How can I select the first and last available prices for each stock at each month?

          Thank you in advance.

          Comment


          • #6
            Well, the example data you showed in #1 does not illustrate this problem. And I can't guess how your data is organized. The simplest case would be that there is simply no observation corresponding to a date on which the stock didn't trade. In that case the code you show in #5 would do exactly what you want: give the first available and last available observation in each month.

            But maybe you have a different organization of your data. Perhaps there is an observation for every date, but it has missing values of the financial variables on the days it didn't trade. Or perhaps there is a variable in your data set that indicates trading or non trading for each date. And there are other possibilities as well. It won't help for me to write imaginary code for imaginary data. So you need to show another data example that illustrates how this problem actually manifests itself. Please do NOT use a screenshot again. If you read the Forum FAQ you will learn the reasons why screenshots are the least useful way to show example data and are strongly discouraged. You will also learn that the most useful way to show example data is by using the -dataex- command.

            If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            Comment


            • #7
              Rodrigo #5 The problem of missing values (not variables) is most easily solved by

              Code:
              drop if missing(price)

              Comment


              • #8
                Dear Clyde,

                Below is the code using dataex. It shows the daily prices for a stock ABEV3 in december 2012. You will notice that in the first trading day in December I have a price for it, but I do not have a price in December 31st. If I apply the code above I will have 2 values per month, one non-missing and one missing. When I calculate returns and keep the last price observation for that month using *bys ticker month: gen monthly_return = 100*((price-price[_n-1])/price[_n-1]) and *bys ticker month: keep if _n==_N, I will have a missing return and a missing price. How can I "borrow" the last non-missing observation (dec 28th in the example) in each month for each stock? (The same applies to the first observable daily price, which sometimes is missing).

                I ask that because I tried what Nick mentioned. If I drop missing prices I get what I want. But I need to know when stock prices are missing in December and June of every year (they need to be excluded from yearly portfolios I will form with my dataset) and if I do what nick said I will lose this information. So the question is how can I every month for every stock select the first and last non-missing stock price?

                Thank you !!!

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str11 ticker double price float(date month)
                "ABEV3" 13.258948529 19330 635
                "ABEV3" 12.966533155 19331 635
                "ABEV3" 13.033370955 19332 635
                "ABEV3" 13.030029065 19333 635
                "ABEV3" 12.799438655 19334 635
                "ABEV3" 14.244806075 19337 635
                "ABEV3" 14.468712705 19338 635
                "ABEV3"  14.50380255 19339 635
                "ABEV3" 14.553930899 19340 635
                "ABEV3"   14.3868364 19341 635
                "ABEV3" 14.368456005 19344 635
                "ABEV3"  14.47038365 19345 635
                "ABEV3" 14.570640349 19346 635
                "ABEV3" 14.142878431 19347 635
                "ABEV3" 14.199690561 19348 635
                "ABEV3"            . 19351 635
                "ABEV3"            . 19352 635
                "ABEV3" 14.010873776 19353 635
                "ABEV3" 14.174753525 19354 635
                "ABEV3" 14.140963886 19355 635
                "ABEV3"            . 19358 635
                end
                format %td date
                format %tm month

                Comment


                • #9
                  The first and last non-missing values can be obtained in various ways. There is a more general discussion at https://www.stata.com/support/faqs/d...t-occurrences/

                  Here are two ways to do it. See also https://www.stata-journal.com/sjpdf....iclenum=dm0055 Sections 9 and 10.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str11 ticker double price float(date month)
                  "ABEV3" 13.258948529 19330 635
                  "ABEV3" 12.966533155 19331 635
                  "ABEV3" 13.033370955 19332 635
                  "ABEV3" 13.030029065 19333 635
                  "ABEV3" 12.799438655 19334 635
                  "ABEV3" 14.244806075 19337 635
                  "ABEV3" 14.468712705 19338 635
                  "ABEV3"  14.50380255 19339 635
                  "ABEV3" 14.553930899 19340 635
                  "ABEV3"   14.3868364 19341 635
                  "ABEV3" 14.368456005 19344 635
                  "ABEV3"  14.47038365 19345 635
                  "ABEV3" 14.570640349 19346 635
                  "ABEV3" 14.142878431 19347 635
                  "ABEV3" 14.199690561 19348 635
                  "ABEV3"            . 19351 635
                  "ABEV3"            . 19352 635
                  "ABEV3" 14.010873776 19353 635
                  "ABEV3" 14.174753525 19354 635
                  "ABEV3" 14.140963886 19355 635
                  "ABEV3"            . 19358 635
                  end
                  format %td date
                  format %tm month
                  
                  gen byte OK = !missing(price) 
                  bysort ticker month OK (date) : gen first1 = price[1] if OK 
                  by ticker month OK : gen last1 = price[_N] if OK 
                  bysort ticker month (first1) : replace first1 = first1[1] 
                  bysort ticker month (last1) : replace last1 = last1[1] 
                  drop OK 
                  
                  egen whenfirst = min(cond(!missing(price), date, .)), by(ticker month) 
                  egen whenlast = max(cond(!missing(price), date, .)), by(ticker month) 
                  egen first2 = total(cond(date == whenfirst, price, .)), by(ticker month) 
                  egen last2 = total(cond(date == whenlast, price, .)), by(ticker month) 
                  
                  tabdisp month, c(first? last?) 
                  
                  ----------------------------------------------------------
                      month |     first1      first2       last1       last2
                  ----------+-----------------------------------------------
                    2012m12 |   13.25895    13.25895    14.14096    14.14096
                  ----------------------------------------------------------

                  Comment


                  • #10
                    Hi,

                    I want to retain the latest date when the difference between two BOD meeting dates is 1 or 2 or
                    Meeting_date date_diff Co_code
                    24-01-2012 94 11
                    22-05-2012 119 11
                    23-05-2012 1 11
                    09-08-2012 78 11
                    29-10-2012 81 11
                    01-11-2012 3 11
                    In the above instance, for example when the date difference is 1, I want to drop 22/5/2012 and retain 23/05/2012.

                    Pls help me with the code for this

                    Thanks

                    Comment


                    • #11
                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input int date_diff byte co_code float meeting_date
                       94 11 19016
                      119 11 19135
                        1 11 19136
                       78 11 19214
                       81 11 19295
                        3 11 19298
                      end
                      format %td meeting_date
                      
                      sort meeting_date
                      drop if inlist(meeting_date[_n+1]-meeting_date, 1, 2)
                      In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                      Comment


                      • #12


                        Hi, my sample is over 2000-2021, taking 2019 as an example, and the data date is as follows:


                        Click image for larger version

Name:	2022-05-13 10.46.40.png
Views:	1
Size:	13.9 KB
ID:	1664378


                        Now I hope that If the date is less than 10 days from the end of the month or the last month, the date should l be replaced by the end of the month or the last month. For example,
                        26JAN2019 less than 10 days from 31JAN2019 will be replaced with 31JAN2019, while 01Feb 2019 less than 10 days from the end of last month will be replaced with 31JAN2019.

                        How to write the code?









                        Comment


                        • #13
                          https://www.stata-journal.com/articl...article=dm0100 is a review -- which is now not up-to-date (see later)

                          Let's assume a daily date variable ddate The last day of the current month is the day before the first day of the next month, so

                          Code:
                          gen lastday = dofm(mofd(ddate) + 1)-1
                          You didn't provide a data example using
                          dataex. https://www.statalist.org/forums/help#stata

                          Here's how to do that.

                          Code:
                          * Example generated by -dataex-. For more info, type help dataex
                          clear
                          input float ddate
                          21575
                          21580
                          21602
                          21608
                          21610
                          end
                          format %td ddate
                          
                          gen lastday = dofm(mofd(ddate) + 1)-1 
                          format %td lastday 
                          gen difference = lastday - ddate 
                          list 
                          
                               +----------------------------------+
                               |     ddate     lastday   differ~e |
                               |----------------------------------|
                            1. | 26jan2019   31jan2019          5 |
                            2. | 31jan2019   31jan2019          0 |
                            3. | 22feb2019   28feb2019          6 |
                            4. | 28feb2019   28feb2019          0 |
                            5. | 02mar2019   31mar2019         29 |
                               +----------------------------------+
                          In fact in recent versions of Stata there is a function
                          lastdayofmonth() so you can just go

                          Code:
                          gen diff = lastdayofmonth(ddate) - ddate
                          Code:
                          
                          
                          Note that the FAQ Advice asks you to state the version of Stata you are using if it is not Stata 17.

                          Comment

                          Working...
                          X