Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Count & Delete

    I am working with WRDS data, dates in one coluumn, companies in another and returns in another. I want to drop a company if the return in the last month (December 2018 being the last) is missing and if the returns in 25 of the previous 60 months are also missing . Would appreciate if someone could help with the code for doing so. Thank You.

  • #2
    I doubt that more than a few people here know what WRDS data are. Further, in your previous post https://www.statalist.org/forums/for...data-filtering you gave an example with once a year data. Which is it?

    https://www.statalist.org/forums/help#stata underlines how we do need and how you should give concrete examples of data using dataex. Don't expect people to know, to guess or to imagine what your data look like: show them, please.

    Comment


    • #3
      Sorry will be more careful next time in including a sample. Here is an example of the data:

      31may2010 AAR CORP
      31may2011 AAR CORP
      31may2012 AAR CORP
      31may2013 AAR CORP
      31may2014 AAR CORP
      31may2015 AAR CORP
      31may2016 AAR CORP
      31may2017 AAR CORP
      31may2018 AAR CORP
      30nov2010 ASA GOLD AND PRECIOUS METALS
      30nov2011 ASA GOLD AND PRECIOUS METALS
      30nov2012 ASA GOLD AND PRECIOUS METALS
      30nov2013 ASA GOLD AND PRECIOUS METALS
      30nov2014 ASA GOLD AND PRECIOUS METALS
      30nov2015 ASA GOLD AND PRECIOUS METALS
      30nov2016 ASA GOLD AND PRECIOUS METALS
      30nov2017 ASA GOLD AND PRECIOUS METALS
      30nov2018 ASA GOLD AND PRECIOUS METALS

      And other variables like returns, prices etc. are in other columns. Suppose 30niv2018 is the last observation date. I would want to drop ASA company if the data for some variable e.g. return is missing on 30nov2018 and in (e.g.) 3 of the previous 5 time periods. Would appreciate help in this regards. Kindly share the code for doing so.

      Comment


      • #4
        That is not using dataex.We can't even see your variable names.But assuming a Stata numeric daily date variable date and an identifier id then the last year of record is

        Code:
        bysort id (date) : gen lastyear = year(date[_N])
        and you can count the number of nonmissing values in the previous 5 years with

        Code:
        by id: egen previous = count(cond(inrange(year(date), lastyear - 6, lastyear - 1), return, 0))
        and your joint condition is then

        Code:
        by id : gen bad = (previous < 3) & missing(return[_N])
        Inspect results with

        Code:
        browse if bad
        and if satisfied

        Code:
        drop if bad
        drop bad
        See also

        Code:
        help egen
        https://www.stata.com/support/faqs/d...rue-and-false/

        https://www.stata-journal.com/articl...article=dm0055

        and (once again) https://www.statalist.org/forums/help#stata
        Last edited by Nick Cox; 11 May 2019, 01:24.

        Comment


        • #5
          Thank you for the reply. All the steps work, except for the 'count the number of non missing values in the previous 5 years with'
          by id: egen previous = count(cond(inrange(year(date), lastyear - 6, lastyear - 1), return, 0)) doesn't return the proper count in "previous". Even for the case when I have two missing values in the last 5 years, 'previous' has a value of 12. I have been unsuccessful in figuring out the problem. Would appreciate some additional help.

          Comment


          • #6
            No data example still.

            Comment


            • #7
              I have attached a Stata data file as an example, with two companies (1 and 2), dates and returns [real names and data changed due to copyright]. In the example based on the criteria I specified earlier, firm 1 would be dropped because of missing observations. Kindly check and update. Sorry I couldn't upload the data via dataex.
              Attached Files

              Comment


              • #8
                Those are monthly data and thus not at all like the example in #3. So, you need to change the code to match whatever your criteria are for this situation.

                Comment


                • #9
                  Would appreciate if you could kindly suggest what changes to make if the last obs. is missing and in the preceding last 3 out of 5 are also missing. I have so far been unsuccessful.

                  Comment


                  • #10
                    OK. With your example data I found a mistake in my earlier code. Sorry about that, but sometimes it takes real examples to spot an error.

                    Please consider this.

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input double lpermno long datadate double trt1m
                    1  9527                  .5
                    1  9555 -26.470588235294112
                    1  9586   39.99999999999999
                    1  9616 -11.428571428571432
                    1  9647  -20.98066270967742
                    1  9677                   0
                    1  9708   -8.16460011493405
                    1  9739  -62.23330096680109
                    1  9769  -5.837981139196957
                    1  9800                 -25
                    1  9830   4.133328000000014
                    1  9861  -39.94878309596841
                    1  9892                   .
                    1  9920                   .
                    1  9951                   .
                    1  9981                   .
                    1 10012 -14.611873480536385
                    1 10042                   .
                    end
                    format %d datadate
                    
                    bysort lpermno (datadate) : gen lastmdate = mofd(datadate[_N])
                    
                    by lpermno: egen previous = count(cond(inrange(mofd(datadate), lastmdate - 6, lastmdate - 1), trt1m, .))
                    
                    by lpermno : gen isbad = previous < 3 & missing(trt1m[_N])
                    
                    list 
                    
                         +--------------------------------------------------------------+
                         | lpermno    datadate      trt1m   lastmd~e   previous   isbad |
                         |--------------------------------------------------------------|
                      1. |       1   31jan1986         .5        329          2       1 |
                      2. |       1   28feb1986   -26.4706        329          2       1 |
                      3. |       1   31mar1986         40        329          2       1 |
                      4. |       1   30apr1986   -11.4286        329          2       1 |
                      5. |       1   31may1986   -20.9807        329          2       1 |
                         |--------------------------------------------------------------|
                      6. |       1   30jun1986          0        329          2       1 |
                      7. |       1   31jul1986    -8.1646        329          2       1 |
                      8. |       1   31aug1986   -62.2333        329          2       1 |
                      9. |       1   30sep1986   -5.83798        329          2       1 |
                     10. |       1   31oct1986        -25        329          2       1 |
                         |--------------------------------------------------------------|
                     11. |       1   30nov1986    4.13333        329          2       1 |
                     12. |       1   31dec1986   -39.9488        329          2       1 |
                     13. |       1   31jan1987          .        329          2       1 |
                     14. |       1   28feb1987          .        329          2       1 |
                     15. |       1   31mar1987          .        329          2       1 |
                         |--------------------------------------------------------------|
                     16. |       1   30apr1987          .        329          2       1 |
                     17. |       1   31may1987   -14.6119        329          2       1 |
                     18. |       1   30jun1987          .        329          2       1 |
                         +--------------------------------------------------------------+
                    Note that most things from now on in Stata would be easier for you with this kind of monthly data if you work with


                    Code:
                    gen mdate = mofd(datadate) 
                    format mdate %tm

                    Comment


                    • #11
                      Thank You Nick for sharing the code. I figured a slight problem in the above 'list' table (and I get the same on replicating it as well). In the table "previous" takes the value of 2, however if you see the data for id 1, 4 observations are missing out of the last 5. The 'previous' should show a value of 4 as opposed to 2. Sorry for bothering you again, but is there a way to fix this problem.

                      Comment


                      • #12
                        Previous 5 and last 5 don’t mean the same. Decide on your criterion and the code follows.

                        Comment


                        • #13
                          Sorry for asking again, even after modifying, the code still does not give me the correct value of missing in the last 5 observations counting from the newest (time t; jun1987) value. There are total of 4 missing in the last 5 but no matter what I do I don't get the count equal to 4. Would appreciate if you give it another look.

                          Comment


                          • #14
                            Sorry; another silly error.

                            Code:
                             
                             by lpermno: egen previous = count(cond(inrange(mofd(datadate), lastmdate - 5, lastmdate - 1), trt1m, .))

                            Comment

                            Working...
                            X