Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing observations based on a sequence of observations within a group

    I have a panel data for stock returns, at the daily level, with the following variables: firm, year, day, return, and number of trades. I'm trying to remove observations based on on the number of trades variable.

    Criteria: I want to check firms that have at least 30 days of trading activity within an year, that is, at least 30 non-blank and non-zero number of trades data points for each firm-year combination.

    If a firm-year do pass the criteria, meaning it has at least 30 days with non-zero non-blank data points on the number of trades column, I want to keep all daily observations for that firm-year.

    If I firm-year does not pass the criteria, I want to remove all the daily observations for that firm-year. Just to clarify: if a firm has an year for which it does not pass the criteria, but other years that do pass the criteria, I want to remove only the daily observations for that particular year that did not pass the criteria.

    I have been trying to write a code for this, but to be honest I'm not very experienced and I could not accomplish this task. If anyone could help, it would be greatly appreciated!



    Extra context: I'm using Compustat (compd.funda) merged with CRSP (crspa.dsf) through a link table (crsp.ccmxpf_linktable) to attempt to recreate a measure from Chen, Goldstein and Jiang, 2007 called R^2. This measure is from a regression of daily firm returns on market returns and industry returns. The authors filter the data by removing firm-year observations with less than 30 days of trading activities in a year, and I'm struggling to replicate this filter.


    Thanks,


    Lucas Balaminut
    Last edited by Lucas Balaminut; 18 Oct 2019, 09:41.

  • #2
    See if this helps:

    Code:
    *** I created some toy data
    dataex firm_id firm_name date2 year percent_move  // data shared by -dataex-. To install: ssc install dataex
    clear
    input byte firm_id str6 firm_name float date2 int year float percent_move
    1 "Google" 16680 2005   .5125669
    1 "Google" 16681 2005   .3189885
    1 "Google" 16682 2005          0
    1 "Google" 16683 2005          .
    1 "Google" 16684 2005   .4332779
    1 "Google" 17075 2006   .5355702
    1 "Google" 17076 2006 .011056197
    1 "Google" 17077 2006   .4297709
    1 "Google" 17078 2006   .7228306
    1 "Google" 17079 2006   .5595283
    2 "Apple"  15645 2002   .3555461
    2 "Apple"  15646 2002          0
    2 "Apple"  15647 2002   .1801147
    3 "Ford"   16040 2003  .29663193
    3 "Ford"   16041 2003   .5024244
    3 "Ford"   16042 2003          .
    3 "Ford"   16043 2003  .14077796
    3 "Ford"   16044 2003  .27451247
    3 "Ford"   16045 2003   .4282386
    3 "Ford"   16046 2003   .7202394
    end
    format %td date2
    
    egen trad_days  = count( percent_move), by(firm_id year)  // egen, count()  will count all non-missing obs (so it will include the 0's)
    egen trad_days2 = count( percent_move!=0), by(firm_id year)
    gen good_trade  = (percent_move!=0 & percent_move!=.)  // creating a variable for the obs you want to include
    egen trad_days3 = total( good_trade), by(firm_id year)  // this is the one you want to use
    
    . list, sepby(firm_id year) abbrev(12)
    
         +----------------------------------------------------------------------------------------------------------+
         | firm_id   firm_name       date2   year   percent_move   trad_days   trad_days2   good_trade   trad_days3 |
         |----------------------------------------------------------------------------------------------------------|
      1. |       1      Google   01sep2005   2005        .512567           4            5            1            3 |
      2. |       1      Google   02sep2005   2005        .318989           4            5            1            3 |
      3. |       1      Google   03sep2005   2005              0           4            5            0            3 |
      4. |       1      Google   04sep2005   2005              .           4            5            0            3 |
      5. |       1      Google   05sep2005   2005        .433278           4            5            1            3 |
         |----------------------------------------------------------------------------------------------------------|
      6. |       1      Google   01oct2006   2006         .53557           5            5            1            5 |
      7. |       1      Google   02oct2006   2006        .011056           5            5            1            5 |
      8. |       1      Google   03oct2006   2006        .429771           5            5            1            5 |
      9. |       1      Google   04oct2006   2006        .722831           5            5            1            5 |
     10. |       1      Google   05oct2006   2006        .559528           5            5            1            5 |
         |----------------------------------------------------------------------------------------------------------|
     11. |       2       Apple   01nov2002   2002        .355546           3            3            1            2 |
     12. |       2       Apple   02nov2002   2002              0           3            3            0            2 |
     13. |       2       Apple   03nov2002   2002        .180115           3            3            1            2 |
         |----------------------------------------------------------------------------------------------------------|
     14. |       3        Ford   01dec2003   2003        .296632           6            7            1            6 |
     15. |       3        Ford   02dec2003   2003        .502424           6            7            1            6 |
     16. |       3        Ford   03dec2003   2003              .           6            7            0            6 |
     17. |       3        Ford   04dec2003   2003        .140778           6            7            1            6 |
     18. |       3        Ford   05dec2003   2003        .274512           6            7            1            6 |
     19. |       3        Ford   06dec2003   2003        .428239           6            7            1            6 |
     20. |       3        Ford   07dec2003   2003        .720239           6            7            1            6 |
         +----------------------------------------------------------------------------------------------------------+

    Comment


    • #3
      Also posted and answered on Stack Overflow. You should tell us about cross-posting.

      Comment

      Working...
      X