Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difficult drop

    Hi Statlist,

    Hope everything is fine in the forum!

    I have a dataset made like this:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id_prodotto Year) double tot_sales float recalls
     1 2015  787.3446819822431 0
     2 2008 172317.61279375976 0
     2 2009 425880.54044550576 0
     2 2010  417859.1518602657 0
     2 2011 1910.6711194804911 1
     2 2012 20.761279655887872 0
     3 2008 172317.61279375976 0
     3 2009 425880.54044550576 0
     3 2010  417859.1518602657 0
     3 2011 1910.6711194804911 0
     3 2012 20.761279655887872 0
     4 2004  8489894.740631223 0
     4 2005 184204.23165769983 0
     5 2004 119146.82565863623 0
     5 2005  606.1984172597759 0
     6 2015  6504.150627917435 0
     7 2004  209313.5919697661 0
     7 2005 269757.08668214345 0
     7 2006  333650.2204887878 0
     7 2007  337538.7634404509 0
     7 2008 335462.88044321927 0
     7 2009  418041.1121787016 0
     7 2010 384416.72637269174 0
     7 2011  405117.6595642805 0
     7 2012 461432.45523637166 0
     7 2013 450658.28951069975 0
     7 2014 320228.16589092166 0
     7 2015 154498.05198367484 0
     8 2004 141.13202877613222 0
     8 2005 113.58645867843771 0
     8 2006  72.94346760010792 0
     8 2007 57.348417878665295 0
     9 2004 222.16376441745746 0
    10 2004 222.16376441745746 0
    11 2004 222.16376441745746 0
    12 2007  909813.6203051172 0
    12 2008  4126.363950859179 0
    13 2004  5546633.588079651 0
    13 2005  6243365.138153184 0
    13 2006  5648206.863902802 0
    13 2007  7412762.437901269 0
    13 2008   7946245.06274602 0
    13 2009  10159183.97206309 0
    13 2010   9770569.45874448 0
    13 2011  8447039.931233803 0
    13 2012 1142591.0214142571 0
    13 2013  7.142159888715126 0
    13 2014 476385.35764905135 0
    13 2015  591364.4498976789 0
    14 2004  5546633.588079652 0
    14 2005  6243365.138153185 0
    14 2006  5648206.863902803 1
    14 2007   7412762.43790127 0
    14 2008   7946245.06274602 0
    14 2009 10159183.972063089 0
    14 2010  9770569.458744476 0
    14 2011  8447039.931233803 0
    14 2012 1142591.0214142571 0
    14 2013  7.142159888715126 0
    14 2014 476385.35764905135 0
    14 2015  591364.4498976789 0
    15 2004  877847.1682515381 0
    15 2005 190254.57036946868 0
    15 2006  249866.5129101899 0
    15 2007  203604.1011390687 0
    15 2008  104180.2516135006 0
    15 2009  50206.27151761131 0
    15 2010 25853.440101304135 0
    15 2011 25074.980428922405 0
    15 2012  6847.073109778692 0
    16 2004  5546633.588079651 0
    16 2005  6243365.138153183 0
    16 2006  5648206.863902802 0
    16 2007   7412762.43790127 0
    16 2008  7946245.062746021 0
    16 2009  10159183.97206309 0
    16 2010  9770569.458744477 0
    16 2011  8447039.931233805 0
    16 2012  1142591.021414257 0
    16 2013  7.142159888715126 0
    16 2014 476385.35764905135 0
    16 2015   591364.449897679 0
    17 2004  5546633.588079651 0
    17 2005  6243365.138153183 0
    17 2006  5648206.863902803 1
    17 2007  7412762.437901269 0
    17 2008  7946245.062746019 0
    17 2009  10159183.97206309 0
    17 2010   9770569.45874448 0
    17 2011  8447039.931233803 0
    17 2012 1142591.0214142571 0
    17 2013  7.142159888715126 0
    17 2014  476385.3576490513 0
    17 2015  591364.4498976789 0
    18 2004  88143.53998092997 0
    18 2005  93267.35012439503 0
    18 2006  37744.60653123153 0
    19 2006  5521.900254781699 0
    19 2007 16701.712057355933 1
    19 2008 21785.933780307543 0
    end

    So basically I have an id for each product (id_prod) and each product is repeated over time. The panel is not balanced so each product is observed for a specific time window which varies from product to product. The general span is from 2004 to 2015: so it can be that a product is observed from 2008 to 2015, another one from 2009 to 2011, another one from 2004 to 2013 and so on.

    What I would like to do is to drop the observations with recalls = 1 from that Year on taking into account what said above (so that if a product has recalls = 1 in 2004. So for instance in the example above I would like to have this output:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id_prodotto Year) double tot_sales float recalls
     1 2015  787.3446819822431 0
     2 2008 172317.61279375976 0
     2 2009 425880.54044550576 0
     2 2010  417859.1518602657 0
     3 2008 172317.61279375976 0
     3 2009 425880.54044550576 0
     3 2010  417859.1518602657 0
     3 2011 1910.6711194804911 0
     3 2012 20.761279655887872 0
     4 2004  8489894.740631223 0
     4 2005 184204.23165769983 0
     5 2004 119146.82565863623 0
     5 2005  606.1984172597759 0
     6 2015  6504.150627917435 0
     7 2004  209313.5919697661 0
     7 2005 269757.08668214345 0
     7 2006  333650.2204887878 0
     7 2007  337538.7634404509 0
     7 2008 335462.88044321927 0
     7 2009  418041.1121787016 0
     7 2010 384416.72637269174 0
     7 2011  405117.6595642805 0
     7 2012 461432.45523637166 0
     7 2013 450658.28951069975 0
     7 2014 320228.16589092166 0
     7 2015 154498.05198367484 0
     8 2004 141.13202877613222 0
     8 2005 113.58645867843771 0
     8 2006  72.94346760010792 0
     8 2007 57.348417878665295 0
     9 2004 222.16376441745746 0
    10 2004 222.16376441745746 0
    11 2004 222.16376441745746 0
    12 2007  909813.6203051172 0
    12 2008  4126.363950859179 0
    13 2004  5546633.588079651 0
    13 2005  6243365.138153184 0
    13 2006  5648206.863902802 0
    13 2007  7412762.437901269 0
    13 2008   7946245.06274602 0
    13 2009  10159183.97206309 0
    13 2010   9770569.45874448 0
    13 2011  8447039.931233803 0
    13 2012 1142591.0214142571 0
    13 2013  7.142159888715126 0
    13 2014 476385.35764905135 0
    13 2015  591364.4498976789 0
    14 2004  5546633.588079652 0
    14 2005  6243365.138153185 0
    15 2004  877847.1682515381 0
    15 2005 190254.57036946868 0
    15 2006  249866.5129101899 0
    15 2007  203604.1011390687 0
    15 2008  104180.2516135006 0
    15 2009  50206.27151761131 0
    15 2010 25853.440101304135 0
    15 2011 25074.980428922405 0
    15 2012  6847.073109778692 0
    16 2004  5546633.588079651 0
    16 2005  6243365.138153183 0
    16 2006  5648206.863902802 0
    16 2007   7412762.43790127 0
    16 2008  7946245.062746021 0
    16 2009  10159183.97206309 0
    16 2010  9770569.458744477 0
    16 2011  8447039.931233805 0
    16 2012  1142591.021414257 0
    16 2013  7.142159888715126 0
    16 2014 476385.35764905135 0
    16 2015   591364.449897679 0
    17 2004  5546633.588079651 0
    17 2005  6243365.138153183 0
    18 2004  88143.53998092997 0
    18 2005  93267.35012439503 0
    18 2006  37744.60653123153 0
    19 2006  5521.900254781699 0
    
    end
    where basically I have dropped the observations with recalls = 1 from the Year in which the recall is equal to 1 until the last year that the product is observed in the database.

    Can someone please help me figuring out a solution to this dilemma?

    Thanks,

    Federico

  • #2
    The following code seems to do what you want.
    Code:
    . bysort id_prodotto (Year): generate numrecalls = sum(recalls)
    
    . list if numrecalls>0, sepby(id_prodotto) noobs
    
      +--------------------------------------------------+
      | id_pro~o   Year   tot_sales   recalls   numrec~s |
      |--------------------------------------------------|
      |        2   2011   1910.6711         1          1 |
      |        2   2012    20.76128         0          1 |
      |--------------------------------------------------|
      |       14   2006   5648206.9         1          1 |
      |       14   2007   7412762.4         0          1 |
      |       14   2008   7946245.1         0          1 |
      |       14   2009    10159184         0          1 |
      |       14   2010   9770569.5         0          1 |
      |       14   2011   8447039.9         0          1 |
      |       14   2012     1142591         0          1 |
      |       14   2013   7.1421599         0          1 |
      |       14   2014   476385.36         0          1 |
      |       14   2015   591364.45         0          1 |
      |--------------------------------------------------|
      |       17   2006   5648206.9         1          1 |
      |       17   2007   7412762.4         0          1 |
      |       17   2008   7946245.1         0          1 |
      |       17   2009    10159184         0          1 |
      |       17   2010   9770569.5         0          1 |
      |       17   2011   8447039.9         0          1 |
      |       17   2012     1142591         0          1 |
      |       17   2013   7.1421599         0          1 |
      |       17   2014   476385.36         0          1 |
      |       17   2015   591364.45         0          1 |
      |--------------------------------------------------|
      |       19   2007   16701.712         1          1 |
      |       19   2008   21785.934         0          1 |
      +--------------------------------------------------+
    
    . drop if numrecalls>0
    (24 observations deleted)
    
    . drop numrecalls

    Comment


    • #3
      Yes! I did not knew the possibility of putting parentheses on Year? What does it do? It looks like a
      Code:
      bys id_prodotto Year:
      But it seems that it actually allows for results to be repeated from the first occurrence of recall = 1 to the ending year...something like that?

      thank you very much for the help!

      Comment


      • #4
        The output of help by explains the parenthesized syntax:

        The varlist1 (varlist2) syntax is of special use to programmers. It verifies that the data are sorted by varlist1 varlist2 and then performs a by as if only varlist1 were specified.
        I would restate that for your example that it computes the running sum of the number of recalls separately for each value of id_proddotto, but the sort it does includes Year, to make sure the years are in the proper order. Since it's a running sum, the running sum will be zero until the year of the first recall, and 1 or more for all later years.

        Comment


        • #5
          That's cool! It seems like the max() or min() result! I will keep it mind. Thank you for the precious suggestions!

          Comment

          Working...
          X