Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to drop firms that have less observations within a certain group identifier?

    Dear Stata Users,

    I have a sample of the dataset that I use uploaded below. Can you, please help me with the following issue. Each year “rdID” firms have “firmID” counterparts. However, is some years “rdID” has multiple “firmID”. For example, in year 2004 “rdID”= 1074156 had two “firmID” => 32992 and 101953. What I want is to keep only those “firmID” that have majority within a given “rdID”. For example, “rdID”= 1074156 has 28 “firmID”= 32992 and only one “firmID”= 101953. So, for “rdID”= 1074156 I want to keep just “firmID”= 32992. Please, advise me this issue.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(firmID rdID) int year byte wave
    113442 1245415 1987 0
    113442 1245415 1988 0
    113442 1245415 1989 0
    113442 1245415 1990 0
    113442 1245415 1991 0
    113442 1245415 1992 0
    113442 1245415 1993 0
    113442 1245415 1994 0
    113442 1245415 1995 0
    113442 1245415 1996 0
    113442 1245415 1997 0
    113442 1245415 1998 0
    113442 1245415 1999 0
    113442 1245415 2000 0
    113442 1245415 2001 0
    113442 1245415 2002 0
    113442 1245415 2003 0
    113442 1245415 2004 0
    113442 1245415 2005 0
    113442 1245415 2006 0
    113442 1245415 2007 0
    113442 1245415 2008 0
    113442 1245415 2009 0
    113442 1245415 2010 0
    113442 1245415 2011 0
    113442 1245415 2012 0
    113442 1245415 2013 0
    113442 1245415 2014 0
    113442 1245415 2015 0
    113442 1245415 2016 0
     32992 1074156 1989 1
     32992 1074156 1990 1
     32992 1074156 1991 1
     32992 1074156 1992 1
     32992 1074156 1993 1
     32992 1074156 1994 1
     32992 1074156 1995 1
     32992 1074156 1996 1
     32992 1074156 1997 1
     32992 1074156 1998 1
     32992 1074156 1999 1
     32992 1074156 2000 1
     32992 1074156 2001 1
     32992 1074156 2002 1
     32992 1074156 2003 1
     32992 1074156 2004 1
    101953 1074156 2004 1
     32992 1074156 2005 1
     32992 1074156 2006 1
     32992 1074156 2007 1
     32992 1074156 2008 1
     32992 1074156 2009 1
     32992 1074156 2010 1
     32992 1074156 2011 1
     32992 1074156 2012 1
     32992 1074156 2013 1
     32992 1074156 2014 1
     32992 1074156 2015 1
     32992 1074156 2016 1
      6591 1119794 2001 1
      6256 1119794 2001 1
      6591 1119794 2002 1
      6256 1119794 2002 1
      6256 1119794 2003 1
      6591 1119794 2003 1
      6591 1119794 2004 1
      6256 1119794 2004 1
      6256 1119794 2005 1
      6591 1119794 2005 1
      6256 1119794 2006 1
      6591 1119794 2006 1
      6256 1119794 2007 1
      6591 1119794 2007 1
      6591 1119794 2008 1
      6256 1119794 2008 1
      6591 1119794 2009 1
      6591 1119794 2010 1
      6256 1119794 2010 1
      6256 1119794 2011 1
      6591 1119794 2011 1
      6256 1119794 2012 1
      6591 1119794 2012 1
      6256 1119794 2013 1
      6591 1119794 2013 1
      6256 1119794 2014 1
      6591 1119794 2014 1
    138872 1119794 2015 0
      6256 1119794 2015 1
      6591 1119794 2015 1
      6591 1119794 2016 1
      6256 1119794 2016 1
    end

  • #2
    Code:
    order rdID, first
    preserve
    tabulate rdID firmID
    
    
               |                              firmID
          rdID |      6256       6591      32992     101953     113442     138872 |     Total
    -----------+------------------------------------------------------------------+----------
       1074156 |         0          0         28          1          0          0 |        29 
       1119794 |        15         16          0          0          0          1 |        32 
       1245415 |         0          0          0          0         30          0 |        30 
    -----------+------------------------------------------------------------------+----------
         Total |        15         16         28          1         30          1 |        91 
    
    
    collapse (count) count=year (min) first_year=year (max) max_year=year, by( rdID firmID)
    sort rdID count
    bysort rdID (count): gen firmID_to_use = firmID[_N]
    
    . list, sepby( rdID) noobs abbrev(14)
    
      +------------------------------------------------------------------+
      |    rdID   firmID   count   first_year   max_year   firmID_to_use |
      |------------------------------------------------------------------|
      | 1074156   101953       1         2004       2004           32992 |
      | 1074156    32992      28         1989       2016           32992 |
      |------------------------------------------------------------------|
      | 1119794   138872       1         2015       2015            6591 |
      | 1119794     6256      15         2001       2016            6591 |
      | 1119794     6591      16         2001       2016            6591 |
      |------------------------------------------------------------------|
      | 1245415   113442      30         1987       2016          113442 |
      +------------------------------------------------------------------+
    Then you can merge this into your master data and then keep if firmID = firmID_to_use.
    Note that collapse deletes data, so save your master dataset, or rename it before collapsing.

    Comment


    • #3
      Thank you!

      Comment

      Working...
      X