Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations if entity does not exist in a certain year

    Hi,

    I have panel data from 2002-2016. I have an indicator variable (POST) that is equal to 1 post-2010 and 0 pre-2010. I want to drop all entities (firms, indicated by gvkey) that do not exist in 2010.

    I.e. where such a comparison between pre and post is not possible.

    Is there a command that can do this? I have already dropped all missing values by:
    drop if missing(invt, at, ib, sale, prcc_f, prcc_c, csho, cfo, cogs, xad, xrd, xsga)

    So, I am not looking to drop observations if they are missing specific variables, but simply if the firm does not exist in 2010 (so if it only have data let's say from 2002 until 2009)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double year long gvkey int sic str10 rating
    2002 1408 2085 "A"  
    2003 1408 2085 "A"  
    2004 1408 2085 "A"  
    2005 1408 2085 "A"  
    2006 1408 2085 "BBB"
    2007 1408 2085 "BBB"
    2008 1408 2085 "BBB"
    2009 1408 2085 "BBB"
    2010 1408 2085 "BBB"
    2011 1408 2085 "BBB"
    2012 1408 2085 "BBB"
    2004 1478 2834 "A"  
    2005 1478 2834 "A"  
    2006 1478 2834 "A"  
    2007 1478 2834 "A"  
    2008 1478 2834 "A"  
    2003 1794 2820 "BBB"
    2004 1794 2820 "BBB"
    2005 1794 2820 "BBB"
    2006 1794 2820 "BB" 
    2007 1794 2820 "BB" 
    2008 1794 2820 "BB" 
    2009 1794 2820 "BB" 
    2010 1794 2820 "BB" 
    2011 1794 2820 "BB" 
    2012 1794 2820 "BB" 
    2013 1794 2820 "BB" 
    2014 1794 2820 "BB" 
    2015 1794 2820 "BB" 
    2016 1794 2820 "BB" 
    2002 1913 2670 "A"  
    2003 1913 2670 "A"  
    2004 1913 2670 "A"  
    2005 1913 2670 "A"  
    2006 1913 2670 "A"  
    2007 1913 2670 "A"  
    2008 1913 2670 "BBB"
    2009 1913 2670 "BBB"
    2010 1913 2670 "BBB"
    2011 1913 2670 "BBB"
    2012 1913 2670 "BBB"
    2013 1913 2670 "BBB"
    2002 2044 3841 "BBB"
    2003 2044 3841 "BBB"
    2004 2044 3841 "BBB"
    2005 2044 3841 "A"  
    2006 2044 3841 "A"  
    2007 2044 3841 "A"  
    2008 2044 3841 "A"  
    2009 2044 3841 "A"  
    2010 2044 3841 "A"  
    2011 2044 3841 "A"  
    2012 2044 3841 "A"  
    2013 2044 3841 "A"  
    2014 2044 3841 "A"  
    2015 2044 3841 "A"  
    2016 2044 3841 "A"  
    2002 2085 2834 "BBB"
    2003 2085 2834 "BBB"
    2004 2085 2834 "BBB"
    2005 2085 2834 "BBB"
    2006 2085 2834 "BBB"
    2002 2255 3540 "BBB"
    2003 2255 3540 "BBB"
    2004 2255 3540 "BBB"
    2005 2255 3540 "BBB"
    2006 2255 3540 "BBB"
    2007 2255 3540 "BBB"
    2008 2255 3540 "BBB"
    2009 2255 3540 "BBB"
    2002 2393 3510 "BB" 
    2003 2393 3510 "BB" 
    2004 2393 3510 "BB" 
    2005 2393 3510 "BBB"
    2006 2393 3510 "BBB"
    2007 2393 3510 "BB" 
    2008 2393 3510 "BB" 
    2009 2393 3510 "BB" 
    2010 2393 3510 "BB" 
    2011 2393 3510 "BB" 
    2012 2393 3510 "BB" 
    2013 2393 3510 "BB" 
    2014 2393 3510 "BB" 
    2015 2393 3510 "BB" 
    2016 2393 3510 "BB" 
    2002 2403 2834 "AA" 
    2003 2403 2834 "AA" 
    2004 2403 2834 "A"  
    2005 2403 2834 "A"  
    2006 2403 2834 "A"  
    2007 2403 2834 "A"  
    2008 2403 2834 "A"  
    2009 2403 2834 "A"  
    2010 2403 2834 "A"  
    2011 2403 2834 "A"  
    2012 2403 2834 "A"  
    2013 2403 2834 "A"  
    2014 2403 2834 "A"  
    2015 2403 2834 "A"  
    2016 2403 2834 "A"  
    end

  • #2
    Code:
    egen is2010 = total(year == 2010), by(gvkey) 
    keep if is2010

    Comment


    • #3
      I find your explanation a bit confusing, but the following two commands will give you a 0 (not there in 2010) or a 1 (there in 2010) for each row (assuming, as I did, that gvkey is your id for firm):
      Code:
      . gen byte yr2010=year==2010
      . bys gvkey: egen byte has2010=max(yr2010)
      then you can just
      Code:
      drop if has2010==0

      Comment


      • #4
        Thank you both. They both do what I want and sorry for the confusion Rich

        Comment


        • #5
          I have already dropped all missing values by:
          drop if missing(invt, at, ib, sale, prcc_f, prcc_c, csho, cfo, cogs, xad, xrd, xsga)
          As far as I know, Stata's drop if missing() does a listwise deletion.
          Dropping cases with missing observations on one or a few of several variables assumes that the missing data occur completely at random. They rarely do.

          Although listwise deletion is common, it tends to introduce a bias. Alternatives are to used multiple imputation or some full information technique as available in SEM, or to use Bayes. (Substituting a missing value with the sample mean is not a good idea.)

          If I would use listwise deletion, I would at least ensure that only a few observations/cases are dropped, unless I know that the "missingness" in the data is truly random.
          Last edited by Christopher Bratt; 10 Jun 2021, 01:50.

          Comment

          Working...
          X