Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to drop observations in an unbalanced panel.

    Hello Statalists,

    I have a question about dropping observations in STATA.

    My dataset is a panel with many firms, and each firm has a few years. But the panel is not balanced, as a lot of firms don't have obs for all years (2003-2012). If a firm doesn't have some years, they are not showing "." in the data, they just don't exist in the data. The example of the dataset is as below.

    id year VAR
    1 2009 6
    1 2010 7
    1 2011 8
    1 2012 9
    2 2003 0
    2 2004 1
    2 2005 2
    2 2006 3
    2 2007 4
    2 2008 5
    2 2009 6
    2 2010 7
    2 2011 8
    2 2012 9

    In the example above, id=2 has the full rank of years. However, id=1 does not have years before 2009. I would like to drop all the id that don't have years before 2009. I tried the following code but it didn't work.

    bysort id (VAR): drop if missing(VAR<=5)

    May I have your advice on how to deal with this issue?

    Also, I would also like to try running regressions only on the firms that have the full rank of years. How can I keep these firms and drop all the others?

    Thank you very much,
    Chenli

  • #2
    Hi!

    Originally posted by Chenli You View Post
    […]I would like to drop all the id that don't have years before 2009. I tried the following code but it didn't work.[…]
    Your code does not make sense, with regards to what you want to achieve. Anyways, you can simply do exactly what you wrote in your text:
    1. identify the first year per firm id in the data
    2. drop all observations whose first year is 2009 or later

    Code:
    * generate and show example data
    clear
    input id year VAR
    1 2009 6
    1 2010 7
    1 2011 8
    1 2012 9
    2 2003 0
    2 2004 1
    2 2005 2
    2 2006 3
    2 2007 4
    2 2008 5
    2 2009 6
    2 2010 7
    2 2011 8
    2 2012 9
    end
    list , sepby(id)
    
    * 1. calculate variable containing the minimum year per id
    bysort id (year) : generate firstyear=year[1]
    list , sepby(id)
    
    * 2. keep firms with a firstyear<2009 only
    keep if (firstyear<2009)
    list , sepby(id)
    Those two steps even can be shortened to a one-liner:
    Code:
    bysort id (year) : keep if year[1]<2009
    Note that it does not make a difference if you "keep if year[1]<2009" or "drop if year[1]>=2009".

    Kind regards
    Bela

    Comment


    • #3
      Daniel,

      Thank you very much. This is very helpful!

      May I have a quick question, if I'd like to drop all the id that don't have years after 2009, should I just type the codes below?

      bysort id (year): drop if year[N]<=2009

      In this code, "[N]" means the last year of each id, right? And the second last would be [N-1]?

      Thanks a lot,
      Chenli

      Comment


      • #4
        Chenli:
        I think that your code needs a small tweak:
        Code:
        bysort id (year): drop if year[_N]<=2009
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo,
          This works great. Thank you very much!
          Best regards,
          Chenli

          Comment

          Working...
          X