Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping variables in panel

    Hi all,

    I have an unbalanced panel. The complete panel spans 11 periods from 2004 to 2015. I would like to keep just the "balanced" firms having observations from 2004 to 2015. The unbalanced panel looks like:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double idfirm float Year
    808 2004
    808 2004
    808 2004
    808 2004
    808 2005
    808 2005
    808 2005
    808 2005
    808 2006
    808 2006
    808 2006
    808 2006
    808 2007
    808 2007
    808 2007
    808 2007
    808 2008
    808 2008
    808 2008
    808 2008
    808 2009
    808 2009
    808 2009
    808 2009
    808 2010
    808 2010
    808 2010
    808 2010
    808 2011
    808 2011
    808 2011
    808 2011
    808 2012
    808 2012
    808 2012
    808 2012
    808 2013
    808 2013
    808 2013
    808 2013
    808 2014
    808 2014
    808 2014
    808 2014
    808 2015
    808 2015
    808 2015
    808 2015
    808 2004
    808 2004
    808 2004
    808 2004
    808 2005
    808 2005
    808 2005
    808 2005
    808 2006
    808 2006
    808 2006
    808 2006
    808 2007
    808 2007
    808 2007
    808 2007
    808 2008
    808 2008
    808 2008
    808 2008
    808 2009
    808 2009
    808 2009
    808 2009
    808 2010
    808 2010
    808 2010
    808 2010
    808 2011
    808 2011
    808 2011
    808 2011
    808 2012
    808 2012
    808 2012
    808 2012
    808 2013
    808 2013
    808 2013
    808 2013
    808 2014
    808 2014
    808 2014
    808 2014
    808 2015
    808 2015
    808 2015
    808 2015
    808 2004
    808 2004
    808 2004
    808 2004
    end
    I have employed the following code:[CODE
    ]tempvar q
    bysort idfirm Year: gen `q' = _n
    gen byte balanced = 0
    replace balanced = 1 if `q'==11

    drop if balanced==0[/CODE]
    do you think it is fine?

  • #2
    What you show is not panel data in the ordinary sense. You have multiple observations of the same firm in the same year. It is difficult to know what "balanced" even means in this context. Please clarify.

    Comment


    • #3
      Federico,

      It looks like the same firm-year combination is in your data many times (perhaps because each observation is actually a firm-year-product combination?). To get firms that are present in the data each year from 2004 to 2015, I would create a new dataset that collapses down to 1 observation per firm-year.

      So you could do:
      Code:
      duplicates drop idfirm Year
      Code:
      dataex firm year  // data shared via  -dataex-. To install: ssc install dataex
      clear
      input byte firm int year
      1 2004
      1 2005
      1 2006
      1 2007
      1 2008
      1 2009
      2 2005
      2 2006
      2 2007
      2 2008
      2 2009
      3 2004
      3 2005
      3 2006
      3 2007
      3 2008
      4 2004
      4 2005
      4 2006
      4 2007
      4 2008
      4 2009
      end
      
      * Above data is only for 2004-2009
      bysort firm (year): gen count = _N
      
      . list, sepby(firm) noobs
      
        +---------------------+
        | firm   year   count |
        |---------------------|
        |    1   2004       6 |
        |    1   2005       6 |
        |    1   2006       6 |
        |    1   2007       6 |
        |    1   2008       6 |
        |    1   2009       6 |
        |---------------------|
        |    2   2005       5 |
        |    2   2006       5 |
        |    2   2007       5 |
        |    2   2008       5 |
        |    2   2009       5 |
        |---------------------|
        |    3   2004       5 |
        |    3   2005       5 |
        |    3   2006       5 |
        |    3   2007       5 |
        |    3   2008       5 |
        |---------------------|
        |    4   2004       6 |
        |    4   2005       6 |
        |    4   2006       6 |
        |    4   2007       6 |
        |    4   2008       6 |
        |    4   2009       6 |
        +---------------------+
      
      * Then do keep if count==11 in your case
      gen to_keep = 1  // this will now be==1 for all firms you will be keeping
      Once you've created to_keep==1 for all the firms that are continuously in your data, you could then collapse it down to a single obs per firm, or just merge it into your data in #1 based on firm year. The variable to_keep will equal 1 for all firms that are continuously in your data from 2004-2015 and will be missing for all others. You can then do "keep if to_keep==1."

      As an aside, I don't think there is a need to delete "unbalanced" data, and in fact, I would be concerned about survival bias for including only the firms that are continuously in your data (do they differ in important ways from firms that entered and went bankrupt or otherwise exited the industry?) Also, Stata handles such "unbalanced" data just fine. If you are getting pressure from your advisor to do it that way that's one thing, but if the choice is up to you I would leave those observations in. (Run your regressions with and without them and see what changes).
      Last edited by David Benson; 22 Jan 2019, 12:11.

      Comment


      • #4
        You are both right. Indeed I posted the wrong data. The right ones are the following, I am again sorry for the confusion:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double(idfirm Year)
         1 2008
         1 2009
         1 2010
         1 2011
         1 2012
         1 2013
         1 2014
         1 2015
         2 2004
         2 2005
         2 2006
         2 2007
         2 2008
         2 2009
         2 2010
         2 2011
         2 2012
         2 2013
         2 2014
         2 2015
         3 2004
         3 2005
         3 2006
         3 2007
         3 2008
         3 2009
         3 2010
         3 2011
         3 2012
         3 2013
         3 2014
         3 2015
         4 2004
         4 2005
         4 2006
         4 2007
         4 2008
         4 2009
         4 2010
         4 2011
         4 2012
         4 2013
         4 2014
         4 2015
         5 2007
         6 2004
         6 2005
         6 2006
         6 2007
         6 2008
         6 2009
         7 2007
         7 2008
         7 2009
         7 2010
         7 2011
         7 2012
         7 2013
         7 2014
         7 2015
         8 2004
         8 2005
         8 2006
         8 2007
         8 2008
         8 2009
         8 2010
         8 2011
         8 2012
         8 2013
         8 2014
         8 2015
         9 2004
         9 2005
         9 2006
         9 2007
         9 2008
         9 2009
         9 2010
         9 2011
         9 2012
         9 2013
         9 2014
         9 2015
        10 2004
        11 2008
        11 2009
        11 2010
        11 2011
        11 2012
        11 2013
        11 2014
        12 2012
        12 2013
        12 2014
        12 2015
        13 2004
        13 2005
        13 2006
        13 2007
        end
        The complete panel represents a firm observed for 11 years from 2004 to 2015. However, being the panel unbalanced, some firms may be observed for less years (either they are observed from a Year later than 2004 or they "die" before 2015). My aim is to keep just those firms that have a complete panel, that is those firms that are observed from 2004 to 2015.

        Thank you for both replies and sorry for the mistake!
        Last edited by Federico Nutarelli; 23 Jan 2019, 01:11.

        Comment


        • #5
          Hello Federico,

          As I understand, you want to create a balance panel data. For that, I would recommend this code:
          Code:
          duplicates report idfirm // this command just for checking
          duplicates tag idfirm, gen(dup) // create a variable "dup' to count how many time that firm repeat
          drop if dup!=11 // drop all firm that not repeats 11 years
          drop dup
          Hope it help!

          Comment


          • #6
            Federico Nutarelli
            ​​​​​​​Sorry, my bad. just change command
            Code:
            drop if dup!=11
            by command
            Code:
            drop if dup!=10
            because previous command count how many duplicates, and duplicates==10 mean that firm repeats 11 years

            Comment


            • #7
              Let me just point out that if the complete data is supposed to run 2004 through 2015 (including both of those years), then the number of observations to target is not 11, it is 12. (corresponding to dup != 11).

              Comment


              • #8
                Perfect. Sorry again for my mistake. Thank you very much for the appreciated help!

                Comment

                Working...
                X