Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making balanced panel by dropping observation based on a certain variable

    I want to make a balanced panel based on my ln_wage outcome variable. In my data most of data has 22 observations fro 22 years. But, there are some counties which doesn't have 22 observation and I need to drop them to make a balanced panel . Can you kindly advise how I can do that by coding ? If a county doesn't have 22 observation in ln_wage variable , then I need to drop the county from whole sample. How can I execute that ?

    Code:
     tab county if ln_wage !=.
    
         county |      Freq.     Percent        Cum.
    ------------+-----------------------------------
           1003 |         18        0.12        0.12
           1005 |          3        0.02        0.13
           1015 |         21        0.13        0.27
           1017 |          4        0.03        0.29
           1049 |          3        0.02        0.31
           1051 |          1        0.01        0.32
           1055 |         22        0.14        0.46
           1069 |         22        0.14        0.60
           1073 |         22        0.14        0.74
           1077 |          6        0.04        0.78
           1081 |         12        0.08        0.86
           1083 |          2        0.01        0.87
           1089 |         22        0.14        1.01
           1093 |          6        0.04        1.05
           1095 |         22        0.14        1.19
           1097 |         22        0.14        1.33
           1101 |         22        0.14        1.47
           1103 |         22        0.14        1.61
    end

    I've given a sample of my data in the following section

    [CODE]
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double county float ln_wage int year
    1001         . 2001
    1001         . 2002
    1001         . 2006
    1001         . 2007
    1001         . 2008
    1001         . 2009
    1001         . 2010
    1001         . 2011
    1001         . 2012
    1001         . 2013
    1001         . 2014
    1001         . 2015
    1001         . 2016
    1003         . 2001
    1003         . 2002
    1003         . 2003
    1003  8.852218 2004
    1003  9.299419 2005
    1003  9.502734 2006
    1003   9.36311 2007
    1003  9.166079 2008
    1003  9.166397 2009
    1003  9.187581 2010
    1003  9.142589 2011
    1003  9.294462 2012
    1003  9.279741 2013
    1003 9.3235655 2014
    1003  8.929891 2015
    1003  8.941599 2016
    1003  8.965699 2017
    1003  8.934246 2018
    1003  8.908757 2019
    1003  8.950823 2020
    1003  8.973106 2021
    1005         . 2003
    1005   8.66577 2004
    1005  8.836918 2005
    1005  8.686667 2006
    1005         . 2007
    1005         . 2008
    1005         . 2009
    1005         . 2010
    1005         . 2011
    1005         . 2012
    
    end
    Last edited by Tariq Abdullah; 25 Sep 2022, 15:54.

  • #2
    Code:
    drop if missing(ln_wage)
    isid county year, sort
    by county (year): keep if _N == 22

    Comment


    • #3
      Mr. Schechter,

      My humble gratitude as usual for your thoughtful and elegant snippet of coding. Tried several ways but couldn't execute it. It worked perfectly for my dataset. Highly obliged!

      Comment


      • #4
        Tariq:
        you should also deleve into the reason(s) of that missingness: otherwise, considering only the "cream" of your observations (that is, the panels that have -ln_wage- observed for all the waves of data) you may end up with a sample that has a tenuous relationship with your original dataset.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks so much Mr. Lazzaro for showing me the perspective which I was missing. But, honestly, everytime I delve into the missing wage literature I got absorbed with so many solutions for so many unique situations that I get confused. Nonetheless, as you correctly pointed out, I need to delve deep into this , and figure out what scenario is the correct step for my dataset. I highly appreciate your insightful and kind feedback !

          Comment

          Working...
          X