Making balanced panel by dropping observation based on a certain variable

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

Making balanced panel by dropping observation based on a certain variable

25 Sep 2022, 15:51

I want to make a balanced panel based on my ln_wage outcome variable. In my data most of data has 22 observations fro 22 years. But, there are some counties which doesn't have 22 observation and I need to drop them to make a balanced panel . Can you kindly advise how I can do that by coding ? If a county doesn't have 22 observation in ln_wage variable , then I need to drop the county from whole sample. How can I execute that ?

Code:

 tab county if ln_wage !=.

     county |      Freq.     Percent        Cum.
------------+-----------------------------------
       1003 |         18        0.12        0.12
       1005 |          3        0.02        0.13
       1015 |         21        0.13        0.27
       1017 |          4        0.03        0.29
       1049 |          3        0.02        0.31
       1051 |          1        0.01        0.32
       1055 |         22        0.14        0.46
       1069 |         22        0.14        0.60
       1073 |         22        0.14        0.74
       1077 |          6        0.04        0.78
       1081 |         12        0.08        0.86
       1083 |          2        0.01        0.87
       1089 |         22        0.14        1.01
       1093 |          6        0.04        1.05
       1095 |         22        0.14        1.19
       1097 |         22        0.14        1.33
       1101 |         22        0.14        1.47
       1103 |         22        0.14        1.61
end

I've given a sample of my data in the following section

[CODE]

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double county float ln_wage int year
1001         . 2001
1001         . 2002
1001         . 2006
1001         . 2007
1001         . 2008
1001         . 2009
1001         . 2010
1001         . 2011
1001         . 2012
1001         . 2013
1001         . 2014
1001         . 2015
1001         . 2016
1003         . 2001
1003         . 2002
1003         . 2003
1003  8.852218 2004
1003  9.299419 2005
1003  9.502734 2006
1003   9.36311 2007
1003  9.166079 2008
1003  9.166397 2009
1003  9.187581 2010
1003  9.142589 2011
1003  9.294462 2012
1003  9.279741 2013
1003 9.3235655 2014
1003  8.929891 2015
1003  8.941599 2016
1003  8.965699 2017
1003  8.934246 2018
1003  8.908757 2019
1003  8.950823 2020
1003  8.973106 2021
1005         . 2003
1005   8.66577 2004
1005  8.836918 2005
1005  8.686667 2006
1005         . 2007
1005         . 2008
1005         . 2009
1005         . 2010
1005         . 2011
1005         . 2012

end

Last edited by Tariq Abdullah; 25 Sep 2022, 15:54.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

25 Sep 2022, 17:13

Code:

drop if missing(ln_wage) isid county year, sort by county (year): keep if _N == 22
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#3

25 Sep 2022, 17:35

Mr. Schechter,

My humble gratitude as usual for your thoughtful and elegant snippet of coding. Tried several ways but couldn't execute it. It worked perfectly for my dataset. Highly obliged!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17741
#4

26 Sep 2022, 01:43

Tariq:
you should also deleve into the reason(s) of that missingness: otherwise, considering only the "cream" of your observations (that is, the panels that have -ln_wage- observed for all the waves of data) you may end up with a sample that has a tenuous relationship with your original dataset.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#5

26 Sep 2022, 06:19

Thanks so much Mr. Lazzaro for showing me the perspective which I was missing. But, honestly, everytime I delve into the missing wage literature I got absorbed with so many solutions for so many unique situations that I get confused. Nonetheless, as you correctly pointed out, I need to delve deep into this , and figure out what scenario is the correct step for my dataset. I highly appreciate your insightful and kind feedback !
Comment

Announcement

Making balanced panel by dropping observation based on a certain variable

Comment

Comment

Comment

Comment