Dropping unneeded observations in panel data

Florian Berger

Join Date: Jan 2018

Posts: 15
#1

Dropping unneeded observations in panel data

22 Jan 2018, 06:10

Hello everyone,

I hope this is the right place for this questions. If not please tell me where the right place is, it is my first posting after all.

So, I want to drop several oberservations from my dataset. Everyone in the set has an unique ID (panel variable) and the time variable is the year variable (1984 - 2016).

I want to drop everyone who hasn't the value "retired" in a variable called "labor force status" at any time. But I want to keep every observation from every year for the persons which do have the value "retired" at any time.
Basically I want to keep all the information for pensioners, and all the information for people still working should be dropped.

It would be really helpful if somebody could help me with it, because I rarely worked with panel data before.

Greetings
Florian
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35724

22 Jan 2018, 06:23

Please see FAQ Advice 12 on posting data examples. https://www.statalist.org/forums/help#stata

Here are two guesses, that being retired is conveyed by a particular numeric code and that it is conveyed by a particular string value. The principle used is the same in either case: that a identifier being some value in any year (at least one year) is identifiable by the maximum of a true or false expression being 1 over the panel. (More generaliy, and more concisely, any <-> max and all <-> min over true-or-false expressions.)

For lengthier discussion, see the FAQ https://www.stata.com/support/faqs/d...ble-recording/

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id year numstatus) str8 strstatus
1 2015 4 "employed"
1 2016 7 "retired" 
2 2015 4 "employed"
2 2006 4 "employed"
end

egen wanted1 = max(numstatus == 7), by(id)
egen wanted2 = max(strstatus == "retired"), by(id)

list, sepby(id)

     +-----------------------------------------------------+
     | id   year   numsta~s   strsta~s   wanted1   wanted2 |
     |-----------------------------------------------------|
  1. |  1   2015          4   employed         1         1 |
  2. |  1   2016          7    retired         1         1 |
     |-----------------------------------------------------|
  3. |  2   2015          4   employed         0         0 |
  4. |  2   2006          4   employed         0         0 |
     +-----------------------------------------------------+

Once you have such a variable, do something like

Code:

keep if wanted1

Comment

Florian Berger

Join Date: Jan 2018

Posts: 15
#3

22 Jan 2018, 08:39

This was really helpful. Thank you
Comment
Florian Berger

Join Date: Jan 2018

Posts: 15
#4

19 Feb 2018, 06:16

I have two questions regarding a similar topic.
Hopefully the code example is adequate.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long pid int(syear birth occupation Life-satisfaction) 201 1983 1926 12 5 201 1984 1926 12 6 201 1985 1926 13 7 201 1986 1926 13 8 201 1987 1926 13 7

So I want to generate a variable which gives me the life satisfaction of the person, when he/she first retires. In this case in 1985 and the variable should give me always the value of 7. How do I do that for the whole dataset?

Second question:

How can I drop people from the dataset which were unemployed (12) for only one year before retiring (13). In this case the person 201 wouldn't be dropped because he was unemployed for two years before.

Thanks a lot.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

19 Feb 2018, 06:39

See the concurrent thread https://www.statalist.org/forums/for...30520-anywatch

See also Sections 9 and 10 of Speaking Stata: Compared with ...

http://www.stata-journal.com/sjpdf.h...iclenum=dm0055

and mentions of dm0055 in the forum.

Code:

clear
input long pid int(syear birth occupation Life_satisfaction)
 201 1983 1926  12  5
 201 1984 1926  12  6
 201 1985 1926  13  7
 201 1986 1926  13  8
 201 1987 1926  13  7
 end 
 
bysort pid : egen retire = min(cond(occupation == 13, syear, .)) 
by pid : egen satis_retire = min(cond(syear == retire, Life, .)) 
 
list 

    +---------------------------------------------------------------+
     | pid   syear   birth   occupa~n   Life_s~n   retire   satis_~e |
     |---------------------------------------------------------------|
  1. | 201    1983    1926         12          5     1985          7 |
  2. | 201    1984    1926         12          6     1985          7 |
  3. | 201    1985    1926         13          7     1985          7 |
  4. | 201    1986    1926         13          8     1985          7 |
  5. | 201    1987    1926         13          7     1985          7 |
     +---------------------------------------------------------------+

Then the second problem is a twist or two on the first. You can get close to where you want via

Code:

bysort pid : egen wanted = total(cond(syear < retire, occup == 12, .))

Comment

Florian Berger

Join Date: Jan 2018

Posts: 15
#6

19 Feb 2018, 07:07

I really appreciate the help. And I'll try to adjust to the rules of this forum. Thanks a lot, again.
Comment
Florian Berger

Join Date: Jan 2018

Posts: 15
#7

14 Apr 2018, 03:55

Sadly, I have yet another question.

I want to run a regression to find out what has an effect on my dependent variable happiness. Simply put, there are missing values in on of my independent variables income. Should i ignore them and just run the regression anyways? Or should I limit my sample so every value for happiness has a corresponding value for income? Is there another way to fix this problem?

Thank you
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#8

14 Apr 2018, 03:58

I'd post that as a new question. But simply Stata ignores missing values by default any way, so your choice is no choice.

Another way to approach the problem is multiple imputation.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

14 Apr 2018, 04:12

Florina:
as an aside to Nick's helpful advice, you should investigate first if the missingness that you detected in your dataset is ignorable or not.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement