Creating new observations for each value in id-variable

Saliha Amrani

Join Date: Sep 2018
Posts: 14

Creating new observations for each value in id-variable

24 Sep 2018, 04:36

Hi,
I need some help expanding my data set.

I hope this introduction is not too long.

I have a dataset with monthly observation of people who receive different benefits and their labor market status between Jan2011 to Feb2018. The issue is that I only have observations for the months the person actually received a benefit or has a registered status. The number of monthly observations per individual is therefore not "balanced". I want to "balance" the data set so that I have the same number of observations per individual, and for the months there is no data on the individual, it should be missing. I want the data set to look something like this:

For example for individual 1 I only have their status in jan-feb 2011 and april-july2012:

ID	Monthyear	benefit	workstatus
1	2011m1	x	y
1	2011m2	x	.
1	2011m3	.	.
1	2011m4	.	.
1	2011m5	.	.
1	and so on
1	2012m4	z	q
1	2012m5	z	q
1	2012m6	x	q
1	2012m7	z	y
1	2012m8	.	.
1	2012m9	.	.
1	and so on until
1	2018m1	.	.
1	2018m2	.	.
2	and so on

While now it looks like this:

id	monthyear	benefit	workstatus
1	2011m1	x	y
1	2011m2	x	.
1	2012m4	z	q
1	2012m5	z	q
1	2012m6	x	q
1	2012m7	z	y
2

How do I do this? (Its a large dataset with 240 000 individuals)

Saliha

Last edited by Saliha Amrani; 24 Sep 2018, 05:04.

Tags: None

Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

24 Sep 2018, 05:48

The short answer is:

Code:

tsfill, full

See also the guide for tsfill, including an example with monthly data: https://www.stata.com/manuals13/tstsfill.pdf

It's important that your dates are proper Stata dates, though. In the future, please use dataex to post example data, so tat readers here immediately understand what your data is like. See more on how and why of dataex in the FAQ, section12: https://www.statalist.org/forums/help#stata

Further: you might also think about what you would do with all those observations with missing data. Can you interpolate? Running a regression with an unbalanced panel (no observations for e.g., 2012m3) is the same as running the regression with that observation but with all missing values.
Comment

Daniel Bela

Join Date: Apr 2014
Posts: 246

24 Sep 2018, 05:52

I fully support Jorrit's advice. Posting data examples using -dataex- is crucial for getting quick an correct replies.

I just want to add: The short answer, in my opinion, is

Code:

tsset id monthyear
tsfill , full

And this is the long answer, including a minimal working example:

Code:

clear
input id monthyear str1(benefit) str1(workstatus)
1     624     "x"     "y"
1     625    "x"     ""
1     626     "z"     "q"
1     629     "z"     "q"
2     625    "x"    ""        
2     628    ""    "z"
end
format monthyear %tm
list , sepby(id)

tsset id monthyear
tsfill , full
list , sepby(id)

Regards
Bela

Comment

Saliha Amrani

Join Date: Sep 2018

Posts: 14
#4

24 Sep 2018, 06:57

thank you both! I used tsfill and it worked. There are two more issues I don't know the solution to. I also have som fixed characteristics like gender and municipality that become missing in the new observations. How do I "duplicate" or keep them when I use tsfill? And does it matter whether gender and municipality are string or numeric variables?

AND: Is there an easy way to check whether an individual has moved, i.e. if one individual has different values in "municipality"?

Thank you in advance!

Regards,
Saliha
Comment
Saliha Amrani

Join Date: Sep 2018

Posts: 14
#5

24 Sep 2018, 07:05

The missing values will be considered "breaks" in benefits or time away from labor force that i will calculate later.
Comment

Jorrit Gosens

Join Date: Jan 2015
Posts: 1019

24 Sep 2018, 07:53

Please do follow the advice on dataex, also for follow-up questions. With a data example, your questions is a whole lot easier to answer.

Some fake example data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id monthyear) str1(benefit workstatus) str9 municip str6 sex
1 624 "x" "y" "" ""
1 625 "x" ""  ""         ""    
1 626 "z" "q" "sometown"         "male"    
1 629 "z" "q" ""         ""    
2 625 "x" ""  ""         ""    
2 628 ""  "z" "othertown"         "female"   
3 628 "x" ""  ""         ""    
3 629 ""  "z" ""         "female"    
end

code to run:

Code:

format monthyear %tm
tsset id monthyear
tsfill , full
bys id (municip): replace municip=municip[_N]
bys id (sex): replace sex=sex[_N]

Yields:

Code:

. list , sepby(id)

     +---------------------------------------------------------+
     | id   monthy~r   benefit   workst~s     municip      sex |
     |---------------------------------------------------------|
  1. |  1     2012m2         x               sometown     male |
  2. |  1     2012m6         z          q    sometown     male |
  3. |  1     2012m4                         sometown     male |
  4. |  1     2012m1         x          y    sometown     male |
  5. |  1     2012m5                         sometown     male |
  6. |  1     2012m3         z          q    sometown     male |
     |---------------------------------------------------------|
  7. |  2     2012m3                        othertown   female |
  8. |  2     2012m4                        othertown   female |
  9. |  2     2012m2         x              othertown   female |
 10. |  2     2012m1                        othertown   female |
 11. |  2     2012m6                        othertown   female |
 12. |  2     2012m5                    z   othertown   female |
     |---------------------------------------------------------|
 13. |  3     2012m2                                    female |
 14. |  3     2012m4                                    female |
 15. |  3     2012m1                                    female |
 16. |  3     2012m5         x                          female |
 17. |  3     2012m3                                    female |
 18. |  3     2012m6                    z               female |
     +---------------------------------------------------------+

.

Comment

Saliha Amrani

Join Date: Sep 2018

Posts: 14
#7

25 Sep 2018, 07:00

Thank you for your response!
Comment

Announcement

Creating new observations for each value in id-variable

Comment

Comment

Comment

Comment

Comment

Comment