Collapsing observations in longitudinal data

Jaycob Applegate

Join Date: Jan 2023

Posts: 40
#1

Collapsing observations in longitudinal data

05 Jan 2023, 14:03

Hi all,

I had a question about how to approach my data problem. I start with a file that is child-parent level. In image 1, "hhidpn" uniquely represents respondents while "kidid" represents children. The file is wide such that "k5age" is the child's age at wave 5 and "k6age" is the child's age at wave 6. I start by converting this to a long format file such that parent-child pairs each have 8 observations (since I am only interested in waves 5-12). This is shown in image 2.

For the final step, my goal is to have this be a respondent level file instead of respond-kid file. In Image 3, you can see where I currently stand. I have parent-child rows, but I want these to be merged. In the image, I have circled the cells that I would like to match up. In this case, kidid would become irrelevant and now, keduc1 would be the education of child 1, keduc2 the education of child 2, etc. I've tried using "collapse(firstnm)" by 'hhidpn and wave." This works, but the missing codes are important. For example, when collapsing, ".p" and ".n" simply become ".". I could convert these to numbers, collapse, and then reconvert, but I have many different extended missing codes and they mean different things for different variables. Alternatively, is there anyway to make Stata not read ".p" ".m" etc as missing? In an ideal world, if only "." counted as missing, I would have no issue.

Does anyone have any suggestions? I apologize is this is convoluted.

As a final note, the missing patterns are random throughout the data. Some respondents have almost every wave, some have none.

Image 1

Image 2

Image 3
Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1058

05 Jan 2023, 17:19

Alternatively, is there anyway to make Stata not read ".p" ".m" etc as missing? In an ideal world, if only "." counted as missing, I would have no issue.

Check out mvencode and mvdecode.

Code:

* Create a sample dataset
clear
input long (hhidpn kidid) wave keduc1 keduc2 keduc3
10465010 0104650151  5 12 . .
10465010 0104650151  6 . . .
10465010 0104650151  7 . . .
10465010 0104650151  8 .p . .
10465010 0104650151  9 . . .
10465010 0104650151 10 . . .
10465010 0104650151 11 . . .
10465010 0104650151 12 . . .
10465010 0104650152  5 . 12 .
10465010 0104650152  6 . . .
10465010 0104650152  7 . . .
10465010 0104650152  8 . .p .
10465010 0104650152  9 . . .
10465010 0104650152 10 . . .
10465010 0104650152 11 . . .
10465010 0104650152 12 . . .
10465010 0104650153  5 . . 13
10465010 0104650153  6 . . .
10465010 0104650153  7 . . .
10465010 0104650153  8 . . .n
10465010 0104650153  9 . . .
10465010 0104650153 10 . . .
10465010 0104650153 11 . . .
10465010 0104650153 12 . . .
end

* encode > collapse > decode:
mvencode keduc*, mv(.p=-9\.n=-8)
collapse (firstnm) keduc*, by(hhidpn wave)
mvdecode keduc*, mv(-9=.p\-8=.n)

Result:

Code:

     +--------------------------------------------+
     |   hhidpn   wave   keduc1   keduc2   keduc3 |
     |--------------------------------------------|
  1. | 10465010      5       12       12       13 |
  2. | 10465010      6        .        .        . |
  3. | 10465010      7        .        .        . |
  4. | 10465010      8       .p       .p       .n |
  5. | 10465010      9        .        .        . |
  6. | 10465010     10        .        .        . |
  7. | 10465010     11        .        .        . |
  8. | 10465010     12        .        .        . |
     +--------------------------------------------+

Comment

Jaycob Applegate

Join Date: Jan 2023
Posts: 40

06 Jan 2023, 10:15

Originally posted by Ken Chui View Post

Check out mvencode and mvdecode.

Code:

* Create a sample dataset
clear
input long (hhidpn kidid) wave keduc1 keduc2 keduc3
10465010 0104650151 5 12 . .
10465010 0104650151 6 . . .
10465010 0104650151 7 . . .
10465010 0104650151 8 .p . .
10465010 0104650151 9 . . .
10465010 0104650151 10 . . .
10465010 0104650151 11 . . .
10465010 0104650151 12 . . .
10465010 0104650152 5 . 12 .
10465010 0104650152 6 . . .
10465010 0104650152 7 . . .
10465010 0104650152 8 . .p .
10465010 0104650152 9 . . .
10465010 0104650152 10 . . .
10465010 0104650152 11 . . .
10465010 0104650152 12 . . .
10465010 0104650153 5 . . 13
10465010 0104650153 6 . . .
10465010 0104650153 7 . . .
10465010 0104650153 8 . . .n
10465010 0104650153 9 . . .
10465010 0104650153 10 . . .
10465010 0104650153 11 . . .
10465010 0104650153 12 . . .
end

* encode > collapse > decode:
mvencode keduc*, mv(.p=-9\.n=-8)
collapse (firstnm) keduc*, by(hhidpn wave)
mvdecode keduc*, mv(-9=.p\-8=.n)

Result:

Code:

 +--------------------------------------------+
| hhidpn wave keduc1 keduc2 keduc3 |
|--------------------------------------------|
1. | 10465010 5 12 12 13 |
2. | 10465010 6 . . . |
3. | 10465010 7 . . . |
4. | 10465010 8 .p .p .n |
5. | 10465010 9 . . . |
6. | 10465010 10 . . . |
7. | 10465010 11 . . . |
8. | 10465010 12 . . . |
+--------------------------------------------+

Thanks for this. I think this will be my best option. Do you run the risk of assigning missing, say ".p = -9" to a value that already exists and/or is relevant?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30082
#4

06 Jan 2023, 10:55

-mvencode- will not let you do that. If you mistakenly specify such a value, it will give you an error message and halt execution instead.

In this regard, -mvencode- is like most Stata data management commands: they prevent you from inadvertently mangling your data. (They also permit you to deliberately mangle your data by specifying an option that says you want to.)

Last edited by Clyde Schechter; 06 Jan 2023, 10:57.
2 likes
Comment

Announcement