Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapsing observations in longitudinal data

    Hi all,

    I had a question about how to approach my data problem. I start with a file that is child-parent level. In image 1, "hhidpn" uniquely represents respondents while "kidid" represents children. The file is wide such that "k5age" is the child's age at wave 5 and "k6age" is the child's age at wave 6. I start by converting this to a long format file such that parent-child pairs each have 8 observations (since I am only interested in waves 5-12). This is shown in image 2.

    For the final step, my goal is to have this be a respondent level file instead of respond-kid file. In Image 3, you can see where I currently stand. I have parent-child rows, but I want these to be merged. In the image, I have circled the cells that I would like to match up. In this case, kidid would become irrelevant and now, keduc1 would be the education of child 1, keduc2 the education of child 2, etc. I've tried using "collapse(firstnm)" by 'hhidpn and wave." This works, but the missing codes are important. For example, when collapsing, ".p" and ".n" simply become ".". I could convert these to numbers, collapse, and then reconvert, but I have many different extended missing codes and they mean different things for different variables. Alternatively, is there anyway to make Stata not read ".p" ".m" etc as missing? In an ideal world, if only "." counted as missing, I would have no issue.

    Does anyone have any suggestions? I apologize is this is convoluted.


    As a final note, the missing patterns are random throughout the data. Some respondents have almost every wave, some have none.


    Image 1
    Click image for larger version

Name:	Ex 2.PNG
Views:	1
Size:	29.2 KB
ID:	1696089


    Image 2
    Click image for larger version

Name:	Ex 3.PNG
Views:	1
Size:	25.1 KB
ID:	1696090


    Image 3
    Click image for larger version

Name:	Ex 1.PNG
Views:	1
Size:	49.6 KB
ID:	1696091

  • #2
    Alternatively, is there anyway to make Stata not read ".p" ".m" etc as missing? In an ideal world, if only "." counted as missing, I would have no issue.
    Check out mvencode and mvdecode.

    Code:
    * Create a sample dataset
    clear
    input long (hhidpn kidid) wave keduc1 keduc2 keduc3
    10465010 0104650151  5 12 . .
    10465010 0104650151  6 . . .
    10465010 0104650151  7 . . .
    10465010 0104650151  8 .p . .
    10465010 0104650151  9 . . .
    10465010 0104650151 10 . . .
    10465010 0104650151 11 . . .
    10465010 0104650151 12 . . .
    10465010 0104650152  5 . 12 .
    10465010 0104650152  6 . . .
    10465010 0104650152  7 . . .
    10465010 0104650152  8 . .p .
    10465010 0104650152  9 . . .
    10465010 0104650152 10 . . .
    10465010 0104650152 11 . . .
    10465010 0104650152 12 . . .
    10465010 0104650153  5 . . 13
    10465010 0104650153  6 . . .
    10465010 0104650153  7 . . .
    10465010 0104650153  8 . . .n
    10465010 0104650153  9 . . .
    10465010 0104650153 10 . . .
    10465010 0104650153 11 . . .
    10465010 0104650153 12 . . .
    end
    
    * encode > collapse > decode:
    mvencode keduc*, mv(.p=-9\.n=-8)
    collapse (firstnm) keduc*, by(hhidpn wave)
    mvdecode keduc*, mv(-9=.p\-8=.n)
    Result:
    Code:
         +--------------------------------------------+
         |   hhidpn   wave   keduc1   keduc2   keduc3 |
         |--------------------------------------------|
      1. | 10465010      5       12       12       13 |
      2. | 10465010      6        .        .        . |
      3. | 10465010      7        .        .        . |
      4. | 10465010      8       .p       .p       .n |
      5. | 10465010      9        .        .        . |
      6. | 10465010     10        .        .        . |
      7. | 10465010     11        .        .        . |
      8. | 10465010     12        .        .        . |
         +--------------------------------------------+

    Comment


    • #3
      Originally posted by Ken Chui View Post

      Check out mvencode and mvdecode.

      Code:
      * Create a sample dataset
      clear
      input long (hhidpn kidid) wave keduc1 keduc2 keduc3
      10465010 0104650151 5 12 . .
      10465010 0104650151 6 . . .
      10465010 0104650151 7 . . .
      10465010 0104650151 8 .p . .
      10465010 0104650151 9 . . .
      10465010 0104650151 10 . . .
      10465010 0104650151 11 . . .
      10465010 0104650151 12 . . .
      10465010 0104650152 5 . 12 .
      10465010 0104650152 6 . . .
      10465010 0104650152 7 . . .
      10465010 0104650152 8 . .p .
      10465010 0104650152 9 . . .
      10465010 0104650152 10 . . .
      10465010 0104650152 11 . . .
      10465010 0104650152 12 . . .
      10465010 0104650153 5 . . 13
      10465010 0104650153 6 . . .
      10465010 0104650153 7 . . .
      10465010 0104650153 8 . . .n
      10465010 0104650153 9 . . .
      10465010 0104650153 10 . . .
      10465010 0104650153 11 . . .
      10465010 0104650153 12 . . .
      end
      
      * encode > collapse > decode:
      mvencode keduc*, mv(.p=-9\.n=-8)
      collapse (firstnm) keduc*, by(hhidpn wave)
      mvdecode keduc*, mv(-9=.p\-8=.n)
      Result:
      Code:
       +--------------------------------------------+
      | hhidpn wave keduc1 keduc2 keduc3 |
      |--------------------------------------------|
      1. | 10465010 5 12 12 13 |
      2. | 10465010 6 . . . |
      3. | 10465010 7 . . . |
      4. | 10465010 8 .p .p .n |
      5. | 10465010 9 . . . |
      6. | 10465010 10 . . . |
      7. | 10465010 11 . . . |
      8. | 10465010 12 . . . |
      +--------------------------------------------+
      Thanks for this. I think this will be my best option. Do you run the risk of assigning missing, say ".p = -9" to a value that already exists and/or is relevant?

      Comment


      • #4
        -mvencode- will not let you do that. If you mistakenly specify such a value, it will give you an error message and halt execution instead.

        In this regard, -mvencode- is like most Stata data management commands: they prevent you from inadvertently mangling your data. (They also permit you to deliberately mangle your data by specifying an option that says you want to.)
        Last edited by Clyde Schechter; 06 Jan 2023, 10:57.

        Comment

        Working...
        X