Filling missing rows by groups for string and numeric

Rachel Sore

Join Date: Apr 2016
Posts: 5

Filling missing rows by groups for string and numeric

22 Apr 2016, 00:25

I have a data file where there are several rows per person (ID) and both string and numeric variables with only one value per person. I would like to fill the missing rows by person ID.
There are several similar topics on this forum which address this situation however, as far as I am aware, none address the case when there is no entry for one ID.

A simplified version of the data are as follows:

Code:

clear
input ID str8 varstr varnum
1    "A"     .
1   "" 4.2
1   "" .
1   "" .
1   "" .
2   "" .
2   "G" .
2   "" .
2   "" .
2   "" .
3   "" 7.6
3   "" .
3   "" .
3   "" .
3   "" .
end

list, sepby(ID)

Code:

     +----------------------+
     | ID   varstr   varnum |
     |----------------------|
  1. |  1        A        . |
  2. |  1               4.2 |
  3. |  1                 . |
  4. |  1                 . |
  5. |  1                 . |
     |----------------------|
  6. |  2                 . |
  7. |  2        G        . |
  8. |  2                 . |
  9. |  2                 . |
 10. |  2                 . |
     |----------------------|
 11. |  3               7.6 |
 12. |  3                 . |
 13. |  3                 . |
 14. |  3                 . |
 15. |  3                 . |
     +----------------------+

As you can see, the single value for each ID is not always in the first row, ID 3 has no varstr value and ID 2 has no varnum value. When filling, these should be kept as missing.

I have tried

Code:

bysort ID (varstr) : replace varstr= varstr[_N]

however this only works when all IDs have exactly one non-missing value.

Advice much appreciated.

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

22 Apr 2016, 00:29

What rule do you want to follow if there are two or more non-missing values?
Comment
Rachel Sore

Join Date: Apr 2016

Posts: 5
#3

22 Apr 2016, 00:33

There is always a maximum of one non-missing value per ID (this has been checked).
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

22 Apr 2016, 00:43

I am not clear what is puzzling you then. This works for your example to the extent that seems possible.

Code:

clear
input ID str8 varstr varnum
1    "A"     .
1   "" 4.2
1   "" .
1   "" .
1   "" .
2   "" .
2   "G" .
2   "" .
2   "" .
2   "" .
3   "" 7.6
3   "" .
3   "" .
3   "" .
3   "" .
end
bysort ID (varnum) : replace varnum = varnum[1] if missing(varnum) 
bysort ID (varstr) : replace varstr = varstr[_N] if missing(varstr) 
list, sepby(ID) 

     +----------------------+
     | ID   varstr   varnum |
     |----------------------|
  1. |  1        A      4.2 |
  2. |  1        A      4.2 |
  3. |  1        A      4.2 |
  4. |  1        A      4.2 |
  5. |  1        A      4.2 |
     |----------------------|
  6. |  2        G        . |
  7. |  2        G        . |
  8. |  2        G        . |
  9. |  2        G        . |
 10. |  2        G        . |
     |----------------------|
 11. |  3               7.6 |
 12. |  3               7.6 |
 13. |  3               7.6 |
 14. |  3               7.6 |
 15. |  3               7.6 |
     +----------------------+

Comment

Rachel Sore

Join Date: Apr 2016

Posts: 5
#5

22 Apr 2016, 00:53

Yes you're right - it does work. Thank you!
I'm not sure what I was doing before, my apologies.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

22 Apr 2016, 01:39

The number of distinct values in each group is for any variable is

Code:

egen tag = tag(whatevervar id) egen nvalues = total(tag), by(id)
Comment

Announcement

Filling missing rows by groups for string and numeric

Comment

Comment

Comment

Comment

Comment