Combine two variables in the same data set, where rowmax (for example) won't work

Rui Pedroso

Join Date: Dec 2022

Posts: 4
#1

Combine two variables in the same data set, where rowmax (for example) won't work

02 Dec 2022, 02:50

Dear all, I have a similar situation as in the example below. Data is long and I need to generate a new variable from var_a and var_b that should look like var_ab.
I obviously cannot do this with egen var_ab=rowmax(var_a var_b).
I really appreciate some help on this,
Thank you very much in advance,

Rui

id var_a var_b var_ab

1 1 3 1

1 2 4 2

1 3

1 4

1

1

2 5 7 5

2 6 8 6

2 7

2 8

2

2

Last edited by Rui Pedroso; 02 Dec 2022, 02:57.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35641

02 Dec 2022, 04:12

This matches your example, but it is hard to know how literally to take it.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id var_a var_b var_ab)
1 1 3 1
1 2 4 2
1 . . 3
1 . . 4
1 . . .
1 . . .
2 5 7 5
2 6 8 6
2 . . 7
2 . . 8
2 . . .
2 . . .
end

gen wanted = var_a 
bysort id: replace wanted = var_b[_n-2] if missing(wanted)

list 

     +--------------------------------------+
     | id   var_a   var_b   var_ab   wanted |
     |--------------------------------------|
  1. |  1       1       3        1        1 |
  2. |  1       2       4        2        2 |
  3. |  1       .       .        3        3 |
  4. |  1       .       .        4        4 |
  5. |  1       .       .        .        . |
     |--------------------------------------|
  6. |  1       .       .        .        . |
  7. |  2       5       7        5        5 |
  8. |  2       6       8        6        6 |
  9. |  2       .       .        7        7 |
 10. |  2       .       .        8        8 |
     |--------------------------------------|
 11. |  2       .       .        .        . |
 12. |  2       .       .        .        . |
     +--------------------------------------+

Comment

Rui Pedroso

Join Date: Dec 2022

Posts: 4
#3

02 Dec 2022, 11:07

Dear Nick, yes indeed, I did not managed to fully explain the data set. The real data has no clear pattern between var_a and var_b. If I include one more observation in var_a, for example "2", wanted does not match my intended result in var_ab anymore.

Thank you in advance for your help,
Rui

clear
input byte(id var_a var_b var_ab)
1 1 3 1
1 2 4 2
1 2 . 2
1 . . 3
1 . . 4
1 . . .
2 5 7 5
2 6 8 6
2 . . 7
2 . . 8
2 . . .
2 . . .
end

gen wanted = var_a

bysort id: replace wanted = var_b[_n-2] if missing(wanted)

list

+----------------------------------------------+
| id var_a var_b var_ab wanted |
|------------------------------------------------|
1. | 1 1 3 1 1 |
2. | 1 2 4 2 2 |
3. | 1 2 . 2 2 |
4. | 1 . . 3 4 |
5. | 1 . . 4 . |
|------------------------------------------------|
6. | 1 . . . . |
7. | 2 5 7 5 5 |
8. | 2 6 8 6 6 |
9. | 2 . . 7 7 |
10. | 2 . . 8 8 |
|------------------------------------------------|
11. | 2 . . . . |
12. | 2 . . . . |
+------------------------------------------------+

Last edited by Rui Pedroso; 02 Dec 2022, 11:31.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1376
#4

02 Dec 2022, 11:20

Rui Pedroso the answer is probably going to depend on several specifics, e.g.
is the data in var_a and var_b always in sorted order? If not, does it need to maintain its original order in var_ab, or is it okay for it to be re-sorted?

are there always enough observations per id to accommodate the combined length of the non-missing observations of var_a and var_b in var_ab, or do we sometimes need to create new observations?

If you are making up a toy example, it might be better to provide an extract of the actual data instead, using the dataex command.
1 like
Comment
Rui Pedroso

Join Date: Dec 2022

Posts: 4
#5

02 Dec 2022, 11:37

Dear Hemanshu, thank you very much. Yes it depends on certain things. I have improved the example in the answer to Nick. We don't need to maintain the original order in var_ab. It can be re-sorted.
You're quite right, I need to get aquainted with dataex.
Thank you and regards,
Rui
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1376

02 Dec 2022, 12:39

How about this?

Code:

sort id var_a var_b
by id: egen num_obs_a = count(var_a)
gen wanted = var_a
by id: replace wanted = var_b[_n-num_obs_a] if _n>num_obs_a
drop num_obs_a

which produces:

Code:

. li, sepby(id) noobs

  +--------------------------------------+
  | id   var_a   var_b   var_ab   wanted |
  |--------------------------------------|
  |  1       1       3        1        1 |
  |  1       2       4        2        2 |
  |  1       2       .        2        2 |
  |  1       .       .        3        3 |
  |  1       .       .        4        4 |
  |  1       .       .        .        . |
  |--------------------------------------|
  |  2       5       7        5        5 |
  |  2       6       8        6        6 |
  |  2       .       .        7        7 |
  |  2       .       .        8        8 |
  |  2       .       .        .        . |
  |  2       .       .        .        . |
  +--------------------------------------+

Comment

Rui Pedroso

Join Date: Dec 2022

Posts: 4
#7

02 Dec 2022, 14:45

Dear Hemanshu, this is great! It works just fine!
I'm very, very grateful. Fantastic!

Regards,
Rui
Comment

Announcement

Combine two variables in the same data set, where rowmax (for example) won't work

Comment

Comment

Comment

Comment

Comment

Comment