Group by varlist AND position

Valentine Laurent

Join Date: Oct 2023

Posts: 7
#1

Group by varlist AND position

04 Jan 2024, 01:21

Dear forum,

I'm looking for a command similar to egen x = group(), but that also takes into account "spells" of observations. If we have :

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id var1 var2) 1 3 4 2 3 4 3 5 6 4 3 4 end

egen x = group(var1 var2) would give

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id var1 var2 x) 1 3 4 1 2 3 4 1 3 5 6 2 4 3 4 1 end

But I want to introduce the fact that individual number 4 is not following individual number 2 in the code, so that they belong to their own group. What I want is :

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id var1 var2 y) 1 3 4 1 2 3 4 1 3 5 6 2 4 3 4 3 end

If an observation is not having the same var1 and var2 value than the previous one, it should have its own group, even if somewhere else very far in the dataset some other observation share the ssame var1 and var2 values. Can somebody help me achieve this?

Last edited by Valentine Laurent; 04 Jan 2024, 01:31.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10237

04 Jan 2024, 02:34

Originally posted by Valentine Laurent View Post

If an observation is not having the same var1 and var2 value than the previous one, it should have its own group

This assumes the existence of an ordering variable, which is not apparent in your data example. Before running the code below, ensure you have such a variable and sort on it.

Code:

clear
input float(id var1 var2 y)
1 3 4 1
2 3 4 1
3 5 6 2
4 3 4 3
5 3 5 4
6 3 5 4
7 3 5 4
8 2 5 5
end

gen wanted= sum(var1!=var1[_n-1]|var2!=var2[_n-1])

Res.:

Code:

. l, sepby(wanted)

     +-------------------------------+
     | id   var1   var2   y   wanted |
     |-------------------------------|
  1. |  1      3      4   1        1 |
  2. |  2      3      4   1        1 |
     |-------------------------------|
  3. |  3      5      6   2        2 |
     |-------------------------------|
  4. |  4      3      4   3        3 |
     |-------------------------------|
  5. |  5      3      5   4        4 |
  6. |  6      3      5   4        4 |
  7. |  7      3      5   4        4 |
     |-------------------------------|
  8. |  8      2      5   5        5 |
     +-------------------------------+

Last edited by Andrew Musau; 04 Jan 2024, 02:40.

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17719

04 Jan 2024, 02:42

Valentine:
the following tentative reply is way less elegant than Andrew's one:

Code:

. g counter=100 if var1[_n-1]==var1[_n] & var2[_n-1]==var2[_n]

. replace counter=100 if var1[_n+1]==var1[_n] & var2[_n+1]==var2[_n]

. egen x=group( var1 var2) if counter==.

. replace x=counter if x==.

. drop counter

. list

     +-----------------------------+
     | id   var1   var2    y     x |
     |-----------------------------|
  1. |  1      3      4    7   100 |
  2. |  2      3      4    7   100 |
  3. |  3      5      6   11     2 |
  4. |  4      3      4    7     1 |
     +-----------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Announcement