Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Group by varlist AND position

    Dear forum,

    I'm looking for a command similar to egen x = group(), but that also takes into account "spells" of observations. If we have :

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id var1 var2)
    1 3 4
    2 3 4
    3 5 6
    4 3 4
    end

    egen x = group(var1 var2) would give

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id var1 var2 x)
    1 3 4 1
    2 3 4 1
    3 5 6 2
    4 3 4 1
    end
    But I want to introduce the fact that individual number 4 is not following individual number 2 in the code, so that they belong to their own group. What I want is :

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id var1 var2 y)
    1 3 4 1
    2 3 4 1
    3 5 6 2
    4 3 4 3
    end
    If an observation is not having the same var1 and var2 value than the previous one, it should have its own group, even if somewhere else very far in the dataset some other observation share the ssame var1 and var2 values. Can somebody help me achieve this?



    Last edited by Valentine Laurent; 04 Jan 2024, 01:31.

  • #2
    Originally posted by Valentine Laurent View Post
    If an observation is not having the same var1 and var2 value than the previous one, it should have its own group
    This assumes the existence of an ordering variable, which is not apparent in your data example. Before running the code below, ensure you have such a variable and sort on it.

    Code:
    clear
    input float(id var1 var2 y)
    1 3 4 1
    2 3 4 1
    3 5 6 2
    4 3 4 3
    5 3 5 4
    6 3 5 4
    7 3 5 4
    8 2 5 5
    end
    
    gen wanted= sum(var1!=var1[_n-1]|var2!=var2[_n-1])
    Res.:

    Code:
    . l, sepby(wanted)
    
         +-------------------------------+
         | id   var1   var2   y   wanted |
         |-------------------------------|
      1. |  1      3      4   1        1 |
      2. |  2      3      4   1        1 |
         |-------------------------------|
      3. |  3      5      6   2        2 |
         |-------------------------------|
      4. |  4      3      4   3        3 |
         |-------------------------------|
      5. |  5      3      5   4        4 |
      6. |  6      3      5   4        4 |
      7. |  7      3      5   4        4 |
         |-------------------------------|
      8. |  8      2      5   5        5 |
         +-------------------------------+
    Last edited by Andrew Musau; 04 Jan 2024, 02:40.

    Comment


    • #3
      Valentine:
      the following tentative reply is way less elegant than Andrew's one:
      Code:
      . g counter=100 if var1[_n-1]==var1[_n] & var2[_n-1]==var2[_n]
      
      . replace counter=100 if var1[_n+1]==var1[_n] & var2[_n+1]==var2[_n]
      
      . egen x=group( var1 var2) if counter==.
      
      . replace x=counter if x==.
      
      . drop counter
      
      . list
      
           +-----------------------------+
           | id   var1   var2    y     x |
           |-----------------------------|
        1. |  1      3      4    7   100 |
        2. |  2      3      4    7   100 |
        3. |  3      5      6   11     2 |
        4. |  4      3      4    7     1 |
           +-----------------------------+
      
      .
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X