Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Generate a Subgroup Identifier?

    I new to create a new variable that identifies subgroups within a larger group based on dates. I pasted a table below to better illustrate what I mean.
    I have tried multiple approaches, unsuccessfully. Any suggestions?
    Thank you very much!
    ID Date New group variable
    1 01/01/01 1
    1 01/01/01 1
    1 05/03/10 2
    1 05/03/10 2
    2 07/02/05 1
    2 07/02/05 1
    3 12/03/08 1
    3 12/03/08 1
    4 06/18/03 1
    4 06/18/03 1
    4 06/18/03 1
    4 01/07/12 2

  • #2
    Please use dataex in the future to present data examples. Your date variable is not appropriate for this task.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id str8 date byte newgroupvariable
    1 "01/01/01" 1
    1 "01/01/01" 1
    1 "05/03/10" 2
    1 "05/03/10" 2
    2 "07/02/05" 1
    2 "07/02/05" 1
    3 "12/03/08" 1
    3 "12/03/08" 1
    4 "06/18/03" 1
    4 "06/18/03" 1
    4 "06/18/03" 1
    4 "01/07/12" 2
    end
    
    gen date2 = daily(date, "MDY", 2050)
    bys id date2: gen newvar2= _n==1
    bys id: replace newvar2= sum(newvar2)
    list, clean
    Result:

    Code:
    . list, clean
    
           id       date   newgro~e   date2   newvar2  
      1.    1   01/01/01          1   14976         1  
      2.    1   01/01/01          1   14976         1  
      3.    1   05/03/10          2   18385         2  
      4.    1   05/03/10          2   18385         2  
      5.    2   07/02/05          1   16619         1  
      6.    2   07/02/05          1   16619         1  
      7.    3   12/03/08          1   17869         1  
      8.    3   12/03/08          1   17869         1  
      9.    4   06/18/03          1   15874         1  
     10.    4   06/18/03          1   15874         1  
     11.    4   06/18/03          1   15874         1  
     12.    4   01/07/12          2   18999         2  
    
    .

    Comment


    • #3
      Will do, and thank you very much!

      Comment


      • #4
        Here's another way to do it, building on Andrew Musau's work:

        Code:
        . bysort id (date2) : gen wanted = sum(date2 != date2[_n-1])
        
        . list, sepby(id date2)
        
             +-------------------------------------------+
             | id       date   newgro~e   date2   wanted |
             |-------------------------------------------|
          1. |  1   01/01/01          1   14976        1 |
          2. |  1   01/01/01          1   14976        1 |
             |-------------------------------------------|
          3. |  1   05/03/10          2   18385        2 |
          4. |  1   05/03/10          2   18385        2 |
             |-------------------------------------------|
          5. |  2   07/02/05          1   16619        1 |
          6. |  2   07/02/05          1   16619        1 |
             |-------------------------------------------|
          7. |  3   12/03/08          1   17869        1 |
          8. |  3   12/03/08          1   17869        1 |
             |-------------------------------------------|
          9. |  4   06/18/03          1   15874        1 |
         10. |  4   06/18/03          1   15874        1 |
         11. |  4   06/18/03          1   15874        1 |
             |-------------------------------------------|
         12. |  4   01/07/12          2   18999        2 |
             +-------------------------------------------+

        Comment


        • #5
          Thank you, Nick. I took note of your suggestion as well.

          Comment


          • #6
            Hi Nick Cox and Andrew Musau , my question relates to Antonios. My data is posted above and i look to create an id_2 as shown below

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int(year managerid) byte(clientid tenure_ep id_2)
            2009  459 1 1 1
            2010  459 1 2 1
            2011  459 1 3 1
            2009  778 2 1 1
            2010  778 2 2 1
            2011  778 2 3 1
            2012  778 2 4 1
            2013  778 2 5 1
            2015  628 2 1 2
            2016  628 2 2 2
            2017  628 2 3 2
            2012  710 3 1 1
            2013  801 3 1 2
            2014  801 3 2 2
            2015  801 3 3 2
            2016  801 3 4 2
            2017  801 3 5 2
            2018   79 3 1 3
            2009  127 4 1 1
            2010  888 4 1 2
            2011 1073 4 1 3
            2012  875 4 1 4
            2013  875 4 2 4
            2014  875 4 3 4
            2015  875 4 4 4
            2016  875 4 5 4
            2018 1156 4 1 5
            2009   89 5 1 1
            2010  810 5 1 2
            2011  810 5 2 2
            2012  810 5 3 2
            2013  581 5 1 3
            2014  581 5 2 3
            2015  618 5 1 4
            2016  618 5 2 4
            2017  618 5 3 4
            end

            However when I apply Nicks code, for example, it doesn't create the id_2 in a sequential order ie taking into consideration the year as I have demonstrated above. This is what I get when I apply Nicks code

            bysort clientid ( managerid ) : gen wanted = sum( managerid != managerid [_n-1])

            . list, sepby( clientid managerid )

            +-------------------------------------------------------+
            | year manage~d clientid tenure~p id_2 wanted |
            |-------------------------------------------------------|
            1. | 2010 459 1 2 1 1 |
            2. | 2011 459 1 3 1 1 |
            3. | 2009 459 1 1 1 1 |
            |-------------------------------------------------------|
            4. | 2017 628 2 3 2 1 |
            5. | 2015 628 2 1 2 1 |
            6. | 2016 628 2 2 2 1 |
            |-------------------------------------------------------|
            7. | 2011 778 2 3 1 2 |
            8. | 2009 778 2 1 1 2 |
            9. | 2013 778 2 5 1 2 |
            10. | 2012 778 2 4 1 2 |
            11. | 2010 778 2 2 1 2 |
            |-------------------------------------------------------|
            12. | 2018 79 3 1 3 1 |
            |-------------------------------------------------------|
            13. | 2012 710 3 1 1 2 |
            |-------------------------------------------------------|
            14. | 2013 801 3 1 2 3 |
            15. | 2015 801 3 3 2 3 |
            16. | 2016 801 3 4 2 3 |
            17. | 2014 801 3 2 2 3 |
            18. | 2017 801 3 5 2 3 |
            |-------------------------------------------------------|
            19. | 2009 127 4 1 1 1 |
            |-------------------------------------------------------|
            20. | 2012 875 4 1 4 2 |
            21. | 2013 875 4 2 4 2 |
            22. | 2016 875 4 5 4 2 |
            23. | 2015 875 4 4 4 2 |
            24. | 2014 875 4 3 4 2 |
            |-------------------------------------------------------|
            25. | 2010 888 4 1 2 3 |
            |-------------------------------------------------------|
            26. | 2011 1073 4 1 3 4 |
            |-------------------------------------------------------|
            27. | 2018 1156 4 1 5 5 |
            |-------------------------------------------------------|
            28. | 2009 89 5 1 1 1 |
            |-------------------------------------------------------|
            29. | 2013 581 5 1 3 2 |
            30. | 2014 581 5 2 3 2 |
            |-------------------------------------------------------|
            31. | 2016 618 5 2 4 3 |
            32. | 2015 618 5 1 4 3 |
            33. | 2017 618 5 3 4 3 |
            |-------------------------------------------------------|
            34. | 2010 810 5 1 2 4 |
            35. | 2011 810 5 2 2 4 |
            36. | 2012 810 5 3 2 4 |
            +-------------------------------------------------------+

            As you can see with the wanted and id_2, it differs. Same for Andrews code, can you kindly assist please. i must specify that I am using Stata 16.
            Last edited by Kwadwo Boateng; 16 Oct 2019, 11:54.

            Comment


            • #7
              You need to sort by year as well.

              Code:
              bysort clientid (year managerid) : gen wanted = sum(managerid != managerid [_n-1])
              assert wanted==id_2

              Comment


              • #8
                Ah yes. Thank you Andrew Musau . It worked perfectly.

                Comment

                Working...
                X