Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicate each row as many times as is given in a variable

    I have a set of individuals with characteristics. Each individual belongs to one or more group. I need to merge individuals to group characteristics, by firstly duplicating each row of individual data set as many times as is given by `n_groups`.

    The data looks like

    id age n_groups
    1 50 2
    2 46 1
    3 51 3
    4 44 2


    I need to have

    id age n_groups group_index
    1 50 2 1
    1 50 2 2
    2 46 1 1
    3 51 3 1
    3 51 3 2
    3 51 3 3
    4 44 2 1
    4 44 2 2

    It seems like a very easy task, and I need some variation of `expand` with variable number of duplicates. Does there exist a simple function for this?
    Thank you
    Last edited by Nadiia Lazhevska; 02 Oct 2016, 11:36.

  • #2
    As you say, the expand command (!not function) is suitable for this.

    Code:
    clear
    
    input id age n_groups
    1 50 2
    2 46 1
    3 51 3
    4 44 2
    end
    expand n_groups
    bysort id : gen group_index = _n
    
    list, sepby(id)
    
         +--------------------------------+
         | id   age   n_groups   group_~x |
         |--------------------------------|
      1. |  1    50          2          1 |
      2. |  1    50          2          2 |
         |--------------------------------|
      3. |  2    46          1          1 |
         |--------------------------------|
      4. |  3    51          3          1 |
      5. |  3    51          3          2 |
      6. |  3    51          3          3 |
         |--------------------------------|
      7. |  4    44          2          1 |
      8. |  4    44          2          2 |
         +--------------------------------+
    I assume you have a typo in the last data line.

    Comment


    • #3
      You can use expand; read all of the examples in the help file

      Code:
      . list
      
           +---------------------+
           | id   age   n_groups |
           |---------------------|
        1. |  1    50          2 |
        2. |  2    46          1 |
        3. |  3    51          3 |
        4. |  4    44          2 |
           +---------------------+
      
      . expand n_groups
      (4 observations created)
      
      . sort id
      
      . list
      
           +---------------------+
           | id   age   n_groups |
           |---------------------|
        1. |  1    50          2 |
        2. |  1    50          2 |
        3. |  2    46          1 |
        4. |  3    51          3 |
        5. |  3    51          3 |
           |---------------------|
        6. |  3    51          3 |
        7. |  4    44          2 |
        8. |  4    44          2 |
           +---------------------+
      
      . bysort id: ge index = _n
      
      . list
      
           +-----------------------------+
           | id   age   n_groups   index |
           |-----------------------------|
        1. |  1    50          2       1 |
        2. |  1    50          2       2 |
        3. |  2    46          1       1 |
        4. |  3    51          3       1 |
        5. |  3    51          3       2 |
           |-----------------------------|
        6. |  3    51          3       3 |
        7. |  4    44          2       1 |
        8. |  4    44          2       2 |
           +-----------------------------+
      I suspect your "need to have" output has a typo for id == 4 (the group_index value should be 2, not 1?).

      NB Please heed forum FAQ recommendations and (a) post code-like input and output using CODE delimiters, and (b) think also about using dataex to produce the data extract. It's not only that use of CODE delimiters make material more legible. Your "data" cannot be directly copy/pasted into Stata to "play" with. Make things easier for readers to help you.

      Comment


      • #4
        Thank you so much, Nick!

        Comment


        • #5
          Thank you, Stephen, for solution and suggestion!

          Comment


          • #6
            Cross-posted at http://stackoverflow.com/questions/3...-in-a-variable

            Please note our cross-posting policy, which is that you should tell us about it.

            http://www.statalist.org/forums/help#crossposting

            8. May I cross-post to other forums?

            People posting on Statalist may also post the same question on other listservers or in web forums. There is absolutely no rule against doing that.

            But if you do post elsewhere, we ask that you provide cross-references in URL form to searchable archives. That way, people interested in your question can quickly check what has been said elsewhere and avoid posting similar comments. Being open about cross-posting saves everyone time.

            If your question was answered well elsewhere, please post a cross-reference to that answer on Statalist.
            This was explicit in the FAQ Advice you were asked to read before posting -- as indeed as all of Stephen's general advice.

            Comment


            • #7
              How do you expand a string variable?

              Comment

              Working...
              X