Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine two variables in the same data set, where rowmax (for example) won't work

    Dear all, I have a similar situation as in the example below. Data is long and I need to generate a new variable from var_a and var_b that should look like var_ab.
    I obviously cannot do this with egen var_ab=rowmax(var_a var_b).
    I really appreciate some help on this,
    Thank you very much in advance,

    Rui

    id var_a var_b var_ab
    1 1 3 1
    1 2 4 2
    1 3
    1 4
    1
    1
    2 5 7 5
    2 6 8 6
    2 7
    2 8
    2
    2
    Last edited by Rui Pedroso; 02 Dec 2022, 02:57.

  • #2
    This matches your example, but it is hard to know how literally to take it.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id var_a var_b var_ab)
    1 1 3 1
    1 2 4 2
    1 . . 3
    1 . . 4
    1 . . .
    1 . . .
    2 5 7 5
    2 6 8 6
    2 . . 7
    2 . . 8
    2 . . .
    2 . . .
    end
    
    gen wanted = var_a 
    bysort id: replace wanted = var_b[_n-2] if missing(wanted)
    
    list 
    
         +--------------------------------------+
         | id   var_a   var_b   var_ab   wanted |
         |--------------------------------------|
      1. |  1       1       3        1        1 |
      2. |  1       2       4        2        2 |
      3. |  1       .       .        3        3 |
      4. |  1       .       .        4        4 |
      5. |  1       .       .        .        . |
         |--------------------------------------|
      6. |  1       .       .        .        . |
      7. |  2       5       7        5        5 |
      8. |  2       6       8        6        6 |
      9. |  2       .       .        7        7 |
     10. |  2       .       .        8        8 |
         |--------------------------------------|
     11. |  2       .       .        .        . |
     12. |  2       .       .        .        . |
         +--------------------------------------+

    Comment


    • #3
      Dear Nick, yes indeed, I did not managed to fully explain the data set. The real data has no clear pattern between var_a and var_b. If I include one more observation in var_a, for example "2", wanted does not match my intended result in var_ab anymore.

      Thank you in advance for your help,
      Rui

      clear
      input byte(id var_a var_b var_ab)
      1 1 3 1
      1 2 4 2
      1 2 . 2
      1 . . 3
      1 . . 4
      1 . . .
      2 5 7 5
      2 6 8 6
      2 . . 7
      2 . . 8
      2 . . .
      2 . . .
      end

      gen wanted = var_a

      bysort id: replace wanted = var_b[_n-2] if missing(wanted)

      list

      +----------------------------------------------+
      | id var_a var_b var_ab wanted |
      |------------------------------------------------|
      1. | 1 1 3 1 1 |
      2. | 1 2 4 2 2 |
      3. | 1 2 . 2 2 |
      4. | 1 . . 3 4 |
      5. | 1 . . 4 . |
      |------------------------------------------------|
      6. | 1 . . . . |
      7. | 2 5 7 5 5 |
      8. | 2 6 8 6 6 |
      9. | 2 . . 7 7 |
      10. | 2 . . 8 8 |
      |------------------------------------------------|
      11. | 2 . . . . |
      12. | 2 . . . . |
      +------------------------------------------------+
      Last edited by Rui Pedroso; 02 Dec 2022, 11:31.

      Comment


      • #4
        Rui Pedroso the answer is probably going to depend on several specifics, e.g.
        • is the data in var_a and var_b always in sorted order? If not, does it need to maintain its original order in var_ab, or is it okay for it to be re-sorted?
        • are there always enough observations per id to accommodate the combined length of the non-missing observations of var_a and var_b in var_ab, or do we sometimes need to create new observations?
        If you are making up a toy example, it might be better to provide an extract of the actual data instead, using the dataex command.

        Comment


        • #5
          Dear Hemanshu, thank you very much. Yes it depends on certain things. I have improved the example in the answer to Nick. We don't need to maintain the original order in var_ab. It can be re-sorted.
          You're quite right, I need to get aquainted with dataex.
          Thank you and regards,
          Rui

          Comment


          • #6
            How about this?
            Code:
            sort id var_a var_b
            by id: egen num_obs_a = count(var_a)
            gen wanted = var_a
            by id: replace wanted = var_b[_n-num_obs_a] if _n>num_obs_a
            drop num_obs_a
            which produces:

            Code:
            . li, sepby(id) noobs
            
              +--------------------------------------+
              | id   var_a   var_b   var_ab   wanted |
              |--------------------------------------|
              |  1       1       3        1        1 |
              |  1       2       4        2        2 |
              |  1       2       .        2        2 |
              |  1       .       .        3        3 |
              |  1       .       .        4        4 |
              |  1       .       .        .        . |
              |--------------------------------------|
              |  2       5       7        5        5 |
              |  2       6       8        6        6 |
              |  2       .       .        7        7 |
              |  2       .       .        8        8 |
              |  2       .       .        .        . |
              |  2       .       .        .        . |
              +--------------------------------------+

            Comment


            • #7
              Dear Hemanshu, this is great! It works just fine!
              I'm very, very grateful. Fantastic!

              Regards,
              Rui

              Comment

              Working...
              X