Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenating double + long variables, resulting in string -- causing trouble

    I'm working with a large longitudinal dataset, and trying to create a unique ID variable, by concatenating two other existing, broader ID variables. One is a double type, and the other is long. The result is a string, and the destring function doesn't seem to be working. An example of what I'm working with is below. I'm then trying to use the resulting NEW_ID to create a new PERSON_ID that starts from 1. I'm using the following code to generate PERSON_ID: egen double PERSON_ID=group(NEW_ID). But the results don't seem to distinguish between unique NEW_ID values. I'm wondering if the fact that NEW_ID is a string variable is the issue. The table below shows what I'm aiming to generate. Thanks for any insights...


    ID_1 ID_2 NEW_ID PERSON_ID
    1234 2030405060 12342030405060 1
    1234 2030405060 12342030405060 1
    1234 2030405060 12342030405060 1
    1234 2030405060 12342030405060 1
    2345 2030405060 23452030405060 2
    2345 2030405060 23452030405060 2
    2345 2030405060 23452030405060 2
    2345 2030405060 23452030405060 2
    6789 2030405060 67892030405060 3
    6789 2030405060 67892030405060 3
    6789 2030405060 67892030405060 3
    6789 2030405060 67892030405060 3

  • #2
    And as a follow-up, when I check for duplicates using the individual/non-concatenated ID variables (ID_1 ID_2), along with the relevant time variables, I show none. But when I check for duplicates using the resulting string ID variable (NEW_ID), plus relevant time variables, I find duplicates. How can I get around the string issue? It's driving me mad! Thanks very much.

    Comment


    • #3
      You don't want to concatenate digits in this way. 12 and 345 becomes 12345 and 123 and 45 also becomes 12345. At least use an underscore to separate the values or the -group()- function of egen.

      Code:
      g wanted1 = string(ID_1, "%12.0f") + "_" + string(ID_2, "%12.0f")
      or better

      Code:
      egen wanted2= group(ID_1 ID_2), label
      Last edited by Andrew Musau; 28 Sep 2022, 11:11.

      Comment


      • #4
        Code:
        clear
        input long ID_1    double ID_2
        1234    2030405060
        1234    2030405060
        1234    2030405060
        1234    2030405060
        2345    2030405060
        2345    2030405060
        2345    2030405060
        2345    2030405060
        6789    2030405060
        6789    2030405060
        6789    2030405060
        6789    2030405060
        end
        
        gen new_id = string(ID_1,"%04.0f") + string(ID_2,"%010.0f")
        egen person_id_1 = group(new_id)
        egen person_id_2 = group(ID_1 ID_2)
        
        format ID_2 %010.0f
        li, sepby(new_id) noobs
        
          +----------------------------------------------------------+
          | ID_1         ID_2           new_id   person~1   person~2 |
          |----------------------------------------------------------|
          | 1234   2030405060   12342030405060          1          1 |
          | 1234   2030405060   12342030405060          1          1 |
          | 1234   2030405060   12342030405060          1          1 |
          | 1234   2030405060   12342030405060          1          1 |
          |----------------------------------------------------------|
          | 2345   2030405060   23452030405060          2          2 |
          | 2345   2030405060   23452030405060          2          2 |
          | 2345   2030405060   23452030405060          2          2 |
          | 2345   2030405060   23452030405060          2          2 |
          |----------------------------------------------------------|
          | 6789   2030405060   67892030405060          3          3 |
          | 6789   2030405060   67892030405060          3          3 |
          | 6789   2030405060   67892030405060          3          3 |
          | 6789   2030405060   67892030405060          3          3 |
          +----------------------------------------------------------+
        As you can see, you don't really need new_id to get person_id. Use whichever method works best for you. The method via new_id crucially depends on knowing the maximum lengths of ID_1 and ID_2, which I have here assumed to be 4 and 10, respectively.
        Last edited by Hemanshu Kumar; 28 Sep 2022, 11:12.

        Comment


        • #5
          Andrew, thank you! You saved me! I used your second suggested command. It also saves me a few lines of code. Thank you so much. Fingers crossed, I have no other issues...

          Comment

          Working...
          X