Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Editing household IDs

    Dear Users,
    I have the following household IDs:
    ID
    0009f6b339824d57a39f5696c9c385a6
    0011e7ca5a064a59a100b64c205d9689
    0024ae5ebbdd417e95c31e18c90a1ab6
    002e912ee9974a1987e27b1bebf909e3
    0031d76399104961b5a58c71722f84f2
    0035f38396e14cb99cae6f2fa3c14e96
    0038ab99d05a433e946d077f166cfb89
    003f5cb4aaa04e5bad13ce6b8b20357e
    0043b1c8d9eb4bb78ab563f9fc7bf1b9
    004d58321bda4aa68fa1249f1089666c
    Any code on how to remove the numeric characters?

    Thank you,
    Dapel

  • #2
    I was able to get around this.

    First, identify the set up of the ID using:

    Code:
    charlist ID
    0123456789abcdef
    This gives:
    Code:
    0123456789abcdef
    Now is clear that numeric characters from
    a to f
    .

    Finally, run
    Code:
    destring, generate(ID2) ignore(abcdef)
    And boom

    Code:
    ID: characters a b c d e f removed; ID2 generated as double
    Last edited by Zuhumnan Dapel; 10 Sep 2019, 22:27.

    Comment


    • #3
      I'm not sure that it was good to do that. They look like concatenated hexadecimal numbers. IDs are generally okay (even best) as strings. If you need a corresponding numeric variable, for example, for use in some hierarchical regression command, then you could use -encode- for that.

      Comment


      • #4
        Thanks. I'm not sure the IDs were originally so. I've been under the impression that the IDs were corrupted in the process of converting the file from SPSS to Stata.

        Dapel

        Comment


        • #5
          Possible to merge files using a string variable in both files?

          Comment


          • #6
            Yes, of course.

            Comment


            • #7
              Ok. Why then this error message:
              variable parentid1 does not uniquely identify observations in the using data
              r(459);
              ?

              Comment


              • #8
                Because you have duplicate values for parentid1 in your using dataset. The error message has nothing to do with the datatype.

                Comment


                • #9
                  Thanks for flagging this. Any code for identifying and dropping the duplicates?

                  Comment


                  • #10
                    You can identify duplicates using this example:
                    Code:
                    clear
                    input str3 id byte var1
                      001 17
                      002 12
                      002 14
                      002 03
                      003 10
                      004 16
                    end
                    
                    
                    sort id
                    
                    list if id == id[_n+1] | id == id[_n-1]
                    Then you will need to decide which observations should be dropped and can use -drop- as normal to drop unwanted observations.

                    Red Owl
                    Stata/IC 16.0 (Windows 10, 64-bit)

                    Comment


                    • #11
                      Also, there's this:
                      Code:
                      help duplicates

                      Comment


                      • #12
                        Let’s back up here. You don’t like your identifiers for some reason. Say you want shorter, simpler identifiers. So you remove some characters. But now there are duplicates. That shows that the removal of characters messed up your identifiers: they no longer are distinct. The solution is not to remove duplicates but to use a different method.

                        For example if you had Dapel1 and Dapel2 and then found yourself with two instances of Dapel, removing one won’t help.

                        That was the original question. It seems that you wanted to remove non-numeric characters, but the same point arises.

                        You’d do better to map your identifiers to dIstinct integers 1 up using egen. See the longstanding Stata FAQ about identifiers.

                        i am guessing here as there is no explanation of why you want to do this, but the thread is becoming an instance of the x-y problem; you are asking about y but what was the original problem x?
                        Last edited by Nick Cox; 12 Sep 2019, 00:44.

                        Comment

                        Working...
                        X