Editing household IDs

Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#1

Editing household IDs

10 Sep 2019, 21:48

Dear Users,
I have the following household IDs:

ID
0009f6b339824d57a39f5696c9c385a6
0011e7ca5a064a59a100b64c205d9689
0024ae5ebbdd417e95c31e18c90a1ab6
002e912ee9974a1987e27b1bebf909e3
0031d76399104961b5a58c71722f84f2
0035f38396e14cb99cae6f2fa3c14e96
0038ab99d05a433e946d077f166cfb89
003f5cb4aaa04e5bad13ce6b8b20357e
0043b1c8d9eb4bb78ab563f9fc7bf1b9
004d58321bda4aa68fa1249f1089666c

Any code on how to remove the numeric characters?

Thank you,
Dapel
Tags: None
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#2

10 Sep 2019, 22:17

I was able to get around this.

First, identify the set up of the ID using:

Code:

charlist ID 0123456789abcdef

This gives:

Code:

0123456789abcdef

Now is clear that numeric characters from

a to f

.

Finally, run

Code:

destring, generate(ID2) ignore(abcdef)

And boom

Code:

ID: characters a b c d e f removed; ID2 generated as double

Last edited by Zuhumnan Dapel; 10 Sep 2019, 22:27.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4419
#3

10 Sep 2019, 23:17

I'm not sure that it was good to do that. They look like concatenated hexadecimal numbers. IDs are generally okay (even best) as strings. If you need a corresponding numeric variable, for example, for use in some hierarchical regression command, then you could use -encode- for that.
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#4

10 Sep 2019, 23:40

Thanks. I'm not sure the IDs were originally so. I've been under the impression that the IDs were corrupted in the process of converting the file from SPSS to Stata.

Dapel
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#5

10 Sep 2019, 23:43

Possible to merge files using a string variable in both files?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4419
#6

10 Sep 2019, 23:59

Yes, of course.
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#7

11 Sep 2019, 06:04

Ok. Why then this error message:

variable parentid1 does not uniquely identify observations in the using data
r(459);

?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4419
#8

11 Sep 2019, 07:29

Because you have duplicate values for parentid1 in your using dataset. The error message has nothing to do with the datatype.
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#9

11 Sep 2019, 08:16

Thanks for flagging this. Any code for identifying and dropping the duplicates?
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#10

11 Sep 2019, 08:29

You can identify duplicates using this example:

Code:

clear input str3 id byte var1 001 17 002 12 002 14 002 03 003 10 004 16 end sort id list if id == id[_n+1] | id == id[_n-1]

Then you will need to decide which observations should be dropped and can use -drop- as normal to drop unwanted observations.

Red Owl
Stata/IC 16.0 (Windows 10, 64-bit)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4419
#11

11 Sep 2019, 17:23

Also, there's this:

Code:

help duplicates
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35700
#12

12 Sep 2019, 00:41

Let’s back up here. You don’t like your identifiers for some reason. Say you want shorter, simpler identifiers. So you remove some characters. But now there are duplicates. That shows that the removal of characters messed up your identifiers: they no longer are distinct. The solution is not to remove duplicates but to use a different method.

For example if you had Dapel1 and Dapel2 and then found yourself with two instances of Dapel, removing one won’t help.

That was the original question. It seems that you wanted to remove non-numeric characters, but the same point arises.

You’d do better to map your identifiers to dIstinct integers 1 up using egen. See the longstanding Stata FAQ about identifiers.

i am guessing here as there is no explanation of why you want to do this, but the thread is becoming an instance of the x-y problem; you are asking about y but what was the original problem x?

Last edited by Nick Cox; 12 Sep 2019, 00:44.
Comment

Announcement

Editing household IDs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment