Good morning,
I have survey data that I am using IP Address (variable name IPAddress) to identify duplicates. I have created a duplicate IP address variable (dup_IPAddress) using this code:
However, I have found that this code does not produce reproducible results each time: for example, if observation 1 and 2 are duplicates, sometimes dup_IPAddress is 1 for observation 1 and 2 for obs 2 whereas other times I run the same code I get the opposite. This impacts the reproducibility of my downstream analysis. Is there a way to ensure reprodicibility when generating a duplicate condition?
Due to confidentiality I do not want to provide a dataex; if that would be necessary to answer my question please let me know and I will try to find a workaround to create a non-idenifiable dataset that could be used as an example.
Thanks,
Alyssa Beavers
I have survey data that I am using IP Address (variable name IPAddress) to identify duplicates. I have created a duplicate IP address variable (dup_IPAddress) using this code:
Code:
sort IPAddress quietly by IPAddress: gen dup_IPAddress= cond(_N==1,0,_n)
Due to confidentiality I do not want to provide a dataex; if that would be necessary to answer my question please let me know and I will try to find a workaround to create a non-idenifiable dataset that could be used as an example.
Thanks,
Alyssa Beavers
Comment