Hi everyone, I have been experiencing a very weird issue when I try to create an unique numerical ID based on two variables. Basically, by running the command:
I am not getting a different number for each row for the variable unique_identifier. I am not sure if this is an issue with my machine or a bug. Hence, I have below a code to reproduce my issues (the dataset in which I have encountered the issue is available here).
I am running Stata 17.0 SE on a notebook with Linux Ubuntu 22.04 LTS.
Thanks!
Code:
gen unique_identifier = _n
I am running Stata 17.0 SE on a notebook with Linux Ubuntu 22.04 LTS.
Code:
sort var1 var2 by var1 var2: gen position = _n tab position * There are multiple variables sharing the same var1 and var2. I will keep only * one of each keep if position==1 * Indeed, I have kept only one row per combination of var1-var2 tab position * I use the command below to generate an unique ID for each combination of var1 * and var2. unique_identifier should have been the number of the row. gen unique_identifier = _n * By inspecting the dataset can see something is off: the value for * unique_identifier in the last row does not match the number of the last row. * Hence, I create another variable (aux) to keep only cases in which * unique_identifier is the same for more than one pair of var1-var2. gen aux = 0 * If unique_identifier is the same in two lines in a row, aux=1 for the second * line replace aux=1 if unique_identifier[_n]==[_n-1] * If unique_identifier is the same in two lines in a row, aux=1 for the first * line replace aux=1 if aux[_n+1]==1 keep if aux==1 * By inspecting the dataset, one can see many cases in which different pairs of * var1-var2 have the same value for unique_identifier
Comment