Hi, I have more than 3,000 dta files with same format and this is one of the file.dta below.
As you can see, this is a wide type data set and I want to get rid of same value by each column. For your understanding, Var1 only contains 5 digit number value and the rest of cell is empty. For Var2, it only has date/month/year in the column and the rest is empty as well.
Var5 and Var7 has the same value, which does not have to be seen more than one so, I need only one 360 ad 1 for them.
The rest of variables (Var3, Var4, Var6 ...) have binary (0 or 1) values but I need only one value for each one. So, if there "1" in the Var3 column, I do not need to see zero. Otherwise, there should be one "0" by each column.
I have been trying to use "duplicates", "carryforward" option, however, it is hard to eliminate them with holding all of the conditions that I mentioned.
Thank you in advance!
Current state of data
ID | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 | Var7 | ... | Var99 |
1 | 0 | 0 | 360 | 0 | 1 | 0 | |||
2 | 0 | 0 | 360 | 1 | 1 | 0 | |||
3 | 31/01/1948 | 0 | 0 | 360 | 0 | 1 | 0 | ||
4 | 23789 | 1 | 0 | 360 | 0 | 1 | 1 | ||
12/12/2014 | 0 | 0 | 360 | 0 | 1 | 1 | |||
12345 | 0 | 0 | 360 | 0 | 1 | 1 | |||
24/07/1999 | 0 | 0 | 360 | 0 | 1 | 0 | |||
... | 98765 | 0 | 0 | 360 | 0 | 1 | 0 | ||
3000 | 0 | 0 | 360 | 1 | 1 | 0 |
ID | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 | Var7 | ... | Var99 |
1 | 23789 | 31/01/1948 | 1 | 0 | 360 | 1 | 1 | 1 | |
2 | 12345 | 12/12/2014 | |||||||
3 | 98765 | 24/07/1999 |
Comment