Hi Statalist,
I have a dataset where I am comparing the same numerical variables from three different sources. I want to see if they are identical (and ignore any missing values) and currently achieve this with egens rowmax and rowmin functions.
egen rowmin=rowmin(VarA VarB VarC)
egen rowmax=rowmax(VarA VarB VarC)
gen Difference = rowmax!=rowmin
However, I want to introduce a new rule that as long as any two out of three variables match, they should not be marked as different (missing values should still be ignored). It is important to note that any of the variables can have missing values and that I am checking them individually against each other (so none of them is a "correct" variable that I am comparing against). The result should look like this:
I figured there might be a way to do this with anymatch, however, I did not suceed in doing so. Any suggestions wouldd be greatly appreciated!
I have a dataset where I am comparing the same numerical variables from three different sources. I want to see if they are identical (and ignore any missing values) and currently achieve this with egens rowmax and rowmin functions.
egen rowmin=rowmin(VarA VarB VarC)
egen rowmax=rowmax(VarA VarB VarC)
gen Difference = rowmax!=rowmin
However, I want to introduce a new rule that as long as any two out of three variables match, they should not be marked as different (missing values should still be ignored). It is important to note that any of the variables can have missing values and that I am checking them individually against each other (so none of them is a "correct" variable that I am comparing against). The result should look like this:
VarA | VarB | VarC | Difference |
10 | 10 | 10 | 0 |
30 | 30 | 40 | 0 |
50 | 50 | . | 0 |
. | . | 60 | 0 |
70 | . | 80 | 1 |
I figured there might be a way to do this with anymatch, however, I did not suceed in doing so. Any suggestions wouldd be greatly appreciated!
Comment