Run:
This will show you the observations that are causing the problem. It is then up to you to figure out what to do to fix it. Broadly speaking there are a few possibilities (though the details of how you fix them are very numerous):
1. These observations are all correct and belong in the data. You simply have misunderstood the structure of your data when you thought that tinh huyen xa diaban hoso wold uniquely identify observations. Then the question arises whether there is some other variable (or perhaps more than one) that, combined with tinh huyen xa diaban hoso uniquely identifies the observations. If so, you can modify your code accordingly, adding that variable (or those variables) to the -sort- or -isid, sort- commands. Your calculations will then become deterministic..
If not, then you need to completely rethink your analysis because it is dependent on the arbitrary ordering of the observations in the data. You need a different algorithm.
2. Some of these observations contain incorrect data, or they contain correct data but do not really belong in this data set. Then you have to go back to how this data set was created and fix the problems that led to the inclusion of these observations or the incorrect data. A subtle version of this is that the observations themselves are correct as far as they go, but they should have been combined in some way (perhaps taking averages of the variables other than tinh huyen xa diaban hoso, or something like that) into a single observation.
Either way, in the end, you need to get a better understanding of your data or of the algorithm you are trying to apply to it. It is impossible to give more specific advice from a distance.
Code:
duplicates tag tinh huyen xa diaban hoso, gen(flag) browse if flag
1. These observations are all correct and belong in the data. You simply have misunderstood the structure of your data when you thought that tinh huyen xa diaban hoso wold uniquely identify observations. Then the question arises whether there is some other variable (or perhaps more than one) that, combined with tinh huyen xa diaban hoso uniquely identifies the observations. If so, you can modify your code accordingly, adding that variable (or those variables) to the -sort- or -isid, sort- commands. Your calculations will then become deterministic..
If not, then you need to completely rethink your analysis because it is dependent on the arbitrary ordering of the observations in the data. You need a different algorithm.
2. Some of these observations contain incorrect data, or they contain correct data but do not really belong in this data set. Then you have to go back to how this data set was created and fix the problems that led to the inclusion of these observations or the incorrect data. A subtle version of this is that the observations themselves are correct as far as they go, but they should have been combined in some way (perhaps taking averages of the variables other than tinh huyen xa diaban hoso, or something like that) into a single observation.
Either way, in the end, you need to get a better understanding of your data or of the algorithm you are trying to apply to it. It is impossible to give more specific advice from a distance.
Comment