Hi all,
I have been working with this data set and trying to delete the observations for which I got a perfect match and also with the observations with values non-zero for the variable 'd', provided there is one observation with value 0 for the variable 'd'.
In the following example data attached, I tried to find matches for candidate_m. Stata has produced a lot of perfect matches with the same name which are given in candidate_u. But not all the matches are exactly right even though they have a similarity score of 1 from the 'matchit' command. What I would like to do now is, within each identifier variable 'id' (combination of year_m, const_m and candidate_m), I have to check whether d is zero. If it is zero, then delete the other observations within it that are non-zero. If none of the observation has the value of 0 for d within an id, then do not delete anything within that id.
For ex, there are three ids in the given example below: 17782, 18156 and 19101. The last two ids have a value of zero for the variable 'd'. but the id 17782 does not have a value of zero for the variable 'd'. Therefore, none of the observations within 17782 should be deleted but for the ids 18156 and 19101, all the observations that are non-zero for 'd' should be deleted. To execute this, I have written the following lines of code.
egen id = group( year_m const_m candidate_m )
xtset id
bysort id : gen cum_simil=sum(similscore) // to take care of duplicates
by id, sort: gen has_perfect_match = 1 if d==0 & cum_simil>1
drop if has_perfect_match & d!=0
But the above set of commands delete even those ids which have non-zero value for 'd' i.e. the id 17782 has been deleted altogether, which I don't want to be deleted. There is some mistake in the set of codes and I have tried different combinations for a couple of hours already, but in vain. Any suggestion would be helpful.
Regards
***************************** Example dataset is as below
I have been working with this data set and trying to delete the observations for which I got a perfect match and also with the observations with values non-zero for the variable 'd', provided there is one observation with value 0 for the variable 'd'.
In the following example data attached, I tried to find matches for candidate_m. Stata has produced a lot of perfect matches with the same name which are given in candidate_u. But not all the matches are exactly right even though they have a similarity score of 1 from the 'matchit' command. What I would like to do now is, within each identifier variable 'id' (combination of year_m, const_m and candidate_m), I have to check whether d is zero. If it is zero, then delete the other observations within it that are non-zero. If none of the observation has the value of 0 for d within an id, then do not delete anything within that id.
For ex, there are three ids in the given example below: 17782, 18156 and 19101. The last two ids have a value of zero for the variable 'd'. but the id 17782 does not have a value of zero for the variable 'd'. Therefore, none of the observations within 17782 should be deleted but for the ids 18156 and 19101, all the observations that are non-zero for 'd' should be deleted. To execute this, I have written the following lines of code.
egen id = group( year_m const_m candidate_m )
xtset id
bysort id : gen cum_simil=sum(similscore) // to take care of duplicates
by id, sort: gen has_perfect_match = 1 if d==0 & cum_simil>1
drop if has_perfect_match & d!=0
But the above set of commands delete even those ids which have non-zero value for 'd' i.e. the id 17782 has been deleted altogether, which I don't want to be deleted. There is some mistake in the set of codes and I have tried different combinations for a couple of hours already, but in vain. Any suggestion would be helpful.
Regards
***************************** Example dataset is as below
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(year_m year_u) str26 const_m long ac_m str26 const_u float(order_m order_u) str55 candidate_m float id byte dup double d float(cum_simil has_perfect_match) double similscore str55 candidate_u 1991 1993 "Garautha" 139 "Bangarmau" 3 10 "CHANDRA PAL SINGH" 17782 6 107.89971231641277 7 . 1 "CHANDRA PAL SINGH" 1991 1993 "Garautha" 139 "Chhibramau" 3 22 "CHANDRA PAL SINGH" 17782 6 43.09548223868419 2 . 1 "CHANDRA PAL SINGH" 1991 1993 "Garautha" 139 "Kanth" 3 4 "CHANDRA PAL SINGH" 17782 6 64.94923194380358 4 . 1 "CHANDRA PAL SINGH" 1991 1993 "Garautha" 139 "Kasganj" 3 26 "CHANDRA PAL SINGH" 17782 6 51.19461464844054 6 . 1 "CHANDRA PAL SINGH" 1991 1993 "Garautha" 139 "Kashipur" 3 34 "CHANDRA PAL SINGH" 17782 6 49.9999 5 . 1 "CHANDRA PAL SINGH" 1991 1993 "Garautha" 139 "Shahabad" 3 3 "CHANDRA PAL SINGH" 17782 6 49.9999 1 . 1 "CHANDRA PAL SINGH" 1991 1993 "Garautha" 139 "Siana" 3 16 "CHANDRA PAL SINGH" 17782 6 113.41941735080039 3 . 1 "CHANDRA PAL SINGH" 1991 1993 "Kanth" 205 "Bangarmau" 3 10 "CHANDRA PAL SINGH" 18156 6 167.82627376584603 5 . 1 "CHANDRA PAL SINGH" 1991 1993 "Kanth" 205 "Chhibramau" 3 22 "CHANDRA PAL SINGH" 18156 6 104.90555977626428 4 . 1 "CHANDRA PAL SINGH" 1991 1993 "Kanth" 205 "Kanth" 3 4 "CHANDRA PAL SINGH" 18156 6 0 2 1 1 "CHANDRA PAL SINGH" 1991 1993 "Kanth" 205 "Kasganj" 3 26 "CHANDRA PAL SINGH" 18156 6 13.355054613263283 6 . 1 "CHANDRA PAL SINGH" 1991 1993 "Kanth" 205 "Kashipur" 3 34 "CHANDRA PAL SINGH" 18156 6 49.9999 7 . 1 "CHANDRA PAL SINGH" 1991 1993 "Kanth" 205 "Shahabad" 3 3 "CHANDRA PAL SINGH" 18156 6 49.9999 3 . 1 "CHANDRA PAL SINGH" 1991 1993 "Kanth" 205 "Siana" 3 16 "CHANDRA PAL SINGH" 18156 6 47.062286729680835 1 . 1 "CHANDRA PAL SINGH" 1991 1993 "Shahabad" 371 "Bangarmau" 2 10 "CHANDRA PAL SINGH" 19101 6 49.9999 7 . 1 "CHANDRA PAL SINGH" 1991 1993 "Shahabad" 371 "Chhibramau" 2 22 "CHANDRA PAL SINGH" 19101 6 49.9999 3 . 1 "CHANDRA PAL SINGH" 1991 1993 "Shahabad" 371 "Kanth" 2 4 "CHANDRA PAL SINGH" 19101 6 49.9999 5 . 1 "CHANDRA PAL SINGH" 1991 1993 "Shahabad" 371 "Kasganj" 2 26 "CHANDRA PAL SINGH" 19101 6 49.9999 6 . 1 "CHANDRA PAL SINGH" 1991 1993 "Shahabad" 371 "Kashipur" 2 34 "CHANDRA PAL SINGH" 19101 6 49.9999 2 . 1 "CHANDRA PAL SINGH" 1991 1993 "Shahabad" 371 "Shahabad" 2 3 "CHANDRA PAL SINGH" 19101 6 0 4 1 1 "CHANDRA PAL SINGH" 1991 1993 "Shahabad" 371 "Siana" 2 16 "CHANDRA PAL SINGH" 19101 6 49.9999 1 . 1 "CHANDRA PAL SINGH" end
Comment