Hello all,
I have a large dataset with a considerable amount of duplicates (impossible to assess by hand). I am working with mortgage data from the National Archive(HMDA data 2000).
I have duplicate observations by respondent id, with one of the duplicate observations summarising the data shown in the other duplicates. I would like to keep the duplicate with fewer missing values.
I cannot drop the observations with missing values because as the blue row shows, some observations are unique but have missing values.
The other problem is that I cannot tell in which column the value is missing (someone else asked something similar before but for him, the missing value was in only one column for all observations).

I have a large dataset with a considerable amount of duplicates (impossible to assess by hand). I am working with mortgage data from the National Archive(HMDA data 2000).
I have duplicate observations by respondent id, with one of the duplicate observations summarising the data shown in the other duplicates. I would like to keep the duplicate with fewer missing values.
I cannot drop the observations with missing values because as the blue row shows, some observations are unique but have missing values.
The other problem is that I cannot tell in which column the value is missing (someone else asked something similar before but for him, the missing value was in only one column for all observations).

Comment