Hi all,
I am looking to correct data inconsistencies in a panel data set between 2015 - 2017 in Stata.
The table below shows a list of survey respondents with only two years of observations for age. It is not possible to deduce their age given that different birth years are reported. Also, a most frequent response for birth year clearly does not exist. I do not want to remove these observations as this may lead to biased data but then again, there are a total of 6,200 observations. Is there an appropriate strategy to correct these inconsistencies?
Similarly, the table below shows a list of survey respondents with only two years of observations for gender. A most frequent response for gender clearly does not exist. Again, I do not want to simply remove these observations in case it leads to biased data and I do want to find a way of correcting them. Is there an appropriate strategy to correct these inconsistencies?
The only possible strategies I have come across are:
1. Use the first-reported value.
2. Select one of the reported values at random (Not sure how to do this in a non-biased way!)
Many thanks in advance for any advice.
I am looking to correct data inconsistencies in a panel data set between 2015 - 2017 in Stata.
The table below shows a list of survey respondents with only two years of observations for age. It is not possible to deduce their age given that different birth years are reported. Also, a most frequent response for birth year clearly does not exist. I do not want to remove these observations as this may lead to biased data but then again, there are a total of 6,200 observations. Is there an appropriate strategy to correct these inconsistencies?
Data Inconsistencies | |||
ID | Year | Age | Birth Year |
#1 | 2016 | 44 | 1971 |
2017 | 42 | 1974 | |
#2 | 2016 | 58 | 1958 |
2017 | 61 | 1956 | |
#3 | 2016 | 30 | 1986 |
2017 | 40 | 1977 |
Similarly, the table below shows a list of survey respondents with only two years of observations for gender. A most frequent response for gender clearly does not exist. Again, I do not want to simply remove these observations in case it leads to biased data and I do want to find a way of correcting them. Is there an appropriate strategy to correct these inconsistencies?
Data Inconsistencies | ||
ID | Year | Gender |
#4 | 2016 | Male |
2017 | Female | |
#5 | 2016 | Female |
2017 | Male | |
#6 | 2016 | Female |
2017 | Male |
The only possible strategies I have come across are:
1. Use the first-reported value.
2. Select one of the reported values at random (Not sure how to do this in a non-biased way!)
Many thanks in advance for any advice.
Comment