Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning up height data

    I'm doing secondary data analysis of a survey and the height data I need to calculate BMI is not very clean. The data is entered in 3 columns depending on how the survey participants elected to enter it (this part of the survey was self-completed). The columns are cm, feet, inches. The cm column is fine, and about half the people who entered their height in imperial managed to correctly put the feet and inches in the separate columns. The other participants have entered their height in feet and inches just in the feet column ie 5.8, 6.1 etc. I'd like to use their data obviously, but I need some way of splitting the inches off the feet and getting them into the correct column. I've tired split var, p(.) but that didn't work as it is numeric. Suggestions gratefully received.

  • #2
    I think you're cooked! If var is a numeric variable, then it seems that it would be impossible for it to distinguish between 5 foot 1 inch and 5 foot 10 inches, both of which will be 5.1 (or 5.0999999905 if you look under the hood).

    For everything else, you can do this:

    Code:
    gen correct_feet = floor(var)
    gen sort_of_correct_inches = round(100*(var-correct_feet))
    replace sort_of_correct_inches = sort_of_correct_inches/10 ///
        if sort_of_correct_inches > 11
    That will get everything correct except for the uninterpretable 5.1. From there, if it can be found, you will have to go back to the source data to resolve 5'1" from 5'10". (If the data were directly computer entered then I think this problem has no solution.)
    Last edited by Clyde Schechter; 04 Jun 2018, 18:24.

    Comment

    Working...
    X