Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing missing values across multiple variables from a questionnaire

    Hello,

    I am fairly new to STATA and have spent awhile trying to solve this, but can't seem to figure it out. I am working on a questionnaire that has multiple standardized measures on it. Some participants did not fill out all the questions, leaving missing values. However, if they are sufficiently completed I can calculate the mean and replace the missing values with the mean.

    In this example, the variables are ces1_1 through ces20_1, the mean is ces_mean_1, and the count of valid values is ces_count_1. I have tried two methods, but I don't know if either is even the appropriate approach:

    1) Using the recode command, but I believe that it won't let me replace the missing value with the value of another variable, but instead only a specific number:

    recode ces1_1-ces20_1 (missing = ces_mean_1) if ces_count_1 >= 16

    This gives an error of unknown el ces_mean_1 in rule


    2) Using the foreach command:

    foreach var of varlist ces1_1-ces20_1 {
    replace `.' = ces_mean_1 if ces_count_1 >=16
    }

    However, it gives me an error of "varlist required." I used trace to find out where the error was occurring and this is what it gave me:

    . foreach var of varlist ces1_1-ces20_1 {
    2. replace `.' = ces_mean_1 if ces_count_1 >=16
    3. }
    - foreach var of varlist ces1_1-ces20_1 {
    - replace `.' = ces_mean_1 if ces_count_1 >=16
    = replace = ces_mean_1 if ces_count_1 >=16
    varlist required
    }

    However, I don't see where a varlist would be needed in that line.

    Thank you for any help,
    Jordan

  • #2
    Mean substitution is still far too common, but is a very "evil" way of dealing with missing data -- you'll see it a lot in practice, but if you read anything about methods, it's frowned upon. Do you have anybody on your end who might be able to get you going with multiple imputation? it's a bit much to explain why and give all details particular to your situation on how to implement it, but do it if you can.

    Comment


    • #3
      Originally posted by ben earnhart View Post
      Mean substitution is still far too common, but is a very "evil" way of dealing with missing data -- you'll see it a lot in practice, but if you read anything about methods, it's frowned upon. Do you have anybody on your end who might be able to get you going with multiple imputation? it's a bit much to explain why and give all details particular to your situation on how to implement it, but do it if you can.

      I can ask, but I am just working as an RA in my first semester and this is what I was asked to do. I'll look into multiple imputation though.

      Comment


      • #4
        The syntax error here (see help replace) is that a single variable name is needed after replace. Another way of seeing that you are wrong is to note that you never refer to each variable inside the loop.

        I am not sure that I understand all your approach, but this code is more likely to be legal.

        Code:
        foreach var of varlist ces1_1-ces20_1 {
              replace `var' = ces_mean_1 if ces_count_1 >=16  & `var' == .
        }
        Mean imputation naturally is easy but moderately poor as an imputation method....

        Comment


        • #5
          Originally posted by Nick Cox View Post
          The syntax error here (see help replace) is that a single variable name is needed after replace. Another way of seeing that you are wrong is to note that you never refer to each variable inside the loop.

          I am not sure that I understand all your approach, but this code is more likely to be legal.

          Code:
          foreach var of varlist ces1_1-ces20_1 {
          replace `var' = ces_mean_1 if ces_count_1 >=16 & `var' == .
          }
          Mean imputation naturally is easy but moderately poor as an imputation method....


          Thank you! I looked back through my iterations of code and realized I had tried this but mistakenly only put a single "=". I will also look into other imputation methods as you and ben earnhart suggested.

          Thank you for your help.

          Comment


          • #6
            Jordan:
            as Ben and Nick warned you about, replacing missing values with the mean of the existing observations is a methodologically weak way to go. As it is easy to envisage, if you have a remarkable number of missing values (but, as far as I know, nobody set a quantitative cut-off) that approach would make, at best, the variance collapse.
            Other seemingly easy methods, like last observation carried forward and next observation carried backwards are criticized as well.
            An interesting website on this topic is www.missingdata.org.uk, which is maintained by Jonathan Bartlet (London School of Hygiene & Tropical Medicine), whose posts appear on this forum from time to time.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear All,
              I have to amend a typo in my previous reply, as Jonathan's surname is Bartlett (with double t at its end).
              I do apologize to Jonathan.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Related to this question, I want to replace all missing values with 0 but the variables have the same suffix (rather than prefix). So I have 10 variables with the suffix "po3_204". How do do the replacement?

                Comment


                • #9
                  Code:
                  foreach var of varlist *po3_204{
                      replace `var'=0 if missing(`var')
                  }

                  Comment

                  Working...
                  X