Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace -99 and -9s as missing values(dots) in the whole data?

    I have a large data set with over 400 variables. Literally, every variable has either -9 or -99 as missing values. I have to replace them with dots. Since there are over 400 variables, I want few commands to replace all the -9s and -99s in all the variables. Is it possible to execute them with a one or two commands?

  • #2
    Code:
    mvdecode _all, mv(-9 -99)
    The above will do it. But it presumes that there is no variable for which, say, -99 encodes missing, but -9 also occurs and is a valid value (or vice versa). If that kind of situation can arise, then there needs to be some way to identify which variables use which missing value and treat them separately.

    Comment


    • #3
      I think you can just do

      recode * (-9 -99 =.)

      Be sure that is what you really want. And save the original data in case you do screw up.

      EDIT:

      You can also use mvdecode.

      Last edited by Richard Williams; 18 Jun 2017, 19:16.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Also, perhaps -9 and -99 show different reasons for being missing, in which case extended missing values such as .a and .b would be appropriate.

        Comment


        • #5
          Thanks for the suggestions. I will try to use it in my datasets.

          Comment


          • #6
            Also, is it possible to have more than .z (for example, .aa) to signify different types of missing values? We have data that has many different types of missingness, so need more than just .a through .z

            Thanks in advance!

            Comment


            • #7
              Wow, I have never seen that many missing values. If nobody has a better idea, you could clone the original variable, recode the clone to have one or a few missing values, and keep the original in case you need to retrieve the original missings.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thanks for your response! Yes, it's because we are merging multiple data sets together, each which have difference non-numeric codes in the otherwise numeric variables.

                Comment


                • #9
                  If you have different codes for the same thing (e.g. "NA" and "Not Applicable") you may want to recode them all to use the same codes.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    Thank you very much,all!! I have learned a lot from your posts.
                    Respectfully,Hassen

                    Comment


                    • #11
                      It is my understanding that all the above-mentioned STATA commands (e.g. mvdecode and recode) concern numeric values.

                      I am dealing with a large dataset (also >400 variables) which I imported from a csv-file. "NA" (i.e. a string value) was used to indicate "not applicable" and I would like to change all data points in the data set that are "NA"s to ".a". Does anybody know a helpful strategy to do this?

                      Comment


                      • #12
                        You could do something like this, where you can specify options to ds to get the right variables.
                        Code:
                        ds
                        foreach var of varlist `r(varlist)' {
                            replace `var' = ".a" if `var' == "NA"
                        }
                        Although I would replace it with an empty string rather than ".a", because it allows use of the missing function.
                        Code:
                        . di missing(".a")
                        0
                        
                        . di missing("")
                        1

                        Comment


                        • #13
                          My guess is that Johannes will probably follow up with destring anyways because string variables are not very useful for analysis. So he might as well stick with ".a" which destring should recognize as extended missing values (although I did not test that recently).

                          Edit:

                          I have often seen

                          Code:
                          ds
                          foreach var of varlist `r(varlist)' {
                          
                          }
                          on the list. Note that the initial call to ds is not necessary because the full set of variables is also directly available as

                          Code:
                          foreach var of varlist * {
                          
                          }
                          Last edited by daniel klein; 06 Jul 2020, 09:25.

                          Comment


                          • #14
                            Sure. but there is real point in say

                            Code:
                            ds, has(type string)
                            because then the ensuing r(varlist) includes only string variables. I think this is what Wouter Wakker was implying by "where you can specify options to ds to get the right variables",

                            Comment


                            • #15
                              Dear all,

                              thank you for your helpful contribution!

                              The following has worked:

                              Replacing all "NA"s by ".a" in string variables:

                              Code:

                              ds, has(type string)
                              foreach var of varlist `r(varlist)' {
                              replace `var' = ".a" if `var' == "NA"
                              }

                              Applying the destring option (Note: ".a" is rightly regarded as "missing") to convert string variables to numerical variables.

                              Code:

                              ds, has(type string)
                              foreach var of varlist `r(varlist)' {
                              destring `var', replace
                              }


                              Best wishes,
                              Johannes

                              Comment

                              Working...
                              X