Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to code missing values after using -encode- on a string variable?

    I have a number of variables that are in string format but lend themselves to being numerically coded. There are a number of missing values that were originally coded as -9, -10, 999.

    I used the command -encode- to generate my numeric variables. The missing values have also been encoded. I couldn't assign them as MVs using -mvdecode-

    foreach v in Asthmapmed2freq_8 Asthmapmed1freq_8 Asthmapmed1freqex_8 Asthmapmed4freq_8 Asthmapmed3freq_8 {
    encode `v', ge(c`v')
    }

    This didn't work:

    mvdecode c*, mv(-9=.a \-10=.b \999=.c)

    I would be grateful for any advice. Many thanks!

  • #2
    Sara,
    My advice would be: gen missing values before destring/encode the variables
    Code:
    foreach v in Asthmapmed2freq_8 Asthmapmed1freq_8 Asthmapmed1freqex_8 Asthmapmed4freq_8 Asthmapmed3freq_8 {
    replace `v=="" if `v'=="-9" | `v'=="-10" | `v'=="-999"
    }
    This being said: a few remarks:
    On your post first
    Please (as per FAQ), do not report "this didn't work" tell us rather what error message Stata displayed.
    Also use the code delimiters (in the advance editor toggle).

    On your issue now:
    Your mvdecode doesn"t work, because you run it on your new (encoded) variables, which does not have the values your searching for (e.g. the "-9" have been encoded, see what's the new value of those variables in the encode variable)

    Edit : my first code then doesn't allow you to specify various types of missing variables, the end of the post suggest you how to do it. Choose the option you prefer.
    Last edited by Charlie Joyez; 22 Nov 2016, 09:35.

    Comment


    • #3
      What does "this didn't work" mean? Did you get an error messsage of some kind? (If so, what was it?) Did it run but produce unexpected results--if so, show those results? Did something else happen?

      Here's my best guess, without this relevant information. It is likely that when you used -encode-, the strings "-9", "-10", and "999" did not get encoded as -9, -10, and 999 respectively but as some other numbers. In fact, I am quite sure that is the case since -encode- never produces negative numbers in its output. So when you go to -mvdecode- there simply aren't any values of -9 or -10 (and there's an excellent chance there are non values of 999 either). If I'm right about this, the problem is that you should not have used -encode- for this: if you have a string variable whose contents look numeric, the way to create a numeric variable out of it is with -destring-, not -encode-.

      Note: Crossed with Charlie Joyez' response, which makes some of these same points and offers a different approach. But note that while Charlie's code will deal with the missing values, it will probably leave Sara with misrepresented numeric values in her variables, and she will be in for some very nasty surprises when she tries to calculate with them. -encode- is just the wrong approach to this situation.

      Comment


      • #4
        Many thanks Charlie for your response. Your comments have been noted. Stata didn't generate an error message when I coded the MVs. It is only when I tabulated my data did I notice that the missing values were being treated as normal observations.

        At this stage of my research, I need to maintain the existing MV codes rather than set them to blank as I need to understand why a question wasn't answered.

        Comment


        • #5
          Thank you Clyde. Apologies for not providing enough information. Yes, -encode- assigned a code to the MV strings. I will go back and use -destring-.

          Comment


          • #6
            Clyde's right, the destring option seems more logical here if the variable are numeric coded into strings, I though mentioning it (this is why I wrote "before destring/encode") but then I kept with the encoding idea, since we don't know the data, I though the variables could have non non numeric characters.

            However, we made the same point: the encode command doesn't return -9 for "-9", so the mvdecode code in #1 cannot work.

            So to sum up: if the variables are indeed numeric

            Code:
            foreach v in Asthmapmed2freq_8 Asthmapmed1freq_8 Asthmapmed1freqex_8 Asthmapmed4freq_8 Asthmapmed3freq_8 {
            destring `v', replace
            mvdecode `v', mv(-9=.a \-10=.b \999=.c)
            }
            Or if the variable have non numeric characters (the previous command won't work, Stata will warn you about non numeric character, so no replace would be possible:

            Code:
            foreach v in Asthmapmed2freq_8 Asthmapmed1freq_8 Asthmapmed1freqex_8 Asthmapmed4freq_8 Asthmapmed3freq_8 {
            encode `v', gen(c`v')
            su c`v' if `v'=="-9"
            local a=r(mean)
            su c`v' if `v'=="-10"
            local b=r(mean)
            su c`v' if `v'=="-999"
            local c=r(mean)
            mvdecode c`v', mv(`a'=.a \`b'=.b \`c'=.c)
            }
            The idea here is to store into locals the "encoded" values of "-9", "-10" and "-999"
            This should work, but I haven't tested it.

            Best,
            Charlie

            Comment


            • #7
              Perfect! Thank you so much Charlie. The latter option is what I need as there are non-numeric data.

              Comment


              • #8
                Charlie gives good advice as usual, but there is no need for a loop here.

                Both destring and mvdecode take varlists.

                Also, for multiple encode see multenncode from SSC.

                Comment


                • #9
                  Thank you so much Charlie. The latter option is what I need as there are non-numeric data, however, it is generating the following error message. I can't see where to alter the -mvdecode- command as it looks correct.

                  HTML Code:
                  . foreach v in Asthmapmed2freq_8 Asthmapmed1freq_8 Asthmapmed1freqex_8 Asthmapmed4freq_8 Asthmapmed3freq_8 Asthmacmed1freq_8 {
                    2. encode `v', gen(c`v')
                    3. su c`v' if `v'=="-9"
                    4. local a=r(mean)
                    5. su c`v' if `v'=="-10"
                    6. local b=r(mean)
                    7. su c`v' if `v'=="999"
                    8. local c=r(mean)
                    9. mvdecode c`v', mv(`a'=.a \`b'=.b \`c'=.c)
                   10. }
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                  cAsthmapme~8 |        154           2           0          2          2
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                  cAsthmapme~8 |        918           1           0          1          1
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                  cAsthmapme~8 |          4           3           0          3          3
                  = found, where \ expected
                  r(198);




                  Here is the information on the variable post -encode-
                  HTML Code:
                  . codebook cAsthmapmed2freq_8
                  
                  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                  cAsthmapmed2freq_8                                                                                In the past 12 months has your child used any medicines pills puffers or other m
                  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                  
                                    type:  numeric (long)
                                   label:  cAsthmapmed2freq_8
                  
                                   range:  [1,5]                        units:  1
                           unique values:  5                        missing .:  0/1,184
                  
                              tabulation:  Freq.   Numeric  Label
                                             918         1  -10
                                             154         2  -9
                                               4         3  999
                                              74         4  Irritated
                                              34         5  Regular


                  Any thoughts appreciated!

                  Comment


                  • #10
                    Thanks Nick for your comment,
                    As always however, he manages to give better advices.

                    My assumption is that my code leads to some error message in your third variable in the loop, because at least one of "-9" "-10" and "-999" doesn't appear in the initial variable, which leads to have at least one local among `a' `b' and `c' being null -which is equal to not exist for a local-, and then crashes when running the mvdecode, since one of the "new value" isn't given.

                    Take a look at Nick's suggestion, to avoid the loop, and thus the use of locals.

                    Best,
                    Charlie

                    Comment


                    • #11
                      Among the take-home messages we get from the error message in #1, I'd mention converting string variables into numerical before applying the mvdecode command.

                      Just to exemplify:

                      Code:
                      . set obs 5
                      number of observations (_N) was 0, now 5
                      
                      . input str4 mystring
                      
                            mystring
                        1. -9
                        2. -10
                        3. 999
                        4. 11
                        5. 21
                      
                      . des
                      
                      Contains data
                        obs:             5                          
                       vars:             1                          
                       size:            20                          
                      -----------------------------------------------------------------------------------------------------------------------------
                                    storage   display    value
                      variable name   type    format     label      variable label
                      -----------------------------------------------------------------------------------------------------------------------------
                      mystring        str4    %9s                  
                      -----------------------------------------------------------------------------------------------------------------------------
                      Sorted by:
                           Note: Dataset has changed since last saved.
                      
                      . mvdecode mystring, mv(-9=.a \ -10=.b \ 999 = .c)
                          mystring: string variable ignored
                      
                      . destring mystring, replace
                      mystring: all characters numeric; replaced as int
                      
                      . mvdecode mystring, mv(-9=.a \ -10=.b \ 999 = .c)
                          mystring: 3 missing values generated
                      
                      . list
                      
                           +----------+
                           | mystring |
                           |----------|
                        1. |       .a |
                        2. |       .b |
                        3. |       .c |
                        4. |       11 |
                        5. |       21 |
                           +----------+
                      Also under a loop (2 variables):

                      Code:
                      . gen mystring2 = mystring
                      
                      . replace mystring2 = "999" in 5
                      (1 real change made)
                      
                      . list
                      
                           +---------------------+
                           | mystring   mystri~2 |
                           |---------------------|
                        1. |       -9         -9 |
                        2. |      -10        -10 |
                        3. |      999        999 |
                        4. |       11         11 |
                        5. |       21        999 |
                           +---------------------+
                      
                      . foreach v in mystring mystring2 {
                        2. destring `v', replace
                        3. mvdecode `v', mv(-9=.a \ -10=.b \ 999 = .c)
                        4. list
                        5. }
                      mystring: all characters numeric; replaced as int
                          mystring: 3 missing values generated
                      
                           +---------------------+
                           | mystring   mystri~2 |
                           |---------------------|
                        1. |       .a         -9 |
                        2. |       .b        -10 |
                        3. |       .c        999 |
                        4. |       11         11 |
                        5. |       21        999 |
                           +---------------------+
                      mystring2: all characters numeric; replaced as int
                         mystring2: 4 missing values generated
                      
                           +---------------------+
                           | mystring   mystri~2 |
                           |---------------------|
                        1. |       .a         .a |
                        2. |       .b         .b |
                        3. |       .c         .c |
                        4. |       11         11 |
                        5. |       21         .c |
                           +---------------------+
                      Last edited by Marcos Almeida; 23 Nov 2016, 06:13.
                      Best regards,

                      Marcos

                      Comment


                      • #12
                        Thanks Marcos, but how's that different from my first code in #6 ?
                        And it seems that variables cannot be destringed because they contain non-numeric characters.

                        I've taken a look at my previous code in #6 and several things are wrong, in addition to the errors I've pointed in #10:
                        -I've misscoded the .c missing values, making it correspond to "-999" instead of "999"

                        -The error message reported in #9 is du to mvdecode c`v', mv(`a'=.a \`b'=.b \`c'=.c), while the real code should be:
                        Code:
                        mvdecode c`v', mv("`a'"=.a \"`b'"=.b \"`c'"=.c)
                        However, the worry concerning the missing values (#10) is real. I've found a way to solve it, but it might be really improved (alhough it works).
                        Following the numerical example of Marcos, you could try:


                        Code:
                        clear
                        set obs 5
                        input str4 mystring
                           -9
                           -10
                           999
                           11
                           21
                         
                        gen mystring2 = mystring
                        replace mystring2 = "999" in 5
                        replace mystring2 = "9" in 1
                        
                        foreach v in mystring mystring2 {
                        encode `v', gen(c`v')
                        quie su c`v' if `v'=="-9"
                        local a=r(mean)
                        if  r(N)== 0 {
                        local a
                        }
                        quie  su c`v' if `v'=="-10"
                        local b=r(mean)
                        if  r(N)== 0 {
                        local b
                        }
                        quie  su c`v' if `v'=="999"
                        local c=r(mean)
                        if  r(N)== 0 {
                        local c
                        }
                        
                        if !missing("`a'") & !missing("`b'") & !missing("`c'"){
                        mvdecode c`v', mv("`a'"=.a \"`b'"=.b \"`c'"=.c)
                        }
                        if !missing("`a'") & !missing("`b'") & missing("`c'"){
                        mvdecode c`v', mv("`a'"=.a \"`b'"=.b )
                        }
                        if !missing("`a'") & missing("`b'") & missing("`c'"){
                        mvdecode c`v', mv("`a'"=.a)
                        }
                        if !missing("`a'") & missing("`b'") & !missing("`c'"){
                        mvdecode c`v', mv("`a'"=.a  \"`c'"=.c)
                        }
                        if missing("`a'") & !missing("`b'") & !missing("`c'"){
                        mvdecode c`v', mv("`b'"=.b \"`c'"=.c)
                        }
                        if missing("`a'") & missing("`b'") & !missing("`c'"){
                        mvdecode c`v', mv("`c'"=.c)
                        }
                        if missing("`a'") & !missing("`b'") & missing("`c'"){
                        mvdecode c`v', mv("`b'"=.b)
                        }
                        }
                        In the second variable, you don't have any values "-9", so the initial code would return an error message ( . is not a valid numlist). The new code creates real missing local if the case isn't observed, and then use the appropriate mvdecode code.
                        I hope this helps

                        Comment


                        • #13
                          Charlie Joyez Indeed, Charlie, you are absolutely right: the code in #11 doesn't differ much from the code you present in #6. It was basically a way to underline what I considered a "take home" message, at least to myself.

                          Due to lack of a data set in #1 (also, in the following messages), I thought it would be helpful to present the way I did when trying to give a reply to the original message: using as few commands as I could envisage, this time presenting a quite simple data set.

                          By the way, thanks for the code in #12 (it deals with missing values) and thanks for taking on the fictitious data I used for didatic purposes.

                          Kind regards,

                          Marcos.
                          Best regards,

                          Marcos

                          Comment

                          Working...
                          X