Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count don't knows and refuse to answer per observation

    Hi,

    My dataset contains some variables for which the response option is "don_t_know" and "refuse_to_answer".

    I would like to count the number of variables for which one, the other, or both options, and then, for each observation count the number of don't know, refuse to answer, or both, use it to then have a rate of don't know and refuse to answer per observation, which I would later use to compute the number of and rate of don't know and refuse to answer per enumerator.

    I am struggling to create a variable that counts the number of don't knows and refuse to answer per observation. I think that from there on, I can finish the work.

    In the questionnaire, don't know is either don_t_know or -88 (for numerical variables). refuse to answer is either refuse_to_answer or -99.

    Thank you

    Best,
    Nicolas

  • #2
    Code:
    help egen
    and look at the any* functions.

    Comment


    • #3
      For future reference, using extended missing values for DK, NR, N/A, etc is handy when you go to analyze data. You can retain the identity of different types of missing (with labels) and sill have those values automatically excluded from calculations (see help missing and help mvdecode).
      Stata/MP 14.1 (64-bit x86-64)
      Revision 19 May 2016
      Win 8.1

      Comment


      • #4
        Thanks for your reply.

        The egen any* (anycount, anymatch and anyvalue) works for the -88 and -99 options (even if for some reason it tells me that my numlist is invalid:

        quietly ds *_text, not
        quietly ds `r(varlist)', has(type numeric)
        egen total_88=anycount(`r(varlist)'), values(integer -88 -99)

        But how would I do it for string variables? I also have STATA 11, I may be lacking some of the codes :/.

        Thank you

        Comment


        • #5
          Delete the word integer which is not part of the syntax.

          You can't do much statistical if your data are all recorded as string variables. You should probably convert them to numeric.

          Comment


          • #6
            I agree that you'll probably want to recode those string variables as numeric, but this will allow you to count the number of "don_t_know" & "refuse_to_answer". In my experience, there may be some variation by variable in the capitalization, exact wording of those categories, and blank spaces before and after, so be careful to check that and clean as necessary.


            Code:
            ds, has(type string)
            foreach var of varlist `r(varlist)' {
                gen `var'_miss=1 if `var'=="don_t_know" | `var'=="refuse_to_answer"
                }
            egen total_str_miss=anycount(*_miss), values(1)
            Stata/MP 14.1 (64-bit x86-64)
            Revision 19 May 2016
            Win 8.1

            Comment


            • #7
              Thanks Nick.

              I am currently collecting data and I would like to monitor and ensure that enumerators are not using the don't know option to often. Unfortunately, it would take too long to convert each variable into numeric.

              For the sole purpose of monitoring, would there be a way to find out the number of instances the don't know response is repeated across an observation?

              Comment


              • #8
                "Unfortunately, it would take too long to convert each variable into numeric." Really, that doesn't follow. If you have consistent coding, it could be as little as one command line.

                Comment


                • #9
                  Hi Nick,

                  I tried the syntax and the last two lines worked, but the first line red error message as below:

                  . quietly ds *_text, not
                  variable *_text not found
                  r(111);


                  . quietly ds `r(varlist)', has(type numeric)

                  . egen total_99=anycount(`r(varlist)'), values(88 99)

                  The last two lines counted the number of variables that have "99" don't know/missing for each observation.

                  How can I exclude them for analysis?

                  Comment


                  • #10
                    In Stata, it is best not to use magic numbers such as 88 and 99 to represent missing values. Stata has system missing and extended missing values, and those are automatically excluded from nearly all analyses. So what you need to do is:

                    Code:
                    quietly ds, has(type numeric)
                    mvdecode `r(varlist)', mv(88 99 = .d)
                    Notes:
                    1. This assumes that you want to convert 88 and 99 to a missing value in all the numeric variables in your data set. If you only want to convert some variables, skip the -ds- command and just list out the variables you want to convert in the -mvdecode- command in place of `r(varlist)'.

                    2. It also assumes you want to treat both 88 and 99 as missing values. It isn't entirely clear from your post if you want to do that with 88. If not, just remove it from the -mv()- option of -mvdecode-.

                    3. I chose the .d missing value because, mnemonically, it could represent "don't know." But you can use any of Stata's missing values. And if you want, you can have separate missing values of 88 and 99 if they represent different types of missingness. Read -help mvdecode- for more information.

                    Comment


                    • #11
                      Thank you so much, Professor Clyde!

                      Comment


                      • #12
                        #9 (compare #4)

                        Code:
                        . quietly ds *_text, not
                        variable *_text not found
                        r(111);
                        
                        . quietly ds `r(varlist)', has(type numeric)
                        The first command failed because you don't have variables named *_text (unlike Nicolas in #4).

                        But the second command works because r(varlist) is empty as a consequence (ds never finished). So the second ds thus just looks at all variables when there is no varlist supplied.

                        Comment

                        Working...
                        X