Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations of categorical variables

    Hi, I have a set of categorical variables extracted from survey dataset and as part of the responses I have observations with "don't know" or "dna" which I would like to drop. But I am getting error r(111). Example I want to drop observations where variable 'qhealth' has responses of "dna". I type the following code:

    drop if qhealth==dna
    But I get error saying dna not found

    I also tried:
    drop if qhealth=="dna"
    But I get error saying type mismatch

    Anyone can help me figure out what I'm doing wrong?

    Thanks in advance!

  • #2
    So, if you run -des qhealth-, I'm pretty sure that you will find that it is not a string variable, but is a numerical variable with a value label attached to it. So you can -label list- that value label and find out what the numerical value corresponding to "dna" is, and then -drop if qhealth == that_numerical_value-. Or, if you want to be fancy, let's say the value label is called qhealth_label. Then you can do it as -drop if qhealth == "dna":qhealth_label-.

    Value labels are one of the basics of Stata data management. Do acquaint yourself with the corresponding sections of the user manual.

    Comment


    • #3
      Thanks very much! That has been really helpful. Issue solved : )

      Comment


      • #4
        Hi,

        I have a variable Country Name and I want to drop a few countries under it but I get a mismatch error. Can some one please help? Thank you.

        drop if CountryName == "Curacao" & "Cayman Islands" & "Gibraltar" & "St. Martin (French part)" & "Nauru" & "Sint Maarten (Dutch part)" & "Turks and Caicos Islands" & "British Virgin Islands"

        Comment


        • #5
          You are writing Stata code as if it were ordinary English, so your use of & is incorrect. I presume what you mean is that you want to drop the observation if the CountryName is any of the ones listed in your command. If so:
          Code:
          drop if inlist(CountryName, "Curacao", "Cayman Islands", "Gibraltar", "St. Martin (French part)", "Nauru", ///
               Sint Maarten (Dutch part)", "Turks & Caicos Islands", "British Virgin Islands")

          Comment


          • #6
            Thank you!

            Comment


            • #7
              Hi, I have a dataset of millions of observations. I have a categorical variable called "imaging" and want to drop hundreds of observations under this variable at a time. I have identified which label values these are but it is taking a long time! Instead of typing: drop if inlist(imaging, "500","501",......"1000") is there another way to do this? When I have tried: "drop if imaging>500" it instead drops those imaging label values with frequencies over 500. (I think thats what it is doing anyway).

              Comment


              • #8
                If and only if imaging is a string variable or scalar will a command like


                Code:
                drop if inlist(imaging, "500", "1000")
                be legal and if that is true then

                Code:
                drop if imaging > 500
                will be illegal. In any case even if imaging were numeric the effect would not be to drop according to frequencies of values but according to values.

                Some of your text seems to imply that the variable is numeric with value labels, but if so then the first command is illegal.

                In short, #7 is utterly contradictory. Please read and act on https://www.statalist.org/forums/help#stata and give us a data example using dataex to help us help you to work out what is happening.

                Comment


                • #9
                  Thanks Nick, sorry I am new to this forum but have read that now thank you. Your advice has helped me think through my problem and have now solved it. Thank you!

                  Comment


                  • #10
                    That's good but telling people what the solution was could be interesting and even useful to others who come across this thread.

                    Comment


                    • #11
                      Of course sorry, the dataset i am using has 50 variables and over a million observations. The variable imaging1 is a string variable, but i used the encode command to create a numeric variable imaging. The imaging variable has 1717 different values. When i was constructing the dataex sample to reply to you i realised what my mistake was...

                      I used the command: drop if imaging>1716.
                      this should only have dropped 50 observations as the imaging variable goes upto 1717, and there are only 50 observations in those with an imaging numeric code of 1717. But instead it dropped over one million.

                      So i have now used the command: drop if imaging >1716 & < 1718
                      and this only dropped the 50 observations i was expecting.
                      In conclusion, the imaging variable has a lot of missing observations and so my earlier commands were dropping those too!

                      Comment

                      Working...
                      X