Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with DROP command

    Hello,

    I have a dataset with 742 observations, and among them 14 have duplicates. For some of these, I need to drop the row for two variables. But whenever I am using the command -
    drop if hsn==1027 & A43_bqacgprima== "1. EE (YES)", it is showing type mismatch. I am using this command for the first time, so I am pretty sure I am making some mistakes. Can some one please help me in this regard?

    Also, I need to drop some rows, where two observations are just duplicate of each other. In that case, how can I remove one row, while keeping the other intact?

    Fahmida

  • #2
    Originally posted by Fahmida Huq View Post
    Hello,

    I have a dataset with 742 observations, and among them 14 have duplicates. For some of these, I need to drop the row for two variables. But whenever I am using the command -
    drop if hsn==1027 & A43_bqacgprima== "1. EE (YES)", it is showing type mismatch. I am using this command for the first time, so I am pretty sure I am making some mistakes. Can some one please help me in this regard?

    Also, I need to drop some rows, where two observations are just duplicate of each other. In that case, how can I remove one row, while keeping the other intact?

    Fahmida
    If you want to remove duplicate observations then

    Code:
    duplicates tag, gen (dup)
    tab dup
    duplicates drop
    drop dup

    rerun the code again to see duplicates still exist.

    Code:
    help duplicates

    Comment


    • #3
      Thanks for your answer. But I need to drop the particular row where hsn==1027 & A43_bqacgprima== "1. EE (YES)". There is another row with same hsn, but A43_bqacgprima is something different. I need to keep that row and delete/drop the one with A43_bqacgprima== "1. EE (YES)".How can I do that?

      Fahmida

      Comment


      • #4
        Your variable A43_bqacgprima is not a string variable. It is a numeric variable with value labels. You need to compare it to the number that corresponds to "EE (YES)" which is probably 1.
        Code:
        generate todrop = hsn==1027 & A43_bqacgprima== 1
        list if todrop
        // if what's listed is what you wanted dropped
        drop if todrop
        drop todrop

        Comment


        • #5
          A43_bqacgprima is a string variable. It is labelled as 1. EE (YES) in the dataset. There is no other numbers or answers for this variable.

          Comment


          • #6
            Hi Fahmida,

            Originally posted by Fahmida Huq View Post
            A43_bqacgprima is a string variable. It is labelled as 1. EE (YES) in the dataset. There is no other numbers or answers for this variable.
            Stata does not allow a string variable to have labels. Either the A43 variable is a string (without labels), or it is a numeric variable (that might or might not have labels). William is suggesting that your variable is a numeric one with labels, and that the way you are constructing your code you are asking Stata to drop a literal (string) value of a variable. You can test if your A43_bqacgprima variable is a string using the following code:

            Code:
            confirm string variable A43_bqacgprima
            To which you should receive no answer if A43_bqacgprima is indeed a string. If you get a red warning, followed by error r(7), than your variable is not a string.

            If your variable is numeric, then you have to code your -drop- statement using the value of the variable (not the label). One way to find out which value does a label correspond, is to look in the Variables Manager (Data -> Variables Manager). Another way is to simply run tabulations of said variable, asking Stata to show and hide the labels:

            Code:
            tab A43_bqacgprima
            tab A43_bqacgprima, nolabel
            Best;

            Comment


            • #7
              Thanks for the clarification. I actually got no answer after running the code, so it is indeed a string variable. I was wondering how to solve the problem now. I am a new user of stata, and I am really having trouble to solve this problem. Thanks in advance.

              Fahmida.

              Comment


              • #8
                Perhaps hsn is actually a string variable, in which case 1027 should be enclosed in quotation marks.

                Comment


                • #9
                  Yes, William is right, it could be that the other variable is a string.

                  Fahmida, can you please share a snippet of your dataset using -dataex-? This would have avoided the whole turmoil. If you have concerns about sharing a snippet of your dataset, you can restrict dataex reach by running the following code:

                  Code:
                  dataex hsn A43_bqacgprima in 1/5
                  This code will create a snippet of your dataset only for first 5 observations and only for the 2 variables that you can and paste here in Statalist. What is important is that dataex preserves variable types, and let people trying to help you know exactly what they are dealing with.

                  Best;

                  Comment


                  • #10
                    Thanks for the advice. I'll share the snippet of my dataset from next time. And William was right, as my other dataset was converted to string variable, which I didn't notice. Thanks for the suggestions, as I learnt how to deal with a dataset from different perspectives.

                    Fahmida

                    Comment

                    Working...
                    X