Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop all observations with negative values

    Hi, I have been trying to delete all negative values from my dataset (instead of doing it by variable) using the below 2 approaches, but am ending up with 0 observations in each case.

    APPROACH1

    foreach v of var * {
    drop if `v' < 0
    }

    APPROACH 2
    egen rmin=rowmin(_all)
    drop if rmin<0


    Thanks in advance!


  • #2
    Or:
    the issue is that -drop- can't do what you want without deleting non-negative observations in some variables, too.
    It's probably better to flag the negative values in your dataset:
    Code:
    foreach var of varlist A-Z {
    g flag=1 if `var' <0 
    }
    and then rule those observation out from future analyses via -if-:

    Code:
    sum A if flag!=1
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Carlo's code will fail second time the loop, as the variable already exists. He probably meant something more like this

      Code:
      gen flag = 0 
      foreach var of varlist A-Z {
          replace flag = 1 if `var' <0 
      }
      Note that you can do

      Code:
      egen flag = rowmin(A-Z) 
      replace flag = flag < 0

      Comment


      • #4
        Often when people on this forum say they want to "drop" certain values of variables, it turns out that they mean they want to replace those values with the system missing value (dot). If that's what is meant here:

        Code:
        foreach v of varlist _all {
            replace `v' = . if `v' < 0
        }

        Comment


        • #5
          Nick:
          good point.
          Note for myself: remember to test everything before posting.

          PS: crossed with Clyde's reply, who raised a substantive issue (-drop- meant as -replace-).
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Even better than

            Code:
            drop

            or

            Code:
            replace 
            ​
            is just

            ignore!

            i.e. a separate analysis ignoring observations with negative values; you might well change your mind.

            Comment


            • #7
              Nick:
              paraphrasing a famous UK rock band's hit: "Learning to flag" (and then ignore the flagged observations for future analysis).
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Thanks all for your alternative solutions.

                Carlo: given the issue with -drop- as you pointed out, I have no option then to do it for every variable if I go ahead with that strategy. Thanks!

                Comment


                • #9
                  I have a some categorical variables that contain some negative responses (usually indicative of a non-response) and therefore I would like to remove these from my dataset. After reading this post, I am unsure if the code in #3, #4 or #6 is most suited. Ignore may be a good option, but I'm not sure how to code that.
                  (Please advise if it is ok to continue with an old post of the same topic or if it's best to start a new post).

                  Comment


                  • #10
                    #9 is related to the thread title and to previous questions, which is fine: lapse of time is not an issue, as the FAQ Advice explains at 16.2.

                    Re-opening a thread by yourself or others is always allowed, and encouraged when any one has something relevant to add, say by reporting another solution, an update of a program, or a very similar question. Lapse of time is often not important: for example, it's fine to announce an update of a program in the same thread a few years after the original post. A new post always bumps a thread temporarily to the top of the list, so that additions can be noticed and read in context.
                    As in previous posts, looking for the row minimum across those variables is a good way to find observations with negative values. Thereafter a drop implies confidence that those observations will never be of interest or use; flagging implies not being so confident,

                    Comment


                    • #11
                      Thank you Nick Cox. I think #4 is what I'm after, thanks Clyde Schechter

                      Comment


                      • #12
                        I used the code in #4
                        Code:
                        foreach var of varlist _all {
                            replace `var' = . if `var' < 0
                        }
                        to remove the negative values in my dataset, however, I still see negative data in my data editor. I also see negative values when graphing variables. This makes me think the code I've used is not working. For clairification, I have a panel dataset based on a survey over multiple years. Due to the nature of some questions, their responses have negative values. For example, (-10) non-responding person, (-4) refused/not stated, (-3) don't know. An example of positive responses could for a health question is (1) excellent, (2) very good (3) good (4) fair (5) poor. I want to remove these negative values from my dataset. I also note that I am not writing over the original dataset, I am making a new version of the original dataset so I am happy to drop all negative values.

                        Thank you in advance..

                        Regards,
                        Chris

                        Comment


                        • #13
                          I don't see anything wrong with the code. Please post back with an example of the data that exhibits the problem you are encountering. Be sure to use -dataex- to do that.
                          If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                          I do have one thought: this code is usable only with numeric variables, and you are looping over varlist _all. If the data set contains any string variable, as soon as Stata reaches the first of those, it will declare a type mismatch, throw an error message to that effect, and halt. The rest of the variables will not be processed. So perhaps the negative values you are seeing are in variables that were never reached because of this. But you should have seen an error message alerting you to the type mismatch problem if that is the case and you don't report that as happening.
                          Last edited by Clyde Schechter; 27 Dec 2019, 21:53.

                          Comment


                          • #14
                            Dear Clyde. Thank you for your reply. Below is a small sample of one of the variables I'm using with negative values - the meaning of which is explained in #12. The following example data includes id and responses to a question (based on a Likert scale from 1-10) by the respondent and their partner. This relates to data in year 4 of an annual survey now into its 18th year. The main issue is that when I graph these data the graph includes negative values and I only want to show positive values. I want to use a graph that helps me see a relationship in this data but haven't been successful yet. I'm not sure if it's because of the nature of the data or due to me not using appropriate graphs. Anyway, one problem at a time.

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input float(id imp p_imp)
                            100018  5  1
                            100019  1  5
                            100023 10 10
                            100024 10 10
                            100029  5  1
                            100030  1  5
                            100038  5  0
                            100039  0  5
                            100042  2  4
                            100043  4  2
                            100048 -8 -8
                            100049 -8 -8
                            100052  2  7
                            100053  7  2
                            100055 -8  3
                            100057  3  1
                            end

                            Comment


                            • #15
                              Chris:
                              just to pluck a temptative reply out of thin air:
                              Code:
                              twoway (scatter imp p_imp if p_imp>0, sort)
                              Kind regards,
                              Carlo
                              (Stata 18.0 SE)

                              Comment

                              Working...
                              X