Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to drop observation if var x == any value of var y

    Dear All,
    suppose
    Code:
    clear
    input id var1 var2
           1   12    
           2   c5    
           3   14   12
           4   12  
           5   oi
    I want the first and the fourth observation to be dropped, because their var1 value appears somewhere in the var2 column. var1, var2 are string variables. Somewhat unusual problem I guess, but I hope someone can help. Of course the data is much larger, in that I cannot just use drop if var1 == 12 or something. However any var2 value necessarily appears in var1 column somewhere (where it should be dropped).

    Thanks in advance!

    Best,


    Last edited by Jannic Cutura; 30 Jan 2017, 06:36.

  • #2
    (Please note that your example data suffers from various problems.)

    That being said: If the number of possible values of var2 is relatively small, less than or equal to 10, -inlist()- could accommodate the entire list of values from var2.
    Code:
    levelsof var2, local(vlist)
    local vlist = subinstr("`vlist'", " ", ",", .)
    gen byte to_drop = inlist(var1, `vlist')
    If you have a larger number of values for var2, looping over all the possible values of var2 would work:
    Code:
    levelsof var2, local(vlist)
    gen byte to_drop = 0
    foreach v of local vlist {
       replace to_drop = (var1 == "`v'") if !to_drop
    }

    Comment


    • #3
      Dear Mike,
      thanks!

      Indeed there is way more than 10 different values for var2. Your procedure works just fine though!

      Case solved

      Comment


      • #4
        Mike Lacy One more question actually. Is it possible to perform your operation "by date"? Suppose there is a third variable, date, and var1 uniquely identifies an observation for a given date. However not across dates. Can I tell Stata to look for var1 values in var2 column but only for observations of the same date?
        Thanks again for your help!


        Comment


        • #5
          As the problem gets more complicated, the more it looks like a problem for merge, i.e. you should merge the dataset with a copy of itself and merge on the identical values and identical dates.

          Comment


          • #6
            Mike Lacy One more question actually. Is it possible to perform your operation "by date"? Suppose there is a third variable, date, and var1 uniquely identifies an observation for a given date. However not across dates. Can I tell Stata to look for var1 values in var2 column but only for observations of the same date?
            Basically:


            Code:
            levelsof date, local(datelist)
            for each date of local datelist {
                if `date' == date {
                levelsof var2, local(vlist)
                gen byte to_drop = 0
                foreach v of local vlist {
                   replace to_drop = (var1 == "`v'") if !to_drop
                }
            }
            but this doesn't work:
            Code:
            macro length exceeded
            and probably would not have worked if the macro length wasn't exceeded x)



            Thanks again for your help!

            Comment


            • #7
              Nick Cox
              yes this sounds very interesting. Basically I am trying to translate a SAS clean procedure into Stata. I am not experienced with SAS at all, but from what I can tell they use sql script and create two copies of the data and then have a smart way of matching the correct things.

              Nick, are you aware of documentation that discusses how merge can be used in cases as the one that I describe? Like the deleting duplicates material you pointed me to?

              Thanks in advance!

              Best,

              Comment


              • #8
                Your code in #6 is not going to work for other reasons too.

                1. for each is illegal. Should be foreach.

                2. Your if statement will not behave differently each time round the loop. See also

                FAQ . . . . . . . . . . . . . . . . . . . . if command versus if qualifier
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Wernow
                4/05 I have an if or while command in my program that
                only seems to evaluate the first observation.
                What's going on?
                http://www.stata.com/support/faqs/programming/
                if-command-versus-if-qualifier/

                3. Second time around the loop, generate will fail as the variable already exists.

                See also

                FAQ . . . . . . . . . . Making foreach go through all values of a variable
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
                8/05 Is there a way to tell Stata to try all values of a
                particular variable in a foreach statement without
                specifying them?
                http://www.stata.com/support/faqs/data-management/
                try-all-values-with-foreach/

                http://www.stata.com/support/faqs/da...ach/index.html

                This is likely to be slow. Code not tested.

                Code:
                gen byte to_drop = 0
                egen gdate = group(date)
                su gdate, meanonly
                
                forval d = 1/`r(max)' {      
                     levelsof var2 if gdate == `d', local(vlist)      
                     foreach v of local vlist {        
                            replace to_drop = (var1 == "`v'") if gdate == `d' & !to_drop    
                     }
                }
                I'd see if I can find time (meetings coming up) to suggest code for a merge approach.

                Comment


                • #9
                  Nick Cox yes this works, but you are right, it would run for years I am afriad! I will try to implement the merge thing you mentioned.

                  Comment


                  • #10
                    Those with same id numbers are categorized into ranks and I want to keep percentage1 com11 pet1 when rank is 1 and so forth but after the first code Stata drops all rank which is not 1. Can someone please guide me. Thank You
                    if rank == 1 keep percentage1 com11 pet1
                    Click image for larger version

Name:	var.png
Views:	2
Size:	33.8 KB
ID:	1702283

                    if rank == 2 keep percentage2 com12 pet2
                    if rank == 3 keep percenatge3 com13 pet3


                    Comment


                    • #11
                      You have two problems here. The first is that you have the syntax of -if- qualifiers wrong. But, more important, what you are trying to do is not possible at all. When you -keep- or -drop- a list of variables, the variables are affected in the entirety, and there is no way to restrict that to only some observations. Another way of saying it is that Stata data sets are rectangular in shape: a variable (column) is either wholly present or wholly absent. There are no partial "columns" in a Stata data set.

                      What you can do is replace some of the values of a variable with missing values.

                      It sounds like what you want to do is:
                      Code:
                      forvalues i = 1/3 {
                          foreach x in percentage com1 pet {
                              replace `x'`i' = . if rank == `i'
                          }
                      }
                      Notice, by the way, that the -if- clause must come after the command to work in this way. (There is also a legal Stata construction where -if- precedes a command, but it does something entirely different: it does not select the observations to which the command will apply.)

                      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                      Last edited by Clyde Schechter; 17 Feb 2023, 22:21.

                      Comment

                      Working...
                      X