Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations corresponding to other variable

    I have two variables, Names1 and Names2, both consisting of certain names. A simplified e.g. is pasted below:
    Names1 Names2
    AA CC
    BB AA
    CC HH
    DD FF
    EE GG
    FF
    GG
    HH

    I would like to drop the names in Names1 which are not in Names2. So in this e.g. I want to drop BB, DD, EE. Would appreciate if you someone could provide guidance on doing so.

  • #2
    I'm not sure exactly what you want to do here. It is not possible to simply "drop" the name in Names1. You can replace the name with missing value, or you can drop the entire observation. The code below assumes you want the latter.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2(names1 names2)
    "AA" "CC"
    "BB" "AA"
    "CC" "HH"
    "DD" "FF"
    "EE" "GG"
    "FF" ""  
    "GG" ""  
    "HH" ""  
    end
    
    preserve
    keep names2
    drop if missing(names2)
    rename names2 names1
    duplicates drop
    tempfile keepers
    save `keepers'
    
    restore
    merge m:1 names1 using `keepers', keep(match)

    Comment


    • #3
      Thank You for the code. However it does not quiet perform the required task (it drops observations from the second column as well). Let me re-phrase the problem again. I have two groups of companies, e.g. given below:
      Group1 Group2
      Apple Microsoft
      DELL HP
      Microsoft Nike
      Nike Shell
      Addidas Google
      Kellogs Addidas
      HP
      Shell
      Google
      I only want to retain companies in Group 1 that are in Group 2 i.e. I want to drop companies; Apple, DELL and Kellogs from Group1 only since they are not in group 2 (don't want to change any companies in Group2). Would appreciate help/code for doing such filtering.

      Comment


      • #4
        OK, we had a miscommunication because you are using the word -drop- to mean something different from what -drop- means in Stata. In Stata, when you -drop- something you eliminate the entire observation. It sounds like what you want is to simply replace those with missing values.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str2(names1 names2)
        "AA" "CC"
        "BB" "AA"
        "CC" "HH"
        "DD" "FF"
        "EE" "GG"
        "FF" ""  
        "GG" ""  
        "HH" ""  
        end
        
        preserve
        keep names2
        drop if missing(names2)
        rename names2 names1
        duplicates drop
        tempfile keepers
        save `keepers'
        
        restore
        merge m:1 names1 using `keepers'
        replace names1 = "" if _merge != 3
        list, noobs clean

        Comment

        Working...
        X