Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations if value of one variable is found in another variable of any observation in the dataset

    Dear Statalist,

    I have a dataset containing the information about the consolidated accounts of different firms. Each company is assigned a unique Company-ID in a string variable. In a second string variable, the owner of the respective firm is listed with his Company-ID. I would like to drop all observations, whose Owner-ID is found as the Company ID in another observation. I tried to find a way to do that with the help of "if", however, I didn't find a way to compare two variables across observations. Any help on this is highly appreciated.

    Example:
    Company-ID Revenue Owner-ID
    A1 10 D4
    B2 11 E5
    C3 12 A1
    In the example, the third observation should be dropped, because the owner-ID equals the Company-ID of the first observation.

    Many thanks in advance
    Steven
    Last edited by Steven Buck; 15 Oct 2018, 05:07.

  • #2
    Please use dataex in the future to present data examples. This is essentially the same problem as the one in a recent thread

    https://www.statalist.org/forums/for...in-the-dataset

    Here is how to solve the problem using Nick Cox's suggestion #2 from the thread. For other approaches, refer to the link.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5(CompanyID Revenue OwnerID)
    "A1" "10" "D4"
    "B2" "11" "E5"
    "C3" "12" "A1"
    end
    
    levelsof CompanyID, local(cid)
    levelsof OwnerID, local(oid)
    local diff: list cid - oid
    local todrop: list cid- diff
    local todrop2 : subinstr local todrop " " ",,, all
    drop if inlist( OwnerID, `todrop2')
    Result
    Code:
    . l
    
         +------------------------------+
         | Compan~D   Revenue   OwnerID |
         |------------------------------|
      1. |       A1        10        D4 |
      2. |       B2        11        E5 |
         +------------------------------+
    Added in edit: Note that inlist() accepts up to eight arguments, so for a larger list, you will need the "|" separator.
    Last edited by Andrew Musau; 15 Oct 2018, 05:55.

    Comment


    • #3
      Thanks for your swift response, Andrew. Unfortunately, when using the levelsof command, I receive the error that "macro substitution results in line that is too long". Your suggested method in the referenced thread, however, worked for me. Thank you very much for your help!

      Comment


      • #4
        Glad to hear that you were able to solve your problem. There is a correction to the code in #2. The only reason that the following does not throw in an error is because there is only one element in the macro

        local todrop2 : subinstr local todrop " " ",,, all
        I will expand the data example to include more than 1 element

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str5(CompanyID Revenue OwnerID)
        "A1" "10" "D4"
        "B2" "11" "E5"
        "C3" "12" "A1"
        "A1" "10" "B2"
        end
        levelsof CompanyID, local(cid)
        levelsof OwnerID, local(oid)
        local diff: list cid - oid
        local todrop: list cid- diff
        di `"`todrop'"'
        Displaying contents of the local macro

        Code:
        . di `"`todrop'"'
        `"A1"' `"B2"'
        reveals that I need the following to separate elements by a comma

        Code:
        local todrop: subinstr local todrop `" "' `", "', all
        di `"`todrop'"'
        resulting in

        Code:
        . di `"`todrop'"'
        `"A1"', `"B2"'
        So finally, the following should work

        Code:
        drop if inlist(OwnerID, `todrop')

        Comment


        • #5
          Utilizing -levelsof- as in Andrew’s solution is very interesting, but going along with the -inlist-'s limitation, which is quite vulnerable to be touched for this kind of issue. Below is a concise way, which is safe from such limitation.
          Code:
          expand 2, gen(ex)
          replace OwnerID = CompanyID if ex
          bys OwnerID (ex): drop if ex[_N]
          drop ex
          Then, Steve, what is your solution? Could you please share it out here? (By the way, for the future post, please use - dataex- to give a small sample of your data. Read more on FAQ, section 12.2.)

          Comment

          Working...
          X