Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • identify peers

    Dear All, I found this question here (http://bbs.pinggu.org/forum.php?mod=...=1#pid54489763). The data is,
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1
    2 "2008-06-25" "A" ""           "BB" "MIT" 
    2 "2008-06-25" "B" "Harvard"    "AA" "UCLA"
    2 "2008-06-25" "C" ""           "AA" "UCLA"
    2 "2008-06-25" "D" ""           "AA" "UCLA"
    2 "2008-06-25" "E" ""           "AA" "UCLA"
    2 "2008-06-25" "F" "MIT"        "AA" "UCLA"
    2 "2008-06-25" "G" ""           "AA" "UCLA"
    3 "2008-07-27" "A" ""           "BB" "MIT" 
    3 "2008-07-27" "B" "Boston"     "AA" "UCLA"
    3 "2008-07-27" "C" ""           "AA" "UCLA"
    3 "2008-07-27" "D" ""           "AA" "UCLA"
    3 "2008-07-27" "E" ""           "AA" "UCLA"
    3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA"
    3 "2008-07-27" "G" ""           "AA" "UCLA"
    end
    For each `stkcd', if any element in `university' also appears in `graduate1', then return 1 for this `stkcd', 0 otherwise. Thus, for stkcd=2, the returned value is 1, and for stkcd=3, the returned value is 0. Any suggestion is highly appreciated.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    I change your data example somewhat to incorporate more than 2 groups.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1
    2 "2008-06-25" "A" ""           "BB" "MIT"
    2 "2008-06-25" "B" "Harvard"    "AA" "UCLA"
    2 "2008-06-25" "C" ""           "AA" "UCLA"
    2 "2008-06-25" "D" ""           "AA" "UCLA"
    2 "2008-06-25" "E" ""           "AA" "UCLA"
    2 "2008-06-25" "F" "MIT"        "AA" "UCLA"
    2 "2008-06-25" "G" "UCLA"           "AA" "UCLA"
    3 "2008-07-27" "A" ""           "BB" "MIT"
    3 "2008-07-27" "B" "Boston"     "AA" "UCLA"
    3 "2008-07-27" "C" ""           "AA" "UCLA"
    3 "2008-07-27" "D" ""           "AA" "UCLA"
    3 "2008-07-27" "E" ""           "AA" "UCLA"
    3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA"
    3 "2008-07-27" "G" ""           "AA" "UCLA"
    4 "2008-07-27" "A" ""           "BB" "MIT"
    4 "2008-07-27" "B" "Harvard"    "AA" "UCLA"
    4 "2008-07-27" "C" ""           "AA" "UCLA"
    4 "2008-07-27" "D" ""           "AA" "UCLA"
    4 "2008-07-27" "E" ""           "AA" "UCLA"
    4 "2008-07-27" "F" "MIT"       "AA" "UCLA"
    4 "2008-07-27" "G" ""           "AA" "UCLA"
    4 ""           ""  ""           ""   ""    
    end
    
    levelsof stkcd, local(id)
    foreach i in `id'{ 
    levelsof university if stkcd==`i', local(uid`i')
    levelsof graduate1 if stkcd==`i', local(gid`i')
    local diff`i': list uid`i' - gid`i'
    local present`i': list uid`i'- diff`i'
    local p`i': subinstr local present`i' `" "' `", "', all
    local n`i': word count `p`i''
    }
    
    foreach i in `id'{
    if `n`i''>0{
    local wanted "`wanted' `i'"
    local wanted: subinstr local wanted " " ",", all
    }
    }
    gen varwanted= inlist(stkcd, 0.1`wanted')
    list, clean
    The resulting output:

    Code:
    . list, clean
    
           stkcd       reptdt   name   university   invest~1   gradua~1   varwan~d  
      1.       2   2008-06-25      A                      BB        MIT          1  
      2.       2   2008-06-25      B      Harvard         AA       UCLA          1  
      3.       2   2008-06-25      C                      AA       UCLA          1  
      4.       2   2008-06-25      D                      AA       UCLA          1  
      5.       2   2008-06-25      E                      AA       UCLA          1  
      6.       2   2008-06-25      F          MIT         AA       UCLA          1  
      7.       2   2008-06-25      G         UCLA         AA       UCLA          1  
      8.       3   2008-07-27      A                      BB        MIT          0  
      9.       3   2008-07-27      B       Boston         AA       UCLA          0  
     10.       3   2008-07-27      C                      AA       UCLA          0  
     11.       3   2008-07-27      D                      AA       UCLA          0  
     12.       3   2008-07-27      E                      AA       UCLA          0  
     13.       3   2008-07-27      F   Vanderbilt         AA       UCLA          0  
     14.       3   2008-07-27      G                      AA       UCLA          0  
     15.       4   2008-07-27      A                      BB        MIT          1  
     16.       4   2008-07-27      B      Harvard         AA       UCLA          1  
     17.       4   2008-07-27      C                      AA       UCLA          1  
     18.       4   2008-07-27      D                      AA       UCLA          1  
     19.       4   2008-07-27      E                      AA       UCLA          1  
     20.       4   2008-07-27      F          MIT         AA       UCLA          1  
     21.       4   2008-07-27      G                      AA       UCLA          1  
     22.       4                                                                 1
    Last edited by Andrew Musau; 03 Nov 2018, 07:31.

    Comment


    • #3
      Dear Andrew, Thanks a lot for this helpful suggestion.
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        A similar issue has been discussed in this thread. And again, I found that: for this kind of issue, -inlist-,while being an interesting choice, seems a little bit complicated, and ineffective with its limitation (in the number of values). Meanwhile, -expand- contributes an easier solution, without such limitation.
        Code:
        expand 2, gen(ex)
        replace graduate1 = university if ex
        bys stkcd graduate1 (ex): gen tag = (ex[1] != ex[_N]) if !missing(graduate1)
        by stkcd: egen wanted = max(tag)
        drop if ex
        drop ex tag

        Comment


        • #5
          Dear Romalpa, Thank you so much for this concise solution.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            Dear all,

            I've got a quite similar question.

            Based on the suggested code by Romalpa Akzo, I have got the following output:

            Code:
            clear
            input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1 int wanted
            2 "2008-06-25" "A" ""           "BB" "MIT" 1
            2 "2008-06-25" "B" "Harvard"    "AA" "UCLA" 1
            2 "2008-06-25" "C" ""           "AA" "UCLA" 1
            2 "2008-06-25" "D" ""           "AA" "UCLA" 1
            2 "2008-06-25" "E" ""           "AA" "UCLA" 1
            2 "2008-06-25" "F" "MIT"        "AA" "UCLA" 1
            2 "2008-06-25" "G" ""           "AA" "UCLA" 1 
            3 "2008-07-27" "A" ""           "BB" "MIT" 0 
            3 "2008-07-27" "B" "Boston"     "AA" "UCLA" 0 
            3 "2008-07-27" "C" ""           "AA" "UCLA" 0 
            3 "2008-07-27" "D" ""           "AA" "UCLA" 0
            3 "2008-07-27" "E" ""           "AA" "UCLA" 0
            3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA" 0
            3 "2008-07-27" "G" ""           "AA" "UCLA" 0
            end
            I just wonder, in this example, instead of returning the value of 1 for all observations within the same stkcd if any element in graduate1 also appears in university, is there any way to return the value of 1 for only the particular observation in graduate1 that appears in university?

            River Huang: I'm sorry for taking over your thread.

            Thank you very much for your help.

            Vinh

            Comment


            • #7
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1
              2 "2008-06-25" "A" ""           "BB" "MIT"
              2 "2008-06-25" "B" "Harvard"    "DD" "Stanford"
              2 "2008-06-25" "C" ""           "AA" "Boston"
              2 "2008-06-25" "D" ""           "AA" "UCLA"
              2 "2008-06-25" "E" ""           "CC" "UCSD"
              2 "2008-06-25" "F" "MIT"        "EE" "Yale"
              2 "2008-06-25" "G" "UCLA"       "AA" "UCLA"
              3 "2008-07-27" "A" ""           "BB" "MIT"
              3 "2008-07-27" "B" "Boston"     "AA" "UCLA"
              3 "2008-07-27" "C" ""           "AA" "UCLA"
              3 "2008-07-27" "D" ""           "AA" "UCLA"
              3 "2008-07-27" "E" ""           "AA" "UCLA"
              3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA"
              3 "2008-07-27" "G" ""           "AA" "UCLA"
              4 "2008-07-27" "A" ""           "BB" "MIT"
              4 "2008-07-27" "B" "Harvard"    "AA" "UCLA"
              4 "2008-07-27" "C" ""           "AA" "UCLA"
              4 "2008-07-27" "D" ""           "AA" "UCLA"
              4 "2008-07-27" "E" ""           "AA" "UCLA"
              4 "2008-07-27" "F" "MIT"        "AA" "UCLA"
              4 "2008-07-27" "G" ""           "AA" "UCLA"
              end
              
              expand 2, gen(ex)
              replace graduate1 = university if ex
              bys stkcd: gen tag= university==graduate1 & !missing(graduate1) & ex
              bys stkcd graduate1: egen wanted= max(tag)
              drop if ex
              drop ex tag
              Result:

              Code:
              . l, sepby(stkcd)
              
                   +-----------------------------------------------------------------------+
                   | stkcd       reptdt   name   university   invest~1   gradua~1   wanted |
                   |-----------------------------------------------------------------------|
                1. |     2   2008-06-25      C                      AA     Boston        0 |
                2. |     2   2008-06-25      A                      BB        MIT        1 |
                3. |     2   2008-06-25      B      Harvard         DD   Stanford        0 |
                4. |     2   2008-06-25      G         UCLA         AA       UCLA        1 |
                5. |     2   2008-06-25      D                      AA       UCLA        1 |
                6. |     2   2008-06-25      E                      CC       UCSD        0 |
                7. |     2   2008-06-25      F          MIT         EE       Yale        0 |
                   |-----------------------------------------------------------------------|
                8. |     3   2008-07-27      A                      BB        MIT        0 |
                9. |     3   2008-07-27      B       Boston         AA       UCLA        0 |
               10. |     3   2008-07-27      F   Vanderbilt         AA       UCLA        0 |
               11. |     3   2008-07-27      D                      AA       UCLA        0 |
               12. |     3   2008-07-27      G                      AA       UCLA        0 |
               13. |     3   2008-07-27      E                      AA       UCLA        0 |
               14. |     3   2008-07-27      C                      AA       UCLA        0 |
                   |-----------------------------------------------------------------------|
               15. |     4   2008-07-27      A                      BB        MIT        1 |
               16. |     4   2008-07-27      G                      AA       UCLA        0 |
               17. |     4   2008-07-27      B      Harvard         AA       UCLA        0 |
               18. |     4   2008-07-27      C                      AA       UCLA        0 |
               19. |     4   2008-07-27      D                      AA       UCLA        0 |
               20. |     4   2008-07-27      E                      AA       UCLA        0 |
               21. |     4   2008-07-27      F          MIT         AA       UCLA        0 |
                   +-----------------------------------------------------------------------+
              
              .

              Comment


              • #8
                Andrew's code has captured what Vinh needs. I just have a small contribution: The line generating tag is not necessary. Instead, we could just do
                Code:
                bys stkcd graduate1: egen wanted = max(ex) if !missing(graduate1)
                As it might be relevant, I would also note that indeed, the info (that Vinh needs) has been available in the code suggested at #4. Specifically, this wanted info happens to be found in the variable tag of that code.
                Code:
                expand 2, gen(ex)
                replace graduate1 = university if ex
                
                bys stkcd graduate1 (ex): gen tag = (ex[1] != ex[_N]) if !missing(graduate1)
                * or you can use the above mentioned blue line
                
                drop if ex
                drop ex

                Comment


                • #9
                  Dear Vinh, It's fine with me. And hope your problem has been solved.



                  Ho-Chuan (River) Huang
                  Stata 19.0, MP(4)

                  Comment


                  • #10
                    Dear Andrew and Akzo,

                    Thank you very much for your help. That's exactly what I'd like to have.

                    Cheers,

                    Vinh

                    Comment

                    Working...
                    X