Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • identify relations between different sectors within the same id

    Hello,

    here it is the example of my data (I use stata12), the real sample contains more than 60000 observations.

    * Example generated by -dataex-. To install: ssc install dataex

    clear

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str40 name long appln_id str14 sector
    "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR"
    "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR"
    "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR"
    "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR"
    "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR"
    "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR"
    "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR"
    "PRAYON" 423510368 "COMPANY"
    "PRAYON" 423510368 "COMPANY"
    "PRAYON" 423510368 "COMPANY"
    "VAN GENECHTEN PACKAGING" 423597672 "COMPANY"
    "VAN GENECHTEN PACKAGING" 423597672 "COMPANY"
    "VAN GENECHTEN PACKAGING" 423597674 "COMPANY"
    "VAN GENECHTEN PACKAGING" 423597674 "COMPANY"
    "BOSCH " 423598235 "COMPANY"
    "BOSCH " 423598235 "COMPANY"
    "EXXONMOBIL CHEMICAL" 423610441 "COMPANY"
    "EXXONMOBIL CHEMICAL" 423610441 "COMPANY"
    "GILEAD" 420926514 "UNIVERSITY"
    "KATHOLIEKE UNIVERSITEIT LEUVEN" 420926514 "COMPANY"
    end


    Many observations of my data have the same sector within their appln_id, but some of them have different sectors within one appln_id. I need to identify all appln_id (id) with different sectors within the same appln_id (e.g. appln_id = 420926514 has "UNIVERSITY" and "COMPANY" sectors).

  • #2
    Palina:
    you may want to try:
    Code:
    egen flag=group(name appln_id sector)
    bysort appln_id : list name appln_id sector flag if flag[_n]!=flag[_N]
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      thank you very much for your answer.

      here it is the outcome, where its hard to identify which one appln_id is different. Or maybe i dont know how to see it?

      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str40 name long appln_id str14 sectoras float flag
      "GILEAD" 420926514 "UNIVERSITY" 3
      "KATHOLIEKE UNIVERSITEIT LEUVEN" 420926514 "COMPANY" 4
      "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR" 5
      "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR" 5
      "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR" 5
      "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR" 5
      "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR" 5
      "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR" 5
      "PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU" 423510260 "UNIVERSITY_FOR" 5
      "PRAYON" 423510368 "COMPANY" 6
      "PRAYON" 423510368 "COMPANY" 6
      "PRAYON" 423510368 "COMPANY" 6
      "VAN GENECHTEN PACKAGING" 423597672 "COMPANY" 7
      "VAN GENECHTEN PACKAGING" 423597672 "COMPANY" 7
      "VAN GENECHTEN PACKAGING" 423597674 "COMPANY" 8
      "VAN GENECHTEN PACKAGING" 423597674 "COMPANY" 8
      "BOSCH " 423598235 "COMPANY" 1
      "BOSCH " 423598235 "COMPANY" 1
      "EXXONMOBIL CHEMICAL" 423610441 "COMPANY" 2
      "EXXONMOBIL CHEMICAL" 423610441 "COMPANY" 2
      end

      Comment


      • #4
        Palina;
        admittedly, I forgot the last code line in my previous reply. Sorry for this:
        Code:
        bysort appln_id: list name appln_id sectoras flag if flag[1]!=flag[_N]
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,
          indeed, i see the outcome with combinations, look like this:

          -> appln_id = 4.2e+08
          name appln_id sectoras flag |
          |---------------------------------------------------------------|
          1. | GILEAD 4.2e+08 UNIVERSITY 3 |
          2. | KATHOLIEKE UNIVERSITEIT LEUVEN 4.2e+08 COMPANY 4 |
          +---------------------------------------------------------------+


          But, when i have in my data set more then 60000 obs, with your code probably i ll have many like this appln_id. Should i identify them one by one or maybe i can have a list of flag observations somehow in stata?

          Best,
          Palina

          Comment


          • #6
            Palina:
            you may want to try:
            Code:
            bysort appln_id: g check=1 if flag[1]!=flag[_N]
            replace check=0 if check==.
            list appln_id sectoras flag check if check==1
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Carlo,

              thank you VERY MUCH, its working perfectly

              Best,
              Palina

              Comment


              • #8
                I like Carlo's approach very much, but I think it can be slimmed down a bit:

                Code:
                 
                 bysort appln_id (flag): g check = flag[1] != flag[_N]  list appln_id sectoras if check
                The main principle here is that 1 or 0 variables can be created directly http://www.stata.com/support/faqs/da...rue-and-false/

                See also http://www.stata.com/support/faqs/da...ions-in-group/

                Comment

                Working...
                X