Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ideas for: extracting all unique items, across all observations, from a string variable (itself being a list) - into a separate file?

    Dear statalist

    I am trying to figure out if there is a way to generate a datafile with all unique items (from all observations in the datafile) in a string variable "intersections" as individual observations, for later merging with another dataset.
    I am not interested in saving other parts of this data.

    I am not sure what it is called, so I am unsure what to search for to find examples, but I would greatly appreciate suggestions if you have ideas. I am using stata 17 for pc.
    Thanks!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 source str76 term_name str18 term_id double(adjusted_p_value negative_log10_of_adjusted_p_val) int(term_size query_size intersection_size effective_domain_size) strL intersections
    "GO:BP" "tube morphogenesis"                             "GO:0035239" .00848689240280593 2.07125130379293 764 47 11 17810 "HMOX1,APLN,APOLD1,ANGPTL4,F3,JUNB,GPX1,HK2,ID1,SDC4,AIMP1"
    "GO:BP" "regulation of endothelial cell proliferation"   "GO:0001936"  .0243630191781557 1.61326889292077 110 47  5 17810 "HMOX1,APLN,F3,ATP5IF1,AIMP1"                              
    "GO:BP" "regulation of endothelial cell differentiation" "GO:0045601"  .0423562955869072 1.37308202959653  33 85  4 17810 "APOLD1,S1PR3,ID1,VCL"                                     
    end

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 source str76 term_name str18 term_id double(adjusted_p_value negative_log10_of_adjusted_p_val) int(term_size query_size intersection_size effective_domain_size) strL intersections
    "GO:BP" "tube morphogenesis"                             "GO:0035239" .00848689240280593 2.07125130379293 764 47 11 17810 "HMOX1,APLN,APOLD1,ANGPTL4,F3,JUNB,GPX1,HK2,ID1,SDC4,AIMP1"
    "GO:BP" "regulation of endothelial cell proliferation"   "GO:0001936"  .0243630191781557 1.61326889292077 110 47  5 17810 "HMOX1,APLN,F3,ATP5IF1,AIMP1"                              
    "GO:BP" "regulation of endothelial cell differentiation" "GO:0045601"  .0423562955869072 1.37308202959653  33 85  4 17810 "APOLD1,S1PR3,ID1,VCL"                                     
    end
    
    gen toexpand= length(intersections) - length(subinstr(intersections, ",", "", .)) + 1
    expand toexpand
    bys term_id: gen wanted= word(subinstr(intersections, ",", " ", .), _n)
    contract wanted
    Res.:

    Code:
    . l, sep(0)
    
         +-----------------+
         |  wanted   _freq |
         |-----------------|
      1. |   AIMP1       2 |
      2. | ANGPTL4       1 |
      3. |    APLN       2 |
      4. |  APOLD1       2 |
      5. | ATP5IF1       1 |
      6. |      F3       2 |
      7. |    GPX1       1 |
      8. |     HK2       1 |
      9. |   HMOX1       2 |
     10. |     ID1       2 |
     11. |    JUNB       1 |
     12. |   S1PR3       1 |
     13. |    SDC4       1 |
     14. |     VCL       1 |
         +-----------------+

    Comment


    • #3
      Andrew Musau THANK YOU!
      You just saved me several hours of work, thank you so much!!!

      Comment

      Working...
      X