Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sort row with string variables?

    Dear All, Is it possible to sort "string" variable? Suppose that the data set is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str54 temp
    "2089525.IB,2089526.IB,2089527.IB"                      
    "2089502.IB,2089503.IB,2089504.IB"                      
    "2089511.IB,2089512.IB,2089513.IB"                      
    "2089499.IB,2089500.IB,2089501.IB"                      
    "2089516.IB,2089517.IB,2089518.IB"                      
    "2089481.IB,2089482.IB,2089483.IB"                      
    "2089478.IB,2089479.IB,2089480.IB"                      
    "2089442.IB,2089443.IB,2089444.IB"                      
    "2089494.IB,2089495.IB,2089496.IB"                      
    "2089467.IB,2089468.IB,2089469.IB"                      
    "2089439.IB,2089440.IB,2089441.IB"                      
    "2089455.IB,2089456.IB,2089457.IB"                      
    "2089464.IB,2089465.IB,2089466.IB"                      
    "2089489.IB,2089490.IB,2089491.IB,2089492.IB,2089493.IB"
    "2089387.IB,2089388.IB,2089389.IB"                      
    "2089408.IB,2089409.IB,2089410.IB,2089411.IB"           
    "2089445.IB,2089446.IB,2089447.IB,2089448.IB"           
    "2089430.IB,2089431.IB,2089432.IB"                      
    "2089355.IB,2089356.IB,2089357.IB"                      
    "2089412.IB,2089413.IB,2089414.IB"                      
    end
    
    split temp, p(",")
    drop temp
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str54 temp
    "2089525.IB,2089526.IB,2089527.IB"                      
    "2089502.IB,2089503.IB,2089504.IB"                      
    "2089511.IB,2089512.IB,2089513.IB"                      
    "2089499.IB,2089500.IB,2089501.IB"                      
    "2089516.IB,2089517.IB,2089518.IB"                      
    "2089481.IB,2089482.IB,2089483.IB"                      
    "2089478.IB,2089479.IB,2089480.IB"                      
    "2089442.IB,2089443.IB,2089444.IB"                      
    "2089494.IB,2089495.IB,2089496.IB"                      
    "2089467.IB,2089468.IB,2089469.IB"                      
    "2089439.IB,2089440.IB,2089441.IB"                      
    "2089455.IB,2089456.IB,2089457.IB"                      
    "2089464.IB,2089465.IB,2089466.IB"                      
    "2089489.IB,2089490.IB,2089491.IB,2089492.IB,2089493.IB"
    "2089387.IB,2089388.IB,2089389.IB"                      
    "2089408.IB,2089409.IB,2089410.IB,2089411.IB"           
    "2089445.IB,2089446.IB,2089447.IB,2089448.IB"           
    "2089430.IB,2089431.IB,2089432.IB"                      
    "2089355.IB,2089356.IB,2089357.IB"                      
    "2089412.IB,2089413.IB,2089414.IB"                      
    end
    
    split temp, p(,) ignore(.IB) destring g(val)
    reshape long val, i(temp)
    drop if missing(val)
    sort val
    gen wanted= string(val)+".IB"
    keep wanted

    Res.:

    Code:
    . l, sep(0)
    
         +------------+
         |     wanted |
         |------------|
      1. | 2089355.IB |
      2. | 2089356.IB |
      3. | 2089357.IB |
      4. | 2089387.IB |
      5. | 2089388.IB |
      6. | 2089389.IB |
      7. | 2089408.IB |
      8. | 2089409.IB |
      9. | 2089410.IB |
     10. | 2089411.IB |
     11. | 2089412.IB |
     12. | 2089413.IB |
     13. | 2089414.IB |
     14. | 2089430.IB |
     15. | 2089431.IB |
     16. | 2089432.IB |
     17. | 2089439.IB |
     18. | 2089440.IB |
     19. | 2089441.IB |
     20. | 2089442.IB |
     21. | 2089443.IB |
     22. | 2089444.IB |
     23. | 2089445.IB |
     24. | 2089446.IB |
     25. | 2089447.IB |
     26. | 2089448.IB |
     27. | 2089455.IB |
     28. | 2089456.IB |
     29. | 2089457.IB |
     30. | 2089464.IB |
     31. | 2089465.IB |
     32. | 2089466.IB |
     33. | 2089467.IB |
     34. | 2089468.IB |
     35. | 2089469.IB |
     36. | 2089478.IB |
     37. | 2089479.IB |
     38. | 2089480.IB |
     39. | 2089481.IB |
     40. | 2089482.IB |
     41. | 2089483.IB |
     42. | 2089489.IB |
     43. | 2089490.IB |
     44. | 2089491.IB |
     45. | 2089492.IB |
     46. | 2089493.IB |
     47. | 2089494.IB |
     48. | 2089495.IB |
     49. | 2089496.IB |
     50. | 2089499.IB |
     51. | 2089500.IB |
     52. | 2089501.IB |
     53. | 2089502.IB |
     54. | 2089503.IB |
     55. | 2089504.IB |
     56. | 2089511.IB |
     57. | 2089512.IB |
     58. | 2089513.IB |
     59. | 2089516.IB |
     60. | 2089517.IB |
     61. | 2089518.IB |
     62. | 2089525.IB |
     63. | 2089526.IB |
     64. | 2089527.IB |
         +------------+
    
    .

    Comment


    • #3
      The data example in #1 is fine but it's no disrespect to point out that it's hard to spot quickly whether a solution gives the right answer. I worked on the basis that strings are comma-separated, do not contain spaces otherwise, and may vary in number of substrings.

      My guess jumps the other way from Andrew Musau, but just possibly this solution that keeps the wide layout is of interest too.

      Programmers paid according to the reciprocal of the number of lines they write can amuse themselves by telescoping the Mata code into fewer lines.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str54 temp
      "A, C, B"
      "D, C, A"
      "E, B, C, A, F"
      end
      
      
      mata 
      data = st_sdata(., "temp")
      
      for(i = 1; i <= rows(data); i++) { 
          work = data[i]
          work = subinstr(work, ",", " ", .)
          work = tokens(work)'
          work = sort(work, 1)'
          data[i] = invtokens(work)  
      }
      
      st_addvar("str54", "wanted")
      st_sstore(., "wanted", data)
      
      end 
      
      split wanted 
      l wanted? 
      
           +-------------------------------------------------+
           | wanted1   wanted2   wanted3   wanted4   wanted5 |
           |-------------------------------------------------|
        1. |       A         B         C                     |
        2. |       A         C         D                     |
        3. |       A         B         C         E         F |
           +-------------------------------------------------+

      Comment


      • #4
        In #2, I threw away some information as it seems that the elements in each observation in #1 are already sorted. If the wide layout is required, then one can also use the -j()- variable created by reshape.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str54 temp
        "2089525.IB,2089526.IB,2089527.IB"                      
        "2089502.IB,2089503.IB,2089504.IB"                      
        "2089511.IB,2089512.IB,2089513.IB"                      
        "2089499.IB,2089500.IB,2089501.IB"                      
        "2089516.IB,2089517.IB,2089518.IB"                      
        "2089481.IB,2089482.IB,2089483.IB"                      
        "2089478.IB,2089479.IB,2089480.IB"                      
        "2089442.IB,2089443.IB,2089444.IB"                      
        "2089494.IB,2089495.IB,2089496.IB"                      
        "2089467.IB,2089468.IB,2089469.IB"                      
        "2089439.IB,2089440.IB,2089441.IB"                      
        "2089455.IB,2089456.IB,2089457.IB"                      
        "2089464.IB,2089465.IB,2089466.IB"                      
        "2089489.IB,2089490.IB,2089491.IB,2089492.IB,2089493.IB"
        "2089387.IB,2089388.IB,2089389.IB"                      
        "2089408.IB,2089409.IB,2089410.IB,2089411.IB"          
        "2089445.IB,2089446.IB,2089447.IB,2089448.IB"          
        "2089430.IB,2089431.IB,2089432.IB"                      
        "2089355.IB,2089356.IB,2089357.IB"                      
        "2089412.IB,2089413.IB,2089414.IB"                      
        end
        gen id=_n
        split temp, p(,) ignore(.IB) destring g(val)
        reshape long val, i(id) j(which)
        bys id (val): replace which=_n
        drop if missing(val)
        gen strval= string(val)+".IB"
        bys id (which): gen wanted= strval if _n==1
        bys id (which): replace wanted=wanted[_n-1]+ "," + strval if _n>1
        bys id (which): keep if _n==_N
        keep temp wanted
        Res.:

        Code:
        . l
        
             +-----------------------------------------------------------------------------------------------------------------+
             |                                                   temp                                                   wanted |
             |-----------------------------------------------------------------------------------------------------------------|
          1. |                       2089525.IB,2089526.IB,2089527.IB                         2089525.IB,2089526.IB,2089527.IB |
          2. |                       2089502.IB,2089503.IB,2089504.IB                         2089502.IB,2089503.IB,2089504.IB |
          3. |                       2089511.IB,2089512.IB,2089513.IB                         2089511.IB,2089512.IB,2089513.IB |
          4. |                       2089499.IB,2089500.IB,2089501.IB                         2089499.IB,2089500.IB,2089501.IB |
          5. |                       2089516.IB,2089517.IB,2089518.IB                         2089516.IB,2089517.IB,2089518.IB |
             |-----------------------------------------------------------------------------------------------------------------|
          6. |                       2089481.IB,2089482.IB,2089483.IB                         2089481.IB,2089482.IB,2089483.IB |
          7. |                       2089478.IB,2089479.IB,2089480.IB                         2089478.IB,2089479.IB,2089480.IB |
          8. |                       2089442.IB,2089443.IB,2089444.IB                         2089442.IB,2089443.IB,2089444.IB |
          9. |                       2089494.IB,2089495.IB,2089496.IB                         2089494.IB,2089495.IB,2089496.IB |
         10. |                       2089467.IB,2089468.IB,2089469.IB                         2089467.IB,2089468.IB,2089469.IB |
             |-----------------------------------------------------------------------------------------------------------------|
         11. |                       2089439.IB,2089440.IB,2089441.IB                         2089439.IB,2089440.IB,2089441.IB |
         12. |                       2089455.IB,2089456.IB,2089457.IB                         2089455.IB,2089456.IB,2089457.IB |
         13. |                       2089464.IB,2089465.IB,2089466.IB                         2089464.IB,2089465.IB,2089466.IB |
         14. | 2089489.IB,2089490.IB,2089491.IB,2089492.IB,2089493.IB   2089489.IB,2089490.IB,2089491.IB,2089492.IB,2089493.IB |
         15. |                       2089387.IB,2089388.IB,2089389.IB                         2089387.IB,2089388.IB,2089389.IB |
             |-----------------------------------------------------------------------------------------------------------------|
         16. |            2089408.IB,2089409.IB,2089410.IB,2089411.IB              2089408.IB,2089409.IB,2089410.IB,2089411.IB |
         17. |            2089445.IB,2089446.IB,2089447.IB,2089448.IB              2089445.IB,2089446.IB,2089447.IB,2089448.IB |
         18. |                       2089430.IB,2089431.IB,2089432.IB                         2089430.IB,2089431.IB,2089432.IB |
         19. |                       2089355.IB,2089356.IB,2089357.IB                         2089355.IB,2089356.IB,2089357.IB |
         20. |                       2089412.IB,2089413.IB,2089414.IB                         2089412.IB,2089413.IB,2089414.IB |
             +-----------------------------------------------------------------------------------------------------------------+

        Comment


        • #5
          Dear @Andrew Musau and @Nick Cox, Thanks a lot for the suggestions.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment

          Working...
          X