Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • split items in a variable

    Dear All, I have this data (in Chinese):
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte store str73 A str39 food str20 business str15 tableware
    1 "大腸麵線(1),已營業大於20年,有提供餐具"                  "大腸麵線(1)"                         "已營業大於20年" "有提供餐具"
    2 "大腸麵線(2),已營業16-17年"                                       "大腸麵線(2)"                         "已營業16-17年"    ""               
    3 "大腸麵線(3)、滷肉飯,已營業5年,無提供餐具"             "大腸麵線(3)、滷肉飯"             "已營業5年"        "無提供餐具"
    4 "日式料理,已營業小於7年,有提供餐具"                      "日式料理"                            "已營業小於7年"  "有提供餐具"
    5 "日式料理(2),韓式料理,早點,已營業7年,有提供餐具" "日式料理(2),韓式料理,早點" "已營業7年"        "有提供餐具"
    6 "蛋餅,飲料,早點"                                                  "蛋餅,飲料,早點"                ""                     ""               
    7 "漢堡,早點,已營業6-7年"                                         "漢堡,早點"                         "已營業6-7年"      ""               
    8 "蛋餅,早點,有提供餐具"                                         "蛋餅,早點"                         ""                     "有提供餐具"
    end
    The purpose is to split items in A variable (with different number of items, usually separated by "、" or ",") into three variables, i.e., food, business, and tableware.

    As far as I can tell, in general, (1) the last wanted variable `tableware' is in the last position (of `A') with "餐具", separated from prior item `business' with ","; (2) the second last wanted item `business' is ended with "年" (years in English), separated from prior item `food' with ",".; (3) the first wanted variable `food', unlike the other two variables (`business' and `tableware'), may contain more than one item of `A' (apart from the other items for `business' and `tableware'). Any suggestions are highly appreciated!
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    You can introduce your own delimiters based on the string endings, split and then group.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte store str73 A
    1 "大腸麵線(1),已營業大於20年,有提供餐具"                
    2 "大腸麵線(2),已營業16-17年"                                      
    3 "大腸麵線(3)、滷肉飯,已營業5年,無提供餐具"            
    4 "日式料理 ,已營業小於7年,有提供餐具"                    
    5 "日式料理(2),韓式料理,早點,已營業7年,有提供餐具"
    6 "蛋餅,飲料,早點"                                                
    7 "漢堡,早點,已營業6-7年"                                        
    8 "蛋餅,早點,有提供餐具"                                        
    end
    
    foreach char in 已 有 無{
        replace A= ustrregexra(A, "`char'", "\.`char'")
    }
    foreach char in 年 餐具{
        replace A= ustrregexra(A, "`char'", "`char'\.")
    }
    replace A= ustrregexra(A, ",|、", " ")
    split A, p(.) g(wanted)
    reshape long wanted, i(store) j(which)
    replace which=1
    replace which=2 if ustrregexm(wanted, "年")
    replace which= 3 if ustrregexm(wanted, "餐具")
    drop if missing(trim(wanted))
    reshape wide wanted, i(store) j(which)
    Res.:

    Code:
    . l store wanted*, sep(0)
    
         +------------------------------------------------------------------+
         | store                      wanted1          wanted2      wanted3 |
         |------------------------------------------------------------------|
      1. |     1                 大腸麵線(1)    已營業大於20年   有提供餐具 |
      2. |     2                 大腸麵線(2)     已營業16-17年              |
      3. |     3          大腸麵線(3) 滷肉飯         已營業5年   無提供餐具 |
      4. |     4                    日式料理     已營業小於7年   有提供餐具 |
      5. |     5   日式料理(2) 韓式料理 早點         已營業7年   有提供餐具 |
      6. |     6               蛋餅 飲料 早點                               |
      7. |     7                   漢堡 早點       已營業6-7年              |
      8. |     8                   蛋餅 早點                     有提供餐具 |
         +------------------------------------------------------------------+

    Comment


    • #3
      Dear Andrew, Many thanks for the very helpful suggestion. In addition, I'd like to generate (probably using regular expression) the following variable from `wanted2'.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte store str8 wanted2a
      1 "20年"   
      2 "16-17年"
      3 "5年"    
      4 "7年"    
      5 "7年"    
      6 ""        
      7 "6-7年"  
      8 ""        
      end
      Any suggestions? Thanks.
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str36 wanted2
        "已營業大於20年"
        "已營業16-17年"   
        "已營業5年"       
        "已營業小於7年" 
        "已營業7年"       
        ""                    
        "已營業6-7年"     
        ""                    
        end
        
        g wanted= ustrregexs(2) if ustrregexm(wanted2, "([^\d]+)([\d-\d]+[^\d+])")
        Res.:

        Code:
        . l, sep(0)
        
             +--------------------------+
             |        wanted2    wanted |
             |--------------------------|
          1. | 已營業大於20年      20年 |
          2. |  已營業16-17年   16-17年 |
          3. |      已營業5年       5年 |
          4. |  已營業小於7年       7年 |
          5. |      已營業7年       7年 |
          6. |                          |
          7. |    已營業6-7年     6-7年 |
          8. |                          |

        Comment


        • #5
          Dear Andrew, Problem solved, thanks.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment

          Working...
          X