Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • matching with smae (Chinese) words?

    Dear All, I found this question here (https://bbs.pinggu.org/thread-10628284-1-1.html). Suppose that a simplified data set is
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str57 landholder str6 region
    "漳州乌山旅游开发有限公司"                      "长沙"
    "长沙市白果园街区建设开发有限责任公司"    "太原"
    "葫芦岛市龙湾中央商务区管理委员会"          "衡水"
    "潮安县民政局"                                        "长春"
    "开县镇东街道办事处教育办公室"                "河北"
    "东莞松山湖高新技术产业开发区管理委员会" ""      
    "贵州省瓮安草塘中学"                               ""      
    "长春市双阳区人民政府奢岭街道办事处"       ""      
    "冯原镇人民政府"                                     ""      
    end
    The question is:
    1. If we can find `any' element of "region" appears in the (each) element/observation of "landholder", set d=1, otherwise 0.
    2. For example, we cannot find any match of the first element of "landholder" in the "region" variable, so d = 0.
    3. Further, the second element of "landholder" has "长沙", as the same of the first element of "region", then d = 1.
    4. Similarly, the eighth element of "landholder" has "长春", which also appears as the fourth element of "region", thus d = 1.
    Any suggestions are highly appreciated.
    Last edited by River Huang; 14 Jun 2021, 19:54.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    levelsof region, separate(|) clean local(regions)
    gen d = ustrregexm(landholder,"`regions'")

    Comment


    • #3
      Dear Ali, Many thanks for this wonderful suggestion.
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        Dear Ali, Is it possible to obtain the results as below (instead of a dummy `d').
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str57 landholder str6(region wanted)
        "漳州乌山旅游开发有限公司"                      "长沙" ""      
        "长沙市白果园街区建设开发有限责任公司"    "太原" "长沙"
        "葫芦岛市龙湾中央商务区管理委员会"          "衡水" ""      
        "潮安县民政局"                                        "长春" ""      
        "开县镇东街道办事处教育办公室"                "河北" ""      
        "东莞松山湖高新技术产业开发区管理委员会" ""       ""      
        "贵州省瓮安草塘中学"                               ""       ""      
        "长春市双阳区人民政府奢岭街道办事处"       ""       "长春"
        "冯原镇人民政府"                                     ""       ""      
        end
        The `wanted' variable is what we wanted. Thanks.

        Ho-Chuan (River) Huang
        Stata 19.0, MP(4)

        Comment


        • #5
          In your example the "wanted" always have 2 chars. In real data could it different? and obs 8 seems to not have region but a "wanted" value?

          Comment


          • #6
            Code:
            levelsof region, separate(|) clean local(regions)
            gen d2 = ustrregexs(0) if ustrregexm(landholder,"`regions'")

            Comment


            • #7
              Dear Ali, Thanks again. It works quite well.

              Ho-Chuan (River) Huang
              Stata 19.0, MP(4)

              Comment


              • #8
                Dear Bjart, In fact, I think that the "wanted" variable can have more than 2 chars.
                Ho-Chuan (River) Huang
                Stata 19.0, MP(4)

                Comment

                Working...
                X