Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Text in common between two variables

    Hello. I need to generate a variable that indicates the length of the text string that two other text variables have in common from position 1, including spaces. For example:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str18 var1 str19 var2 float THIS_IS
    "example number 1"   "example number 123"  16
    "esto es una prueba" "esto es otra prueba"  8
    end
    I do not know how to do it. Could someone help me please? Thanks

  • #2
    Esteban, I'll show a clumsy solution as below.

    Code:
    gen var1_len = strlen(var1)
    qui sum var1_len
    local len = r(max)
    gen var2_len = strlen(var2)
    local len = min(`len', r(max))
    drop var1_len var2_len
    
    gen v1 = cond(substr(var1,1,1)==substr(var2,1,1)&!mi(substr(var1,1,1))&!mi(substr(var2,1,1)), 1, 0)
    gen wanted = v1
    
    forvalues i = 2/`len'  {
        gen v2 = cond(substr(var1,`i',1)==substr(var2,`i',1)&!mi(substr(var1,`i',1))&!mi(substr(var2,`i',1)), 1, 0)
        replace v1 = v1*v2
        replace wanted = wanted + v1
        drop v2
    }
    
    drop v1
    Code:
    . l
    
         +---------------------------------------------------+
         |               var1                  var2   wanted |
         |---------------------------------------------------|
      1. |   example number 1    example number 123       16 |
      2. | esto es una prueba   esto es otra prueba        8 |
         +---------------------------------------------------+

    Comment


    • #3
      It served me perfectly, and I don't think it was a clumsy solution. Thank you very much!

      Comment


      • #4
        An alternative solution using regex:

        Code:
        gen newvar1 = "^" + ustrregexra(var1,"\w| ","$0?")
        gen wanted = strlen(ustrregexs(0)) if ustrregexm(var2,newvar1)
        drop newvar1

        Comment


        • #5
          I tried this solution and it didn't work well... :-( But I think a regex approach has a lot of potential. Thank you!

          Comment


          • #6
            Originally posted by Ali Atia View Post
            An alternative solution using regex:

            Code:
            gen newvar1 = "^" + ustrregexra(var1,"\w| ","$0?")
            gen wanted = strlen(ustrregexs(0)) if ustrregexm(var2,newvar1)
            drop newvar1
            The answer is very good!

            Comment

            Working...
            X