Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I check if two string variables contain the same content?

    I have 2 string variables of which I want to compare the content. I want to know if the content is exactly the same or not.

    I wanted to include a snippet via dataex but it returned the error 'data width (1401 chars) exceeds max linesize. Try specifying fewer variables' and I don't know how to fix this. So I included a screenshot of my dataset.

    Thank you for your answer in advance.

    Best,
    Maarten

    Attached Files

  • #2
    String variables in the first instance are just equal or not equal.


    Code:
    list strvar1 strvar2 if strvar1 != strvar2
    But watch out: the test for inequality is necessarily sensitive to differences you may consider substantively immaterial, such as leading, trailing or embedded spaces, punctuation quirks, or arbitrary differences between lower and upper case. Above all, no homunculus looks inside the strings to see if they have the same meaning.

    Standardising on use of spaces is easiest to achieve. Replacing a string using trim(itrim()) is often a good idea.




    Comment


    • #3
      Thanks for your answer, but I'm afraid I haven't explained myself clearly enough. My apologies.

      In the image you can see 2 observations of kokid '117'. In observation 2 the kokid '117' has a value for strvar1, and in observation 3 kokid '117' has a value for strvar2. I want to know for kokid 117 if the values for strvar1 and strvar2 differ or not.

      I hope I have been more clear. Sorry for the inconvenience.

      Comment


      • #4
        OK; hard to answer with general code from so specific an example, but copy non-missing strings into observations with missing values, and then you can compare different variables in the same observation. http://www.stata.com/support/faqs/da...issing-values/

        Comment


        • #5
          Ok thanks. I'll try your solution.

          Comment


          • #6
            Perhaps Maarten could - encode - the string variables. Then, he could - tabulate - these variables and see whether they have the "same content".


            Code:
            . encode stringvar1, gen(myvar1)
            . encode stringvar2, gen(myvar2)
            . tabulate myvar1
            . tabulate myvar2
            Naturally, beware the contents may differ for many reasons, as Nick pointed out in #2.
            Best regards,

            Marcos

            Comment

            Working...
            X