Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two data sets using joinby but with capitol and lower case differences

    Dear Statalisters,

    I need some help because I am trying to merge two databases using
    Code:
    joinby firms
    , but I have a problem because in one database the names of the firms are in capital letters and the second database the firms names are in low case letters (actually only the first letter is in capital letter".

    My question for you is if are there any way to do the merge without changing the names of each firm one by one since I have 3800 different firms?

    Hope to be clear with my question.
    Best regards,

    Alejandro




  • #2
    I would use a command like strupper to convert both sets of firm names to all uppercase and then match or join them. (See also strlower and strproper).

    For example, in both datasets:
    Code:
    gen firm_name = strupper(firms)
    Then you can merge or joinby using firm_name

    Comment


    • #3
      Thank you so much David, I did it !!!

      I really appreciate your time answering.

      Best regards,
      Alejandro

      Comment


      • #4
        Just a terminological point regarding #2: strupper(), strlower(), and strproper() are Stata functions, not commands.

        Comment


        • #5
          Hello Clyde, thank you for your comment, its always good to read you!!

          Comment


          • #6
            I would like to ask you just one more question please. How long can it takes a merge? I am doing a second merger using joinby by started more tan one hour ago.
            Thank you very much again.
            Alejandro

            Comment


            • #7
              It depends on the size of the data sets and the extent of matching between the firms in the data sets. Each observation in each data set has to be potentially paired with every other, and then only those that match get retained. (In fact it's a little faster than that because the data sets can be sorted first--but that is also time consuming.) In addition, since the resulting data set will be much larger than the original ones, Stata will frequently be calling the operating system requesting extra memory. Then there is also the possibility that the resulting data set will be too large to sit in active memory and you may end up thrashing the disk using virtual memory. Anyway, an hour doesn't sound very long to me. With a large enough data set this could easily run overnight or for a day.

              I guess the question is, are you sure you really need to use -joinby-? Do you really need to pair up every observation for a given firm in the first data set with every observation of the same firm in the second data set? That's what -joinby- does.

              Comment


              • #8
                Thank you very much for the clear answer. Unfortunatelly joinby is necessary in this case for my research, in part is what is going to add value and it is a large dataset, so I am preparing myself for a overnight merger then.
                Thank you Clyde again, as several times your time here is really usefull.
                thank you

                Comment

                Working...
                X