Merging two data sets using joinby but with capitol and lower case differences

Alejandro Torres

Join Date: Jan 2018

Posts: 152
#1

Merging two data sets using joinby but with capitol and lower case differences

26 Dec 2018, 14:06

Dear Statalisters,

I need some help because I am trying to merge two databases using

Code:

joinby firms

, but I have a problem because in one database the names of the firms are in capital letters and the second database the firms names are in low case letters (actually only the first letter is in capital letter".

My question for you is if are there any way to do the merge without changing the names of each firm one by one since I have 3800 different firms?

Hope to be clear with my question.
Best regards,

Alejandro
Tags: None
David Benson

Join Date: Oct 2018

Posts: 489
#2

26 Dec 2018, 14:21

I would use a command like strupper to convert both sets of firm names to all uppercase and then match or join them. (See also strlower and strproper).

For example, in both datasets:

Code:

gen firm_name = strupper(firms)

Then you can merge or joinby using firm_name
Comment
Alejandro Torres

Join Date: Jan 2018

Posts: 152
#3

26 Dec 2018, 14:35

Thank you so much David, I did it !!!

I really appreciate your time answering.

Best regards,
Alejandro
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

26 Dec 2018, 15:32

Just a terminological point regarding #2: strupper(), strlower(), and strproper() are Stata functions, not commands.
Comment
Alejandro Torres

Join Date: Jan 2018

Posts: 152
#5

26 Dec 2018, 18:04

Hello Clyde, thank you for your comment, its always good to read you!!
Comment
Alejandro Torres

Join Date: Jan 2018

Posts: 152
#6

26 Dec 2018, 18:06

I would like to ask you just one more question please. How long can it takes a merge? I am doing a second merger using joinby by started more tan one hour ago.
Thank you very much again.
Alejandro
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#7

26 Dec 2018, 18:31

It depends on the size of the data sets and the extent of matching between the firms in the data sets. Each observation in each data set has to be potentially paired with every other, and then only those that match get retained. (In fact it's a little faster than that because the data sets can be sorted first--but that is also time consuming.) In addition, since the resulting data set will be much larger than the original ones, Stata will frequently be calling the operating system requesting extra memory. Then there is also the possibility that the resulting data set will be too large to sit in active memory and you may end up thrashing the disk using virtual memory. Anyway, an hour doesn't sound very long to me. With a large enough data set this could easily run overnight or for a day.

I guess the question is, are you sure you really need to use -joinby-? Do you really need to pair up every observation for a given firm in the first data set with every observation of the same firm in the second data set? That's what -joinby- does.
1 like
Comment
Alejandro Torres

Join Date: Jan 2018

Posts: 152
#8

26 Dec 2018, 18:50

Thank you very much for the clear answer. Unfortunatelly joinby is necessary in this case for my research, in part is what is going to add value and it is a large dataset, so I am preparing myself for a overnight merger then.
Thank you Clyde again, as several times your time here is really usefull.
thank you
Comment

Announcement

Merging two data sets using joinby but with capitol and lower case differences

Comment

Comment

Comment

Comment

Comment

Comment

Comment