One specific year cannot be merged even though merging variables have the same values in master and using data

Sebastian Trabert

Join Date: Mar 2019

Posts: 7
#1

One specific year cannot be merged even though merging variables have the same values in master and using data

18 Mar 2019, 11:39

Hi everyone,

I am working with two panel datasets with which I am trying to perform a 1:1 merge. Ultimatly, I want to add the variable fiscal year to my master dataset. My merging variables are FIRM_ID, DATE, and PERSON_ID. Hence, my code looks like

Code:

merge 1:1 firm_id date person_id using "...\file.dta", keep(master match) nogenerate

There are no duplicates for the combination of the three merging variables in neither of the two datasets.

The merging process itself works well. However, one out of eight years does not merge properly. Taking a look at the population of the variable in the master and the using dataset might help to explain the problem:

This is how fiscal year is populated in the using dataset

And this is how it is populated in the master dataset after the merge

Apparantly, the merging works quiet well except for the fiscal year 2009.

Of course, I have already compared the observations which did not match. However, the observations occur in both datasets and 2009 should be as populated as all the other years in the data after the merge. Furthermore, each of the three merging variables have the same format in both datasets. I really do not understand why Stata does not properly merge this one year, whereas all the others work quite well.

Has anyone ever experienced anything similar? Do you have any ideas how to solve the problem?

I would really appreciate your answers.

Best regards,
Sebastian
Tags: merge
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#2

18 Mar 2019, 12:16

Unfortunately you haven't provided enough detail to begin to diagnose this.
I'd start by leaving off the keep(master match) and nogenerate parts of your code and carefully inspecting your _merge variable.

You say that you compared the observations that didn't match and they occur in both datasets. How did you compare them? If you can show us some examples of observations that don't merge that you expect to merge that might be helpful.
4 likes
Comment
Sebastian Trabert

Join Date: Mar 2019

Posts: 7
#3

19 Mar 2019, 03:21

Thank you Sarah! I will try to give you some more information.

Leaving out keep(master match) unfortunately does not change anything for the year 2009. I will show you one example drawn from the dataset.

The first screenshot shows an example from the master dataset after the merge. As can be seen, only the 2009 year did not merge for this specific firm_id.

However, when you take a look at the using dataset below, you can see that it contains the matching observations for the respective firm_id, person_id, and date.

Furthermore, the variable types and formats of the matching variables are the same in both datasets and I have already checked whether any of the variables contains leading blanks. So normally the 2009 observations should match just like all the others.

Do you have any other ideas what might be the reason for that?

Last edited by Sebastian Trabert; 19 Mar 2019, 04:12.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

19 Mar 2019, 04:35

Try to use the option - update - in the command.

Hopefully that helps!

Best regards,

Marcos
1 like
Comment
Sebastian Trabert

Join Date: Mar 2019

Posts: 7
#5

19 Mar 2019, 06:24

Thanks Marcos! I tried, but it does not change anything.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

19 Mar 2019, 06:28

My last tentative approach: try the option - update replace - for that matter.

Best regards,

Marcos
1 like
Comment
Sebastian Trabert

Join Date: Mar 2019

Posts: 7
#7

19 Mar 2019, 07:14

Unfortunately, it doesn't change anything either.

Somehow Stata is not able "connect" the matching variables of the two datasets for this one year. I don't understand why though...
Comment
Sebastian Trabert

Join Date: Mar 2019

Posts: 7
#8

19 Mar 2019, 07:47

I just figured it out myself. The date variable in the using dataset contained a 'hidden' time. That's why the matching did not work properly.

After importing the data (from Excel) again as strings, the matching worked well.

Thanks again for your comments anyway.

Best regards,
Sebastian
1 like
Comment

Announcement

One specific year cannot be merged even though merging variables have the same values in master and using data

Comment

Comment

Comment

Comment

Comment

Comment

Comment