Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • One specific year cannot be merged even though merging variables have the same values in master and using data

    Hi everyone,

    I am working with two panel datasets with which I am trying to perform a 1:1 merge. Ultimatly, I want to add the variable fiscal year to my master dataset. My merging variables are FIRM_ID, DATE, and PERSON_ID. Hence, my code looks like

    Code:
    merge 1:1 firm_id date person_id using "...\file.dta", keep(master match) nogenerate
    There are no duplicates for the combination of the three merging variables in neither of the two datasets.

    The merging process itself works well. However, one out of eight years does not merge properly. Taking a look at the population of the variable in the master and the using dataset might help to explain the problem:

    This is how fiscal year is populated in the using dataset

    Click image for larger version

Name:	Using.PNG
Views:	1
Size:	8.4 KB
ID:	1488796


    And this is how it is populated in the master dataset after the merge

    Click image for larger version

Name:	Master.PNG
Views:	1
Size:	8.6 KB
ID:	1488797


    Apparantly, the merging works quiet well except for the fiscal year 2009.

    Of course, I have already compared the observations which did not match. However, the observations occur in both datasets and 2009 should be as populated as all the other years in the data after the merge. Furthermore, each of the three merging variables have the same format in both datasets. I really do not understand why Stata does not properly merge this one year, whereas all the others work quite well.

    Has anyone ever experienced anything similar? Do you have any ideas how to solve the problem?

    I would really appreciate your answers.

    Best regards,
    Sebastian

  • #2
    Unfortunately you haven't provided enough detail to begin to diagnose this.
    I'd start by leaving off the keep(master match) and nogenerate parts of your code and carefully inspecting your _merge variable.

    You say that you compared the observations that didn't match and they occur in both datasets. How did you compare them? If you can show us some examples of observations that don't merge that you expect to merge that might be helpful.

    Comment


    • #3
      Thank you Sarah! I will try to give you some more information.

      Leaving out keep(master match) unfortunately does not change anything for the year 2009. I will show you one example drawn from the dataset.

      The first screenshot shows an example from the master dataset after the merge. As can be seen, only the 2009 year did not merge for this specific firm_id.
      Click image for larger version

Name:	Master2.PNG
Views:	1
Size:	37.3 KB
ID:	1488889




      However, when you take a look at the using dataset below, you can see that it contains the matching observations for the respective firm_id, person_id, and date.
      Click image for larger version

Name:	Using2.PNG
Views:	1
Size:	16.0 KB
ID:	1488890



      Furthermore, the variable types and formats of the matching variables are the same in both datasets and I have already checked whether any of the variables contains leading blanks. So normally the 2009 observations should match just like all the others.

      Do you have any other ideas what might be the reason for that?
      Last edited by Sebastian Trabert; 19 Mar 2019, 04:12.

      Comment


      • #4
        Try to use the option - update - in the command.

        Hopefully that helps!
        Best regards,

        Marcos

        Comment


        • #5
          Thanks Marcos! I tried, but it does not change anything.

          Comment


          • #6
            My last tentative approach: try the option - update replace - for that matter.
            Best regards,

            Marcos

            Comment


            • #7
              Unfortunately, it doesn't change anything either.

              Somehow Stata is not able "connect" the matching variables of the two datasets for this one year. I don't understand why though...

              Comment


              • #8
                I just figured it out myself. The date variable in the using dataset contained a 'hidden' time. That's why the matching did not work properly.

                After importing the data (from Excel) again as strings, the matching worked well.

                Thanks again for your comments anyway.

                Best regards,
                Sebastian

                Comment

                Working...
                X