Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge annual and monthly data in Stata?

    I have two data-sets for my firms. My main data (A) is at month level. The other one (B), contains a variable I need, but is at year level. I have the company names as the shared identifier. How do I combine these two datasets so that the variable in B merges with my data A, and repeats itself for each of the 12 months in a given year? I tried the following but no luck:

    merge m:1 company year using B
    error: variables company year do not uniquely identify observations in the using data

    which makes sense, because I have the data in A at month level. How can I go around this problem?

    Thank you in advance for any help.

  • #2
    Welcome to Statalist.

    If I understand you correctly, dataset A has one observation for each combination of company, year, and month, while dataset B has one observation for each combination of company and year.

    Stata's error message is telling you that - completely independently of the characteristics of dataset A - you have observations in dataset B with the same value for company and year.

    You should use the duplicates command to identify combinations of company and year that are repeated in dataset B, and try to understand what that tells you about your data. For example duplicates example company year. See help duplicates for more information on the use of this command.

    Comment


    • #3
      Thank you for your response. My question is how to merge these data when they are not at the same time level. Dataset A and B has the same company list. The difference is A has them at monthly level, and B has them only at year level. Now, I want to keep the final dataset as monthly (my masterdata being A), so I want to merge the price variable in B into my dataset A. I understand that the price variable that I am bringing from B will have to repeat for each month in dataset A, which I am ok with. Does Stata have a command for this type of merge?
      Dataset A Dataset B
      company year month variables company year price
      ACURA 2003 1 ACURA 2003 38572
      ACURA 2003 2 ACURA 2004 41694
      ACURA 2003 3
      ACURA 2003 4
      ACURA 2003 5
      ACURA 2003 6
      ACURA 2003 7
      ACURA 2003 8
      ACURA 2003 9
      ACURA 2003 10
      ACURA 2003 11
      ACURA 2003 12
      ACURA 2004 1
      ACURA 2004 2
      ACURA 2004 3
      ACURA 2004 4
      ACURA 2004 5
      ACURA 2004 6
      ACURA 2004 7
      ACURA 2004 8
      ACURA 2004 9
      ACURA 2004 10
      ACURA 2004 11
      ACURA 2004 12

      Comment


      • #4
        Code:
        clear 
        * Dataset A        
        input str5 company    year    month        
        ACURA    2003    1            
        ACURA    2003    2            
        ACURA    2003    3                    
        ACURA    2003    4                    
        ACURA    2003    5                    
        ACURA    2003    6                    
        ACURA    2003    7                    
        ACURA    2003    8                    
        ACURA    2003    9                    
        ACURA    2003    10                    
        ACURA    2003    11                    
        ACURA    2003    12                    
        ACURA    2004    1                    
        ACURA    2004    2                    
        ACURA    2004    3                    
        ACURA    2004    4                    
        ACURA    2004    5                    
        ACURA    2004    6                    
        ACURA    2004    7                    
        ACURA    2004    8                    
        ACURA    2004    9                    
        ACURA    2004    10                    
        ACURA    2004    11                    
        ACURA    2004    12
        end 
        save datasetA
        
        clear 
        input str5 company    year    price
        ACURA    2003    38572
        ACURA    2004    41694
        end 
        
        merge 1:m company year using datasetA 
        
        list 
        
             +----------------------------------------------+
             | company   year   price   month        _merge |
             |----------------------------------------------|
          1. |   ACURA   2003   38572       1   matched (3) |
          2. |   ACURA   2004   41694       1   matched (3) |
          3. |   ACURA   2003   38572       2   matched (3) |
          4. |   ACURA   2003   38572       3   matched (3) |
          5. |   ACURA   2003   38572       4   matched (3) |
             |----------------------------------------------|
          6. |   ACURA   2003   38572       5   matched (3) |
          7. |   ACURA   2003   38572       6   matched (3) |
          8. |   ACURA   2003   38572       7   matched (3) |
          9. |   ACURA   2003   38572       8   matched (3) |
         10. |   ACURA   2003   38572       9   matched (3) |
             |----------------------------------------------|
         11. |   ACURA   2003   38572      10   matched (3) |
         12. |   ACURA   2003   38572      11   matched (3) |
         13. |   ACURA   2003   38572      12   matched (3) |
         14. |   ACURA   2004   41694       2   matched (3) |
         15. |   ACURA   2004   41694       3   matched (3) |
             |----------------------------------------------|
         16. |   ACURA   2004   41694       4   matched (3) |
         17. |   ACURA   2004   41694       5   matched (3) |
         18. |   ACURA   2004   41694       6   matched (3) |
         19. |   ACURA   2004   41694       7   matched (3) |
         20. |   ACURA   2004   41694       8   matched (3) |
             |----------------------------------------------|
         21. |   ACURA   2004   41694       9   matched (3) |
         22. |   ACURA   2004   41694      10   matched (3) |
         23. |   ACURA   2004   41694      11   matched (3) |
         24. |   ACURA   2004   41694      12   matched (3) |
             +----------------------------------------------+

        Comment


        • #5
          Yes, Stata has a command for this, and it is precisely the command you tried:

          Code:
          use DatasetA
          merge m:1 company year using DatasetB
          What William was trying to point out to you is that your dataset B is not what you say it is. Stata has discovered that for at least some companies and years, there is more than one observation per year. It is not simply once-a-year data on each company as you believe it to be. That is what the error message Stata gave is telling you. And I have never known Stata to be wrong about this. Data sets, even well-curated ones from respected data sources, often have errors; or sometimes you just have mistaken expectations about what is in a data set due to miscommunication or unclear codebooks. In any case, you do not have the data you think you have in data set B. So you need to either fix the data, or make a different analysis plan that does not rely on merging these.

          His suggestions in #2 are excellent and I suggest you follow them to find out what these extra observations are. Then you will have to figure out what to do about them. Until you fix the data so that DatasetB really does have just a single observation for each company in each year, you cannot proceed with the merge.

          Black Box Warning: Do not force the issue by using -merge m:m-. That does not solve the problem; it sweeps the problem under the rug and produces a "merged" data set that is almost guaranteed to be a meaningless jumble of data. Any analysis you undertake using it will end badly.

          Added: Crossed with #4. I get the impression that Nick was responding to #3 and did not consider the issues raised in #1 and #2.

          Comment


          • #6
            Clyde is right. I was showing that merge works fine for the example given recently. That in no sense undermines earlier comments. In fact, I've written on this topic.

            https://www.stata.com/support/faqs/d...d-time-values/

            although the spirit there is exactly that of #2.

            Comment


            • #7
              Adapting Lewis Carroll to our purposes,

              "... I have said it thrice:
              What I tell you three times§ is true.”
              __________
              § In posts #2, #5, and #6
              I regret not being more explicit in post #2, but I have been told I have a tendency to overexplain. Trying the commands I suggested would have clarified my response.

              Comment


              • #8
                Thank you all for your thoughtful answers. It was indeed a duplicate company name that caused the issue (long story, but it is an issue of some automakers changing hands frequently in a specific time period in the auto industry, so it causes issues in merging different databases). I identified & fixed it following William's suggestion of duplicates, and was able to use my original m:1 code as both Clyde and Nick suggested should work fine. Now I have a 100% matched data. As you know sometimes small issues cause a bump and I am grateful for all your time and to this forum that helped me overcome this issue quickly and effectively.

                Comment

                Working...
                X