Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Data sets

    I have a dataset which contains monthly share transactions data for about 470 companies. Each company trades about 50 - 60 shares a month and a sample of dataset looks likes this.
    Company Name Share Name Date Qty
    ABC delta 01/01/2003 500
    ABC delta 01/02/2003 200
    ABC delta 01/03/2003 100
    ABC gamma 01/01/2003 20
    ABC gamma 01/02/2003 15
    AAA gamma 01/01/2004 1000
    AAA gamma 01/05/2004 120
    AAA zigma 01/01/2006 500
    AAA zigma 01/02/2006 65
    ​I want to merge this data set with another dataset which contains codes for each stock and it looks like this
    Share Name Code
    delta dt
    gamma gm
    zigma zg
    It doesn't allow me to merge using m:1 as it gave me an error message once. When I tried to merge them again, now it constantly says the code file can't be found even if it's in the same folder. Please help.
    Thank you
    Last edited by Yapa Bandara; 20 Aug 2015, 01:34.

  • #2
    The message I get is,
    variable share_name does not uniquely identify observations in the using data
    r(459);

    Comment


    • #3
      It would really help if we saw the command you issued, and understood which is the main data (in memory) and which is the using data.

      With that said, if your first dataset is the one in memory, and your second dataset the using dataset, then you need merge m:1 share_name using ... . If its the other way around, then you need merge 1:m share_name using ... . The m:1 or 1:m needs to be in the same order as the datasets with multiple or 1 observation per share_name.

      The error message you quote in your second message tells you that your using dataset has multiple observations with the same value of share_name. That is wrong if you are doing an m:1 merge. Either (a) your second dataset has at least one share_name that appears more than once or (b) your m:1 or 1:m should be reversed on your merge commend.

      I cannot at all comment on the problem about your files not being found. We can better help you if we know what commands you have tried and what Stata told you to indicate that there was a problem. Please review the Statalist FAQ linked to from the top of the page, especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata log window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ.

      Comment


      • #4
        I'm sorry for being less informative. The one in the memory is the first table named "holdings" and the second table is the using data set and it is named "share_code". The common field as you can see is the "share name". However, in the main file as I explained before, one company (shown by company name) trades 50 - 60 shares and these shares has been sorted first by share name and then by dates. So share_name contains multiple observations.

        However, the using data set has only one observation for one share name with the relevant code. I need to assign codes for the main file and it should look like this after merging.
        Company Name share_name Date Qty Code
        ABC delta 01/01/2003 500 dt
        ABC delta 01/02/2003 200 dt
        ABC delta 01/03/2003 100 dt
        ABC gamma 01/02/2003 20 gm
        ABC gamma 01/01/2004 15 gm
        AAA gamma 01/01/2004 1000 gm
        AAA gamma 01/05/2004 120 gm
        AAA zigma 01/01/2006 500 zg
        AAA zigma 01/02/2006 65 zg
        I did
        merge m:1 share_name using share_code

        variable share_name does not uniquely identify observations in the using data
        r(459);

        Thank You

        Comment


        • #5
          The error message is telling you that despite your assertion to the contrary, there is at least one share_name that appears more than once in the share_code dataset.

          See help duplicates for instructions on using the duplicates command to identify the duplicated observations in your share_code dataset. Here is an example of its use on a modified version of your sample data.
          Code:
          . list, clean abbreviate(12)
          
                 share_name   code  
            1.        delta     dt  
            2.        gamma     gm  
            3.        zigma     zg  
            4.        gamma     gg  
            5.        delta     dt  
          
          . duplicates report share_name
          
          Duplicates in terms of share_name
          
          --------------------------------------
             copies | observations       surplus
          ----------+---------------------------
                  1 |            1             0
                  2 |            4             2
          --------------------------------------
          
          . duplicates list share_name, abbreviate(12)
          
          Duplicates in terms of share_name
          
            +----------------------------+
            | group:   obs:   share_name |
            |----------------------------|
            |      1      1        delta |
            |      1      5        delta |
            |      2      2        gamma |
            |      2      4        gamma |
            +----------------------------+
          
          .
          Last edited by William Lisowski; 20 Aug 2015, 18:10.

          Comment


          • #6
            I figured it out and merge was done. Thank you William. By the way, can you please let me know what is meant by abbreviate(12) in your command? I ran duplicate list without using it.

            Comment


            • #7
              The abbreviate option on the duplicates command has the same function as it does on the list command: it sets the length at which variable names are shown as abbreviated in the log output. Compare the output below to my previous sample.

              You should read the output of the help duplicates command for details, and to learn more about the duplicates command.
              Code:
              . duplicates list share_name
              
              Duplicates in terms of share_name
              
                +--------------------------+
                | group:   obs:   share_~e |
                |--------------------------|
                |      1      1      delta |
                |      1      5      delta |
                |      2      2      gamma |
                |      2      4      gamma |
                +--------------------------+
              
              .

              Comment

              Working...
              X