Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining two data sets without common variable

    Apologies in advance if I do not ask this question correctly - this is my first post.

    I have two data sets. One is a survey of 5000 people with numerous variables about the individuals including their 1st, 2nd and 3rd choices of university. I have another data set which includes a number of characteristics about the universities including their ranking. The University numbers in the 1st, 2nd and 3rd choices of the first data set match with the numbers given in the second data set. What would be the simplest way to create variables which would for example give ranking of 1st choice, ranking of 2nd choice ranking of 3rd choice ?

    E.g. of data sets

    ID | 1ST CHOICE | 2ND CHOICE | 3RD CHOICE |
    1 | 4005 | 3904 | 5001 |
    2 | 6789 | 4678 | 9803 |

    and so on. The schools data set then looks like this:

    School | Ranking | % females |....
    4005 | 45 | 51 |
    3904 | 36 | 53 |
    5001 | 2 | 53 |
    6789 | 17 | 48 |

    and so on.

    I cannot think of any merge or append strategy that would work in this case.

    Thanks in advance for any help.
    Last edited by Lucy Kraftman; 04 Aug 2017, 04:31.

  • #2
    You can change the first dataset from wide to long, using the reshape command, and then use "School" to match the characteristics of each school.

    Code:
    Wide
    
    ID | 1ST CHOICE | 2ND CHOICE | 3RD CHOICE |
    1 | 4005 | 3904 | 5001 |
    2 | 6789 | 4678 | 9803 |
    
    Long
    
    ID | SCHOOL | CHOICE 
    1   |   4005      |    1
    1   |   3904      |    2
    1   |   5001      |    3
    2   |   6789      |    1
    and so on. Now we can clearly see how to match it with the second dataset.

    Comment


    • #3
      First, I am proposing to reshape your data.
      In order to Reshape, I propose to change the variables name (to a unique commun identification with number) in your first data:
      rename 1stCHOICE choice_1
      rename 2ndCHOICE choice_2
      rename 3rdCHOICE choice_3


      Then proceed to reshape command.

      reshape long choice_, i(id) j(date)


      Now you have your data reshaped.
      save it then merge with your main data.

      Comment


      • #4
        and without reshape:

        Code:
        ren 1st_choice school
        merge m:1 school using "university.dta"
        ren rank rank1st
        ren females females1st
        ren school 1st_choice
        
        ren 2nd_choice school
        merge m:1 school using "university.dta"
        ren rank rank2nd
        ren females females2nd
        ren school 2nd_choice
        
        ren 3rd_choice school
        merge m:1 school using "university.dta"
        ren rank rank3rd
        ren females females3rd
        ren school 3rd_choice
        you can shorten this with a loop

        Comment


        • #5
          Thanks for all the help, worked perfectly

          Comment


          • #6
            Hello,

            For a research I want to combine 3 datasets. First I want to combine the AuditAnalytics dataset with Compustat;North-America Daily;fundamentals annual (both from WRDS-Wharton). I took the years 2006 till 2009 and selected "CIK" for company codes. After selecting all the available variables, I wanted to merge these 2 datasets into 1 by using "CIK" as the main variable to base the merging on. However, when I try to merge, I get the message: "variable cik does not uniquely identify observations in the master data. How can I solve this?

            Thanks in advance for any help!

            Comment

            Working...
            X