Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare if the lists have similar contents

    Dear Statalisters,

    I have two datasets that contain industry codes, I want to see how much overlap in between. Below is the first few observations from each data.

    data one
    industryA
    1111A0
    1111B0
    111200
    111335
    1113A0
    111400
    111910
    111920
    1119A0
    1119B0
    112100

    data two
    industryB
    111200
    111335
    111400
    111910
    111920
    112100
    112300
    113300
    114100
    114200
    115000

    the number of observation is 470 for one and 450 for two.

    How should I join the two data ?
    after that, should i use

    gen flag=0
    replace flag=1 if industryB~=industryA

    thanks,
    Rochelle

  • #2
    I think what you want to do is:

    Code:
    use "data two", clear
    rename industryB industry
    duplicates drop
    tempfile holding
    save `holding'
    
    use "data one", clear
    rename industryA industry
    duplicates drop
    merge 1:1 industry using `holding'
    This will create a variable, _merge, that will tell you which observations came from data one alone, which from data two alone, and which are found in both.

    If you are not already familiar with the -merge- command, you should read up on it: it is one of the most useful commands for data management.

    Comment


    • #3
      Code:
       
      help levelsof
      help macrolists 
      use data1 
      levelsof industryA, local(A) 
      use data2
      levelsof industryB, local(B) 
      local both : list A | B

      Comment


      • #4
        Hi Rochelle,

        with respect to comparing the data sets you may consider making use of the cf command that compares the data set in memory to a data set on disk. For a more sophisticated comparisons you may consider exploring. For instance, cfvars will enable you to compare variables available in both data sets. CF3 can produce lists for missing observations between the two data sets. With respect to merging the data, I would expect that you can merge the data with use of the merge command and then use the values of the _merge variable to evaluate whether the variables were matched or whether were only present in master/using data set.
        Kind regards,
        Konrad
        Version: Stata/IC 13.1

        Comment


        • #5
          Many thanks to Clyde, Nick and Konrad !!!!

          Best,
          Rohcelle

          Comment

          Working...
          X