Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets with different, misspelled, or changing identifiers?

    Hi Everyone,

    I am working with Stata 17 on Mac.

    I have project-level data on energy production across provinces in Canada. My goal is to do some analysis at the firm-month level across Canada by combining these datasets.

    My problem is that each province publishes its own data, so the identifiers are not consistent across datasets. Moreover, so datasets have changed their coding over time. For example, in one dataset, Shell is coded as "SHELL", while in the other it is coded as "Shell Canada Energy"

    Data from B.C.:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16(OPERATOR well_id)
    "SHELL" "100162907819W600"
    "SHELL" "100090708018W600"
    end

    Data from Alberta:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str43 OPERATOR str11 BATTERY
    "Shell Canada Energy" "ABBT0147964"
    "Shell Canada Energy" "ABBT0143302"
    end

    The best solution I can think of is to manually recode all identifiers, but there are hundreds of firms across hundreds of months. So, I'm hoping there is another method that doesn't involve me manually scanning through all of the firm identifiers

    Thank you for all of your suggestions/solutions
    Jerome

  • #2
    there are at least 3 community-contributed programs that might help; use -search- to find and download: matchit, reclink, reclink2

    Comment


    • #3
      Before you use them, it helps to clean the data by using commands such as replace var = lower(var); replace var = strtrim(var), and trying to get common iterations of words to be the same (e.g., (replace var = subinstr(var, ", inc.", "", .) so the program doesn't have to deal with that difference)

      Comment


      • #4
        Thank you for these suggestions!

        Comment

        Working...
        X