Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge not lining up

    My apologies, I don't even know where to begin on how to use dataex to show my problem. I am merging two data sets. Each data set has a "TXid" number variable per year (2018, 2019, 2021, 2022). So each data set has 4 observations per TXid. I am merging these two datasets by TXid and year. So they are all unique. In my second data set, the TXid matches the name of the elementary school that that first data set has. For example, pretend the TXid is 2103 and the school name is Help Elem. BOTH data sets say this. They both have 2103 being help elementary. However, when I merge the two datasets, the second data set says that TXid now goes with Franklin Elementary while the first still has it as Help elementary. Any ideas why this is? How can I use dataex to show this? Everything seems perfect for a merge, I don't understand why this is happening.

  • #2
    Data set 1
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double TXidnew str15 cname int year
    1902001902103 "CAYUGA EL"       1718
    1902001902103 "CAYUGA ELEM."    2021
    1902001902103 "CAYUGA ELEM."    1819
    1902001902103 "CAYUGA EL"       2122
    1903001903102 "ELKHART INT"     1718
    1903001903102 "ELKHART INT"     2122
    1903001903102 "ELKHART INTERME" 2021
    1903001903102 "ELKHART INTERME" 1819
    1904001904102 "FRANKSTON EL"    2122
    1904001904102 "FRANKSTON ELEM." 1819
    1904001904102 "FRANKSTON EL"    1718
    1904001904102 "FRANKSTON ELEM." 2021
    1906001906102 "NECHES ELEM."    2021
    1906001906102 "NECHES EL"       1718
    1906001906102 "NECHES ELEM."    1819
    1906001906102 "NECHES EL"       2122
    1907001907110 "STORY INTERMEDI" 1819
    1907001907110 "STORY INT"       2122
    1907001907110 "STORY INT"       1718
    1907001907110 "STORY INTERMEDI" 2021
    1908001908101 "WESTWOOD ELEM."  1819
    1908001908101 "WESTWOOD EL"     2122
    1908001908101 "WESTWOOD EL"     1718
    1908001908101 "WESTWOOD ELEM."  2021
    1909001909101 "SLOCUM PK - 8"   2122
    1909001909101 "SLOCUM PRE-K -"  1819
    1909001909101 "SLOCUM PRE-K -"  2021
    1909001909101 "SLOCUM PK - 8"   1718
    2901002901104 "UNDERWOOD EL"    1718
    2901002901104 "UNDERWOOD ELEM." 2021
    2901002901104 "UNDERWOOD ELEM." 1819
    2901002901104 "UNDERWOOD EL"    2122
    3801003801103 "SARAH STRINDEN"  2021
    3801003801103 "SARAH STRINDEN"  1819
    3801003801103 "SARAH STRINDEN"  1718
    3801003801103 "SARAH STRINDEN"  2122
    3902003902102 "W H BONNER EL"   2122
    3902003902102 "W.H. BONNER ELE" 2021
    3902003902102 "W.H. BONNER ELE" 1819
    3902003902102 "W H BONNER EL"   1718
    3903003903123 "COSTON ELEM."    2021
    3903003903123 "COSTON EL"       2122
    3903003903122 "BROOKHOLLOW EL"  2122
    3903003903121 "BRANDON ELEM."   2021
    3903003903129 "SLACK EL"        2122
    3903003903123 "COSTON ELEM."    1819
    3903003903122 "BROOKHOLLOW ELE" 2021
    3903003903120 "ANDERSON EL"     2122
    3903003903121 "BRANDON EL"      2122
    3903003903123 "COSTON EL"       1718
    3903003903120 "ANDERSON ELEM."  1819
    3903003903121 "BRANDON EL"      1718
    3903003903129 "SLACK EL"        1718
    3903003903122 "BROOKHOLLOW ELE" 1819
    3903003903122 "BROOKHOLLOW EL"  1718
    3903003903129 "SLACK ELEM."     2021
    3903003903129 "SLACK ELEM."     1819
    3903003903120 "ANDERSON ELEM."  2021
    3903003903121 "BRANDON ELEM."   1819
    3903003903120 "ANDERSON EL"     1718
    3904003904102 "HUNTINGTON INTE" 2021
    3904003904102 "HUNTINGTON INT"  1718
    3904003904102 "HUNTINGTON INTE" 1819
    3904003904102 "HUNTINGTON INT"  2122
    3905003905103 "H G TEMPLE INT"  2122
    3905003905103 "H.G. TEMPLE INT" 1819
    3905003905103 "H.G. TEMPLE INT" 2021
    3905003905103 "H G TEMPLE INT"  1718
    3906003906101 "ZAVALLA EL"      1718
    3906003906101 "ZAVALLA ELEM."   1819
    3906003906101 "ZAVALLA ELEM."   2021
    3906003906101 "ZAVALLA EL"      2122
    3907003907041 "CENTRAL J H"     2122
    3907003907041 "CENTRAL JR.H."   1819
    3907003907041 "CENTRAL JR.H."   2021
    3907003907041 "CENTRAL J H"     1718
    4901004901101 "FULTON LEARNING" 2021
    4901004901101 "FULTON 4-5 LEAR" 1718
    4901004901101 "FULTON LEARNING" 1819
    4901004901101 "FULTON LEARNING" 2122
    5901005901101 "ARCHER CITY ELE" 1819
    5901005901101 "ARCHER CITY ELE" 2021
    5901005901101 "ARCHER CITY EL"  2122
    5901005901101 "ARCHER CITY EL"  1718
    5902005902101 "HOLLIDAY ELEM."  2021
    5902005902101 "HOLLIDAY ELEM."  1819
    5902005902101 "HOLLIDAY EL"     2122
    5902005902101 "HOLLIDAY EL"     1718
    5904005904101 "WINDTHORST ELEM" 2021
    5904005904101 "WINDTHORST ELEM" 1819
    5904005904101 "WINDTHORST EL"   2122
    5904005904101 "WINDTHORST EL"   1718
    6902006902001 "CLAUDE SCHOOLS"  2122
    6902006902101 "CLAUDE EL"       1718
    6902006902101 "CLAUDE ELEM."    1819
    6902006902001 "CLAUDE SCHOOLS"  2021
    7901007901101 "CHARLOTTE ELEM." 2021
    7901007901041 "CHARLOTTE M.S."  1819
    7901007901101 "CHARLOTTE EL"    2122
    7901007901041 "CHARLOTTE MIDDL" 1718
    end
    Data Set 2


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double TXidnew str50 schoolname int year
    1902001902001 "CAYUGA H S"                        1718
    1902001902001 "CAYUGA H S"                        1819
    1902001902001 "CAYUGA H S"                        2021
    1902001902001 "CAYUGA H S"                        2122
    1902001902041 "CAYUGA MIDDLE"                     1718
    1902001902041 "CAYUGA MIDDLE"                     1819
    1902001902041 "CAYUGA MIDDLE"                     2021
    1902001902041 "CAYUGA MIDDLE"                     2122
    1902001902103 "CAYUGA EL"                         1718
    1902001902103 "CAYUGA EL"                         1819
    1902001902103 "CAYUGA EL"                         2021
    1902001902103 "CAYUGA EL"                         2122
    1903001903001 "ELKHART H S"                       1718
    1903001903001 "ELKHART H S"                       1819
    1903001903001 "ELKHART H S"                       2021
    1903001903001 "ELKHART H S"                       2122
    1903001903041 "ELKHART MIDDLE"                    1718
    1903001903041 "ELKHART MIDDLE"                    1819
    1903001903041 "ELKHART MIDDLE"                    2021
    1903001903041 "ELKHART MIDDLE"                    2122
    1903001903101 "ELKHART EL"                        1718
    1903001903101 "ELKHART EL"                        1819
    1903001903101 "ELKHART EL"                        2021
    1903001903101 "ELKHART EL"                        2122
    1903001903102 "ELKHART INT"                       1718
    1903001903102 "ELKHART INT"                       1819
    1903001903102 "ELKHART INT"                       2021
    1903001903102 "ELKHART INT"                       2122
    1904001904001 "FRANKSTON H S"                     1718
    1904001904001 "FRANKSTON H S"                     1819
    1904001904001 "FRANKSTON H S"                     2021
    1904001904001 "FRANKSTON H S"                     2122
    1904001904041 "FRANKSTON MIDDLE"                  1718
    1904001904041 "FRANKSTON MIDDLE"                  1819
    1904001904041 "FRANKSTON MIDDLE"                  2021
    1904001904041 "FRANKSTON MIDDLE"                  2122
    1904001904102 "FRANKSTON EL"                      1718
    1904001904102 "FRANKSTON EL"                      1819
    1904001904102 "FRANKSTON EL"                      2021
    1904001904102 "FRANKSTON EL"                      2122
    1906001906002 "NECHES H S"                        1718
    1906001906002 "NECHES H S"                        1819
    1906001906002 "NECHES H S"                        2021
    1906001906002 "NECHES H S"                        2122
    1906001906102 "NECHES EL"                         1718
    1906001906102 "NECHES EL"                         1819
    1906001906102 "NECHES EL"                         2021
    1906001906102 "NECHES EL"                         2122
    1907001907001 "PALESTINE H S"                     1718
    1907001907001 "PALESTINE H S"                     1819
    1907001907001 "PALESTINE H S"                     2021
    1907001907001 "PALESTINE H S"                     2122
    1907001907041 "PALESTINE J H"                     1718
    1907001907041 "PALESTINE J H"                     1819
    1907001907041 "PALESTINE J H"                     2021
    1907001907041 "PALESTINE J H"                     2122
    1907001907101 "NORTHSIDE PRI"                     1718
    1907001907101 "NORTHSIDE PRI"                     1819
    1907001907101 "NORTHSIDE PRI"                     2021
    1907001907101 "NORTHSIDE PRI"                     2122
    1907001907102 "WASHINGTON EARLY CHILDHOOD CENTER" 1718
    1907001907102 "WASHINGTON EARLY CHILDHOOD CENTER" 1819
    1907001907102 "WASHINGTON EARLY CHILDHOOD CENTER" 2021
    1907001907102 "WASHINGTON EARLY CHILDHOOD CENTER" 2122
    1907001907107 "SOUTHSIDE EL"                      1718
    1907001907107 "SOUTHSIDE EL"                      1819
    1907001907107 "SOUTHSIDE EL"                      2021
    1907001907107 "SOUTHSIDE EL"                      2122
    1907001907110 "STORY INT"                         1718
    1907001907110 "STORY INT"                         1819
    1907001907110 "STORY INT"                         2021
    1907001907110 "STORY INT"                         2122
    1908001908002 "WESTWOOD H S"                      1718
    1908001908002 "WESTWOOD H S"                      1819
    1908001908002 "WESTWOOD H S"                      2021
    1908001908002 "WESTWOOD H S"                      2122
    1908001908041 "WESTWOOD J H"                      1718
    1908001908041 "WESTWOOD J H"                      1819
    1908001908041 "WESTWOOD J H"                      2021
    1908001908041 "WESTWOOD J H"                      2122
    1908001908101 "WESTWOOD EL"                       1718
    1908001908101 "WESTWOOD EL"                       1819
    1908001908101 "WESTWOOD EL"                       2021
    1908001908101 "WESTWOOD EL"                       2122
    1908001908102 "WESTWOOD PRI"                      1718
    1908001908102 "WESTWOOD PRI"                      1819
    1908001908102 "WESTWOOD PRI"                      2021
    1908001908102 "WESTWOOD PRI"                      2122
    1909001909001 "SLOCUM H S"                        1718
    1909001909001 "SLOCUM H S"                        1819
    1909001909001 "SLOCUM H S"                        2021
    1909001909001 "SLOCUM H S"                        2122
    1909001909101 "SLOCUM PK - 8"                     1718
    1909001909101 "SLOCUM PK - 8"                     1819
    1909001909101 "SLOCUM PK - 8"                     2021
    1909001909101 "SLOCUM PK - 8"                     2122
    2901002901001 "ANDREWS H S"                       1718
    2901002901001 "ANDREWS H S"                       1819
    2901002901001 "ANDREWS H S"                       2021
    2901002901001 "ANDREWS H S"                       2122
    end


    MERGED DATA SET

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double TXidnew str50 schoolname str15 cname int year
    1902001902103 "CAYUGA EL"                  "FRANKSTON EL"    1718
    1902001902103 "CAYUGA EL"                  "FRANKSTON ELEM." 1819
    1902001902103 "CAYUGA EL"                  "FRANKSTON ELEM." 2021
    1902001902103 "CAYUGA EL"                  "FRANKSTON EL"    2122
    1903001903102 "ELKHART INT"                "SLOCUM PK - 8"   1718
    1903001903102 "ELKHART INT"                "SLOCUM PRE-K -"  1819
    1903001903102 "ELKHART INT"                "SLOCUM PRE-K -"  2021
    1903001903102 "ELKHART INT"                "SLOCUM PK - 8"   2122
    1904001904102 "FRANKSTON EL"               "W H BONNER EL"   1718
    1904001904102 "FRANKSTON EL"               "W.H. BONNER ELE" 1819
    1904001904102 "FRANKSTON EL"               "W.H. BONNER ELE" 2021
    1904001904102 "FRANKSTON EL"               "W H BONNER EL"   2122
    1906001906102 "NECHES EL"                  "BRANDON EL"      1718
    1906001906102 "NECHES EL"                  "BRANDON ELEM."   1819
    1906001906102 "NECHES EL"                  "BRANDON ELEM."   2021
    1906001906102 "NECHES EL"                  "BRANDON EL"      2122
    1907001907110 "STORY INT"                  "ZAVALLA EL"      1718
    1907001907110 "STORY INT"                  "ZAVALLA ELEM."   1819
    1907001907110 "STORY INT"                  "ZAVALLA ELEM."   2021
    1907001907110 "STORY INT"                  "ZAVALLA EL"      2122
    1908001908101 "WESTWOOD EL"                "ARCHER CITY EL"  1718
    1908001908101 "WESTWOOD EL"                "ARCHER CITY ELE" 1819
    1908001908101 "WESTWOOD EL"                "ARCHER CITY ELE" 2021
    1908001908101 "WESTWOOD EL"                "ARCHER CITY EL"  2122
    1909001909101 "SLOCUM PK - 8"              ""                1718
    1909001909101 "SLOCUM PK - 8"              ""                1819
    1909001909101 "SLOCUM PK - 8"              "CLAUDE SCHOOLS"  2021
    1909001909101 "SLOCUM PK - 8"              "CLAUDE SCHOOLS"  2122
    2901002901104 "UNDERWOOD EL"               ""                1718
    2901002901104 "UNDERWOOD EL"               ""                1819
    2901002901104 "UNDERWOOD EL"               "ATASCOSA COUNTY" 2021
    2901002901104 "UNDERWOOD EL"               ""                2122
    3801003801103 "SARAH STRINDEN EL"          "O'BRYANT INT"    1718
    3801003801103 "SARAH STRINDEN EL"          "O'BRYANT INTERM" 1819
    3801003801103 "SARAH STRINDEN EL"          "O'BRYANT INTERM" 2021
    3801003801103 "SARAH STRINDEN EL"          "O'BRYANT INT"    2122
    3902003902102 "W H BONNER EL"              ""                1718
    3902003902102 "W H BONNER EL"              "SEALY ELEM."     1819
    3902003902102 "W H BONNER EL"              "SEALY ELEM."     2021
    3902003902102 "W H BONNER EL"              "SEALY EL"        2122
    3903003903120 "ANDERSON EL"                "MARY DESHAZO EL" 1718
    3903003903120 "ANDERSON EL"                "MARY DESHAZO EL" 1819
    3903003903120 "ANDERSON EL"                "MARY DESHAZO EL" 2021
    3903003903120 "ANDERSON EL"                "MARY DESHAZO EL" 2122
    3903003903121 "BRANDON EL"                 "MEDINA SCHOOL"   1718
    3903003903121 "BRANDON EL"                 "MEDINA SCHOOL"   1819
    3903003903121 "BRANDON EL"                 "MEDINA SCHOOL"   2021
    3903003903121 "BRANDON EL"                 "MEDINA SCHOOL"   2122
    3903003903122 "BROOKHOLLOW EL"             "ALKEK EL"        1718
    3903003903122 "BROOKHOLLOW EL"             "ALKEK ELEM."     1819
    3903003903122 "BROOKHOLLOW EL"             "ALKEK ELEM."     2021
    3903003903122 "BROOKHOLLOW EL"             "ALKEK EL"        2122
    3903003903123 "COSTON EL"                  "HILL COUNTRY EL" 1718
    3903003903123 "COSTON EL"                  "HILL COUNTRY EL" 1819
    3903003903123 "COSTON EL"                  "HILL COUNTRY EL" 2021
    3903003903123 "COSTON EL"                  "HILL COUNTRY EL" 2122
    3903003903129 "SLACK EL"                   "NEIDIG EL"       1718
    3903003903129 "SLACK EL"                   "NEIDIG ELEM."    1819
    3903003903129 "SLACK EL"                   ""                2021
    3903003903129 "SLACK EL"                   ""                2122
    3904003904102 "HUNTINGTON INT"             ""                1718
    3904003904102 "HUNTINGTON INT"             "FADDEN-MCKEOWN-" 1819
    3904003904102 "HUNTINGTON INT"             "FADDEN-MCKEOWN-" 2021
    3904003904102 "HUNTINGTON INT"             "FADDEN-MCKEOWN-" 2122
    3905003905103 "H G TEMPLE INT"             "GEORGETOWN CHAR" 1718
    3905003905103 "H G TEMPLE INT"             "GEORGETOWN CHAR" 1819
    3905003905103 "H G TEMPLE INT"             "GEORGETOWN CHAR" 2021
    3905003905103 "H G TEMPLE INT"             "GEORGETOWN CHAR" 2122
    3906003906101 "ZAVALLA EL"                 "CEDAR PARK CHAR" 1718
    3906003906101 "ZAVALLA EL"                 "CEDAR PARK CHAR" 1819
    3906003906101 "ZAVALLA EL"                 "CEDAR PARK CHAR" 2021
    3906003906101 "ZAVALLA EL"                 "CEDAR PARK CHAR" 2122
    3907003907041 "CENTRAL J H"                "NEW HORIZONS"    1718
    3907003907041 "CENTRAL J H"                "NEW HORIZONS"    1819
    3907003907041 "CENTRAL J H"                "NEW HORIZONS"    2021
    3907003907041 "CENTRAL J H"                "NEW HORIZONS"    2122
    4901004901101 "FULTON 4-5 LEARNING CENTER" "ACADEMY INT"     1718
    4901004901101 "FULTON LEARNING CENTER"     "ACAD. INTERMEDI" 1819
    4901004901101 "FULTON LEARNING CENTER"     "ACAD. INTERMEDI" 2021
    4901004901101 "FULTON LEARNING CENTER"     "ACADEMY INT"     2122
    5901005901101 "ARCHER CITY EL"             "SOUTHWEST EL"    1718
    5901005901101 "ARCHER CITY EL"             "SOUTHWEST ELEM." 1819
    5901005901101 "ARCHER CITY EL"             "SOUTHWEST ELEM." 2021
    5901005901101 "ARCHER CITY EL"             "SOUTHWEST EL"    2122
    5902005902101 "HOLLIDAY EL"                "JOE M PIRTLE EL" 1718
    5902005902101 "HOLLIDAY EL"                "JOE M. PIRTLE E" 1819
    5902005902101 "HOLLIDAY EL"                "JOE M. PIRTLE E" 2021
    5902005902101 "HOLLIDAY EL"                "JOE M PIRTLE EL" 2122
    5904005904101 "WINDTHORST EL"              "CHISHOLM TRAIL"  1718
    5904005904101 "WINDTHORST EL"              "CHISHOLM TRAIL"  1819
    5904005904101 "WINDTHORST EL"              "CHISHOLM TRAIL"  2021
    5904005904101 "WINDTHORST EL"              "CHISHOLM TRAIL"  2122
    6902006902001 "CLAUDE SCHOOLS"             "CHARTER OAK ELE" 2021
    6902006902001 "CLAUDE SCHOOLS"             "CHARTER OAK EL"  2122
    6902006902101 "CLAUDE EL"                  "HOLLAND EL"      1718
    6902006902101 "CLAUDE EL"                  "HOLLAND ELEM."   1819
    7901007901041 "CHARLOTTE MIDDLE"           "EAST WARD EL"    1718
    7901007901041 "CHARLOTTE MIDDLE"           "EAST WARD ELEM." 1819
    7901007901101 "CHARLOTTE EL"               "HARKER HEIGHTS"  2021
    7901007901101 "CHARLOTTE EL"               "HARKER HEIGHTS"  2122
    end

    Comment


    • #3
      Sorry, I should have included this all in one post. Look at data set 1 and 2, they both have cayuga elementary with id number 1902001902103. But when I merge them dataset 1 decides that ID number is Frankston Elem. data set 1 uses the variable name "cname" for the school name.

      Comment


      • #4
        Why do you not present any code for how you got this result?

        Comment


        • #5
          I honestly don't know what more I could put?

          merge 1:1 TXidnew year using "dataset2" ?

          I am sorry, I have been on here a handful of times over the last couple of years, and have decided im a perpetual novice at stata.

          Comment


          • #6
            ok, I know this is very basic, but the problem was I guess it was supposed to be a many to one merge. I really thought it would be 1:1 but doing m:1 gave me the correct results.

            Comment


            • #7
              I'm glad you seem to have solved your problem. But I have my doubts. First, unless you data sets are very small, it is really unreliable to decide whether the matching has gone correctly by visual scanning. It is very easy to miss things.

              More important, in your example data, TXidnew and year do uniquely identify observations in both data sets. They are, in that respect, compatible with a 1:1 -merge-. In that situation, just changing to a m:1 -merge- will not change the results at all. If, on the other hand, in the full data sets there really is a need for an m:1 -merge- (because TXnewid and year do not actually uniquely identify observations), your attempt to do a 1:1 -merge- would not have produced the mismatched results you experienced: you would have gotten an error message stating that problem and there would have been no -merge- results at all.

              Finally, I will also note that the association between TXidnew and cname in example dataset 1 is not, in fact, internally consistent. Terms like "ELEMENTARY" and the like appear indiscriminately spelled out in full, or abbreviated to ELEM or EL, with or without periods throughout the data set. Moreover, TXidnew 7901007901041 is associated with four different cnames, two of which imply it is an elementary school, and two of which imply it is a middle school. Perhaps it is a combined 1-8 school, and this is harmless. But are you sure that this TXidnew is not miscoding two different schools as one? These inconsistencies internal to one dataset, assuming they do not amount to assigning different actual schools to the same number, should not affect the correspondence across the two data sets. But if I can so easily find these inconsistencies in the brief example data, I am inclined to believe that more serious data errors might be found in the larger data sets.

              You have at no point shown us the code that you used to create the unsuccessful, nor the putatively successful -merge- commands. But I do not believe the problem is as you have described it. If I were you, I would not trust the result you have obtained with -merge m:1-. I think that the problem has not yet been solved, and I would post back showing the exact code you used.
              Last edited by Clyde Schechter; 04 Mar 2023, 11:05.

              Comment

              Working...
              X