Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No sufficient identifier variables for merge (Follow up)

    Hi! I am looking to merge two datasets with the goal of including the variables size and group_numb (number of groups) from the using dataset in the master dataset. The only common variables between the two are country and year, which do not uniqiely identify observations in neither dataset. The variable groupid could identify the values for size and group_numb in the using dataset but this variable is not present in the master dataset. I am new to stata so I am currently at a loss as to how to merge these datasets. Thanks for the help!

  • #2
    Presumably *something* enables you as a human being to say "I want the variable value from *this* observation in the using data set to go with *that* observation in the master data set." If you can explain to us how you determine that matchup, and show us relevant data examples, per the StataList FAQ, for both the using and master file, your chance of getting some help in getting Stata to recognize this matchup would be good. From your posting, it almost sounds like *nothing* would distinctly link observations in the two files, a situation that would make a merge impossible, so let's hope that impression is wrong.

    Comment


    • #3
      The first question is, if country and year do not uniquely identify observations in either data set, is this a fact about the real data or does this show that your data set contains errors? In both data sets, I recommend you run
      Code:
      duplicates tag country year, gen(flag)
      browse if flag
      This will show you the offending observations, and it may be that you will find that they are either pure duplicates (they agree on all observations), represent incomplete observations that when put together would reduce to a single observation, contain only missing values, or are contradictory observations (i.e. contain real data that disagree on other variables). In the last case you have a serious problem that requires you to ascertain which, if any, of those observations is actually correct and get rid of the others (or combine them in some way to come up with correct data). In the first three cases, it is fairly straightforward to get rid of the surplus observations, although the presence of these data anomalies suggests that a thorough review of the data management that created the data set is in order. (Where one error has been found, others may lurk.)

      Or, you may find that the data in both sets is correct. In that case, the very notion of merging these data sets is ill-defined. You must make it more precise. If country and year do not uniquely identify the observations in either data set, you must answer this question: if you were doing this by hand, how would you decide, given, say 3 observations for Zaire 2015 in one data set and 4 such observations in the other, how many observations for Zaire 2015 should there be in the result, and which observations from the first data set would be paired with which observations in the second. As this can be difficult to describe in words, I suggest that when you post back, you also provide example data from both data sets (including some observations that should pair up and some others that shouldn't) and then also show what the result would look like. Be sure to use -dataex- to show the examples.

      If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2, where Mike Lacey has said pretty much the same things, albeit more concisely.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        The first question is, if country and year do not uniquely identify observations in either data set, is this a fact about the real data or does this show that your data set contains errors? In both data sets, I recommend you run
        Code:
        duplicates tag country year, gen(flag)
        browse if flag
        This will show you the offending observations, and it may be that you will find that they are either pure duplicates (they agree on all observations), represent incomplete observations that when put together would reduce to a single observation, contain only missing values, or are contradictory observations (i.e. contain real data that disagree on other variables). In the last case you have a serious problem that requires you to ascertain which, if any, of those observations is actually correct and get rid of the others (or combine them in some way to come up with correct data). In the first three cases, it is fairly straightforward to get rid of the surplus observations, although the presence of these data anomalies suggests that a thorough review of the data management that created the data set is in order. (Where one error has been found, others may lurk.)

        Or, you may find that the data in both sets is correct. In that case, the very notion of merging these data sets is ill-defined. You must make it more precise. If country and year do not uniquely identify the observations in either data set, you must answer this question: if you were doing this by hand, how would you decide, given, say 3 observations for Zaire 2015 in one data set and 4 such observations in the other, how many observations for Zaire 2015 should there be in the result, and which observations from the first data set would be paired with which observations in the second. As this can be difficult to describe in words, I suggest that when you post back, you also provide example data from both data sets (including some observations that should pair up and some others that shouldn't) and then also show what the result would look like. Be sure to use -dataex- to show the examples.

        If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Added: Crossed with #2, where Mike Lacey has said pretty much the same things, albeit more concisely.
        Thank you for your reply!

        here are examples from both datasets:

        First, the master dataset: (variables such as violence repress and escalation addes just for the sake of a more complete example)

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double eventid int ccode str37 country float(year violence) byte(repress escalation)
        6150002 615 "Algeria"                  1990 0 0 0
        6150005 615 "Algeria"                  1990 0 0 0
        6150004 615 "Algeria"                  1990 1 0 0
        6150001 615 "Algeria"                  1990 0 0 0
        6150003 615 "Algeria"                  1990 0 0 0
        5400002 540 "Angola"                   1990 0 0 0
        5400003 540 "Angola"                   1990 1 0 0
        4340001 434 "Benin"                    1990 0 0 0
        4340002 434 "Benin"                    1990 0 0 0
        4820001 482 "Central African Republic" 1990 0 0 0
        4370001 437 "Cote d'Ivoire"            1990 0 1 0
        4370009 437 "Cote d'Ivoire"            1990 1 0 0
        4370007 437 "Cote d'Ivoire"            1990 0 1 0
        4370005 437 "Cote d'Ivoire"            1990 0 0 0
        4370006 437 "Cote d'Ivoire"            1990 0 1 4
        4370002 437 "Cote d'Ivoire"            1990 0 0 0
        4370004 437 "Cote d'Ivoire"            1990 0 0 4
        4370008 437 "Cote d'Ivoire"            1990 1 0 0
        6510007 651 "Egypt"                    1990 1 2 0
        6510003 651 "Egypt"                    1990 1 0 0
        6510002 651 "Egypt"                    1990 0 2 4
        6510004 651 "Egypt"                    1990 1 1 0
        6510005 651 "Egypt"                    1990 1 0 0
        6510008 651 "Egypt"                    1990 1 2 0
        6510006 651 "Egypt"                    1990 0 2 4
        6510001 651 "Egypt"                    1990 0 1 0
        5300005 530 "Ethiopia"                 1990 0 1 0
        4810004 481 "Gabon"                    1990 1 1 0
        4810001 481 "Gabon"                    1990 1 0 0
        4810004 481 "Gabon"                    1990 1 1 0
        4810002 481 "Gabon"                    1990 0 0 0
        4810003 481 "Gabon"                    1990 1 0 0
        5010002 501 "Kenya"                    1990 0 2 4
        5010002 501 "Kenya"                    1990 0 2 4
        5010001 501 "Kenya"                    1990 0 1 4
        5010001 501 "Kenya"                    1990 0 1 4
        5010003 501 "Kenya"                    1990 1 0 0
        4500002 450 "Liberia"                  1990 1 0 0
        4500003 450 "Liberia"                  1990 0 1 0
        6200002 620 "Libya"                    1990 0 0 0
        6200001 620 "Libya"                    1990 0 0 0
        5800001 580 "Madagascar"               1990 1 2 2
        4360003 436 "Niger"                    1990 0 0 0
        4360001 436 "Niger"                    1990 0 2 0
        4510001 451 "Sierra Leone"             1990 0 2 4
        5200006 520 "Somalia"                  1990 1 0 0
        5200003 520 "Somalia"                  1990 0 2 0
        5200005 520 "Somalia"                  1990 0 0 0
        5600028 560 "South Africa"             1990 0 1 4
        5600051 560 "South Africa"             1990 1 0 0
        5600010 560 "South Africa"             1990 0 2 4
        5600015 560 "South Africa"             1990 0 2 4
        5600064 560 "South Africa"             1990 0 0 0
        5600046 560 "South Africa"             1990 0 0 0
        5600039 560 "South Africa"             1990 0 1 0
        5600064 560 "South Africa"             1990 0 0 0
        5600009 560 "South Africa"             1990 0 1 4
        5600042 560 "South Africa"             1990 0 2 0
        5600001 560 "South Africa"             1990 0 0 4
        5600005 560 "South Africa"             1990 0 0 0
        5600040 560 "South Africa"             1990 1 0 0
        5600033 560 "South Africa"             1990 1 2 4
        5600030 560 "South Africa"             1990 1 2 0
        5600048 560 "South Africa"             1990 1 0 0
        5600021 560 "South Africa"             1990 1 0 0
        5600052 560 "South Africa"             1990 1 0 0
        5600022 560 "South Africa"             1990 0 0 4
        5600037 560 "South Africa"             1990 1 0 0
        5600036 560 "South Africa"             1990 0 0 0
        5600038 560 "South Africa"             1990 0 0 0
        5600055 560 "South Africa"             1990 1 0 0
        5600013 560 "South Africa"             1990 0 0 4
        5600058 560 "South Africa"             1990 1 0 0
        5600044 560 "South Africa"             1990 1 1 4
        5600003 560 "South Africa"             1990 0 0 4
        5600044 560 "South Africa"             1990 1 1 4
        5600008 560 "South Africa"             1990 0 1 0
        5600066 560 "South Africa"             1990 0 0 0
        5600044 560 "South Africa"             1990 1 1 4
        5600027 560 "South Africa"             1990 0 0 0
        5600043 560 "South Africa"             1990 1 0 0
        5600037 560 "South Africa"             1990 1 0 0
        5600035 560 "South Africa"             1990 1 0 0
        5600054 560 "South Africa"             1990 1 0 0
        5600049 560 "South Africa"             1990 1 0 0
        5600047 560 "South Africa"             1990 0 1 0
        5600002 560 "South Africa"             1990 0 1 0
        5600026 560 "South Africa"             1990 1 2 0
        5600059 560 "South Africa"             1990 1 0 0
        5600045 560 "South Africa"             1990 0 0 0
        5600067 560 "South Africa"             1990 0 0 0
        5600056 560 "South Africa"             1990 1 0 0
        5600017 560 "South Africa"             1990 0 0 0
        5600063 560 "South Africa"             1990 1 0 0
        5600023 560 "South Africa"             1990 1 0 0
        5600053 560 "South Africa"             1990 1 0 0
        5600025 560 "South Africa"             1990 0 0 0
        5600016 560 "South Africa"             1990 1 0 0
        5600004 560 "South Africa"             1990 0 1 0
        5600061 560 "South Africa"             1990 1 0 0
        end

        The using dataset:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long groupid str84 group str37 country float(year size group_numb)
        1000 "Balanta"      "Guinea-Bissau" 1990   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1991   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1992   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1993   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1994   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1995   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1996   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1997   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1998   .3 4
        1000 "Balanta"      "Guinea-Bissau" 1999   .3 4
        3000 "Manjaco"      "Guinea-Bissau" 1990  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1991  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1992  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1993  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1994  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1995  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1996  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1997  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1998  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 1999  .14 4
        5000 "Papel"        "Guinea-Bissau" 1990  .07 4
        5000 "Papel"        "Guinea-Bissau" 1991  .07 4
        5000 "Papel"        "Guinea-Bissau" 1992  .07 4
        5000 "Papel"        "Guinea-Bissau" 1993  .07 4
        5000 "Papel"        "Guinea-Bissau" 1994  .07 4
        5000 "Papel"        "Guinea-Bissau" 1995  .07 4
        5000 "Papel"        "Guinea-Bissau" 1996  .07 4
        5000 "Papel"        "Guinea-Bissau" 1997  .07 4
        5000 "Papel"        "Guinea-Bissau" 1998  .07 4
        5000 "Papel"        "Guinea-Bissau" 1999  .07 4
        6000 "Cape Verdean" "Guinea-Bissau" 1990  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1991  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1992  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1993  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1994  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1995  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1996  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1997  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1998  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 1999  .02 4
        1000 "Balanta"      "Guinea-Bissau" 2000   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2001   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2002   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2003   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2004   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2005   .3 4
        3000 "Manjaco"      "Guinea-Bissau" 2000  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2001  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2002  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2003  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2004  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2005  .14 4
        5000 "Papel"        "Guinea-Bissau" 2000  .07 4
        5000 "Papel"        "Guinea-Bissau" 2001  .07 4
        5000 "Papel"        "Guinea-Bissau" 2002  .07 4
        5000 "Papel"        "Guinea-Bissau" 2003  .07 4
        5000 "Papel"        "Guinea-Bissau" 2004  .07 4
        5000 "Papel"        "Guinea-Bissau" 2005  .07 4
        6000 "Cape Verdean" "Guinea-Bissau" 2000  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2001  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2002  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2003  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2004  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2005  .02 4
        1000 "Balanta"      "Guinea-Bissau" 2006   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2007   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2008   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2009   .3 4
        3000 "Manjaco"      "Guinea-Bissau" 2006  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2007  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2008  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2009  .14 4
        5000 "Papel"        "Guinea-Bissau" 2006  .07 4
        5000 "Papel"        "Guinea-Bissau" 2007  .07 4
        5000 "Papel"        "Guinea-Bissau" 2008  .07 4
        5000 "Papel"        "Guinea-Bissau" 2009  .07 4
        6000 "Cape Verdean" "Guinea-Bissau" 2006  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2007  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2008  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2009  .02 4
        1000 "Balanta"      "Guinea-Bissau" 2010   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2011   .3 4
        1000 "Balanta"      "Guinea-Bissau" 2012   .3 4
        3000 "Manjaco"      "Guinea-Bissau" 2010  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2011  .14 4
        3000 "Manjaco"      "Guinea-Bissau" 2012  .14 4
        5000 "Papel"        "Guinea-Bissau" 2010  .07 4
        5000 "Papel"        "Guinea-Bissau" 2011  .07 4
        5000 "Papel"        "Guinea-Bissau" 2012  .07 4
        6000 "Cape Verdean" "Guinea-Bissau" 2010  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2011  .02 4
        6000 "Cape Verdean" "Guinea-Bissau" 2012  .02 4
        1000 "Balanta"      "Guinea-Bissau" 2013   .3 4
        3000 "Manjaco"      "Guinea-Bissau" 2013  .14 4
        5000 "Papel"        "Guinea-Bissau" 2013  .07 4
        6000 "Cape Verdean" "Guinea-Bissau" 2013  .02 4
        1000 "Mandinka"     "Gambia"        1990 .395 5
        1000 "Mandinka"     "Gambia"        1991 .395 5
        1000 "Mandinka"     "Gambia"        1992 .395 5
        1000 "Mandinka"     "Gambia"        1993 .395 5
        end


        What I aim for the merged dataset to look like is that the number of groups as well as their sizes are added to fit their respective countries and years in the master dataset so for example something like this:
        4370005 "Cote d'Ivoire" 1990 0 0 0 .07(some group's size in that country, at that time) 4 (the number of groups in that country, at that time).

        What seems important are the groupid variable in the using dataset and the eventid in the master dataset. I am thinking that I may need to create a sort of new identifier variable in each dataset to merge but I dont really know how to do this.
        Last edited by Santiago Molano; 07 Jun 2023, 23:17.

        Comment


        • #5
          Originally posted by Mike Lacy View Post
          Presumably *something* enables you as a human being to say "I want the variable value from *this* observation in the using data set to go with *that* observation in the master data set." If you can explain to us how you determine that matchup, and show us relevant data examples, per the StataList FAQ, for both the using and master file, your chance of getting some help in getting Stata to recognize this matchup would be good. From your posting, it almost sounds like *nothing* would distinctly link observations in the two files, a situation that would make a merge impossible, so let's hope that impression is wrong.
          Hello and thank you for reaching out!

          I will provide examples of the data here

          First, the master dataset: (variables such as violence repress and escalation addes just for the sake of a more complete example)

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input double eventid int ccode str37 country float(year violence) byte(repress escalation)
          6150002 615 "Algeria"                  1990 0 0 0
          6150005 615 "Algeria"                  1990 0 0 0
          6150004 615 "Algeria"                  1990 1 0 0
          6150001 615 "Algeria"                  1990 0 0 0
          6150003 615 "Algeria"                  1990 0 0 0
          5400002 540 "Angola"                   1990 0 0 0
          5400003 540 "Angola"                   1990 1 0 0
          4340001 434 "Benin"                    1990 0 0 0
          4340002 434 "Benin"                    1990 0 0 0
          4820001 482 "Central African Republic" 1990 0 0 0
          4370001 437 "Cote d'Ivoire"            1990 0 1 0
          4370009 437 "Cote d'Ivoire"            1990 1 0 0
          4370007 437 "Cote d'Ivoire"            1990 0 1 0
          4370005 437 "Cote d'Ivoire"            1990 0 0 0
          4370006 437 "Cote d'Ivoire"            1990 0 1 4
          4370002 437 "Cote d'Ivoire"            1990 0 0 0
          4370004 437 "Cote d'Ivoire"            1990 0 0 4
          4370008 437 "Cote d'Ivoire"            1990 1 0 0
          6510007 651 "Egypt"                    1990 1 2 0
          6510003 651 "Egypt"                    1990 1 0 0
          6510002 651 "Egypt"                    1990 0 2 4
          6510004 651 "Egypt"                    1990 1 1 0
          6510005 651 "Egypt"                    1990 1 0 0
          6510008 651 "Egypt"                    1990 1 2 0
          6510006 651 "Egypt"                    1990 0 2 4
          6510001 651 "Egypt"                    1990 0 1 0
          5300005 530 "Ethiopia"                 1990 0 1 0
          4810004 481 "Gabon"                    1990 1 1 0
          4810001 481 "Gabon"                    1990 1 0 0
          4810004 481 "Gabon"                    1990 1 1 0
          4810002 481 "Gabon"                    1990 0 0 0
          4810003 481 "Gabon"                    1990 1 0 0
          5010002 501 "Kenya"                    1990 0 2 4
          5010002 501 "Kenya"                    1990 0 2 4
          5010001 501 "Kenya"                    1990 0 1 4
          5010001 501 "Kenya"                    1990 0 1 4
          5010003 501 "Kenya"                    1990 1 0 0
          4500002 450 "Liberia"                  1990 1 0 0
          4500003 450 "Liberia"                  1990 0 1 0
          6200002 620 "Libya"                    1990 0 0 0
          6200001 620 "Libya"                    1990 0 0 0
          5800001 580 "Madagascar"               1990 1 2 2
          4360003 436 "Niger"                    1990 0 0 0
          4360001 436 "Niger"                    1990 0 2 0
          4510001 451 "Sierra Leone"             1990 0 2 4
          5200006 520 "Somalia"                  1990 1 0 0
          5200003 520 "Somalia"                  1990 0 2 0
          5200005 520 "Somalia"                  1990 0 0 0
          5600028 560 "South Africa"             1990 0 1 4
          5600051 560 "South Africa"             1990 1 0 0
          5600010 560 "South Africa"             1990 0 2 4
          5600015 560 "South Africa"             1990 0 2 4
          5600064 560 "South Africa"             1990 0 0 0
          5600046 560 "South Africa"             1990 0 0 0
          5600039 560 "South Africa"             1990 0 1 0
          5600064 560 "South Africa"             1990 0 0 0
          5600009 560 "South Africa"             1990 0 1 4
          5600042 560 "South Africa"             1990 0 2 0
          5600001 560 "South Africa"             1990 0 0 4
          5600005 560 "South Africa"             1990 0 0 0
          5600040 560 "South Africa"             1990 1 0 0
          5600033 560 "South Africa"             1990 1 2 4
          5600030 560 "South Africa"             1990 1 2 0
          5600048 560 "South Africa"             1990 1 0 0
          5600021 560 "South Africa"             1990 1 0 0
          5600052 560 "South Africa"             1990 1 0 0
          5600022 560 "South Africa"             1990 0 0 4
          5600037 560 "South Africa"             1990 1 0 0
          5600036 560 "South Africa"             1990 0 0 0
          5600038 560 "South Africa"             1990 0 0 0
          5600055 560 "South Africa"             1990 1 0 0
          5600013 560 "South Africa"             1990 0 0 4
          5600058 560 "South Africa"             1990 1 0 0
          5600044 560 "South Africa"             1990 1 1 4
          5600003 560 "South Africa"             1990 0 0 4
          5600044 560 "South Africa"             1990 1 1 4
          5600008 560 "South Africa"             1990 0 1 0
          5600066 560 "South Africa"             1990 0 0 0
          5600044 560 "South Africa"             1990 1 1 4
          5600027 560 "South Africa"             1990 0 0 0
          5600043 560 "South Africa"             1990 1 0 0
          5600037 560 "South Africa"             1990 1 0 0
          5600035 560 "South Africa"             1990 1 0 0
          5600054 560 "South Africa"             1990 1 0 0
          5600049 560 "South Africa"             1990 1 0 0
          5600047 560 "South Africa"             1990 0 1 0
          5600002 560 "South Africa"             1990 0 1 0
          5600026 560 "South Africa"             1990 1 2 0
          5600059 560 "South Africa"             1990 1 0 0
          5600045 560 "South Africa"             1990 0 0 0
          5600067 560 "South Africa"             1990 0 0 0
          5600056 560 "South Africa"             1990 1 0 0
          5600017 560 "South Africa"             1990 0 0 0
          5600063 560 "South Africa"             1990 1 0 0
          5600023 560 "South Africa"             1990 1 0 0
          5600053 560 "South Africa"             1990 1 0 0
          5600025 560 "South Africa"             1990 0 0 0
          5600016 560 "South Africa"             1990 1 0 0
          5600004 560 "South Africa"             1990 0 1 0
          5600061 560 "South Africa"             1990 1 0 0
          end

          The using dataset:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input long groupid str84 group str37 country float(year size group_numb)
          1000 "Balanta"      "Guinea-Bissau" 1990   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1991   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1992   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1993   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1994   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1995   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1996   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1997   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1998   .3 4
          1000 "Balanta"      "Guinea-Bissau" 1999   .3 4
          3000 "Manjaco"      "Guinea-Bissau" 1990  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1991  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1992  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1993  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1994  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1995  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1996  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1997  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1998  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 1999  .14 4
          5000 "Papel"        "Guinea-Bissau" 1990  .07 4
          5000 "Papel"        "Guinea-Bissau" 1991  .07 4
          5000 "Papel"        "Guinea-Bissau" 1992  .07 4
          5000 "Papel"        "Guinea-Bissau" 1993  .07 4
          5000 "Papel"        "Guinea-Bissau" 1994  .07 4
          5000 "Papel"        "Guinea-Bissau" 1995  .07 4
          5000 "Papel"        "Guinea-Bissau" 1996  .07 4
          5000 "Papel"        "Guinea-Bissau" 1997  .07 4
          5000 "Papel"        "Guinea-Bissau" 1998  .07 4
          5000 "Papel"        "Guinea-Bissau" 1999  .07 4
          6000 "Cape Verdean" "Guinea-Bissau" 1990  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1991  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1992  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1993  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1994  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1995  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1996  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1997  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1998  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 1999  .02 4
          1000 "Balanta"      "Guinea-Bissau" 2000   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2001   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2002   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2003   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2004   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2005   .3 4
          3000 "Manjaco"      "Guinea-Bissau" 2000  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2001  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2002  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2003  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2004  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2005  .14 4
          5000 "Papel"        "Guinea-Bissau" 2000  .07 4
          5000 "Papel"        "Guinea-Bissau" 2001  .07 4
          5000 "Papel"        "Guinea-Bissau" 2002  .07 4
          5000 "Papel"        "Guinea-Bissau" 2003  .07 4
          5000 "Papel"        "Guinea-Bissau" 2004  .07 4
          5000 "Papel"        "Guinea-Bissau" 2005  .07 4
          6000 "Cape Verdean" "Guinea-Bissau" 2000  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2001  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2002  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2003  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2004  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2005  .02 4
          1000 "Balanta"      "Guinea-Bissau" 2006   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2007   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2008   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2009   .3 4
          3000 "Manjaco"      "Guinea-Bissau" 2006  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2007  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2008  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2009  .14 4
          5000 "Papel"        "Guinea-Bissau" 2006  .07 4
          5000 "Papel"        "Guinea-Bissau" 2007  .07 4
          5000 "Papel"        "Guinea-Bissau" 2008  .07 4
          5000 "Papel"        "Guinea-Bissau" 2009  .07 4
          6000 "Cape Verdean" "Guinea-Bissau" 2006  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2007  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2008  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2009  .02 4
          1000 "Balanta"      "Guinea-Bissau" 2010   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2011   .3 4
          1000 "Balanta"      "Guinea-Bissau" 2012   .3 4
          3000 "Manjaco"      "Guinea-Bissau" 2010  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2011  .14 4
          3000 "Manjaco"      "Guinea-Bissau" 2012  .14 4
          5000 "Papel"        "Guinea-Bissau" 2010  .07 4
          5000 "Papel"        "Guinea-Bissau" 2011  .07 4
          5000 "Papel"        "Guinea-Bissau" 2012  .07 4
          6000 "Cape Verdean" "Guinea-Bissau" 2010  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2011  .02 4
          6000 "Cape Verdean" "Guinea-Bissau" 2012  .02 4
          1000 "Balanta"      "Guinea-Bissau" 2013   .3 4
          3000 "Manjaco"      "Guinea-Bissau" 2013  .14 4
          5000 "Papel"        "Guinea-Bissau" 2013  .07 4
          6000 "Cape Verdean" "Guinea-Bissau" 2013  .02 4
          1000 "Mandinka"     "Gambia"        1990 .395 5
          1000 "Mandinka"     "Gambia"        1991 .395 5
          1000 "Mandinka"     "Gambia"        1992 .395 5
          1000 "Mandinka"     "Gambia"        1993 .395 5
          end

          What I aim for the merged dataset to look like is that the number of groups as well as their sizes are added to fit their respective countries and years in the master dataset so for example something like this:
          4370005 "Cote d'Ivoire" 1990 0 0 0 .07(some group's size in that country, at that time) 4 (the number of groups in that country, at that time).

          What seems important are the groupid variable in the using dataset and the eventid in the master dataset. I am thinking that I may need to create a sort of new identifier variable in each dataset to merge but I dont really know how to do this.
          Last edited by Santiago Molano; 07 Jun 2023, 23:16.

          Comment


          • #6
            Thanks for giving us example data.

            That being said, helping you here was more difficult because you didn't answer my question about how *you* know which observations match in your files. Having seen your data, it appears that the answer is "There is no way to match up observations between the two files because three variables -- country, year, and groupid -- are necessary to uniquely identify an observation in the using file, but the variable -groupid- is not available in the master file." Another way to say this is that at least your "size" variable differs with sets of observations that have the same country/year values. You need to have -groupid- in the master file, so you need to get that data entered into your master file or somehow created.

            Another possibility is that you want several values of -group- and -size- attached to each country/year observation in your master file. This would mean that you want a different observation in your master file for each value of -groupid-. Going any further in this direction would require and answer to this question and my prior question.

            Comment


            • #7
              Yes, Mike Lacy has hit the nail on the head. Let me elaborate a bit.

              4370005 "Cote d'Ivoire" 1990 0 0 0 .07(some group's size in that country, at that time) 4 (the number of groups in that country, at that time)
              At least in the example data, group_num is not a problem because it is always the same in all observations for a given country and year. But the size variable is, indeed, different from one observation to the next within a country-year combination. So the question is, why the one with size = .07 and not one of the others? How would you know to pick that one if you were doing this by hand?

              Comment


              • #8
                Thank you for reaching out again! To bring some context, what I am aiming to do with the merged dataset is running a regression with the dichotomous variable of "violence" as the dependent variable and among others using size and group_numb (number of groups) as independent variables. The case is exactly as you have described, the variables of size and group number both differ with sets of observations that have the same country & year values. The reason for this is of course that there are specific groups per country, identified by the groupid variable so when viewing, for example: size, this variable is different for each group included in the country at a given year. I think its definitely the case that country, year and groupid are needed as identifier variables but I do not have the groupid variable in the master dataset so I have to bring the groupid data into the master dataset or perhaps create some sort of new identifier variables in both the user and the master datasets but I genuinely dont know how this can be done. Similarly, when having the merged dataset, the expectation is that size and group number will differ per country and year depending on the groupid, like in the using dataset. To be exact about the datasets I am using, these are the Ethnic Power Relations and Social Conflict Analysis database.
                Last edited by Santiago Molano; 08 Jun 2023, 09:13.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Yes, Mike Lacy has hit the nail on the head. Let me elaborate a bit.


                  At least in the example data, group_num is not a problem because it is always the same in all observations for a given country and year. But the size variable is, indeed, different from one observation to the next within a country-year combination. So the question is, why the one with size = .07 and not one of the others? How would you know to pick that one if you were doing this by hand?
                  To elaborate some more, each country/year has a specific number of groups, which are named by the group variable such as "Manjaco" or "Papel", then of course size is respective to each group. If I was doing this by hand I would include every size of each group per country/year and not just for example .7, with my goal being that the merged dataset would display the different value of size per country/year as well. For this, I am thinking that it is necessary to have groupid included in the master dataset but as it currently stands, it is not there yet so the merge with size cannot happen. This very detail is what I am not aware of how to do. Thank you for replying!

                  Comment


                  • #10
                    OK, but this doesn't answer the question. How would you know which groupid to pair up with which eventid in the same country and year? Clearly neither data set contains direct information about this. I am not familiar with the Ethnic Power Relations and Social Conflict Analysis database. But perhaps there is some third data set contained in it that links eventid's with corresponding groupid's. I would start with a thorough review of the user documentation that accompanies that database. If it contains a dataset that links eventid and groupid, a series of merges will get you there. If you cannot find the answer in the database documentation, there might be somebody at the organization that manages that database whom you could contact for help.

                    As it stands, it it simply not possible to merge the two data sets you have shown.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      OK, but this doesn't answer the question. How would you know which groupid to pair up with which eventid in the same country and year? Clearly neither data set contains direct information about this. I am not familiar with the Ethnic Power Relations and Social Conflict Analysis database. But perhaps there is some third data set contained in it that links eventid's with corresponding groupid's. I would start with a thorough review of the user documentation that accompanies that database. If it contains a dataset that links eventid and groupid, a series of merges will get you there. If you cannot find the answer in the database documentation, there might be somebody at the organization that manages that database whom you could contact for help.

                      As it stands, it it simply not possible to merge the two data sets you have shown.
                      Yes, it seems like no variables in neither dataset can directly identify size, seeems like the merge is not possible.

                      Comment

                      Working...
                      X