Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging two panel data sets - 1:m merging

    Hello, I need help with two merging panel data sets according to the following variables: h_id, year, g_m1 and gender_m.

    The form that I want to achieve by merging is something like:
    h_id h_pid year g_m1 gender_m c_age c_age
    15 1502 2017 20 1 8 12

    The problems that I am facing is that stata says that the variables do not uniquely identify observations in the master data. (I can't understand why they are not unique.)

    Another is it's 1:m merging. (because the master data set is observations of parents and the using data set are children - and I want to match children to parents)

    I'm sorry i made the explanation very confusing but it'd be really appreciated if I can get some help/ idea to get through this. I have attached the datasets below. Please let me know if you need further explanation/ information. Thank you.


    The below is the master data set.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int h_id long h_pid float(year age g_m1 gender_m)
    15 1502 2017 42 20 1
    15 1502 2005 30 20 1
    15 1501 2020 42 10 0
    15 1501 2012 34 10 0
    15 1501 2008 30 10 0
    15 1501 2011 33 10 0
    15 1502 2014 39 20 1
    15 1502 2015 40 20 1
    15 1502 2018 43 20 1
    15 1502 2008 33 20 1
    15 1501 2018 40 10 0
    15 1502 2016 41 20 1
    15 1502 2013 38 20 1
    15 1501 2010 32 10 0
    15 1501 2013 35 10 0
    15 1502 2007 32 20 1
    15 1501 2015 37 10 0
    15 1501 2009 31 10 0
    15 1501 2007 29 10 0
    15 1501 2014 36 10 0
    15 1502 2012 37 20 1
    15 1502 2006 31 20 1
    15 1502 2011 36 20 1
    15 1501 2016 38 10 0
    15 1501 2005 27 10 0
    15 1502 2010 35 20 1
    15 1501 2017 39 10 0
    15 1502 2019 44 20 1
    15 1501 2019 41 10 0
    15 1502 2009 34 20 1
    15 1502 2020 45 20 1
    15 1501 2006 28 10 0
    38 3801 2020 48 10 0
    38 3801 2010 38 10 0
    38 3802 2005 31 20 1
    38 3801 2006 34 10 0
    38 3801 2014 42 10 0
    38 3802 2011 37 20 1
    38 3802 2014 40 20 1
    38 3801 2016 44 10 0
    38 3802 2016 42 20 1
    38 3801 2009 37 10 0
    38 3802 2017 43 20 1
    38 3801 2019 47 10 0
    38 3802 2009 35 20 1
    38 3802 2018 44 20 1
    38 3801 2012 40 10 0
    38 3801 2017 45 10 0
    38 3802 2010 36 20 1
    38 3801 2011 39 10 0
    38 3802 2015 41 20 1
    38 3802 2008 34 20 1
    38 3802 2013 39 20 1
    38 3801 2008 36 10 0
    38 3802 2012 38 20 1
    38 3801 2005 33 10 0
    38 3801 2018 46 10 0
    38 3802 2019 45 20 1
    38 3801 2007 35 10 0
    38 3801 2013 41 10 0
    38 3802 2007 33 20 1
    38 3801 2015 43 10 0
    38 3802 2020 46 20 1
    38 3802 2006 32 20 1
    45 4501 2013 41 10 0
    45 4502 2015 42 20 1
    45 4501 2010 38 10 0
    45 4501 2005 33 10 0
    45 4501 2011 39 10 0
    45 4502 2012 39 20 1
    45 4501 2016 44 10 0
    45 4502 2011 38 20 1
    45 4502 2008 35 20 1
    45 4502 2016 43 20 1
    45 4501 2007 35 10 0
    45 4501 2006 34 10 0
    45 4501 2020 48 10 0
    45 4501 2019 47 10 0
    45 4501 2009 37 10 0
    45 4502 2005 32 20 1
    45 4502 2013 40 20 1
    45 4502 2010 37 20 1
    45 4502 2009 36 20 1
    45 4501 2018 46 10 0
    45 4501 2014 42 10 0
    45 4502 2019 46 20 1
    45 4502 2007 34 20 1
    45 4501 2008 36 10 0
    45 4501 2015 43 10 0
    45 4502 2017 44 20 1
    45 4502 2018 45 20 1
    45 4501 2017 45 10 0
    45 4502 2006 33 20 1
    45 4502 2014 41 20 1
    45 4501 2012 40 10 0
    45 4502 2020 47 20 1
    48 4802 2017 35 20 1
    48 4805 2018 42 10 0
    48 4801 2007 73 10 1
    48 4802 2011 29 20 1
    end
    And the below is the using data set that I want to merge into the master data set.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int h_id float(year g_m2 gender_m c_age)
    15 2018 20 1 14
    15 2015 20 1 11
    15 2012 20 1  8
    15 2016 20 1  8
    15 2014 20 1  6
    15 2008 20 1  4
    15 2017 20 1  9
    15 2006 20 1  2
    15 2016 20 1 12
    15 2013 20 1  9
    15 2005 20 1  1
    15 2009 20 1  5
    15 2017 20 1 13
    15 2020 20 1 16
    15 2011 20 1  7
    15 2010 20 1  2
    15 2014 20 1 10
    15 2019 20 1 11
    15 2009 20 1  1
    15 2007 20 1  3
    15 2015 20 1  7
    15 2012 20 1  4
    15 2010 20 1  6
    15 2019 20 1 15
    15 2018 20 1 10
    15 2013 20 1  5
    15 2011 20 1  3
    15 2020 20 1 12
    38 2014 20 1  8
    38 2018 20 1 12
    38 2012 20 1  6
    38 2008 20 1  2
    38 2011 20 1  5
    38 2019 20 1 13
    38 2013 20 1  7
    38 2016 20 1 10
    38 2009 20 1  3
    38 2010 20 1  4
    38 2014 20 1  8
    38 2007 20 1  1
    38 2007 20 1  1
    38 2015 20 1  9
    38 2016 20 1 10
    38 2015 20 1  9
    38 2013 20 1  7
    38 2020 20 1 14
    38 2010 20 1  4
    38 2019 20 1 13
    38 2018 20 1 12
    38 2017 20 1 11
    38 2008 20 1  2
    38 2012 20 1  6
    38 2011 20 1  5
    38 2017 20 1 11
    38 2020 20 1 14
    38 2009 20 1  3
    45 2012 20 1  8
    45 2007 20 1  3
    45 2015 20 1 11
    45 2018 20 1 14
    45 2010 20 1  6
    45 2017 20 1 13
    45 2011 20 1  7
    45 2006 20 1  2
    45 2005 20 1  1
    45 2013 20 1  9
    45 2020 20 1 16
    45 2008 20 1  4
    45 2019 20 1 15
    45 2016 20 1 12
    45 2014 20 1 10
    45 2009 20 1  5
    48 2014 20 1  4
    48 2011 20 1  1
    48 2018 20 1  8
    48 2017 20 1  5
    48 2016 20 1  6
    48 2013 20 1  3
    48 2020 20 1  8
    48 2012 20 1  2
    48 2018 20 1  6
    48 2013 20 1  1
    48 2017 20 1  7
    48 2014 20 1  2
    48 2015 20 1  3
    48 2016 20 1  4
    48 2019 20 1  9
    48 2019 20 1  7
    48 2015 20 1  5
    48 2020 20 1 10
    end

  • #2
    The problems that I am facing is that stata says that the variables do not uniquely identify observations in the master data. (I can't understand why they are not unique.)
    In the master data, check for duplicates. There is a set of commands under duplicates, and here are two examples. The first reports the amount of duplicated cases, the second one generates a variable called FLAG so that we can find which cases are the problems

    Code:
    duplicates report h_pid year g_m1 gender_m
    duplicates tag h_pid year g_m1 gender_m, gen(FLAG)
    To learn more, use help duplicates.

    Comment

    Working...
    X