Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets

    Hello,

    I am trying to merge the datasets to see whether the age at which people move out of parental home is affected by unemployment. Here is the code for initial dataset and below for the second dataset:

    input long idno str2 cntry int(yrbrn lvpntyr)
    27 "AT" 1975 1996
    137 "AT" 1951 1971
    194 "AT" 1978 1998
    208 "AT" 1955 1970
    220 "AT" 1947 1966
    254 "AT" 1954 1978
    290 "AT" 1962 1983
    301 "AT" 1944 1965
    305 "AT" 1981 2011
    400 "AT" 1996 2015
    413 "AT" 1970 1989
    438 "AT" 1959 1978
    459 "AT" 1941 1960
    472 "AT" 1977 2000
    586 "AT" 1956 1978
    592 "AT" 1991 2007
    614 "AT" 1959 1978
    651 "AT" 1958 1978
    687 "AT" 1969 0
    703 "AT" 1976 1996
    722 "AT" 1959 1980
    815 "AT" 1968 1986
    890 "AT" 1975 1996
    912 "AT" 1988 .b
    915 "AT" 1959 1979
    919 "AT" 1999 0
    932 "AT" 1948 1965
    950 "AT" 1948 1970
    976 "AT" 1972 1990
    1005 "AT" 1983 2003
    1037 "AT" 1982 2010
    1046 "AT" 1951 1982
    1049 "AT" 1995 2015
    1093 "AT" 1969 1995
    1171 "AT" 1950 1966
    1196 "AT" 1960 1979
    1254 "AT" 1974 1996
    1290 "AT" 1956 1979
    1309 "AT" 1968 0
    1338 "AT" 1946 .b
    1373 "AT" 1945 1959
    1381 "AT" 1973 2000
    1382 "AT" 1977 1997
    1393 "AT" 1985 2005
    1406 "AT" 1979 2002
    1473 "AT" 1968 1988
    1478 "AT" 1971 1989
    1516 "AT" 1958 1995
    1539 "AT" 1980 1998
    1564 "AT" 1987 2008
    1597 "AT" 1947 1967
    1641 "AT" 1969 1991
    1644 "AT" 1969 1995
    1732 "AT" 1942 1956
    1746 "AT" 1982 2004
    1753 "AT" 1988 2007
    1766 "AT" 1972 1987
    1775 "AT" 1968 1990
    1804 "AT" 1972 1993
    1840 "AT" 1978 2003
    1863 "AT" 1970 1990
    1870 "AT" 1941 1965
    1893 "AT" 1945 .b
    1903 "AT" 1965 .b
    1924 "AT" 1993 2016
    1942 "AT" 1940 1958
    1956 "AT" 1961 1980
    1958 "AT" 1977 1998
    1961 "AT" 1942 1963
    1984 "AT" 2000 0
    2042 "AT" 1986 2004
    2103 "AT" 1940 0
    2127 "AT" 1989 2008
    2135 "AT" 1964 .b
    2153 "AT" 1948 1963
    2188 "AT" 1961 1982
    2237 "AT" 1940 1958
    2247 "AT" 1976 1988
    2284 "AT" 1945 1965
    2295 "AT" 1958 1974
    2370 "AT" 1979 2000
    2437 "AT" 1946 1969
    2470 "AT" 1954 1969
    2480 "AT" 1992 2013
    2508 "AT" 1958 1969
    2511 "AT" 1967 .b
    2580 "AT" 1958 1976
    2606 "AT" 1982 2010
    2614 "AT" 1963 .b
    2616 "AT" 1946 1970
    2696 "AT" 1968 1990
    2702 "AT" 1991 2009
    2706 "AT" 1955 1973
    2724 "AT" 1969 1993
    2898 "AT" 1950 1969
    2908 "AT" 1982 2003
    2911 "AT" 1986 2005
    2926 "AT" 1986 2003
    2963 "AT" 1963 1982
    3008 "AT" 1994 2016
    end
    label values yrbrn yrbrn
    label values lvpntyr lvpntyr
    label def lvpntyr 0 "Still in parental home, never left 2 months", modify
    label def lvpntyr .b "Don't know", modify
    [/CODE]


    "AT" 1991 3.42000007629395
    "AT" 1992 3.58999991416931
    "AT" 1993 4.25
    "AT" 1994 3.53999996185303
    "AT" 1995 4.34999990463257
    "AT" 1996 5.28000020980835
    "AT" 1997 5.15000009536743
    "AT" 1998 5.51999998092651
    "AT" 1999 4.69999980926514
    "AT" 2000 4.69000005722046
    "AT" 2001 4.01000022888184
    "AT" 2002 4.84999990463257
    "AT" 2003 4.78000020980835
    "AT" 2004 5.82999992370605
    "AT" 2005 5.63000011444092
    "AT" 2006 5.23999977111816
    "AT" 2007 4.8600001335144
    "AT" 2008 4.13000011444092
    "AT" 2009 5.30000019073486
    "AT" 2010 4.82000017166138
    "AT" 2011 4.55999994277954
    "AT" 2012 4.86999988555908
    "AT" 2013 5.32999992370605
    "AT" 2014 5.61999988555908
    "AT" 2015 5.71999979019165
    "AT" 2016 6.01000022888184
    "AT" 2017 5.5
    "AT" 2018 4.84999990463257
    "AT" 2019 4.48999977111816
    "AT" 2020 5.3600001335144
    "AT" 2021 6.30100011825562
    "BE" 1991 6.98999977111816
    "BE" 1992 6.69999980926514
    "BE" 1993 8.07999992370605
    "BE" 1994 9.64999961853027
    "BE" 1995 9.34000015258789
    "BE" 1996 9.47999954223633
    "BE" 1997 8.96000003814697
    "BE" 1998 9.31999969482422
    "BE" 1999 8.64999961853027
    "BE" 2000 6.59000015258789
    "BE" 2001 6.17999982833862
    "BE" 2002 6.90999984741211
    "BE" 2003 7.67999982833862
    "BE" 2004 7.3600001335144
    "BE" 2005 8.4399995803833
    "BE" 2006 8.25
    "BE" 2007 7.46000003814697
    "BE" 2008 6.98000001907349
    "BE" 2009 7.90999984741211
    "BE" 2010 8.28999996185303
    "BE" 2011 7.1399998664856
    "BE" 2012 7.53999996185303
    "BE" 2013 8.43000030517578
    "BE" 2014 8.52000045776367
    "BE" 2015 8.47999954223633
    "BE" 2016 7.82999992370605
    "BE" 2017 7.09000015258789
    "BE" 2018 5.94999980926514
    "BE" 2019 5.3600001335144
    "BE" 2020 5.55000019073486
    "BE" 2021 6.42000007629395
    "BG" 1991 11.1000003814697
    "BG" 1992 15.3000001907349
    "BG" 1993 16.3999996185303
    "BG" 1994 12.8000001907349
    "BG" 1995 11.1000003814697
    "BG" 1996 12.5
    "BG" 1997 13.6999998092651
    "BG" 1998 12.1999998092651
    "BG" 1999 14.1000003814697
    "BG" 2000 16.2199993133545
    "BG" 2001 19.9200000762939
    "BG" 2002 18.1100006103516
    "BG" 2003 13.7299995422363
    "BG" 2004 12.039999961853
    "BG" 2005 10.0799999237061
    "BG" 2006 8.94999980926514
    "BG" 2007 6.88000011444092
    "BG" 2008 5.6100001335144
    "BG" 2009 6.82000017166138
    "BG" 2010 10.2799997329712
    "BG" 2011 11.2600002288818
    "BG" 2012 12.2700004577637
    "BG" 2013 12.9399995803833
    "BG" 2014 11.4200000762939
    "BG" 2015 9.14000034332275
    "BG" 2016 7.57000017166138
    "BG" 2017 6.15999984741211
    "BG" 2018 5.21000003814697
    "BG" 2019 4.23000001907349
    "BG" 2020 5.11999988555908
    "BG" 2021 5.42399978637695
    "CY" 1991 3
    "CY" 1992 2.08999991416931
    "CY" 1993 2.70000004768372
    "CY" 1994 2.70000004768372
    "CY" 1995 2.59999990463257
    "CY" 1996 3.09999990463257
    "CY" 1997 3.40000009536743


    How exactly can I merge the unemployment data to the age of departure from home? Here is what I tried so far:

    gen start = 0
    gen duration = yrbrn - lvpntyr + 1
    gen newid = _n // where _n is the position of an observation
    stset duration, f(event) id(newid) // defining single-episode data
    expand duration
    by newid, sort: replace start = duration[_n-1] if newid==newid[_n-1]
    by newid, sort: gen year = lvpntyr[_n-1]+ _n-1 if newid==newid[_n-1]
    merge 1:m cntry year using "C:\Users\sofiy\Desktop\Studies 2021-2022\Unemployment Data.dta"

    However, the 1:m command does not work in this case. What am I doing wrong?

  • #2
    Most probably

    expand duration
    renders "cntry" and "yrbrn" a non-unique identifier. So merge first and then do your manipulations.

    Code:
    merge m:1 cntry yrbrn using "C:\Users\sofiy\Desktop\Studies 2021-2022\Unemployment Data.dta"
    gen start = 0
    gen duration = yrbrn - lvpntyr + 1
    ...

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      Most probably



      renders "cntry" and "yrbrn" a non-unique identifier. So merge first and then do your manipulations.

      Code:
      merge m:1 cntry yrbrn using "C:\Users\sofiy\Desktop\Studies 2021-2022\Unemployment Data.dta"
      gen start = 0
      gen duration = yrbrn - lvpntyr + 1
      ...
      Thank you , Andrew. I am interested in merging the dataset by the duration (i.e. age at which people move out and country). Hence, I need to do manipulations first. Anything else that can be done to make it work?

      Comment


      • #4
        If the identifiers are not unique across both datasets, look at

        Code:
        help joinby
        I do not follow what you are doing, but I suggest that you provide a dataex example of the dataset after manipulation and the other dataset. For example, there are no identifiers in the second dataset in #1. See FAQ Advice #12 for details on dataex.
        Last edited by Andrew Musau; 18 Feb 2023, 06:29.

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          If the identifiers are not unique across both datasets, look at

          Code:
          help joinby
          I do not follow what you are doing, but I suggest that you provide a dataex example of the dataset after manipulation and the other dataset. For example, there are no identifiers in the second dataset in #1. See FAQ Advice #12 for details on dataex.
          I am doing an event-history survival analysis and trying to see whether the level of unemployment has any effect on the age when people move out. I first created a duration variable: year of move out (lvpntyr) - year of birth (yrbrn). However, I am not sure now how to merge the unemployment dataset with young person's age by country and year.


          input long idno str2 cntry int(yrbrn lvpntyr)
          27 "AT" 1975 1996
          137 "AT" 1951 1971
          194 "AT" 1978 1998
          208 "AT" 1955 1970
          220 "AT" 1947 1966
          254 "AT" 1954 1978
          290 "AT" 1962 1983
          301 "AT" 1944 1965
          305 "AT" 1981 2011
          400 "AT" 1996 2015
          413 "AT" 1970 1989
          438 "AT" 1959 1978
          459 "AT" 1941 1960
          472 "AT" 1977 2000
          586 "AT" 1956 1978
          592 "AT" 1991 2007
          614 "AT" 1959 1978
          651 "AT" 1958 1978
          687 "AT" 1969 0
          703 "AT" 1976 1996
          722 "AT" 1959 1980
          815 "AT" 1968 1986
          890 "AT" 1975 1996
          912 "AT" 1988 .b
          915 "AT" 1959 1979
          919 "AT" 1999 0
          932 "AT" 1948 1965
          950 "AT" 1948 1970
          976 "AT" 1972 1990
          1005 "AT" 1983 2003
          1037 "AT" 1982 2010
          1046 "AT" 1951 1982
          1049 "AT" 1995 2015
          1093 "AT" 1969 1995
          1171 "AT" 1950 1966
          1196 "AT" 1960 1979
          1254 "AT" 1974 1996
          1290 "AT" 1956 1979
          1309 "AT" 1968 0
          1338 "AT" 1946 .b
          1373 "AT" 1945 1959
          1381 "AT" 1973 2000
          1382 "AT" 1977 1997
          1393 "AT" 1985 2005
          1406 "AT" 1979 2002
          1473 "AT" 1968 1988
          1478 "AT" 1971 1989
          1516 "AT" 1958 1995
          1539 "AT" 1980 1998
          1564 "AT" 1987 2008
          1597 "AT" 1947 1967
          1641 "AT" 1969 1991
          1644 "AT" 1969 1995
          1732 "AT" 1942 1956
          1746 "AT" 1982 2004
          1753 "AT" 1988 2007
          1766 "AT" 1972 1987
          1775 "AT" 1968 1990
          1804 "AT" 1972 1993
          1840 "AT" 1978 2003
          1863 "AT" 1970 1990
          1870 "AT" 1941 1965
          1893 "AT" 1945 .b
          1903 "AT" 1965 .b
          1924 "AT" 1993 2016
          1942 "AT" 1940 1958
          1956 "AT" 1961 1980
          1958 "AT" 1977 1998
          1961 "AT" 1942 1963
          1984 "AT" 2000 0
          2042 "AT" 1986 2004
          2103 "AT" 1940 0
          2127 "AT" 1989 2008
          2135 "AT" 1964 .b
          2153 "AT" 1948 1963
          2188 "AT" 1961 1982
          2237 "AT" 1940 1958
          2247 "AT" 1976 1988
          2284 "AT" 1945 1965
          2295 "AT" 1958 1974
          2370 "AT" 1979 2000
          2437 "AT" 1946 1969
          2470 "AT" 1954 1969
          2480 "AT" 1992 2013
          2508 "AT" 1958 1969
          2511 "AT" 1967 .b
          2580 "AT" 1958 1976
          2606 "AT" 1982 2010
          2614 "AT" 1963 .b
          2616 "AT" 1946 1970
          2696 "AT" 1968 1990
          2702 "AT" 1991 2009
          2706 "AT" 1955 1973
          2724 "AT" 1969 1993
          2898 "AT" 1950 1969
          2908 "AT" 1982 2003
          2911 "AT" 1986 2005
          2926 "AT" 1986 2003
          2963 "AT" 1963 1982
          3008 "AT" 1994 2016




          input str2 cntry int year double UE
          "AT" 1991 3.42000007629395
          "AT" 1992 3.58999991416931
          "AT" 1993 4.25
          "AT" 1994 3.53999996185303
          "AT" 1995 4.34999990463257
          "AT" 1996 5.28000020980835
          "AT" 1997 5.15000009536743
          "AT" 1998 5.51999998092651
          "AT" 1999 4.69999980926514
          "AT" 2000 4.69000005722046
          "AT" 2001 4.01000022888184
          "AT" 2002 4.84999990463257
          "AT" 2003 4.78000020980835
          "AT" 2004 5.82999992370605
          "AT" 2005 5.63000011444092
          "AT" 2006 5.23999977111816
          "AT" 2007 4.8600001335144
          "AT" 2008 4.13000011444092
          "AT" 2009 5.30000019073486
          "AT" 2010 4.82000017166138
          "AT" 2011 4.55999994277954
          "AT" 2012 4.86999988555908
          "AT" 2013 5.32999992370605
          "AT" 2014 5.61999988555908
          "AT" 2015 5.71999979019165
          "AT" 2016 6.01000022888184
          "AT" 2017 5.5
          "AT" 2018 4.84999990463257
          "AT" 2019 4.48999977111816
          "AT" 2020 5.3600001335144
          "AT" 2021 6.30100011825562
          "BE" 1991 6.98999977111816
          "BE" 1992 6.69999980926514
          "BE" 1993 8.07999992370605
          "BE" 1994 9.64999961853027
          "BE" 1995 9.34000015258789
          "BE" 1996 9.47999954223633
          "BE" 1997 8.96000003814697
          "BE" 1998 9.31999969482422
          "BE" 1999 8.64999961853027
          "BE" 2000 6.59000015258789
          "BE" 2001 6.17999982833862
          "BE" 2002 6.90999984741211
          "BE" 2003 7.67999982833862
          "BE" 2004 7.3600001335144
          "BE" 2005 8.4399995803833
          "BE" 2006 8.25
          "BE" 2007 7.46000003814697
          "BE" 2008 6.98000001907349
          "BE" 2009 7.90999984741211
          "BE" 2010 8.28999996185303
          "BE" 2011 7.1399998664856
          "BE" 2012 7.53999996185303
          "BE" 2013 8.43000030517578
          "BE" 2014 8.52000045776367
          "BE" 2015 8.47999954223633
          "BE" 2016 7.82999992370605
          "BE" 2017 7.09000015258789
          "BE" 2018 5.94999980926514
          "BE" 2019 5.3600001335144
          "BE" 2020 5.55000019073486
          "BE" 2021 6.42000007629395
          "BG" 1991 11.1000003814697
          "BG" 1992 15.3000001907349
          "BG" 1993 16.3999996185303
          "BG" 1994 12.8000001907349
          "BG" 1995 11.1000003814697
          "BG" 1996 12.5
          "BG" 1997 13.6999998092651
          "BG" 1998 12.1999998092651
          "BG" 1999 14.1000003814697
          "BG" 2000 16.2199993133545
          "BG" 2001 19.9200000762939
          "BG" 2002 18.1100006103516
          "BG" 2003 13.7299995422363
          "BG" 2004 12.039999961853
          "BG" 2005 10.0799999237061
          "BG" 2006 8.94999980926514
          "BG" 2007 6.88000011444092
          "BG" 2008 5.6100001335144
          "BG" 2009 6.82000017166138
          "BG" 2010 10.2799997329712
          "BG" 2011 11.2600002288818
          "BG" 2012 12.2700004577637
          "BG" 2013 12.9399995803833
          "BG" 2014 11.4200000762939
          "BG" 2015 9.14000034332275
          "BG" 2016 7.57000017166138
          "BG" 2017 6.15999984741211
          "BG" 2018 5.21000003814697
          "BG" 2019 4.23000001907349
          "BG" 2020 5.11999988555908
          "BG" 2021 5.42399978637695
          "CY" 1991 3
          "CY" 1992 2.08999991416931
          "CY" 1993 2.70000004768372
          "CY" 1994 2.70000004768372
          "CY" 1995 2.59999990463257
          "CY" 1996 3.09999990463257
          "CY" 1997 3.40000009536743
          end
          Last edited by Sofiya Volvakova; 18 Feb 2023, 06:54.

          Comment


          • #6
            Can someone help?
            Last edited by Sofiya Volvakova; 18 Feb 2023, 07:30.

            Comment


            • #7
              Starting with your first dataset

              Code:
              gen year = lvpntyr
              merge m:1 cntry year using "Unemployment Data", keep(1 3)
              People who are still home/haven't moved out will not have unemployment data.

              Comment

              Working...
              X