Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to add datasets?

    I have the dataset below (first dataex) and I want to add another dataset (second dataex). The common variables are "country" and "year". The first dataset has the variable "sex" that includes three case (female, male amd total). I want to add the second dataset in a way that the values of the variable "gdp_pc" mach the "total" of the variable sex in the first dataset. Can you help me with this? Below I send an example of the main dataset (first example) and a second example with the file I want to add (second example).
    Thank you very much in advance.

    First dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 sex int year float(fam1 fam10a) str4 cou float fam9c
    "TOTAL"  1990    3    . "ARG"    .
    "TOTAL"  1991 2.97    . "ARG"    .
    "TOTAL"  1992 2.93    . "ARG"    .
    "TOTAL"  1993 2.88    . "ARG"    .
    "TOTAL"  1994 2.83    . "ARG"    .
    "TOTAL"  1995 2.77    . "ARG"    .
    "TOTAL"  1996 2.72    . "ARG"    .
    "TOTAL"  1997 2.67    . "ARG"    .
    "TOTAL"  1998 2.62    . "ARG"    .
    "TOTAL"  1999 2.58    . "ARG"    .
    "TOTAL"  2000 2.54    . "ARG"    .
    "TOTAL"  2001 2.51    . "ARG"    .
    "TOTAL"  2002 2.49    . "ARG"    .
    "TOTAL"  2003 2.46    . "ARG"    .
    "TOTAL"  2004 2.44    . "ARG"    .
    "TOTAL"  2005 2.42    . "ARG"    .
    "TOTAL"  2006  2.4    . "ARG"    .
    "TOTAL"  2007 2.38    . "ARG"    .
    "TOTAL"  2008 2.37    . "ARG"    .
    "TOTAL"  2009 2.36    . "ARG"    .
    "TOTAL"  2010 2.35    . "ARG"    .
    "TOTAL"  2011 2.34    . "ARG"    .
    "TOTAL"  2012 2.33    . "ARG"    .
    "TOTAL"  2013 2.32    . "ARG"    .
    "TOTAL"  2014 2.31    . "ARG"    .
    "TOTAL"  2015  2.3    . "ARG"    .
    "TOTAL"  2016 2.29    . "ARG"    .
    "TOTAL"  2017 2.28    . "ARG"    .
    "FEMALE" 2018    .    . "ARG"    .
    "MALE"   2018    .    . "ARG"    .
    "TOTAL"  2018 2.26    . "ARG"    .
    "TOTAL"  2019 2.25    . "ARG"    .
    "TOTAL"  2020 2.23    . "ARG"    .
    "TOTAL"  1990  1.9    . "AUS"    .
    "TOTAL"  1991 1.85    . "AUS"    .
    "TOTAL"  1992 1.89    . "AUS"    .
    "TOTAL"  1993 1.86    . "AUS"    .
    "TOTAL"  1994 1.84    . "AUS"    .
    "TOTAL"  1995 1.82    . "AUS"    .
    "TOTAL"  1996  1.8    . "AUS"    .
    "TOTAL"  1997 1.78    . "AUS"    .
    "TOTAL"  1998 1.76    . "AUS"    .
    "TOTAL"  1999 1.76    . "AUS"    .
    "FEMALE" 2000    .    . "AUS"    .
    "MALE"   2000    .    . "AUS"    .
    "TOTAL"  2000 1.76    . "AUS"    .
    "FEMALE" 2001    .    . "AUS"    .
    "MALE"   2001    .    . "AUS"    .
    "TOTAL"  2001 1.73    . "AUS"    .
    "FEMALE" 2002    .    . "AUS"    .
    "MALE"   2002    .    . "AUS"    .
    "TOTAL"  2002 1.77    . "AUS"    .
    "FEMALE" 2003    .    . "AUS"    .
    "MALE"   2003    .    . "AUS"    .
    "TOTAL"  2003 1.77    . "AUS"    .
    "FEMALE" 2004    .    . "AUS"    .
    "MALE"   2004    .    . "AUS"    .
    "TOTAL"  2004 1.78 19.6 "AUS" 16.6
    "FEMALE" 2005    .    . "AUS"    .
    "MALE"   2005    .    . "AUS"    .
    "TOTAL"  2005 1.85 19.8 "AUS" 14.8
    "FEMALE" 2006    .    . "AUS"    .
    "MALE"   2006    .    . "AUS"    .
    "TOTAL"  2006 1.88 19.4 "AUS" 14.3
    "FEMALE" 2007    .    . "AUS"    .
    "MALE"   2007    .    . "AUS"    .
    "TOTAL"  2007 1.99 22.5 "AUS" 13.5
    "FEMALE" 2008    .    . "AUS"    .
    "MALE"   2008    .    . "AUS"    .
    "TOTAL"  2008 2.02 27.1 "AUS" 12.5
    "FEMALE" 2009    .    . "AUS"    .
    "MALE"   2009    .    . "AUS"    .
    "TOTAL"  2009 1.97 22.1 "AUS" 15.1
    "FEMALE" 2010    .    . "AUS"    .
    "MALE"   2010    .    . "AUS"    .
    "TOTAL"  2010 1.95 23.3 "AUS" 13.9
    "FEMALE" 2011    .    . "AUS"    .
    "MALE"   2011    .    . "AUS"    .
    "TOTAL"  2011 1.92 25.2 "AUS"   14
    "FEMALE" 2012    .    . "AUS"    .
    "MALE"   2012    .    . "AUS"    .
    "TOTAL"  2012 1.93    . "AUS"    .
    "FEMALE" 2013    .    . "AUS"    .
    "MALE"   2013    .    . "AUS"    .
    "TOTAL"  2013 1.88    . "AUS"    .
    "FEMALE" 2014    .    . "AUS"    .
    "MALE"   2014    .    . "AUS"    .
    "TOTAL"  2014 1.79    . "AUS"    .
    "FEMALE" 2015    .    . "AUS"    .
    "MALE"   2015    .    . "AUS"    .
    "TOTAL"  2015 1.79    . "AUS"    .
    "FEMALE" 2016    .    . "AUS"    .
    "MALE"   2016    .    . "AUS"    .
    "TOTAL"  2016 1.79    . "AUS"    .
    "FEMALE" 2017    .    . "AUS"    .
    "MALE"   2017    .    . "AUS"    .
    "TOTAL"  2017 1.74    . "AUS"    .
    "FEMALE" 2018    .    . "AUS"    .
    "MALE"   2018    .    . "AUS"    .
    "TOTAL"  2018 1.74    . "AUS"    .
    end



    Second dataset:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str49 country int year float gdp_pc
    "Australia"      2015  47226.76
    "Australia"      2016  47555.96
    "Australia"      2017  48109.64
    "Australia"      2018  48405.05
    "Australia"      2019  47649.92
    "Australia"      2020  48094.34
    "Australia"      2021  49747.45
    "Austria"        2015  49942.05
    "Austria"        2016  50292.87
    "Austria"        2017   51105.6
    "Austria"        2018  52092.59
    "Austria"        2019  52645.18
    "Austria"        2020  49030.93
    "Austria"        2021  51066.61
    "Belgium"        2015  46201.69
    "Belgium"        2016  46551.56
    "Belgium"        2017  47122.51
    "Belgium"        2018  47749.11
    "Belgium"        2019  48555.65
    "Belgium"        2020  45733.46
    "Belgium"        2021  48346.04
    "Canada"         2015  44670.05
    "Canada"         2016  44609.38
    "Canada"         2017  45417.38
    "Canada"         2018  45868.34
    "Canada"         2019  46055.95
    "Canada"         2020  43299.66
    "Canada"         2021  45018.61
    "Czech Republic" 2015  33909.31
    "Czech Republic" 2016  34696.16
    "Czech Republic" 2017 36405.973
    "Czech Republic" 2018 37447.773
    "Czech Republic" 2019 38427.285
    "Czech Republic" 2020  36208.01
    "Czech Republic" 2021  37501.26
    "Denmark"        2015  49058.14
    "Denmark"        2016  50235.02
    "Denmark"        2017  51329.97
    "Denmark"        2018  52089.17
    "Denmark"        2019  52658.08
    "Denmark"        2020  51492.68
    "Denmark"        2021  53771.66
    "Finland"        2015  42490.21
    "Finland"        2016  43567.15
    "Finland"        2017   44852.7
    "Finland"        2018  45298.11
    "Finland"        2019  45808.05
    "Finland"        2020  44724.21
    "Finland"        2021  45964.21
    "France"         2015  40829.89
    "France"         2016  41122.76
    "France"         2017  41886.43
    "France"         2018  42456.99
    "France"         2019  43039.73
    "France"         2020  39548.03
    "France"         2021  42111.95
    "Germany"        2015  47609.56
    "Germany"        2016  48279.98
    "Germany"        2017  49389.27
    "Germany"        2018  49724.11
    "Germany"        2019  50136.41
    "Germany"        2020  48243.49
    "Germany"        2021  49490.01
    "Greece"         2015  26760.15
    "Greece"         2016 26740.924
    "Greece"         2017  27086.48
    "Greece"         2018  27594.35
    "Greece"         2019  28144.07
    "Greece"         2020  25664.93
    "Greece"         2021 27881.627
    "Hungary"        2015 26798.854
    "Hungary"        2016 27469.645
    "Hungary"        2017  28719.39
    "Hungary"        2018  30297.81
    "Hungary"        2019 31783.996
    "Hungary"        2020 30404.014
    "Hungary"        2021  32702.99
    "Japan"          2015  40908.78
    "Japan"          2016  41274.64
    "Japan"          2017  42041.31
    "Japan"          2018  42374.82
    "Japan"          2019  42365.48
    "Japan"          2020  40603.66
    "Japan"          2021  41344.38
    "Poland"         2015 26495.814
    "Poland"         2016  27298.25
    "Poland"         2017  28705.12
    "Poland"         2018  30418.82
    "Poland"         2019  31794.82
    "Poland"         2020  31178.52
    "Poland"         2021 33481.156
    "Spain"          2015  34945.48
    "Spain"          2016 35976.254
    "Spain"          2017 36980.742
    "Spain"          2018 37666.945
    "Spain"          2019 38107.176
    "Spain"          2020 33612.746
    "Spain"          2021  35486.12
    "Switzerland"    2015  66020.21
    "Switzerland"    2016  66638.74
    end

  • #2
    There are several problems here. The first is that I would not work with a data set like dataset1 in the first place. Having the total and then the separate male and female statistics in the same dataset seldom works well, because there are few if any statistical analyses that would use both the total and the sex-specific data. So, at best, you will constantly be having to add -if sex == "TOTAL"- or -if sex != "TOTAL"- to every data analysis command. (In fact, your current request fits this description as you want to bring in the gdp_pc variable, but only for the TOTAL observations. And the odds are good that at some point you will forget to do that, and then you will get results that are just invalid. So if I were you, I would split this up into two data sets. One for TOTAL only, and the other for MALE and FEMALE only. Then for any given analysis, I would use whichever of the two data sets is appropriate.

    That said, I will nevertheless show you how to work with the data set you have on this problem. The major hurdle here is that the country variables in the two data sets are coded differently. One is 3-letter codes, and the other is full names written out. In order to combine these data sets, you will need to convert one of them to match the other (or both of them to match some other country code.) Fortunately, for this task there is Rafal Raciborski's -kountry- command, which you can get from either Stata Journal or SSC.

    Once that is done, it's pretty straightforward to merge the two data sets, and then just replace the gdp_pc values in the MALE and FEMALE observations with missing values.

    Code:
    use dataset2
    kountry country, from(other) stuck
    rename _ISO3N_ n_country
    kountry n_country, from(iso3n) to(iso3c)
    rename _ISO3C_ cou
    tempfile dataset2_modified
    save `dataset2_modified'
    
    use dataset1
    merge m:1 cou year using `dataset2_modified', keepusing(gdp_pc) keep(match master)
    replace gdp_pc = . if sex != "TOTAL"
    Added: In your example data, the first data set has only two countries, whereas the second has many. And also, the years in the two data sets don't overlap all that much. As a result, in the example data, the -merge- leads to only a small number of matches. But hopefully this will not be a problem in the full data sets.

    Comment


    • #3
      Dear Clyde Schechter ,

      For what understood it is better to create variables for each "total", "male" and "female". I´m goimg to ask some help for this issue in anoter post.
      Thank you very much

      Comment


      • #4
        For what understood it is better to create variables for each "total", "male" and "female". I´m goimg to ask some help for this issue in anoter post.
        Yes, that's another way to handle this that may be better. It depends on how these variables will be used in analysis. Anyway, getting to that is not hard:

        Code:
        rename (fam1 fam10a fam9c) =_
        replace sex = lower(sex)
        reshape wide fam1_ fam10a_ fam9c_, i(cou year) j(sex) string
        When you do that with data set 1, the code for merging in the gdp_c variable changes slightly:
        Code:
        merge 1:1 cou year using `dataset2_modified', keepusing(gdp_pc) keep(match master)
        You still have to modify dataset 2 in the same way so that the country variables are compatible.

        Comment


        • #5
          Thank you very much Clyde Schechter

          Comment

          Working...
          X