Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge command on STATA

    Hi everyone,


    I am using STATA 18.0 on a MacBook.
    I try to merge two datasets, one is containing government spending in the home currency of several countries and the other is containing the exchange rate to the Euro/ECU.
    The goal at the end is to know for every country and for every year the government spending in Euro.
    If the EuroECUExchangeRates is a ".", it means the country is using the Euro itself.

    The code to merge the two datasets:
    (the first tow do files, reorganise a wide data format to long etc.
    Code:
    clear
    do "/Users/qvh/Documents/Documenten - MacBook Air van Quinten /MASTERPROEF MIBEM/Data/EuroECUExchangeRates/EuroECUExchangeRates.do"
    do "/Users/qvh/Documents/Documenten - MacBook Air van Quinten /MASTERPROEF MIBEM/Data/GovernmentConsolidatedGrossDebtByComponents/GovernmentConsolidatedGrossDebtByComponents.do"
    
    merge 1:1 country year using "/Users/qvh/Documents/Documenten - MacBook Air van Quinten /MASTERPROEF MIBEM/Data/EuroECUExchangeRates/EuroECUExchangeRates.dta"
    sort country year
    The next table is given after he merge.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 country int year double(GovernmentConsolidatedGrossDebt EuroECUExchangeRates) byte _merge
    "Austria"  1995   120660      . 1
    "Austria"  1996 124601.6      . 1
    "Austria"  1997 119827.1      . 1
    "Austria"  1998 125384.9      . 1
    "Austria"  1999   135949      . 1
    "Austria"  2000 141245.5      . 1
    "Austria"  2001 147154.2      . 1
    "Austria"  2002 151295.9      . 1
    "Austria"  2003 152688.1      . 1
    "Austria"  2004 157985.4      . 1
    "Austria"  2005 174403.5      . 1
    "Austria"  2006   180270      . 1
    "Austria"  2007 184674.5      . 1
    "Austria"  2008 201809.9      . 1
    "Austria"  2009 230014.6      . 1
    "Austria"  2010 244696.8      . 1
    "Austria"  2011 255683.8      . 1
    "Austria"  2012 261040.8      . 1
    "Austria"  2013 263229.5      . 1
    "Austria"  2014   280001      . 1
    "Austria"  2015 292264.6      . 1
    "Austria"  2016 296244.5      . 1
    "Austria"  2017 289951.3      . 1
    "Austria"  2018 285421.8      . 1
    "Austria"  2019 280540.4      . 1
    "Austria"  2020 315980.9      . 1
    "Austria"  2021   334284      . 1
    "Austria"  2022 350698.4      . 1
    "Belgium"  1995 276343.7      . 1
    "Belgium"  1996 276440.2      . 1
    "Belgium"  1997 278492.6      . 1
    "Belgium"  1998 277254.7      . 1
    "Belgium"  1999 279531.2      . 1
    "Belgium"  2000 280959.5      . 1
    "Belgium"  2001 286055.5      . 1
    "Belgium"  2002 288110.8      . 1
    "Belgium"  2003 285866.7      . 1
    "Belgium"  2004 288418.8      . 1
    "Belgium"  2005 294975.3      . 1
    "Belgium"  2006 297494.9      . 1
    "Belgium"  2007 300063.7      . 1
    "Belgium"  2008 327683.2      . 1
    "Belgium"  2009 347223.7      . 1
    "Belgium"  2010 364132.1      . 1
    "Belgium"  2011 389106.6      . 1
    "Belgium"  2012 404752.3      . 1
    "Belgium"  2013 414432.4      . 1
    "Belgium"  2014 431384.3      . 1
    "Belgium"  2015 438504.8      . 1
    "Belgium"  2016 451619.9      . 1
    "Belgium"  2017 454048.4      . 1
    "Belgium"  2018 459388.5      . 1
    "Belgium"  2019 467249.4      . 1
    "Belgium"  2020 515211.9      . 1
    "Belgium"  2021 548456.4      . 1
    "Belgium"  2022 578063.2      . 1
    "Bulgaria" 1971        .      . 2
    "Bulgaria" 1972        .      . 2
    "Bulgaria" 1973        .      . 2
    "Bulgaria" 1974        .      . 2
    "Bulgaria" 1975        .      . 2
    "Bulgaria" 1976        .      . 2
    "Bulgaria" 1977        .      . 2
    "Bulgaria" 1978        .      . 2
    "Bulgaria" 1979        .      . 2
    "Bulgaria" 1980        .      . 2
    "Bulgaria" 1981        .      . 2
    "Bulgaria" 1982        .      . 2
    "Bulgaria" 1983        .      . 2
    "Bulgaria" 1984        .      . 2
    "Bulgaria" 1985        .  .0005 2
    "Bulgaria" 1986        .  .0009 2
    "Bulgaria" 1987        .  .0013 2
    "Bulgaria" 1988        .  .0014 2
    "Bulgaria" 1989        .  .0011 2
    "Bulgaria" 1990        .  .0045 2
    "Bulgaria" 1991        .  .0338 2
    "Bulgaria" 1992        .  .0511 2
    "Bulgaria" 1993        .  .0323 2
    "Bulgaria" 1994        .  .0644 2
    "Bulgaria" 1995        .  .0879 3
    "Bulgaria" 1996        .  .2225 3
    "Bulgaria" 1997  18314.2  1.891 3
    "Bulgaria" 1998  17848.5 1.9691 3
    "Bulgaria" 1999  18868.6 1.9558 3
    end
    label values _merge _merge
    label def _merge 1 "Master only (1)", modify
    label def _merge 2 "Using only (2)", modify
    label def _merge 3 "Matched (3)", modify
    Is the merge done correctly if:
    Bulgaria 1995 until Bulgaria 1996: only data form using, but _merge==3
    Bulgaria 1971 until Bulgaria 1994: not any variable form the two datasets, but _merge==2
    How should I get _merge==3 only if two observations (form each dataset one) are correctly matched?

    Tank you in advance

  • #2
    The merge output does not depend on the exchange rate variable, but on the identifiers. If you want to keep observations that have nonmissing values on both government debt and exchange rate, then

    Code:
    clear
    do "/Users/qvh/Documents/Documenten - MacBook Air van Quinten /MASTERPROEF MIBEM/Data/EuroECUExchangeRates/EuroECUExchangeRates.do"
    do "/Users/qvh/Documents/Documenten - MacBook Air van Quinten /MASTERPROEF MIBEM/Data/GovernmentConsolidatedGrossDebtByComponents/GovernmentConsolidatedGrossDebtByComponents.do"
    
    keep if !missing(GovernmentConsolidatedGrossDebt)
    merge 1:1 country year using "/Users/qvh/Documents/Documenten - MacBook Air van Quinten /MASTERPROEF MIBEM/Data/EuroECUExchangeRates/EuroECUExchangeRates.dta", keep(master match)
    After this, you need a variable that identifies EU countries. Call this variable "EU" (=1 if country \(i\) in year \(t\) is an EU member, and 0 otherwise).

    Code:
    drop if missing(EuroECUExchangeRates) & !EU
    Last edited by Andrew Musau; 29 Oct 2023, 09:55.

    Comment


    • #3
      Hello everyone,

      The previous dataset was correct after using the command
      Code:
      keep if !missing(GovernmentConsolidatedGrossDebt)
      ,
      but I still don't get the _merge variable outcome.

      In the manual merge, the explanation is:
      Numeric code Equivalent word (result) Description
      1 master observation appeared in master only
      2 using observation appeared in using only
      3 match observation appeared in both
      4 match_update observation appeared in both, missing values updated
      5 match_conflict observation appeared in both, conflicting nonmissing values
      The result I get after a 1:1 merge is:
      Country Variable 1 Variable 2 _merge
      United Kingdom 2019 110.312 3 (is a match)
      United Kingdom 2020 . (is empty in STATA) 3 (is a match)
      I think the _merge variable in the last row in my example should be 1 or 2, depending with which database is master.

      Can somebody explain how STATA assigns the _merge variable after a merge?

      Thank you,
      Quinten

      Comment


      • #4
        As you are not using the -update- or -replace- options in your -merge- command, we can focus on just the values 1, 2, and 3 of the -merge- variable. It is done by Stata exactly as in the first table you show in #3. Understand, as was pointed out in #2, that "observation found in dataset" means that the data set(s) in question contains at least one observation with the corresponding values of the merge key variables (which are country and year, in your case). It does not matter whether the values of other variables are missing or not.
        So the fact that you got -merge == 3- for United Kingdom and year 2020 means that Stata found an observation with country == "United Kingdom" and year == 2020 in both data sets.

        I have never known Stata to be wrong about this. So my prediction is that if you open each of the data sets and -browse if country == "United Kingdom" & year == 2020- you will find that there are observations like that in both data sets. Now, you have asserted that in the exchange rates data set, a missing value for the exchange rate implies that the country uses the Euro in that year. We know that the UK has never used the Euro. So it follows that either you are misinformed about the meaning of missing values in the exchange rate data set or the data set itself contains incorrect information.

        Comment


        • #5
          Thank you for the respons

          Comment

          Working...
          X