Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping data from wide to long

    Hello! I am working with the following data obtained from World Bank's World Development Indicators. I have country level data for years 2005 and 2010 on 3 variables- var1, var2 and var3. Currently, the data is wide and looks like this:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str52 countryname str3 countrycode str97 seriesname str11(yr2005 yr2010)
    "Afghanistan"            "AFG" "var1" "365.5487336" "550.514974" 
    "Afghanistan"            "AFG" "var2" "88.80699921" "88.35099792"
    "Afghanistan"            "AFG" "var3" "16.69099998" "15.24300003"
    "Albania"                "ALB" "var1" "3062.592504" "4094.360204"
    "Albania"                "ALB" "var2" "74.65299988" "71.75900269"
    "Albania"                "ALB" "var3" "53.83599854" "52.68199921"
    "Algeria"                "DZA" "var1" "4273.312751" "4463.394675"
    "Algeria"                "DZA" "var2" "76.07900238" "74.65899658"
    "Algeria"                "DZA" "var3" "13.79599953" "15.49800014"
    "American Samoa"         "ASM" "var1" "10644.58107" "10352.82276"
    "American Samoa"         "ASM" "var2" ".."          ".."         
    "American Samoa"         "ASM" "var3" ".."          ".."         
    "Andorra"                "AND" "var1" "48831.92936" "39736.35406"
    "Andorra"                "AND" "var2" ".."          ".."         
    "Andorra"                "AND" "var3" ".."          ".."         
    "Angola"                 "AGO" "var1" "2866.434694" "3585.905553"
    "Angola"                 "AGO" "var2" "80.5039978"  "80.47399902"
    "Angola"                 "AGO" "var3" "75.81999969" "76.22100067"
    "Antigua and Barbuda"    "ATG" "var1" "12857.16429" "12174.6978" 
    "Antigua and Barbuda"    "ATG" "var2" ".."          ".."         
    "Antigua and Barbuda"    "ATG" "var3" ".."          ".."         
    "Argentina"              "ARG" "var1" "8522.522732" "10276.2605" 
    "Argentina"              "ARG" "var2" "81.88500214" "80.80200195"
    "Argentina"              "ARG" "var3" "56.69800186" "53.70700073"
    "Armenia"                "ARM" "var1" "2571.985756" "3218.381655"
    "Armenia"                "ARM" "var2" "70.58200073" "76.43699646"
    "Armenia"                "ARM" "var3" "53.12200165" "54.84700012"
    end
    I am trying to reshape this data such that I have 1 column variable "yr" which takes the value 2005 and 2010. Variables var1, var2 and var3 would also be reshaped into columns. In the end, ideally, the data would look something like this:
    Countryname Countrycode yr var1 var2 var3
    Afganistan AFG 2005 365.549 88.807 16.691
    Afganistan AFG 2010 550.515 88.351 15.243
    Albania ALB 2005 3062.59 74.653 53.835999
    Albania ALB 2010 4094.36 71.759 52.681999
    Algeria DZA 2005 4273.31 76.079 13.796
    Algeria DZA 2010 4463.39 74.659 15.498
    American Samoa ASM 2005 10644.6 .. ..
    American Samoa ASM 2010 10352.8 .. ..
    Andorra AND 2005 48831.9 .. ..
    Andorra AND 2010 39736.4 .. ..
    Angola AGO 2005 2866.43 80.504 75.82
    Angola AGO 2010 3585.91 80.474 76.221001
    Antigua and Barbuda ATG 2005 12857.2 .. ..
    Antigua and Barbuda ATG 2010 12174.7 .. ..
    Argentina ARG 2005 8522.52 81.885 56.698002
    Argentina ARG 2010 10276.3 80.802 53.707001
    Armenia ARM 2005 2571.99 70.582 53.122002
    Armenia ARM 2010 3218.38 76.437 54.847

    I would appreciate any guidance or help in working on this! Thank you!

  • #2
    So this data is long in seriesname and wide in year, and you need it the other way around. The easiest way is to first go fully long, and then go wide in series name:

    Code:
    reshape long yr, i(countryname countrycode seriesname) j(year)
    rename yr _
    reshape wide _, i(countryname countrycode year) j(seriesname) string
    rename _* *
    Added: Just out of curiosity, this kind of situation comes up from time to time here on the Forum. I understand that some packages might work very well with fully wide data (wide on both seriesname and year), but I'm not familiar with any software at all for which this layout, wide on seriesname and long on year, is even convenient, let alone optimal. Yet we find data sets constructed this way. Is there software that works best with this layout? If so, what is it? If not, why do data warehouses construct data sets like this?
    Last edited by Clyde Schechter; 24 Mar 2019, 13:56.

    Comment


    • #3
      Thank you Clyde!

      Comment


      • #4
        See https://www.stata-journal.com/sjpdf....iclenum=dm0031 for further discussion.

        Comment

        Working...
        X