Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape Problem

    Dear Statalist,

    I downloaded data from the UN National Accounts. Unfortunately the layout of the data is not in the format I would like it. There is a line for each variable (e.g. output) for each year. That means that for a given country, there are multiple lines for a year. I would assume I would reshape this to wide and then back to long to do this, but I am really having trouble reshaping it.

    The data is in long now and I would like it long but with a line for each country-year and more rows for the value of each variable at that year for that country. I would like something like this:
    Country Year output intermed_cons gross_VA
    Albania 2018 2.610e+12 1.180e+12 1.430e+12
    Albania 2019 2.640e+12 1.160e+12 1.480e+12
    Albania 2020 2.540e+12 1.100e+12 1.440e+12

    And here is a sample of my data:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str46 country float sna93tablecode str13 Unit str1 subgroup str21 item int(year series) str40 currency int snasystem double value
    "Albania" 2.6 "Total Economy" " " "output"              2020 1000 "lek"  2008   2.540e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2020 1000 "lek"  2008   1.100e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2020 1000 "lek"  2008   1.440e+12
    "Albania" 2.6 "Total Economy" " " "output"              2019 1000 "lek"  2008   2.640e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2019 1000 "lek"  2008   1.160e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2019 1000 "lek"  2008   1.480e+12
    "Albania" 2.6 "Total Economy" " " "output"              2018 1000 "lek"  2008   2.610e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2018 1000 "lek"  2008   1.180e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2018 1000 "lek"  2008   1.430e+12
    "Albania" 2.6 "Total Economy" " " "output"              2017 1000 "lek"  2008   2.510e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2017 1000 "lek"  2008   1.150e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2017 1000 "lek"  2008   1.350e+12
    "Albania" 2.6 "Total Economy" " " "output"              2016 1000 "lek"  2008   2.400e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2016 1000 "lek"  2008   1.110e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2016 1000 "lek"  2008   1.290e+12
    "Albania" 2.6 "Total Economy" " " "output"              2015 1000 "lek"  2008   2.330e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2015 1000 "lek"  2008   1.080e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2015 1000 "lek"  2008   1.260e+12
    "Albania" 2.6 "Total Economy" " " "output"              2014 1000 "lek"  2008   2.250e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2014 1000 "lek"  2008   1.030e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2014 1000 "lek"  2008   1.220e+12
    "Albania" 2.6 "Total Economy" " " "output"              2013 1000 "lek"  2008   2.250e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2013 1000 "lek"  2008   1.070e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2013 1000 "lek"  2008   1.180e+12
    "Albania" 2.6 "Total Economy" " " "output"              2012 1000 "lek"  2008   2.190e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2012 1000 "lek"  2008   1.040e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2012 1000 "lek"  2008   1.150e+12
    "Albania" 2.6 "Total Economy" " " "output"              2011 1000 "lek"  2008   2.190e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2011 1000 "lek"  2008   1.060e+12
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2011 1000 "lek"  2008   1.130e+12
    "Albania" 2.6 "Total Economy" " " "output"              2010 1000 "lek"  2008   2.040e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2010 1000 "lek"  2008   9.620e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2010 1000 "lek"  2008   1.080e+12
    "Albania" 2.6 "Total Economy" " " "output"              2009 1000 "lek"  2008   1.990e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2009 1000 "lek"  2008   9.960e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2009 1000 "lek"  2008   9.900e+11
    "Albania" 2.6 "Total Economy" " " "output"              2008 1000 "lek"  2008   1.870e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2008 1000 "lek"  2008   9.290e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2008 1000 "lek"  2008   9.370e+11
    "Albania" 2.6 "Total Economy" " " "output"              2007 1000 "lek"  2008   1.630e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2007 1000 "lek"  2008   7.890e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2007 1000 "lek"  2008   8.410e+11
    "Albania" 2.6 "Total Economy" " " "output"              2006 1000 "lek"  2008   1.450e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2006 1000 "lek"  2008   6.940e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2006 1000 "lek"  2008   7.550e+11
    "Albania" 2.6 "Total Economy" " " "output"              2005 1000 "lek"  2008   1.340e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2005 1000 "lek"  2008   6.330e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2005 1000 "lek"  2008   7.030e+11
    "Albania" 2.6 "Total Economy" " " "output"              2004 1000 "lek"  2008   1.220e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2004 1000 "lek"  2008   5.760e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2004 1000 "lek"  2008   6.480e+11
    "Albania" 2.6 "Total Economy" " " "output"              2003 1000 "lek"  2008   1.100e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2003 1000 "lek"  2008   5.020e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2003 1000 "lek"  2008   5.960e+11
    "Albania" 2.6 "Total Economy" " " "output"              2002 1000 "lek"  2008   1.020e+12
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2002 1000 "lek"  2008   4.780e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2002 1000 "lek"  2008   5.430e+11
    "Albania" 2.6 "Total Economy" " " "output"              2001 1000 "lek"  2008   9.400e+11
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2001 1000 "lek"  2008   4.370e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2001 1000 "lek"  2008   5.040e+11
    "Albania" 2.6 "Total Economy" " " "output"              2000 1000 "lek"  2008   8.270e+11
    "Albania" 2.6 "Total Economy" " " "intermed_cons"       2000 1000 "lek"  2008   3.850e+11
    "Albania" 2.6 "Total Economy" " " "gross_VA"            2000 1000 "lek"  2008   4.420e+11
    "Armenia" 2.6 "Total Economy" " " "output"              2020 1000 "dram" 2008   8.620e+12
    "Armenia" 2.6 "Total Economy" " " "intermed_cons"       2020 1000 "dram" 2008   3.110e+12
    "Armenia" 2.6 "Total Economy" " " "gross_VA"            2020 1000 "dram" 2008   5.510e+12
    "Armenia" 2.6 "Total Economy" " " "comp_emp"            2020 1000 "dram" 2008   2.180e+12
    "Armenia" 2.6 "Total Economy" " " "taxes_less_subs"     2020 1000 "dram" 2008 7.70734e+10
    "Armenia" 2.6 "Total Economy" " " "other_taxes"         2020 1000 "dram" 2008 7.70734e+10
    "Armenia" 2.6 "Total Economy" " " "gross_OPsurplus"     2020 1000 "dram" 2008   3.260e+12
    "Armenia" 2.6 "Total Economy" " " "adjust_FISM"         2020 1000 "dram" 2008   1.430e+11
    "Armenia" 2.6 "Total Economy" " " "cons_fixcap"         2020 1000 "dram" 2008   8.020e+11
    "Armenia" 2.6 "Total Economy" " " "net_OPsurplus"       2020 1000 "dram" 2008   2.450e+12
    "Armenia" 2.6 "Total Economy" " " "gross_capform"       2020 1000 "dram" 2008   1.220e+12
    "Armenia" 2.6 "Total Economy" " " "gross_fixed_capform" 2020 1000 "dram" 2008   1.100e+12
    "Armenia" 2.6 "Total Economy" " " "invt_change"         2020 1000 "dram" 2008   1.170e+11
    "Armenia" 2.6 "Total Economy" " " "produced_assets"     2020 1000 "dram" 2008   8.280e+12
    "Armenia" 2.6 "Total Economy" " " "emp"                 2020 1000 "dram" 2008     1052400
    "Armenia" 2.6 "Total Economy" " " "output"              2019 1000 "dram" 2008   9.040e+12
    "Armenia" 2.6 "Total Economy" " " "intermed_cons"       2019 1000 "dram" 2008   3.240e+12
    "Armenia" 2.6 "Total Economy" " " "gross_VA"            2019 1000 "dram" 2008   5.800e+12
    "Armenia" 2.6 "Total Economy" " " "comp_emp"            2019 1000 "dram" 2008   2.190e+12
    "Armenia" 2.6 "Total Economy" " " "taxes_less_subs"     2019 1000 "dram" 2008 9.38641e+10
    "Armenia" 2.6 "Total Economy" " " "other_taxes"         2019 1000 "dram" 2008 9.38641e+10
    "Armenia" 2.6 "Total Economy" " " "gross_OPsurplus"     2019 1000 "dram" 2008   3.520e+12
    "Armenia" 2.6 "Total Economy" " " "adjust_FISM"         2019 1000 "dram" 2008   1.220e+11
    "Armenia" 2.6 "Total Economy" " " "cons_fixcap"         2019 1000 "dram" 2008   8.290e+11
    "Armenia" 2.6 "Total Economy" " " "net_OPsurplus"       2019 1000 "dram" 2008   2.690e+12
    "Armenia" 2.6 "Total Economy" " " "gross_capform"       2019 1000 "dram" 2008   1.140e+12
    "Armenia" 2.6 "Total Economy" " " "gross_fixed_capform" 2019 1000 "dram" 2008   1.030e+12
    "Armenia" 2.6 "Total Economy" " " "invt_change"         2019 1000 "dram" 2008   1.040e+11
    "Armenia" 2.6 "Total Economy" " " "produced_assets"     2019 1000 "dram" 2008   7.790e+12
    "Armenia" 2.6 "Total Economy" " " "emp"                 2019 1000 "dram" 2008     1077400
    "Armenia" 2.6 "Total Economy" " " "output"              2018 1000 "dram" 2008   8.390e+12
    "Armenia" 2.6 "Total Economy" " " "intermed_cons"       2018 1000 "dram" 2008   3.010e+12
    "Armenia" 2.6 "Total Economy" " " "gross_VA"            2018 1000 "dram" 2008   5.380e+12
    "Armenia" 2.6 "Total Economy" " " "comp_emp"            2018 1000 "dram" 2008   2.040e+12
    "Armenia" 2.6 "Total Economy" " " "taxes_less_subs"     2018 1000 "dram" 2008 7.77233e+10
    "Armenia" 2.6 "Total Economy" " " "other_taxes"         2018 1000 "dram" 2008 7.77233e+10
    "Armenia" 2.6 "Total Economy" " " "gross_OPsurplus"     2018 1000 "dram" 2008   3.260e+12
    end
    Thank you.

  • #2
    Code:
    reshape wide value, i(country year) j(item) string
    rename value* *
    Note: In your example data, within groups of observations defined by a combination of country and year, the variables sna93tablecode, Unit, subgroup, series, currency, and snasystem are all constant. The code above presumes that this is true throughout your entire data set. I know what currency means, and, other than possibly issues with conversion to the Euro, it strikes me as sensible that that one will indeed be constant for all country year combinations. But the others are mysteries to me, and I do not know if the assumption of constancy is sensible or not..

    If it is not, the -reshape- command will give you an error message stating that there is a variable or variables that is not constant within country year combinations. If this happens, then there are several possibilities:
    1. If the variable(s) involved should be constant within country and year, then there is an error in the data set, which you must fix. But don't just edit in a fixed-up value for that one observation. Review the entire data management process that created the data set and find where the error crept in. Fix that, and any other errors you find along the way. If you received this data set and did no management of it other than importation, then you should notify the provider of the data about the mistake and inquire about the circumstances (or receive instruction from the provider about why it isn't a mistake and your expectation of constancy was wrong.)
    2. If the variable involve does not need to be constant within country and year, then you face a choice.
      1. You can stick to creating a single observation per country per year. In that case, you have to create a new variable that concatenates the non-constant variable(s) with each other and with item. Then drop item and those variables, and use the new concatenated variable in the -j()- option of the -reshape- command. This will leave you with a data set with a (possibly much) larger number of variables, which may prove unwieldy.
      2. You can abandon creating a single observation per country per year. In that case you can leave the variables as they are and just include them in the -i()- option of the -reshape- command. This data set will be compact, but may prove unsuitable for analysis by, say, panel data commands.
      3. In the case of currency (and possibly the others, I don't know) you might be able to harmonize the observations by applying a currency conversion rate to one of the observations and then adding up the values.
      4. You can decide that the offending variables contain only superfluous information that you will not need going forward and just drop them.
    I should note under 2 that you might "mix and match" the three suboptions under 2. above, as appropriate to the variables' meanings and your future use of this data.

    Comment


    • #3
      Clyde, thank you so much for the thorough advice. I really appreciate you taking the time to do that. There were a few small previous steps and thanks to your roadmap, I caught a very minor coding error of mine that was the problem. Thank you again.

      Comment

      Working...
      X