Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape data using getdata

    Hello everyone,

    I'm currently writing a code to extract data from Eurostat by using the user-written tool getdata. I need to get the EMU convergence criterion bond yields for 5 countries/geographical areas: Euro area, Germany, Spain, Italy and Portugal, in monthly frequency between 1980 and 2020. So far, my code is the following:

    getdata xt EUROSTAT, r(irt_lt_mcby_m/M.MCBY.EA+DE+ES+IT+PT) g(GEO country) time(DATE month) s(1980) e(2020) f(m) clear


    However, in Stata I get the data in the following form:

    country month M_MCBY
    EA 1980-01 .
    EA 1980-02 .
    ...
    DE 1980-01 8.06
    DE 1980-02 8.39
    ...
    ES 1980-01 15.5
    ES 1980-02 15.4
    ...
    IT 1980-01 13.96
    IT 1980-02 14.09
    ...
    PT 1980-01 .
    PT 1980-02 .

    Instead of creating 5 different variables for each country, the program creates one variable called "country" that contains the 5 countries.
    In order to export this dataset to Excel, I would like to reshapen the data in the following way:

    month EA DE ES IT PT

    1980-01 . 8.06 15.5 13.96 .
    1980-02 . 8.39 15.4 14.09 .
    .... ... ... ... ... ...


    From the examples that I have seen I know that using reshape or double reshape should solve this problem, but in order to implement it, do I need to change the country codes to have a common prefix, like GEO_EA, GEO_DE, for example? Or could I change the initial getdata code in order to get the intended shape? Unfortunately the help file for the getdata tool could not clarify me on this subject.

    I'm a new Stata user, and I know this should be a relatively easy task to perform, but I really need your help. Thank you advance for your attention!


    Best regards,

    Erica




  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 country str7 month float m_mcby
    "EA" "1980-01"     .
    "EA" "1980-02"     .
    "DE" "1980-01"  8.06
    "DE" "1980-02"  8.39
    "ES" "1980-01"  15.5
    "ES" "1980-02"  15.4
    "IT" "1980-01" 13.96
    "IT" "1980-02" 14.09
    "PT" "1980-01"     .
    "PT" "1980-02"     .
    end
    
    replace country = "_" + country
    reshape wide m_mcby, i(month) j(country) string
    The -replace country = ...- command is optional: I like to have these suffixes separated from the stem of the new variable names by an underscore, but that's just a matter of style.

    All of that said, you probably shouldn't do this. Nearly all data management and analysis in Stata is best done (or, sometimes, only possible) with the data in long layout. Wide layout is really only useful for certain kinds of graphs, and certain kinds of visual displays of the data. Unless you are sure that your next steps can be applied to wide data, you should just leave the data the way you got it, in long layout.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you so much for your prompt help! I'm sorry I didn't present my example using the -dataex- command, I didn't know about it. I'm running version 13, so I will run -ssc install dataex- to get it for future use. The code works wonderfully, but it is not very practical for my purpose, which is, in this case, to use this code to automatize the extraction of Eurostat data into Excel, via Stata. Since sometimes Eurostat revises a whole series, this would imply that I will have to change the whole values of each one of the five series (with monthly data since 1980-01) every time there is an update, which is not very practical. I know data in long layout is generally better to work within Stata, but in this specific case, the main purpose is just to extract in that format to update regularly an Excel database that needs to be in that specific format. Perhaps I will try to use another tool (like eurostatuse) to extract the Eurostat data that don't require to input each observation one by one to rename the country name and allows a more direct use of the reshape command. But thank you so much for your help and advice!

      Comment

      Working...
      X