Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confused about reshape wide

    Hello,
    I have the following dataset listed in the attachment.

    I am trying to reshape the data so that the years are the variables and the country names are the observations, therefore, I need to use reshape wide. However, I am confused about the command, as I am new to Stata.

    Here is what I have

    reshape wide USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia, i(USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia) j(Year)

    I think my problem is that I am unsure what to put in the i() part of the command, as I am not really sure what it would be in this situation.

    How can I reshape the data correctly?

    Thanks
    Attached Files

  • #2
    This will do something similar to what you ask. As you posted data as a screenshot, which is not importable into Stata, I have made up a toy data set similar to it to illustrate the code.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(Year Argentina Brazil Germany)
    1800 1594 600 958
    1801    .   .   .
    1820 1710 600   .
    end
    
    
    ds Year, not
    rename (`r(varlist)') v=
    reshape long v, i(Year) j(Country) string
    reshape wide v, i(Country) j(Year) // I RECOMMEND YOU SKIP THIS STEP
    This departs from your request in that the variable names cannot be the years in Stata because legal variable names cannot begin with a digit. So the variable names here begin with the letter v, followed by the year.

    That said, you probably should not do this anyway. All you are doing is exchanging one wide data layout for another. But wide data layouts are not very useful in Stata: most of Stata's data management and analysis commands work best, or only, with long data layouts. So what I urge you to do, instead, is omit the final -reshape wide- command. Leave the data in long layout, where both Country and Year are variables, and the variable v contains the corresponding value. This is the most usable way to organize this data for Stata. Only if you specifically know you will be working with those few Stata commands that require a wide data layout should you go all the way to the layout you asked for.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      This will do something similar to what you ask. As you posted data as a screenshot, which is not importable into Stata, I have made up a toy data set similar to it to illustrate the code.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(Year Argentina Brazil Germany)
      1800 1594 600 958
      1801 . . .
      1820 1710 600 .
      end
      
      
      ds Year, not
      rename (`r(varlist)') v=
      reshape long v, i(Year) j(Country) string
      reshape wide v, i(Country) j(Year) // I RECOMMEND YOU SKIP THIS STEP
      This departs from your request in that the variable names cannot be the years in Stata because legal variable names cannot begin with a digit. So the variable names here begin with the letter v, followed by the year.

      That said, you probably should not do this anyway. All you are doing is exchanging one wide data layout for another. But wide data layouts are not very useful in Stata: most of Stata's data management and analysis commands work best, or only, with long data layouts. So what I urge you to do, instead, is omit the final -reshape wide- command. Leave the data in long layout, where both Country and Year are variables, and the variable v contains the corresponding value. This is the most usable way to organize this data for Stata. Only if you specifically know you will be working with those few Stata commands that require a wide data layout should you go all the way to the layout you asked for.

      In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
      Thanks for your prompt and detailed response. I will be sure to use he dataex command going forward. What you're saying definitely makes sense, in terms of my data especially. The whole reason that I was trying to reshape into wide anyway was that I wanted to generate new observations for each country that was its 1939 GDP minus its 1929 GDP, and thought that it would be easier to do if the years were variables.

      I've tried something like this

      bysort USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia:: gen GD_GDP_Change = Year[1939] - Year[1929]

      as well as

      bysort USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia:: gen GD_GDP_Change2 = Year[_n] - Year[_n-10]

      but I'm only getting missing values. I am fairly sure that the by command would be correct here since I need to perform the command on all the country variables listed.

      Thanks again for your help.

      Comment


      • #4
        OK. For the purpose of calculating the difference between the 1939 and 1929 values, that is probably most easily done in the wide layout you asked for. So go ahead and do that. Then it's just gen gdp_change = v1939-v1929. But I would recommend that you then follow that with -reshape long- (you don't have to specify any variables, or -i()- or -j()- this time) to get back to the long layout I recommended.

        Or, it can be done, a bit awkwardly, in the long layout I recommended:
        Code:
        by Country (Year), sort: egen gdp_1939 = max(cond(Year == 1939, v, .))
        by Country (Year): egen gdp_1929 = max(cond(Year == 1929, v, .))
        gen gdp_change = gdp_1939 - gdp_1929
        and then you can drop gdp_1939 and gdp_1929 themselves if you have no further use for them.

        Comment


        • #5
          If you were absolutely certain that there were always data for 1929 and 1939 you could ask for the total

          Code:
          total(((year==1939) * v)  - ((year == 1929) * v))
          but this would produce the wrong answer is either or both values were missing. So, the code of Clyde Schechter is the safe and not sorry choice.

          Comment

          Working...
          X