Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Having Trouble With Reshape Long Command

    Hello,

    I'm a beginner with stata, and I've been trying to reshape a dataset. An example of the data below:

    input str9 Season double(ATL BOS) str17 CHA
    "2016-17" .78 .82 ".78"
    "2015-16" .87 .85 ".51"
    "2014-15" .88 .5 ".61"
    "2013-14" .53 .59 ".64"
    "2012-13" .52 .53 ".49"
    "01dec2011" .72 .65 ".57"
    "01nov2010" .88 .76 ".7"
    "01oct2009" .8 .79 ".59"
    "01sep2008" .79 .94 ".52"
    "01aug2007" .76 .5 ".66"
    "01jul2006" .72 .75 ".72"
    "01jun2005" .5 .72 ".66"
    "01may2004" .11 .67 "-"
    "01apr2003" .52 .5 ""
    "01mar2002" .74 .78 ""
    "01feb2001" .61 .79 ".79"
    "01jan2000" .64 .65 ".55"
    "1999-00" .45 .68 ".67"
    "1998-99" .68 .61 ".36"

    I have been trying to reshape using this code: reshape long Team, i(Season) j(Team)

    My goal is for the individual city columns to be labeled as one column "team" where the rows represent the cities, instead of having a different colum for each city. My final dataset needs to have a column for season, a column for team, and then a column with the percentages in my example. I keep getting an error that says "no xij variables found", and I'm not sure what this means. Does anyone have any tips? Thank you.
    Last edited by Gary Hammersmite; 24 Sep 2023, 16:16.

  • #2
    OK, I can fix this. But this data set looks pretty messy, like it was patched together from a bunch of different source files that use different ways of showing the data. I think you are going to have a series of problems working with it. Although ATL and BOS are numeric variables, your CHA percentages, for example, are shown as a string variable, with missing values inconsistently shown as "" (correct) or "-" (this is a problem). And your Season variable is also a hodgepodge of year-xx configurations and some actual dates that correspond to the beginning dates of some months. If, as I suspect is the case, at some point you will have to arrange these in chronological order, it's going to be a bear to do that. It would be easy enough to convert things like "01dec2011" into Stata internal format numeric date variables--which is the only truly useful way to represent dates in Stata. But I have no idea what something like "2016-17" or "1998-99" is supposed to represent. Anyway, I just want to alert you to these problems that, I suspect, will trip you up down the line as you work with this data set. You would be well advised to fix them first.

    Turning to your reshape problem. The xij that Stata refers to in the error message you got is to be understood as follows: i is the variable in the -i()- option of the -reshape- command; j is the variable in the -j()- option, and x refers to the variable(s) that immediately follow -reshape long-. And the arrangement that Stata expects to find is that there will be variables named xj (i.e. the concatenation of an x followed by a j) and that the observations in the data set will be uniquely identified by the variable(s) in -i()-. Stata then rearranges the data, stripping the j's off of the xj variables, and making a new variable out of the j's, and stacking up the values of the original xj variables in a single variable x.

    The difficulty with the command you gave is that you are asking Stata to find a variable named Teamsomething, when no such variable exists. Indeed, Team is the variable you want to create with this command--which means it belongs in the -j()- option, not immediately after -reshape long-. But then there is another problem. The variables you want to "stack vertically" are ATL, BOS, and CHA--but they do not fit an xj pattern. They are a bunch of j's with no common x preceding them. So you have to first supply that.

    That's the fundamental problem with what you tried. In order to fix it, there is one other problem that needs to be dealt with: I mentioned above, namely the type mismatch between numeric ATL and BOS vs. string CHA. And to turn CHA into a numeric variable, we have to get rid of the pesky "-" values that cannot be construed as numbers.

    So putting these problems together:
    Code:
    //   FIX UP CHA AND MAKE IT NUMERIC TO MATCH ATL AND BOS
    replace CHA = "" if CHA == "-"
    destring CHA, replace
    
    //   GIVE THE TEAM VARIABLE NAMES A COMMON PREFIX
    rename (ATL BOS CHA) pct=
    
    //    NOW WE CAN RESHAPE
    reshape long pct, i(Season) j(team) string

    Comment


    • #3
      Clyde Schechter thank you for the response and bringing up these other issues! It looks like the dataset has many variables that need to be destringed. Do you know of a way to destring multiple variable at once?

      Comment


      • #4
        You can do as many as you like in a single -destring- command:

        Code:
        destring var1 var3-var5 abc* x?, replace
        will destring var1, all variables from var3 to var15 in the data set, any variable that begins with abc, and any variable whose name is x followed by a single character.

        If the variable is actually already numeric, or if any of its values contain material that cannot be read as a pure number, that variable will not be replaced and you will get an error message informing you to that effect.

        If you encounter a variable that Stata says cannot be converted to numeric and you don't know why, you can find the observations that have non-number values with:
        Code:
        browse variable_name if missing(real(variable_name)) & !missing(variable_name)
        Then you can figure out whether this represents data errors that you should fix, or if the variable is genuinely not numeric and needs to be dealt with in some other way.
        Last edited by Clyde Schechter; 24 Sep 2023, 16:54.

        Comment


        • #5
          Clyde Schechter Thank you again for the help and tips! Much appreciated.

          Comment

          Working...
          X