No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping from wide to long


    I know this topic has been discussed several times already, however, none of the solutions seem to work for me.

    I have the time series dataset for GDP that I got from the BEA. This is the screenshot from STATA.

    I want to reshape it from wide to long and use this command in STATA:

    reshape long GDP, i(GeoFips) j(year)

    It gives me that "variable id does not uniquely identify the observations".

    Any suggestions on how to deal with this problem?

    Thank you,

  • #2
    All Forum members are requested to read the Forum FAQ before posting. Had you done that, you would know that screenshots are often not visible to others (which is the case with this one), and even when they can be seen, they are pretty much the least helpful way to show example data. Please read the FAQ at your earliest convenience and pay particular attention to #12.

    The message means exactly what it says: there are multiple observations with the same value for GeoFips which makes the -reshape long- command invalid.

    There are several possibilities that you need to explore. All of them begin with identifying the observations that are causing this problem.

    duplicates tag GeoFips, gen(flag)
    browse if flag
    will show them to you.

    Now you have to consider which of the following applies:

    1. These observations are completely duplicates, agreeing exactly on all variables. In that case, running -duplicates drop- will eliminate them, and you can then proceed to -reshape- the data. But before you just do that, it is worth considering why those duplicate observations were there in the first place. It often indicates that there are errors in the data management steps that produced your data. You should review that carefully to see if there are errors. And consider that where there is one error, others often lurk as well.

    2. The observations are duplicates as far as their values of GeoFips go, but they disagree on some other variable(s). Then you have a more serious problem. You need to then chase down where these duplicate conflicting observations came from and try to resolve which (if any) of them is correct. Then you need to, ideally, revise the data management so that it produces a clean data set with only the correct observations, or, less ideally, just write some additional code that deletes the incorrect ones.

    2a. Another possibility is that the correct solution to the conflicting observations is to take means or medians, or mins, or maxs of the values. That obviously depends on the substantive aspects of the project. -collapse- will be your friend in this situation.

    3. There is nothing wrong with the data; you have just not conceptualized the -reshape- properly. The multiple observations for the same GeoFips are all correct and are supposed to be there because they refer to different aspects of the same GeoFips. For example, the observations might refer to different industrial sectors, or to different population age groups or something like that. In that case, the data are fine, and the solution is to add the name(s) of the variable(s) which disambiguate the multiple observations to the -i()- option of your -reshape- command.