Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on reshaping dataset: Transforming City-Level Data into Panel Structure

    hello lovely community,

    I hope to get your help. I am stuck with an easy task that turns out not to be trivial at all (what a surprise).
    I have a large dataset with 2000 names of cities and their characteristics (population for different years, id, and so on).
    The dataset looks like this: city1 id ...

    I need them to look like the following panel structure:

    year1 city1 ...
    year2 city1 ...
    year3 city1 ...
    ...
    year1 city2 ...
    year2 city2 ...
    year3 city3 ...
    ...

    What is the best way to do this?

    Thank you so much x x

  • #2
    The dataset looks like this: city1 id ...
    That's a very meager description of the data set and, honestly, it leaves me with no idea what your data set actually is like. What is this id, and why does it have no place in the tableau you show of what you want to get. And where are the years in your existing data set? Also, we really need to see what the actual names of those city* variables are to discern how to work with them for the reshaping.

    The helpful way to show example data (which you should always do when asking for help with code) is to use the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi Clyde, thank you for your answer and sorry for being so imprecise.
      First, to the dataset: My dataset includes city names of 2420 cities alongside with their city id, their country id, state id, urban area (yes/no), population in certain years:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str50 name long id float county_id double(lon pop_1939)
      "Abtsbessingen"            16065001 16065 10.762472   833
      "Admannshagen-Bargeshagen" 13072001 13072 11.984106   616
      "Adorf/Vogtl., Stadt"      14523010 14523 12.256069 10152
      end
      label var id "Official Key (AGS)" 
      label var county_id "County Key" 
      label var lon "Longitude" 
      label var pop_1939 "Population (17 May 1939)"
      I do not have existing years in my dataset, therefore, I want to include them. More precisely, in the end I would like to have an observation of every city for every year from 1933-1960, looking like this (sorry if this is not well explained, but this is the main idea):
      year1 city1 ...
      year2 city1 ...
      year3 city1 ...
      ...
      year1 city2 ...
      year2 city2 ...
      year3 city3 ...
      ...
      This should result in 67.760 observations (28 times every city).

      In a second step I have to include the population data. I need to transfer the population data, which are still in columns, into the panel structure. In other words, I generate a "Population" variable, the corresponding population data should then be entered into these. I do not have population data for every year, only for some of them in the time frame between 1933-1960. Therefore, I have to transfer the population data I have (still in the columns) to the corresponding years..

      Comment


      • #4
        You don't have to create the dataset with all combinations beforehand. merge will create the existing combinations automatically. For the enclosed dataset:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str50 name long id float county_id double(lon pop_1939)
        "Abtsbessingen"            16065001 16065 10.762472   833
        "Admannshagen-Bargeshagen" 13072001 13072 11.984106   616
        "Adorf/Vogtl., Stadt"      14523010 14523 12.256069 10152
        end
        label var id "Official Key (AGS)" 
        label var county_id "County Key" 
        label var lon "Longitude" 
        label var pop_1939 "Population (17 May 1939)"
        
        reshape long pop_, i(name) j(year)
        Res.:

        Code:
        . l
        
             +---------------------------------------------------------------------------+
             |                     name   year         id   county~d         lon    pop_ |
             |---------------------------------------------------------------------------|
          1. |            Abtsbessingen   1939   16065001      16065   10.762472     833 |
          2. | Admannshagen-Bargeshagen   1939   13072001      13072   11.984106     616 |
          3. |      Adorf/Vogtl., Stadt   1939   14523010      14523   12.256069   10152 |
             +---------------------------------------------------------------------------+
        You can attach a sample of the population data if you are unable to combine it with the above. Then once all the data is combined, see

        Code:
        help tsfill
        for a way to create all combinations.

        Comment

        Working...
        X