Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting repeated cross section data to panel data by transferring the information from long to wide format

    In my data race has 4 categories as : W stands for white, B stand for Black , AIAN stands for asian and API stands for Asian Pacific Islander. Gender has two categories : M stands for male and F stands for female. Age is coded as from 5-13 in 9 categories covering people aged as 20-64.

    The value of pop_race variable in each row indicates given the category of race column ( white, black , asian or asian Pacific Islander ) in each row how much population of that particular race the county has in that specific year.

    The pop-hispanic row tells how much hispanic people the county has in that specific year.

    The pop_gender in each row indicates given the category of gender ( male and female) how much population of that particular gender the county has in that specific year.

    The value of pop_age in each row indicates given the category of age ( 20-25, 25-29, 30-34, 35-39 etc) how much population of that particular age bracket the county has in that specific year.

    My research goal is to find 7 variables for each county in each specific year in a panel data form : total white population, total Asian population, total black population, total Asian Pacific Islander population, total male population, total Hispanic population, total population for each age bracket from 20-64(9 age brackets coded as from 5-13)

    I have a feeling that I can transform this repeated cross section to panel data ( My aim is for each county in each specific there is going to be only 1 row unlike in the provided data below where I have multiple rows for each county for a unique year). Statalist were kind to me before , therefore posting this to find the solution which I couldn't do by myself?

    Any idea is much appreciated.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year float county str4 race str2 hispanic str1 gender str11 age float(pop_race pop_hispanic pop_gender pop_age)
    2001 10003 "W"    "H"  "F" "20_24" 231345  16036 156865 37296
    2001 10003 "AIAN" "H"  "F" "20_24"    888  16036 156865 37296
    2001 10003 "API"  "NH" "F" "20_24"  10267 290136 156865 37296
    2001 10003 "W"    "NH" "F" "20_24" 231345 290136 156865 37296
    2001 10003 "B"    "NH" "F" "20_24"  63672 290136 156865 37296
    2001 10003 "W"    "NH" "M" "20_24" 231345 290136 149307 37296
    2001 10003 "B"    "H"  "M" "20_24"  63672  16036 149307 37296
    2001 10003 "API"  "H"  "M" "20_24"  10267  16036 149307 37296
    2001 10003 "W"    "NH" "M" "40_44" 231345 290136 149307 41762
    2001 10003 "W"    "H"  "M" "40_44" 231345  16036 149307 41762
    2001 10003 "W"    "H"  "F" "40_44" 231345  16036 156865 41762
    2001 10003 "AIAN" "H"  "M" "40_44"    888  16036 149307 41762
    2001 10003 "B"    "NH" "F" "40_44"  63672 290136 156865 41762
    2001 10003 "AIAN" "H"  "M" "35_39"    888  16036 149307 41813
    2001 10003 "W"    "NH" "F" "35_39" 231345 290136 156865 41813
    2001 10003 "AIAN" "H"  "F" "35_39"    888  16036 156865 41813
    2001 10003 "API"  "H"  "M" "35_39"  10267  16036 149307 41813
    2001 10003 "B"    "H"  "F" "35_39"  63672  16036 156865 41813
    2001 10003 "W"    "H"  "M" "35_39" 231345  16036 149307 41813
    1991 10003 "W"    "H"  "F" "20_24" 225463   7413 141056 39313
    1991 10003 "AIAN" "NH" "F" "20_24"    552 268721 141056 39313
    1991 10003 "B"    "H"  "F" "20_24"  44874   7413 141056 39313
    1991 10003 "API"  "NH" "F" "20_24"   5245 268721 141056 39313
    1991 10003 "W"    "NH" "F" "20_24" 225463 268721 141056 39313
    1991 10003 "API"  "H"  "M" "20_24"   5245   7413 135078 39313
    1991 10003 "B"    "NH" "M" "20_24"  44874 268721 135078 39313
    1991 10003 "W"    "H"  "M" "35_39" 225463   7413 135078 37009
    1991 10003 "B"    "NH" "M" "35_39"  44874 268721 135078 37009
    1991 10003 "W"    "H"  "F" "35_39" 225463   7413 141056 37009
    1991 10003 "B"    "NH" "F" "35_39"  44874 268721 141056 37009
    1991 10003 "W"    "NH" "F" "35_39" 225463 268721 141056 37009
    1991 10003 "AIAN" "NH" "M" "35_39"    552 268721 135078 37009
    1990 1001 "W"    "H"  "F" "20_24" 16393   118 10230 2226
    1990 1001 "B"    "NH" "M" "20_24"  3336 19739  9627 2226
    1990 1001 "W"    "NH" "F" "20_24" 16393 19739 10230 2226
    1990 1001 "W"    "NH" "M" "20_24" 16393 19739  9627 2226
    1990 1001 "API"  "NH" "F" "20_24"    87 19739 10230 2226
    1990 1001 "B"    "NH" "F" "20_24"  3336 19739 10230 2226
    1990 1001 "W"    "H"  "M" "30_34" 16393   118  9627 2914
    1990 1001 "B"    "NH" "F" "30_34"  3336 19739 10230 2914
    1990 1001 "API"  "H"  "F" "30_34"    87   118 10230 2914
    1990 1001 "B"    "H"  "M" "30_34"  3336   118  9627 2914
    1990 1001 "W"    "NH" "F" "30_34" 16393 19739 10230 2914
    1990 1001 "API"  "NH" "F" "30_34"    87 19739 10230 2914
    1991 1001 "B"    "NH" "M" "20_24"  3370 20178  9841 2288
    1991 1001 "B"    "NH" "F" "20_24"  3370 20178 10473 2288
    1991 1001 "AIAN" "NH" "F" "20_24"    46 20178 10473 2288
    1991 1001 "AIAN" "NH" "M" "20_24"    46 20178  9841 2288
    1991 1001 "W"    "NH" "M" "20_24" 16805 20178  9841 2288
    end

  • #2
    Code:
    rename (race hispanic gender age) attribute_=
    bysort county year: gen n =_n
    reshape long attribute pop, i(county year n) j(j) string
    replace j = strtoname(substr(j,2,.)+"_"+attribute)
    collapse (first) pop, by(county year j)
    reshape wide pop, i(county year) j(j) string
    rename pop* *
    Resulting in:

    Code:
         +-----------------------------------------------------------------------------------------------------------------------------------------------+
         | county   year   age_2~24   age_3~34   age_3~39   age_4~44   gender_F   gender_M   hispa~_H   hispa~NH   race_A~N   race_API   race_B   race_W |
         |-----------------------------------------------------------------------------------------------------------------------------------------------|
      1. |   1001   1990       2226       2914          .          .      10230       9627        118      19739          .         87     3336    16393 |
      2. |   1001   1991       2288          .          .          .      10473       9841          .      20178         46          .     3370    16805 |
      3. |  10003   1991      39313          .      37009          .     141056     135078       7413     268721        552       5245    44874   225463 |
      4. |  10003   2001      37296          .      41813      41762     156865     149307      16036     290136        888      10267    63672   231345 |
         +-----------------------------------------------------------------------------------------------------------------------------------------------+
    Last edited by Ali Atia; 18 Sep 2022, 11:18.

    Comment


    • #3
      Wow amazing how well this well thought out snippet of codes just boiled down the data to exactly how I needed this data format ! highly obliged for the invaluable contribution to my concern! appreciate your time and generosity so much

      Comment

      Working...
      X