Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • create a panel dataset and calculate the total population for different races, sexes, and age groups

    In the following data

    2000 -2022 County-level: Expanded Races (White, Black, American Indian/Alaska Native, Asian/Pacific Islander) by Origin (Hispanic, Non-Hispanic);
    County-level population files with 19 age groups (<1, 1-4, ..., 80-84, 85+)

    I want to create a panel dataset and calculate the total population for different races, sexes, and age groups (20-64 years) separately for each county and year level (2000-2022). How can i code that on stata ?

    This is the description of my data

    Code:
    1 White
    2 Black
    3 Other (1969+)/American Indian/Alaska Native (1990+)
    4 Asian or Pacific Islander (1990+)
    hispanic:
    0 Non-Hispanic
    1 Hispanic
    9 Not applicable in 1969-2004 W
    sex:
    1 Male
    2 Female
    age:
    0 0 years
    1 1-4 years
    2 5-9 years
    3 10-14 years
    4 15-19 years
    5 20-24 years
    6 25-29 years
    7 30-34 years
    8 35-39 years
    9 40-44 years
    10 45-49 years
    11 50-54 years
    12 55-59 years
    13 60-64 years
    14 65-69 years
    15 70-74 years
    16 75-79 years
    17 80-84 years
    18 85+ years

    This is the sample of my data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(year county) byte(race hispanic sex) float(age population)
    2000 1001 1 0 1  0  217
    2000 1001 1 0 1  1  963
    2000 1001 1 0 1  2 1487
    2000 1001 1 0 1  3 1551
    2000 1001 1 0 1  4 1284
    2000 1001 1 0 1  5  889
    2000 1001 1 0 1  6 1010
    2000 1001 1 0 1  7 1239
    2000 1001 1 0 1  8 1638
    2000 1001 1 0 1  9 1479
    2000 1001 1 0 1 10 1249
    2000 1001 1 0 1 11 1079
    2000 1001 1 0 1 12  949
    2000 1001 1 0 1 13  808
    2000 1001 1 0 1 14  623
    2000 1001 1 0 1 15  428
    2000 1001 1 0 1 16  273
    2000 1001 1 0 1 17  144
    2000 1001 1 0 1 18   87
    2000 1001 1 1 1  1   18
    2000 1001 1 1 1  2   19
    2000 1001 1 1 1  3   27
    2000 1001 1 1 1  4   30
    2000 1001 1 1 1  5   45
    2000 1001 1 1 1  6   34
    2000 1001 1 1 1  7   24
    2000 1001 1 1 1  8   27
    2000 1001 1 1 1  9   23
    2000 1001 1 1 1 10   16
    2000 1001 1 1 1 11    4
    2000 1001 2 0 1  1  248
    2000 1001 2 0 1  2  350
    2000 1001 2 0 1  3  376
    2000 1001 2 0 1  4  363
    2000 1001 2 0 1  5  242
    2000 1001 2 0 1  6  215
    2000 1001 2 0 1  7  203
    2001 1003 1 0 1  0  773
    2001 1003 1 0 1  1 2873
    2001 1003 1 0 1  2 4015
    2001 1003 1 0 1  3 4468
    2001 1003 1 0 1  4 4093
    2001 1003 1 0 1  5 2950
    2001 1003 1 0 1  6 3189
    2001 1003 1 0 1  7 3796
    2001 1003 1 0 1  8 4510
    2001 1003 1 0 1  9 4888
    2001 1003 1 0 1 10 4639
    2001 1003 1 1 1  0   32
    2001 1003 1 1 1  1   93
    2001 1003 1 1 1  2  121
    2001 1003 1 1 1  3   91
    2001 1003 1 1 1  4  138
    2001 1003 1 1 1  5  188
    2001 1003 1 1 1  6  180
    2001 1003 1 1 1  7  128
    2001 1003 1 1 1  8  120
    2001 1003 1 1 1  9   98
    2001 1003 2 0 1  1  474
    2001 1003 2 0 1  2  651
    2001 1003 2 0 1  3  752
    2001 1003 2 0 1  4  716
    2001 1003 2 0 1  5  609
    2001 1003 2 0 1  6  490
    2001 1003 2 0 1  7  459
    2001 1003 2 0 1  8  519
    2001 1003 2 0 1  9  555
    2001 1003 2 0 1 10  480
    2001 1003 2 0 1 11  366
    2001 1003 2 0 1 12  272
    2001 1003 2 0 1 13  224
    end

  • #2
    I applied the following code to solve this issue. Need to mention that I previously was assited with a similar problem by statalist mentor.

    Code:
    drop population /*from*/
    rename (race hispanic sex age) attribute_=
    bysort county year: gen n =_n
    reshape long attribute population, i(county year n) j(j) string
    replace j = strtoname(substr(j,2,.)+"_"+attribute)
    collapse (first) population, by(county year j)
    reshape wide population, i(county year) j(j) string
    rename population* *
    however, I'm getting this following error:

    Code:
    Data                               Wide   ->   Long
    -----------------------------------------------------------------------------
    Number of observations       20,242,626   ->   80,970,504  
    Number of variables                   9   ->   8          
    j variable (4 values)                     ->   j
    xij variables:
    attribute_age attribute_hispanic ... attribute_sex->attribute
           pop_age pop_hispanic ... pop_sex   ->   pop
    -----------------------------------------------------------------------------
    
    . replace j = strtoname(substr(j,2,.)+"_"+attribute)
    type mismatch
    r(109);
    
    end of do-file
    
    r(109);
    Last edited by Tariq Abdullah; 17 Apr 2024, 22:41.

    Comment


    • #3
      This is how i am extrracting the data from data file

      Code:
      * Read the data using the infix command
      infix ///
          year 1-4 ///
          state 5-6 ///
          county 7-11 ///
          registry 12-13 ///
          race 14 ///
          hispanic 15 ///
          sex 16 ///
          age 17-18 ///
          population 19-26 ///
      using "us.1990_2022.19ages.adjusted.txt", clear
      
      drop state
      drop registry
      This is how i am reshaping the long file into wide file to get the panel data for all the age , sex and race cateogries.
      Code:
      drop population /*from seer_final*/
      rename (race hispanic sex age) attribute_=
      bysort county year: gen n =_n
      reshape long attribute population, i(county year n) j(j) string
      replace j = strtoname(substr(j,2,.)+"_"+attribute)
      collapse (first) population, by(county year j)
      reshape wide population, i(county year) j(j) string
      rename population* *
      This is the error message

      Code:
      (j = _age _hispanic _race _sex)
      (variable population_age not found)
      (variable population_hispanic not found)
      (variable population_race not found)
      (variable population_sex not found)
      
      Data                               Wide   ->   Long
      -----------------------------------------------------------------------------
      Number of observations       20,242,626   ->   80,970,504  
      Number of variables                   7   ->   6           
      j variable (4 values)                     ->   j
      xij variables:
      attribute_age attribute_hispanic ... attribute_sex->attribute
      population_age population_hispanic ... population_sex->population
      -----------------------------------------------------------------------------
      
      . replace j = strtoname(substr(j,2,.)+"_"+attribute)
      type mismatch
      r(109);
      
      end of do-file
      
      r(109);

      Comment


      • #4
        That specific error is because attribute is a numeric variable, and you need to convert it to a string,

        Code:
        replace j = strtoname(substr(j,2,.)+"_"+ string(attribute))

        Comment


        • #5
          Could you provide a small sample of the final form in which you want your data?

          Comment


          • #6
            Originally posted by Hemanshu Kumar View Post
            That specific error is because attribute is a numeric variable, and you need to convert it to a string,

            Code:
            replace j = strtoname(substr(j,2,.)+"_"+ string(attribute))
            After correcting your method I get only missing values in my reshaped data

            This is the code I'm using

            Code:
            
            * Read the data using the infix command
            infix ///
            year 1-4 ///
            state 5-6 ///
            county 7-11 ///
            registry 12-13 ///
            race 14 ///
            hispanic 15 ///
            sex 16 ///
            age 17-18 ///
            population 19-26 ///
            using "us.1990_2022.19ages.adjusted.txt", clear
            
            drop state
            drop registry
            
            
            drop population /*from seer_final*/
            rename (race hispanic sex age) attribute_=
            bysort county year: gen n =_n
            reshape long attribute population, i(county year n) j(j) string
            replace j = strtoname(substr(j,2,.)+"_"+ string(attribute)) /*CORRECTED LINE*/
            collapse (first) population, by(county year j)
            reshape wide population, i(county year) j(j) string
            rename population* *
            [CODE]
            dataex county year sex_1 sex_2

            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float(county year) byte(sex_1 sex_2)
            1001 1990 . .
            1001 1991 . .
            1001 1992 . .
            1001 1993 . .
            1001 1994 . .
            1001 1995 . .
            1001 1996 . .
            1001 1997 . .
            1001 1998 . .
            1001 1999 . .
            1001 2000 . .
            1001 2001 . .
            1001 2002 . .
            1001 2003 . .
            1001 2004 . .
            1001 2005 . .
            1001 2006 . .
            1001 2007 . .
            1001 2008 . .
            1001 2009 . .
            1001 2010 . .
            1001 2011 . .
            1001 2012 . .
            1001 2013 . .
            1001 2014 . .
            1001 2015 . .
            1001 2016 . .
            1001 2017 . .
            1001 2018 . .
            1001 2019 . .
            1001 2020 . .
            1001 2021 . .
            1001 2022 . .
            1003 1990 . .
            1003 1991 . .
            1003 1992 . .
            1003 1993 . .
            1003 1994 . .
            1003 1995 . .
            1003 1996 . .
            1003 1997 . .
            1003 1998 . .
            1003 1999 . .
            1003 2000 . .
            1003 2001 . .
            1003 2002 . .
            1003 2003 . .
            Still this part is shown which could be the reason for missing values ?? But the whole code runs sucessfully. Do you think i should remove the string from the following code snippet ?

            Code:
            reshape long attribute population, i(county year n) j(j) string
            (j = _age _hispanic _race _sex)
            (variable population_age not found)
            (variable population_hispanic not found)
            (variable population_race not found)
            (variable population_sex not found)
            Last edited by Tariq Abdullah; 17 Apr 2024, 23:47.

            Comment


            • #7
              Well, for one you're dropping population right at the start, so your code will never contain what you intend (the sums of population).

              Try this instead (and don't drop population):
              Code:
              rename (race hispanic sex age) attribute_=
              bysort county year: gen n =_n
              
              reshape long attribute , i(county year n population) j(j) string
              
              replace j = strtoname(substr(j, 2, .) + "_" + string(attribute, "%02.0f"))
              
              collapse (sum) population, by(county year j)
              reshape wide population, i(county year) j(j) string
              rename population* *

              Comment


              • #8
                Originally posted by Hemanshu Kumar View Post
                Well, for one you're dropping population right at the start, so your code will never contain what you intend (the sums of population).

                Try this instead (and don't drop population):
                Code:
                rename (race hispanic sex age) attribute_=
                bysort county year: gen n =_n
                
                reshape long attribute , i(county year n population) j(j) string
                
                replace j = strtoname(substr(j, 2, .) + "_" + string(attribute, "%02.0f"))
                
                collapse (sum) population, by(county year j)
                reshape wide population, i(county year) j(j) string
                rename population* *
                thank you so much ! tjis worked perfectly ! appreciate your patience with me!

                Comment

                Working...
                X