Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stacked bar chart using pre-aggregated census data

    Hello,
    I have a table of aggregated data that I downloaded from Census. The first three years of the data are included below. I would like to make a stacked bar chart with year on the horizontal axis and the share of firms by age (by 5 new bins) on the vertical axis - so the vertical axis should be 100%. The aim is to show how the share of firms by age has changed over time.

    There are many examples on this site and elsewhere using twoway bar and tabplot doing charts like this, but I have been unsuccessful in adapting any of that code to already-aggregated data like what I have .

    In the course of trying to use twoway bar and tablot I generated the last four of the 7 variables in the data example below. The ones I made are:
    - totalfirms, a sum of the total firms by year
    - agegroup, because I'd like to use different age bins from what the data came with
    - countfirms_agegroup, which counts the firms in each age group
    - percentfirms_agegroup, which is countfirms_agegroup/totalfirms. This is what I'm trying to graph.

    I know Stata is not really built for data like what I have downloaded because this does not have individual observations. However, I think the biggest problem is that I'm not used to working with already aggregated data in Stata so it is hard for me to think "outside the box". Thank you for any suggestions.

    Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int year str16 fage4 float agegroup long firms float(totalfirms countfirms_agegroup percentfirms_agegroup)
      2001 "1"             1 374542 4881601  845753  17.32532
      2001 "0"             1 471211 4881601  845753  17.32532
      2001 "2"             2 327127 4881601  625529 12.814013
      2001 "3"             2 298402 4881601  625529 12.814013
      2001 "5"             3 231767 4881601  492267  10.08413
      2001 "4"             3 260500 4881601  492267  10.08413
      2001 "6 to 10"       4 863051 4881601  863051  17.67967
      2001 "11 to 15"      5 634854 4881601 2055001  42.09687
      2001 "21 to 25"      5 285459 4881601 2055001  42.09687
      2001 "Left Censored" 5 659119 4881601 2055001  42.09687
      2001 "16 to 20"      5 475569 4881601 2055001  42.09687
      2002 "1"             1 368030 4908740  864168 17.604681
      2002 "0"             1 496138 4908740  864168 17.604681
      2002 "3"             2 287290 4908740  607289 12.371586
      2002 "2"             2 319999 4908740  607289 12.371586
      2002 "4"             3 263886 4908740  498119 10.147593
      2002 "5"             3 234233 4908740  498119 10.147593
      2002 "6 to 10"       4 874913 4908740  874913 17.823576
      2002 "Left Censored" 5 603782 4908740 2064251  42.05256
      2002 "21 to 25"      5 348478 4908740 2064251  42.05256
      2002 "11 to 15"      5 626335 4908740 2064251  42.05256
      2002 "16 to 20"      5 485656 4908740 2064251  42.05256
      2003 "0"             1 500847 4963081  877087 17.672228
      2003 "1"             1 376240 4963081  877087 17.672228
      2003 "3"             2 282181 4963081  598962  12.06835
      2003 "2"             2 316781 4963081  598962  12.06835
      2003 "5"             3 239564 4963081  497553 10.025084
      2003 "4"             3 257989 4963081  497553 10.025084
      2003 "6 to 10"       4 895085 4963081  895085 18.034866
      2003 "Left Censored" 5 576707 4963081 2094394  42.19947
      2003 "21 to 25"      5 338754 4963081 2094394  42.19947
      2003 "16 to 20"      5 499208 4963081 2094394  42.19947
      2003 "26+"           5  65894 4963081 2094394  42.19947
      2003 "11 to 15"      5 613831 4963081 2094394  42.19947
      end
      label values agegroup agegrouplabel
      label def agegrouplabel 1 "0-1", modify
      label def agegrouplabel 2 "2-3", modify
      label def agegrouplabel 3 "4-5", modify
      label def agegrouplabel 4 "6-10", modify
      label def agegrouplabel 5 "11+", modify



    PS. Through reading many threads on stacked bar charts on this site, I can see that people almost always recommend a sort of "spaced out" bar chart, like the second graph posted by Maarten Buis here: https://www.statalist.org/forums/for...ked-bar-charts. I am certainly open to exploring that and other ways of charting this as traditional bar charts do have their drawbacks, but am taking this one step at a time.



  • #2
    I know Stata is not really built for data like what I have downloaded because this does not have individual observations.
    This is a needless worry. Consider the supplied with Stata dataset census.dta

    Code:
    . sysuse census, clear
    (1980 Census data by state)
    
    . d
    
    Contains data from C:\Program Files\Stata15\ado\base/c/census.dta
      obs:            50                          1980 Census data by state
     vars:            13                          6 Apr 2016 15:43
     size:         2,900                          
    --------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------------
    state           str14   %-14s                 State
    state2          str2    %-2s                  Two-letter state abbreviation
    region          int     %-8.0g     cenreg     Census region
    pop             long    %12.0gc               Population
    poplt5          long    %12.0gc               Pop, < 5 year
    pop5_17         long    %12.0gc               Pop, 5 to 17 years
    pop18p          long    %12.0gc               Pop, 18 and older
    pop65p          long    %12.0gc               Pop, 65 and older
    popurban        long    %12.0gc               Urban population
    medage          float   %9.2f                 Median age
    death           long    %12.0gc               Number of deaths
    marriage        long    %12.0gc               Number of marriages
    divorce         long    %12.0gc               Number of divorces
    --------------------------------------------------------------------------------------
    Sorted by:
    Stata really doesn't need to have a few hundred million observations, one for each person. It just holds the frequencies in distinct variables, one observation for each state.

    In your case, here is a token tabplot (Stata Journal). I needed to ignore duplicated values:

    Code:
    egen tag = tag(agegroup year)
    tabplot agegroup year if tag [iw=percent]  , showval(format(%3.2f)) yreverse ///
    scheme(s1color) bfcolor(none) ytitle(Age group (years)) xtitle("") subtitle(%)
    Click image for larger version

Name:	agegroup.png
Views:	1
Size:	20.4 KB
ID:	1489270

    As the percents aren't integer iweights specify the variable to use.

    Comment


    • #3
      Thank you Nick! The "tag" and "iw = percent" helped me to make a few more figures, too.

      Comment

      Working...
      X