Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using foreach loops to generate summary statistics

    Hi all,

    I have a large census data set (12 million observations) that I want to generate summary statistics for. I have a range of socioeconomic characteristics that I want to have generated but I can't seem to get for loops to work in order to run through all observations. Essentially, I want to have a table that generates 4 columns (but the code below only shows for three races, White,Black and Chinese) that has all the characteristics I want as new rows.

    Below is an excerpt of my code with just a few of the socioeconomic characteristics I want to generate (age, emp_dummy(percentage of a specific race that is employed), foodstmp_dummy(percentage on food stamps), hsorless (those with a high school degree or less).

    Race=1 for white, Race==4 for Chinese, Race=2 for Black
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(region race) int age float(emp_dummy foodstmp_dummy hsorless)
    32 1 49 1 1 1
    32 1 59 0 1 0
    32 1 37 0 1 0
    32 1 15 . 1 .
    32 1 14 . 1 .
    32 1 34 0 1 1
    32 1 51 1 0 1
    32 1 53 1 0 1
    32 1 51 1 0 1
    32 1 48 1 0 0
    32 1 49 1 1 1
    32 1 38 1 1 1
    32 1 21 . 1 .
    32 1 18 . 1 .
    32 1  4 . 1 .
    32 1  3 . 1 .
    32 1  2 . 1 .
    32 1  0 . 1 .
    32 1 40 1 0 0
    32 4 35 0 0 0
    32 1 10 . 0 .
    32 4  2 . 0 .
    32 1 66 . 0 .
    32 1 28 0 0 0
    32 2 80 . 0 .
    32 2 48 1 1 1
    32 2 15 . 1 .
    32 2 19 . 1 .
    32 2 19 . 0 .
    32 1 67 . 0 .
    32 1 67 . 0 .
    32 1 43 1 0 1
    32 1 41 1 0 0
    32 1 77 . 0 .
    32 1 55 1 0 0
    32 1 45 0 0 0
    32 2 56 0 0 1
    32 1 73 . 0 .
    32 1 78 . 0 .
    32 1 28 1 0 1
    32 1 24 . 0 .
    32 1  0 . 0 .
    32 1  5 . 0 .
    32 1 32 1 0 0
    32 1 31 1 0 0
    32 1  0 . 0 .
    32 1  0 . 0 .
    32 1 28 1 0 0
    32 1 26 1 0 0
    32 1 28 1 0 1
    32 1  0 . 0 .
    32 1 29 1 0 0
    32 1 30 1 0 0
    32 1 34 1 0 1
    32 1 77 . 0 .
    32 1 46 1 0 1
    32 1 51 1 0 1
    32 1 18 . 0 .
    32 1 61 0 0 1
    32 1 60 0 0 1
    32 1 28 0 0 0
    32 1 28 0 0 0
    32 1  3 . 0 .
    32 1  1 . 0 .
    32 2 50 0 0 1
    32 1 87 . 0 .
    32 1 25 1 0 1
    32 1 27 0 0 0
    32 1 45 1 0 0
    32 1 44 0 0 0
    32 1 13 . 0 .
    32 1 11 . 0 .
    32 1 76 . 0 .
    32 1 84 . 0 .
    32 1 76 . 0 .
    32 1 40 1 0 0
    32 1 44 1 0 0
    32 1 14 . 0 .
    32 1 56 1 0 0
    32 1 52 0 0 0
    32 1 25 1 0 0
    32 1 22 . 0 .
    32 1 34 0 0 1
    32 1  9 . 0 .
    32 1  7 . 0 .
    32 1 31 1 0 1
    32 1 66 . 0 .
    32 1 21 . 0 .
    32 1 78 . 0 .
    32 1 73 . 0 .
    32 1 75 . 0 .
    32 1 58 1 0 1
    32 1 64 1 0 1
    32 1 64 0 0 1
    32 1 57 0 1 1
    32 1 31 1 0 1
    32 2 20 . 0 .
    32 1 45 1 0 0
    32 1 46 1 0 0
    32 1 18 . 0 .
    end
    Code I have tried so far that hasn't worked (no error, but no .tex file is generated):
    Code:
    global output "C:\Users\User\Documents\Stata Code"
    local sumstats age emp_dummy foodstmp_dummy hsorless
    local race white black chinese
    #delimit ;
    foreach r of local race{;
    foreach s of local sumstats{;
    eststo white: estpost tabstat s,
    stat(mean sd) columns(stat) listwise;
    esttab using "$output/summarystats_overallavg_test.tex", replace  label 
    title ("Table 1: ACS Summary Statistics Across Races,") 
        addnotes("The means are reported with the standard deviations in parenthesis. Significance stars: * 0.05 ** 0.01 *** 0.001")
        cells(mean(pattern(1 1 1 1) fmt(3)) sd(par pattern(1 1 1 1)fmt(2)) b(star pattern(0 0 1) fmt(2)))
        mtitles( "White" "Black" "Chinese") 
        varlabel(
        emp_dummy "Percentage employed (Ages 25-65)"
        hsorless "Percentage with only a high school degree or less"
        age "Mean age"
        foodstmp_dummy "Percentage on Food Stamps") ;
    };
    #delimit cr
    If I try to run using the "brute force" method, where I run each variable separately for each race category the code works and looks like:

    Code:
    eststo white: estpost tabstat age `sumstats' if race ==1,
    stat(mean sd)  columns(stat) listwise;
    esttab using "$output/summarystats_overallavg_test2.tex", replace label 
    title ("Table 1: ACS Summary Statistics Across Races, South Region") 
        addnotes("The means are reported with the standard deviations in parenthesis. Significance stars: * 0.05 ** 0.01 *** 0.001")
        cells(mean(pattern(1 1 1 1) fmt(3)) sd(par pattern(1 1 1 1)fmt(2)) b(star pattern(0 0 1) fmt(2)))
        mtitles( "White" "Black" "Chinese") ;
    
    eststo black: estpost tabstat age `sumstats' if race ==2,
    stat(mean sd) columns(stat) listwise;
    esttab using "$output/summarystats_overallavg_test2.tex", append label
    ........
    ........
    One .tex file is generated at the end of running all of this, but I can't seem to get one generated if I try to use loops instead of brute force method.

    Thanks for any help you might be able to give!

  • #2
    There are several errors in the code for the loops.
    1. You do not reference the local macro r, hence you do not iterate over the variable race. You also did not close the loop with }-bracket. Therefore, the loop should not run in the original code.
    2. The iteration over race is not identical to the brute-force approach. In the code below, I iterate over the value of the variable instead of over the value label what you seem to have intended originally.
    I highlighted in bold the changes that I made to your code.
    Code:
    local sumstats age emp_dummy foodstmp_dummy hsorless
    local race white black Chinese
    local racenum 1 2 4
    #delimit ;
    forvalues r=1/3{;
    eststo `=word("`race'",`r')': estpost tabstat `sumstats' if race==`=word("`racenum'",`r')', stat(mean sd) columns(stat) listwise;
    };
    esttab `race'  using "$output/summarystats_overallavg_test.tex", replace  label 
    title ("Table 1: ACS Summary Statistics Across Races,") 
        addnotes("The means are reported with the standard deviations in parenthesis. Significance stars: * 0.05 ** 0.01 *** 0.001")
        cells(mean(pattern(1 1 1 1) fmt(3)) sd(par pattern(1 1 1 1)fmt(2)) b(star pattern(0 0 1) fmt(2)))
        mtitles( "White" "Black" "Chinese") 
        varlabel(
        emp_dummy "Percentage employed (Ages 25-65)"
        hsorless "Percentage with only a high school degree or less"
        age "Mean age"
        foodstmp_dummy "Percentage on Food Stamps") ;
    #delimit cr

    Comment

    Working...
    X