Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Table with count, percent for categories and mean & standard deviation for continuous data using Stata17 collect commands & Table Builder

    I just purchased Stata 17 and was very excited to start using the Table Builder. It's very straightforward for simple tables, but I'd like to create tables that mix both count/percent data for categorical variables with mean and standard deviation for continuous data. If I can successfully create a table that looks like the following, I can create all the tables I need to. Obviously, I could create the simple table below by running sum or tab for each variable and then manually copying the data into a Word or Excel table, but I'd really like to use the flexibility that seems to be available in the collect command and corresponding Table Builder. Unfortunately, I'm a novice user and would appreciate some help to create the correct structure of commands for my do file. I've read through about the first 30 pages of the STATACUSTOMIZABLE TABLES ANDCOLLECTED RESULTS REFERENCE MANUAL RELEASE 17, but still need help. Here's the example table I'd like to create (instead of values in the table, I've just used N, %, Mean, and SD):

    [survey] Baseline Year 1 Year 2
    [group] Lifting Bending Lifting Bending Lifting Bending
    Females N (%) N (%) N (%) N (%) N (%) N (%)
    Males N (%) N (%) N (%) N (%) N (%) N (%)
    Total N (%) N (%) N (%) N (%) N (%) N (%)
    Age Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)
    Language
    English N (%) N (%) N (%) N (%) N (%) N (%)
    Spanish N (%) N (%) N (%) N (%) N (%) N (%)
    Hours Sleep Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)
    Stress Past Week
    None N (%) N (%) N (%) N (%) N (%) N (%)
    Some N (%) N (%) N (%) N (%) N (%) N (%)
    Most N (%) N (%) N (%) N (%) N (%) N (%)
    If I can create the code needed in a Do file, I think I'd be able to use the Table Builder to modify borders, alignment, fonts, etc.

    A sample of my data (in long form for this longitudinal study):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int idnum byte(group sex survey language stress) double sleep byte age
     1 2 2 0 1 0    5 56
     1 2 2 1 . .    .  .
     1 2 2 2 . .    .  .
     2 2 1 0 1 0    9 45
     2 2 1 1 1 0  6.5 47
     2 2 1 2 1 1    . 48
     3 2 1 0 1 0  6.5 49
     3 2 1 1 . .    .  .
     3 2 1 2 1 0    6 52
    11 2 1 0 1 0    6 57
    11 2 1 1 . .    .  .
    11 2 1 2 . .    .  .
    12 2 2 0 1 1    . 53
    12 2 2 1 1 1    5 54
    12 2 2 2 1 1  4.5 55
    18 2 1 0 1 1    6 49
    18 2 1 1 1 0    6 50
    18 2 1 2 1 0    5 51
    19 2 1 0 1 0    6 55
    19 2 1 1 1 0    5 57
    19 2 1 2 1 0    6 57
    30 2 1 0 1 1    4 32
    30 2 1 1 1 1    5 33
    30 2 1 2 1 1    7 34
    31 2 2 0 2 0    . 39
    31 2 2 1 2 0    7 20
    31 2 2 2 1 0    7 42
    32 2 1 0 2 0    7 47
    32 2 1 1 2 .    8 48
    32 2 1 2 2 0    7 49
    33 2 1 0 2 1    5 40
    33 2 1 1 1 2    6 41
    33 2 1 2 1 0    1 42
    34 2 1 0 1 1  5.5 29
    34 2 1 1 1 0    6 30
    34 2 1 2 . .    .  .
    35 2 2 0 2 .    7 62
    35 2 2 1 2 0    7 62
    35 2 2 2 2 0    8 63
    36 2 2 0 2 0    4 40
    36 2 2 1 2 0  6.5 43
    36 2 2 2 2 .    6 43
    37 2 1 0 2 2  7.5 40
    37 2 1 1 1 1    8 43
    37 2 1 2 1 1    8 43
    38 2 2 0 1 1    7 24
    38 2 2 1 1 0    5 25
    38 2 2 2 1 0    6 26
    39 2 2 0 1 1  6.5 40
    39 2 2 1 1 0    7 41
    39 2 2 2 1 0    6 42
    40 2 2 0 1 1    6 30
    40 2 2 1 . .    .  .
    40 2 2 2 . .    .  .
    41 2 1 0 2 .    7 47
    41 2 1 1 2 0  8.5 48
    41 2 1 2 2 .    . 49
    42 2 1 0 1 0    7 48
    42 2 1 1 1 0    . 49
    42 2 1 2 1 0    . 50
    43 2 2 0 2 .   10 51
    43 2 2 1 . .    .  .
    43 2 2 2 2 .    8 53
    44 2 2 0 1 1    7 39
    44 2 2 1 1 0  6.5 61
    44 2 2 2 1 0    6 61
    45 2 1 0 1 0  7.5 29
    45 2 1 1 . .    .  .
    45 2 1 2 1 2    7 31
    46 2 2 0 1 1    8 52
    46 2 2 1 . .    .  .
    46 2 2 2 . .    .  .
    47 2 1 0 1 0    8 21
    47 2 1 1 . .    .  .
    47 2 1 2 . .    .  .
    48 2 2 0 2 .    8 47
    48 2 2 1 2 0    8 48
    48 2 2 2 2 0    2 49
    49 2 1 0 1 0    6 54
    49 2 1 1 2 0    7 55
    49 2 1 2 1 0    5 56
    50 2 2 0 2 0    . 48
    50 2 2 1 2 .    . 49
    50 2 2 2 2 0    . 50
    59 2 1 0 1 1    7 37
    59 2 1 1 1 1    7 38
    59 2 1 2 1 1    7 39
    65 2 1 0 1 0  6.5 34
    65 2 1 1 1 0    9 35
    65 2 1 2 . .    .  .
    66 2 1 0 1 1    7 56
    66 2 1 1 . .    .  .
    66 2 1 2 1 1    5 58
    67 2 1 0 1 2 6.75 46
    67 2 1 1 . .    .  .
    67 2 1 2 1 0    8 48
    68 2 1 0 1 0   10 76
    68 2 1 1 1 0    7 77
    68 2 1 2 1 0    8 78
    end
    label values group GroupLabel
    label def GroupLabel 2 "Bending", modify
    label values sex MaleFemaleLabel
    label def MaleFemaleLabel 1 "Male", modify
    label def MaleFemaleLabel 2 "Female", modify
    label values survey SurveyLabel
    label def SurveyLabel 0 "Baseline Survey", modify
    label def SurveyLabel 1 "Year1 Survey", modify
    label def SurveyLabel 2 "Year2 Survey", modify
    label values language LanguageLabel
    label def LanguageLabel 1 "English", modify
    label def LanguageLabel 2 "Spanish", modify
    label values stress PastWkStressLabel
    label def PastWkStressLabel 0 "None", modify
    label def PastWkStressLabel 1 "Some", modify
    label def PastWkStressLabel 2 "Most", modify
    Note: If language is missing (.) then that subject did not complete the Survey for that year. All subjects completed (at least) the Baseline survey.

    I started with the Do file code:

    collect clear
    sort survey group sex
    collect: by survey group: tab sex if language!=.
    collect: by survey group: sum age if language!=.
    collect: by survey group: tab language if language!=.
    collect: by survey group: sum sleep if language!=.
    collect: by survey group: tab stress if language!=.

    But that's as far as I got.

    Any help would be appreciated!

    Kind Regards,
    George Brogmus

  • #2
    I did a bit more reading (Chuck Huber's Blog articles). The following gets me close, but I can't figure out how to get the N and % and means and sd's next to each other.

    table (var) (survey group) if language!=., statistic(fvfrequency sex) statistic(fvpercent sex) statistic(mean age) statistic(sd age) statistic(fvfrequency language) statistic(fvpercent language) statistic(mean sleep) statistic(sd sleep) statistic(fvfrequency stress) statistic(fvpercent stress) nototals nformat(%9.0fc fvfrequency fvpercent) sformat("%s%%" fvpercent) nformat(%6.2f mean sd) sformat("(%s)" sd)

    collect label levels result fvfrequency "N" fvpercent "Percent" sd "SD", modify

    collect style putdocx, layout(autofitcontents) title("Table 1: Descriptive Statistics")
    collect export MyTableClose1.docx, as(docx) replace


    The result:
    Table 1: Descriptive Statistics
    Survey 0(Baseline), 1(Year1), 2(Year2)
    Baseline Survey Year1 Survey Year2 Survey
    Intervention-Sham Group Intervention-Sham Group Intervention-Sham Group
    Lifting Bending Lifting Bending Lifting Bending
    Sex=Male
    N 48 40 28 23 33 27
    Percent 62% 50% 62% 47% 59% 47%
    Sex=Female
    N 29 40 17 26 23 30
    Percent 38% 50% 38% 53% 41% 53%
    Age (yrs)
    Mean 48.74 46.24 49.20 46.22 51.70 49.28
    SD (9.33) (10.33) (9.41) (10.74) (8.78) (9.11)
    Language 1(English), 2(Spainish)=English
    N 47 52 33 33 36 37
    Percent 61% 65% 73% 67% 64% 65%
    Language 1(English), 2(Spainish)=Spainish
    N 30 28 12 16 20 20
    Percent 39% 35% 27% 33% 36% 35%
    Sleep Time (hrs) on Average per 24hrs
    Mean 6.43 6.73 7.22 6.76 6.81 6.51
    SD (1.55) (1.74) (1.94) (1.47) (1.64) (1.61)
    Stress level during past week=None
    N 46 43 30 30 34 30
    Percent 68% 58% 68% 65% 63% 59%
    Stress level during past week=Some
    N 16 25 9 13 13 17
    Percent 24% 34% 20% 28% 24% 33%
    Stress level during past week=Most
    N 6 6 5 3 7 4
    Percent 9% 8% 11% 7% 13% 8%
    Thank you!
    George Brogmus

    Comment


    • #3
      This got me a bit closer, but still don't understand how to get the N & % and Mean & SD on the same line.

      table (var) (survey group) if language!=., statistic(fvfrequency sex) statistic(fvpercent sex) statistic(mean age) statistic(sd age) statistic(fvfrequency language) statistic(fvpercent language) statistic(mean sleep) statistic(sd sleep) statistic(fvfrequency stress) statistic(fvpercent stress) nototals nformat(%9.0fc fvfrequency fvpercent) sformat("%s%%" fvpercent) nformat(%6.2f mean sd) sformat("(%s)" sd)

      collect label levels result fvfrequency "N" fvpercent "Percent" sd "SD", modify

      collect style header survey, title(hide)

      collect style header group, title(hide)

      collect label levels survey 0 "Baseline" 1 "Year1" 2 "Year2", modify

      collect label levels group 1 "Sham" 2 "Treatment", modify

      Comment


      • #4
        I'm not sure I completely follow all of the above but I think that what you want is shown in Chuck's 3rd blog and relies on using "collect recode result" where you follow that with something like "mean = column 1" and "sd = column 2" - as I say, a full example is in Chuck's third blog
        "the classic table 1"

        Comment


        • #5
          Thank you Rich. I'll read over that in more detail. The first time I tried the column1, column2 code it didn't work. I think I've got to have a better understanding of Dimensions, Levels and Tags and then perhaps I'll be able to structure the code properly. I'll work on that later today and when I get it to work I'll post it.

          Comment

          Working...
          X