Formatting binary and categorical variables and p-values using table

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 697

#16

23 Jul 2023, 21:07

I'm going to split my response across multiple posts. This post is for
the first request. The next post is for the second request. Then I'll
post the entire program with an example showing off both new features.

---

You can add the zeros with something like

Code:

    collect get fvfrequency=0 fvpercent=0, tags(...)

it is just a matter of specifying option tags(). If you want to
add non-integer numeric categories, then you must use the bracket notation.
For example,

Code:

    collect get fvfrequency=0 fvpercent=0, tags(cat2[3.5] female[0])

First we change the syntax specification for option
categorical() from

Code:

        CATegorical(varlist)    ///

Code:

        CATegorical(string)    ///

and call a custom subroutine that parses the contents of the option into the
macro categorical containing the categorical variables and a new
macro that specifies where to add the zero results among the categorical
variables.

Code:

    ParseCategorical `categorical'
    local categorical `"`s(varlist)'"'
    local FVzeros `"`s(zeros)'"'

Here is how I coded this subroutine.

Code:

program ParseCategorical, sclass
    syntax [varlist(default=none)] [, ZEROs(string)]

    if `:list sizeof varlist' == 0 {
        if `:list sizeof zeros' {
            di as err ///
            "option {bf:zeros()} requires categorical variables"
            exit 198
        }
    }

    gettoken spec zeros : zeros , parse(" []")
    while `:length local spec' {
        capture noisily unab names : `spec'
        if c(rc) {
            di as err "in option {bf:zeros()}"
            exit c(rc)
        }
        foreach name of local names {
            if `:list posof "`name'" in varlist' == 0 {
                di as err "invalid {bf:zeros()} option"
                di as err ///
        "variable {bf:`name'} not found in list of categorical variables"
                exit 198
            }
        }
        gettoken open zeros : zeros , parse(" []")
        if `"`open'"' == "" {
            di as err "invalid {bf:zeros()} option"
            di as err `"nothing found where {bf:[} expected"'
            exit 198
        }
        if `"`open'"' != "[" {
            di as err "invalid {bf:zeros()} option"
            di as err `"{bf:`open'} found where {bf:[} expected"'
            exit 198
        }
        gettoken tok zeros : zeros , parse(" []")
        while !inlist(`"`tok'"', "", "]") {
            capture noisily confirm number `tok'
            if c(rc) {
                di as err "in option {bf:zeros()}"
                exit c(rc)
            }
            foreach name of local names {
                local ZEROS `ZEROS' `name'[`tok']
            }
            gettoken tok zeros : zeros , parse(" []")
        }
        if `"`tok'"' != "]" {
            di as err "invalid {bf:zeros()} option"
            di as err `"closing square bracket '{bf:]}' not found"'
            exit 198
        }
        gettoken spec zeros : zeros , parse(" []")
    }
    sreturn local varlist `"`varlist'"'
    sreturn local zeros `"`ZEROS'"'
end

After calling this parsing code and calling table we can add the
specified zero results. Here is a code snippet for how I did this.

Code:

    quietly collect levelsof `by'
    local by_levels = s(levels)
    foreach l of local by_levels {
        foreach z of local FVzeros {
            collect get fvfrequency=0 fvpercent=0, ///
                tags(`z' `by'[`l'])
        }
    }

I noticed a flaw in the autolevels logic for categorical variables. The
call to

Code:

         collect style autolevels `x' _hide `s(levels)', clear

could yield weird level orders if the levels do not all have the same number
of digits. To fix this I added a short Mata function call to fix the order of
the levels returned by collect levelsof. Here is the definition of the
Mata function I added at the end of the ado-file

Code:

mata:

void mw_table_sort_cat_levels()
{
        vector    levels
    real    vector    sel
    real    vector    order

    levels = tokens(st_global("s(levels)"))
    sel = levels :!= "_hide"
    order = order(strtoreal(select(levels,sel))', 1)
    levels = levels[order]
    st_global("s(levels)", invtokens(levels))
}

end

and here is how I modified the loop over the categorical variables

Code:

    foreach x of local categorical {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(`x'[_hide])
        collect style header `x', title(label)
        quietly collect levelsof `x'
        mata: mw_table_sort_cat_levels()
        collect style autolevels `x' _hide `s(levels)', clear
    }

Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 697

#17

23 Jul 2023, 21:11

Adding a second row for continuous results is somewhat trickier, but it
is possible with composite results and some extra tags.

As with the zeros, we change the syntax specification for option
continuous() from

Code:

        CONTinuous(varlist)    ///

Code:

        CONTinuous(string)    ///

and call a custom subroutine that parses the contents of the option into
the macro continuous containing the continuous variables and a
new macro that indicates a second row of range statistics is requested.

Code:

    ParseContinuous `continuous'
    local continuous `"`s(varlist)'"'
    local CVranges `"`s(ranges)'"'

Here is how I coded this subroutine.

Code:

program ParseContinuous, sclass
    syntax [varlist(default=none)] [, RANGEs]

    if `:list sizeof varlist' == 0 {
        if `:list sizeof ranges' {
            di as err ///
            "option {bf:ranges} requires continuous variables"
            exit 198
        }
    }

    sreturn local varlist `"`varlist'"'
    sreturn local ranges `"`ranges'"'
end

After calling this parsing code we need to add the range statistics in the
call to table, provided the ranges were requested. Here is how I
did this.

Code:

     if `:list sizeof continuous' {
         local CVopts    statistic(mean `continuous') ///
                 statistic(sd `continuous')
        if `:list sizeof CVranges' {
            local CVopts `CVopts'    ///
                statistic(min `continuous') ///
                statistic(max `continuous') ///
                statistic(iqr `continuous')
        }
     }

Adding a row for separate results for the cotinuous variables posed some
interesting challenges. I decided to add a new dimension named
__CONT where I placed the row header information and used to tag
the continuous variable results so that they can show up in separate
rows while still being specified as columns in the layout. I did this
by replacing

Code:

    collect composite define col1 = mean fvfrequency
    collect composite define col2 = sd fvpercent

with

Code:

    if `:length local CVranges' {
        collect composite define rangei = min max, trim
        collect style cell result[rangei], sformat("[%s]")
        collect style cell result[iqr], sformat("(%s)")
        local i 0
        foreach v of local continuous {
            local ++i
            quietly collect addtags __CONT[v`i'name], ///
                fortags(var[`v']#result[mean sd p nobs])
            quietly collect addtags __CONT[v`i'ranges], ///
                fortags(var[`v']#result[min max iqr])

            local lab : variable label `v'
            if `"`lab'"' == "" {
                local lab `v'
            }

            collect label levels __CONT ///
                v`i'name `"`lab'"' ///
                v`i'ranges "Range (IQR)"
        }
        collect style header __CONT, title(hide)
        local contspec __CONT
        local col1extra rangei
        local col2extra iqr
    }
    collect composite define col1 = mean fvfrequency `col1extra'
    collect composite define col2 = sd fvpercent `col2extra'

Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 697

#18

23 Jul 2023, 21:14

Here is the fully modified program. You may notice I also changed how I
collect results from anova. This new version collects only the
results of interest.

Code:

*! version 1.0.1  23jul2023
program mw_table
    version 17

    syntax ,            ///
        by(string asis)        ///
    [                ///
        BINary(varlist)        ///
        CATegorical(string)    ///
        CONTinuous(string)    ///
        GROUPed(string asis)    ///
        *            ///
    ]

    capture noisily ParseByOption `by'
    if c(rc) {
        di as err "in option {bf:by()}"
        exit c(rc)
    }
    local by = s(by)
    local byfirst "`s(first)'"
    local bylabel "`s(label)'"

    // Do not allow variables to be specified in more than one
    // option.

    ParseCategorical `categorical'
    local categorical `"`s(varlist)'"'
    local FVzeros `"`s(zeros)'"'

    ParseContinuous `continuous'
    local continuous `"`s(varlist)'"'
    local CVranges `"`s(ranges)'"'

    local duplist binary categorical continuous
    local k_duplist : list sizeof duplist

    forval i = 1/`k_duplist' {
        local opt1 : word `i' of `duplist'
        forval j = `=`i'+1'/`k_duplist' {
            local opt2 : word `j' of `duplist'
            local both : list `opt1' & `opt2'
            CheckDupVars "`both'" `opt1'() `opt2'()
        }
    }

    // Parse -grouped()- options.
    // Check that grouped variables are not specified in the other
    // options.
    // Check that grouped names are not specified in
    // the other options.
    // Remaining options go to -table-.

    local gid 0
    while `:length local grouped' {
        local ++gid
        capture noisily ParseGroupOption `grouped'
        if c(rc) {
            di as err "in option {bf:grouped()}"
            exit c(rc)
        }
        local group`gid'vars = s(varlist)
        local group`gid'name = s(name)
        local group`gid'label = s(label)
        forval i = 1/`k_duplist' {
            local opt1 : word `i' of `duplist'
            local both : list `opt1' & group`gid'vars
            CheckDupVars "`both'" `opt1'() group()
        }
        local both : list allgroupvars & group`gid'vars
        CheckDupVars "`both'" group() group()
        CheckDupNames `group`gid'name' `allgroupnames'
        local allgroupvars `allgroupvars' `group`gid'vars'
        local allgroupnames `allgroupnames' `group`gid'name'
        local 0 `", `options'"'
        syntax [, GROUPed(string asis) * ]
    }
    local k_grouped = `gid'

    // Check that grouped names are not also being used as
    // variables.

    local both : list allgroupnames & binary
    CheckNameVarConflict "`both'" binary()
    local both : list allgroupnames & categorical
    CheckNameVarConflict "`both'" categorical()
    local both : list allgroupnames & continuous
    CheckNameVarConflict "`both'" continuous()
    local both : list allgroupnames & allgroupvars
    CheckNameVarConflict "`both'" group()

    // Build the call to -table-.

    if `:list sizeof continuous' {
        local CVopts    statistic(mean `continuous') ///
                statistic(sd `continuous')
        if `:list sizeof CVranges' {
            local CVopts `CVopts'    ///
                statistic(min `continuous') ///
                statistic(max `continuous') ///
                statistic(iqr `continuous')
        }
    }
    local fvlist `binary' `categorical' `allgroupvars'
    if `:list sizeof fvlist' {
        local FVopts    statistic(fvfrequency `fvlist') ///
                statistic(fvpercent `fvlist')
    }

    quietly table () (`by' result), `CVopts' `FVopts' `options'

    quietly collect levelsof `by'
    local by_levels = s(levels)
    foreach l of local by_levels {
        foreach z of local FVzeros {
            collect get fvfrequency=0 fvpercent=0, ///
                tags(`z' `by'[`l'])
        }
    }

    if `:list sizeof continuous' {
        // -anova- needs a numercial by variable.
        local bytype : type `by'
        if substr("`bytype'",1,3) == "str" {
            tempvar numby
            encode `by', generate(`numby')
        }
        else {
            local numby `by'
        }

        // continuous variables layout specification
        local contspec var
        collect style autolevels var `continuous', clear
    }

    foreach x of local continuous {
        quietly anova `x' `numby'
        collect get nobs=(e(N)) p=Ftail(e(df_m),e(df_r),e(F)) ///
            , tag(var[`x'])
    }

    foreach x of local categorical {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(`x'[_hide])
        collect style header `x', title(label)
        quietly collect levelsof `x'
        mata: mw_table_sort_cat_levels()
        collect style autolevels `x' _hide `s(levels)', clear
    }

    foreach x of local binary {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
        collect style header `x'[1], title(label) level(hide)
        collect style autolevels `x' 1, clear
    }

    foreach x of local allgroupvars {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
        collect style autolevels `x' 1, clear
    }

    forval i = 1/`k_grouped' {
        local vars : copy local group`i'vars
        local name : copy local group`i'name
        local label : copy local group`i'label
        local j 0
        foreach x of local vars {
            local ++j
            local lab : variable label `x'
            if `"`lab'"' == "" {
                local lab `x'
            }
            quietly collect remap `x'[1] = `name'[`j']
            collect label levels `name' `j' `"`lab'"', modify
        }
        collect label dim `name' `"`label'"', modify
    }

    // Define some composites for the columns. This allows us to
    // let -table- handle result formats.

    if `:length local CVranges' {
        collect composite define rangei = min max, trim
        collect style cell result[rangei], sformat("[%s]")
        collect style cell result[iqr], sformat("(%s)")
        local i 0
        foreach v of local continuous {
            local ++i
            quietly collect addtags __CONT[v`i'name], ///
                fortags(var[`v']#result[mean sd p nobs])
            quietly collect addtags __CONT[v`i'ranges], ///
                fortags(var[`v']#result[min max iqr])

            local lab : variable label `v'
            if `"`lab'"' == "" {
                local lab `v'
            }

            collect label levels __CONT ///
                v`i'name `"`lab'"' ///
                v`i'ranges "Range (IQR)"
        }
        collect style header __CONT, title(hide)
        local contspec __CONT
        local col1extra rangei
        local col2extra iqr
    }
    collect composite define col1 = mean fvfrequency `col1extra'
    collect composite define col2 = sd fvpercent `col2extra'

    // p-value styles

    collect style cell result[p], nformat(%6.2f)
    collect label levels result p "p-value", modify

    // sample size styles

    collect style cell result[nobs], nformat(%18.0fc)
    collect label levels result nobs "N", modify

    // header styles

    collect style header `by', title(hide)
    collect style header result[col1 col2], level(hide)
    collect style row stack, nobinder spacer

    // border style

    collect style cell border_block, border(right, pattern(nil))

    // handle by Total styles

    if "`byfirst'" != "" {
        quietly collect levels `by'
        collect style autolevels `by' .m `s(levels)', clear
    }
    if `"`bylabel'"' != "" {
        collect label levels `by' .m `"`bylabel'"', modify
    }

    collect layout ///
        (`contspec' `categorical' `allgroupnames' `binary') ///
        (`by'#result[col1 col2] result[p nobs])
end

program ParseCategorical, sclass
    syntax [varlist(default=none)] [, ZEROs(string)]

    if `:list sizeof varlist' == 0 {
        if `:list sizeof zeros' {
            di as err ///
            "option {bf:zeros()} requires categorical variables"
            exit 198
        }
    }

    gettoken spec zeros : zeros , parse(" []")
    while `:length local spec' {
        capture noisily unab names : `spec'
        if c(rc) {
            di as err "in option {bf:zeros()}"
            exit c(rc)
        }
        foreach name of local names {
            if `:list posof "`name'" in varlist' == 0 {
                di as err "invalid {bf:zeros()} option"
                di as err ///
        "variable {bf:`name'} not found in list of categorical variables"
                exit 198
            }
        }
        gettoken open zeros : zeros , parse(" []")
        if `"`open'"' == "" {
            di as err "invalid {bf:zeros()} option"
            di as err `"nothing found where {bf:[} expected"'
            exit 198
        }
        if `"`open'"' != "[" {
            di as err "invalid {bf:zeros()} option"
            di as err `"{bf:`open'} found where {bf:[} expected"'
            exit 198
        }
        gettoken tok zeros : zeros , parse(" []")
        while !inlist(`"`tok'"', "", "]") {
            capture noisily confirm number `tok'
            if c(rc) {
                di as err "in option {bf:zeros()}"
                exit c(rc)
            }
            foreach name of local names {
                local ZEROS `ZEROS' `name'[`tok']
            }
            gettoken tok zeros : zeros , parse(" []")
        }
        if `"`tok'"' != "]" {
            di as err "invalid {bf:zeros()} option"
            di as err `"closing square bracket '{bf:]}' not found"'
            exit 198
        }
        gettoken spec zeros : zeros , parse(" []")
    }
    sreturn local varlist `"`varlist'"'
    sreturn local zeros `"`ZEROS'"'
end

program ParseContinuous, sclass
    syntax [varlist(default=none)] [, RANGEs]

    if `:list sizeof varlist' == 0 {
        if `:list sizeof ranges' {
            di as err ///
            "option {bf:ranges} requires continuous variables"
            exit 198
        }
    }

    sreturn local varlist `"`varlist'"'
    sreturn local ranges `"`ranges'"'
end

program ParseByOption, sclass
    syntax varname [, first label(string)]
    sreturn local by "`varlist'"
    sreturn local first `"`first'"'
    sreturn local label `"`label'"'
end

program ParseGroupOption, sclass
    syntax varlist , name(name) [label(string)]
    sreturn local varlist `"`varlist'"'
    sreturn local name `"`name'"'
    sreturn local label `"`label'"'
end

program CheckDupVars
    args vars opt1 opt2

    local k : list sizeof vars
    if `k' == 0 {
        exit
    }
    if `k' > 1 {
        local s s
    }
    if "`opt1'" == "`opt2'" {
        di as err ///
        "variable`s' duplicated in separate {bf:`opt1'} options"
    }
    else {
        di as err ///
        "variable`s' duplicated in options {bf:`opt1'} and {bf:`opt2'}"
    }
    di as err "{p}offending variable`s': {bf:`vars'}{p_end}"
    exit 198
end

program CheckDupNames
    gettoken first rest : 0
    if `:list posof "`first'" in rest' == 0 {
        exit
    }
    di as err "name {bf:`first'} used in more than one {bf:group()} option"
    exit 198
end

program CheckNameVarConflict
    args found opt

    local k : list sizeof found
    if `k' == 0 {
        exit
    }
    gettoken first : found
    di as err "{p}"
    di as err "{bf:grouped()} suboption {bf:name(`first')} is not allowed;{break}"
    di as err "variable {bf:`first'} was specified in option {bf:`opt'}"
    di as err "{p_end}"
    exit 198
end

mata:

void mw_table_sort_cat_levels()
{
        vector    levels
    real    vector    sel
    real    vector    order

    levels = tokens(st_global("s(levels)"))
    sel = levels :!= "_hide"
    order = order(strtoreal(select(levels,sel))', 1)
    levels = levels[order]
    st_global("s(levels)", invtokens(levels))
}

end

exit

Using the data from my last example I composed a do-file that does some
syntax and error message checks, adds some variable labels to verify the
new __CONT dimension picks them up, and call mw_table with
the new options for categorical variable levels with zero observations
and ranges for continuous variables.

Code:

rcof "noisily mw_table, by(female) categorical(, zero(cat1[0])) " == 198
rcof "noisily mw_table, by(female) categorical(cat?, zero(dude[0])) " == 111
rcof "noisily mw_table, by(female) categorical(cat2, zero(cat1[0])) " == 198
rcof "noisily mw_table, by(female) categorical(cat?, zero(cat1)) " == 198
rcof "noisily mw_table, by(female) categorical(cat?, zero(cat1 foo)) " == 198
rcof "noisily mw_table, by(female) categorical(cat?, zero(cat1[a])) " == 7
rcof "noisily mw_table, by(female) categorical(cat?, zero(cat1[1)) " == 198

label variable cont1 "Measure 1"
label variable cont2 "Measure 2"
label variable cont3 "Measure 3"

mw_table, ///
    by(female, first label(Overall)) ///
    binary(bin?) ///
    categorical(cat?, ///
        zeros(cat1[2.5] cat2[0] cat3[9 16]) ///
    ) ///
    continuous(cont?, ranges) ///
    grouped(nib?, name(nibs) label(Group1 indicators)) ///
    grouped(bib?, name(bibs) label(Group2 indicators)) ///
    nformat(%6.1f mean sd min max iqr) ///
    nformat(%6.1f fvpercent percent) ///
    sformat("%s%%" fvpercent percent) ///
    sformat("(%s)" sd) ///
    name(mytable)

Here is the resulting table. You'll notice that I enclose the range
values in square brackes instead of using a dash. I'm reluctant to use a
dash since continuous variables could have negative values (as in the
example).

Code:

---------------------------------------------------------------------------------------------------
                         Overall                Male                 Female         p-value       N
---------------------------------------------------------------------------------------------------
Measure 1                  5.0   (2.1)          4.5   (2.1)           5.5   (2.0)      0.00     956
Range (IQR)          [1.0 9.0]   (3.5)    [1.0 8.0]   (3.6)     [2.0 9.0]   (3.5)
Measure 2                  0.4   (3.1)         -0.1   (3.0)           0.9   (3.1)      0.00   1,000
Range (IQR)        [-8.6 12.7]   (4.2)   [-8.2 8.6]   (4.2)   [-8.6 12.7]   (4.0)
Measure 3                  0.0   (1.0)          0.1   (1.0)          -0.0   (1.1)      0.20   1,000
Range (IQR)         [-3.0 3.0]   (1.4)   [-2.4 3.0]   (1.3)    [-3.0 2.8]   (1.4)

cat1                                                                                   0.00   1,000
  1                         65    6.5%           65   13.0%             0    0.0%
  2                        414   41.4%          208   41.5%           206   41.3%
  2.5                        0    0.0%            0    0.0%             0    0.0%
  3                        521   52.1%          228   45.5%           293   58.7%

cat2                                                                                   0.05   1,000
  0                          0    0.0%            0    0.0%             0    0.0%
  1                        272   27.2%          143   28.5%           129   25.9%
  2                        228   22.8%           99   19.8%           129   25.9%
  3                        235   23.5%          113   22.6%           122   24.4%
  4                        265   26.5%          146   29.1%           119   23.8%

cat3                                                                                   0.24   1,000
  9                          0    0.0%            0    0.0%             0    0.0%
  11                       213   21.3%          104   20.8%           109   21.8%
  12                       194   19.4%           88   17.6%           106   21.2%
  13                       194   19.4%           98   19.6%            96   19.2%
  14                       198   19.8%           97   19.4%           101   20.2%
  15                       201   20.1%          114   22.8%            87   17.4%
  16                         0    0.0%            0    0.0%             0    0.0%

Group1 indicators
  nib1                     290   29.0%          152   30.3%           138   27.7%      0.35   1,000
  nib2                     215   21.5%          143   28.5%            72   14.4%      0.00   1,000
  nib3                     264   26.4%          130   25.9%           134   26.9%      0.75   1,000

Group2 indicators
  bib1                     710   71.0%          349   69.7%           361   72.3%      0.35   1,000
  bib2                     785   78.5%          358   71.5%           427   85.6%      0.00   1,000
  bib3                     736   73.6%          371   74.1%           365   73.1%      0.75   1,000

bin1                       645   64.5%          284   56.7%           361   72.3%      0.00   1,000

bin2                       496   49.6%          257   51.3%           239   47.9%      0.28   1,000
---------------------------------------------------------------------------------------------------

Comment

Matthew Wallace

Join Date: Jul 2023

Posts: 16
#19

25 Jul 2023, 15:47

Thanks so much, Jeff. I've got all of this working perfectly.
Comment

Announcement

Comment

Comment

Comment

Comment