Matrices from tabstat: formating, combining & exporting

Mel de Lange

Join Date: Feb 2022

Posts: 21
#1

Matrices from tabstat: formating, combining & exporting

16 Jun 2023, 12:18

Hi there,

I am using Stata 16 and am ultimately trying to create a table that shows the % of e.g. males/female, different age categories etc. that are an insomnia case (insomniacase is a binary yes/no variable).

I can get the data I need using the following code:

Code:

foreach varname of varlist sex age_cat ethnicity { tab `varname' insomniacase, row nofreq }

However, I would then have to copy & paste each individual number from the Stata output into a word table which my supervisor doesn't want me to do as I actually have 30 variables(all binary/categorical) and am likely to have to repeat this analysis numerous times in different iterations.

He has suggested I use the tabstat command as follows:

Code:

tabstat insomniacase, by(age_cat) stats(mean semean) save return list matrix A= r(Stat1) \ r(Stat2) \ r(Stat3) \ r(Stat4) svmat A

(FYI we have included semean here as we will use this in some plots later on).

However, this gives me the data I need in a single column (8x1) and without row names (the name of each age category), when what I actually want is to have a matrix in the same format as the tabstat output (4 x 2) with the names of the different categories). How can I do this? I have tried transposing the matrix (e.g. mat A= A') but that just puts the data into one row rather than one column....

Once I've got my matrix for one variable in the right format I'll need to repeat this so that I have a matrix for each of my variables. Is there then a way to then combine all of my matrices and then export the final big matrices into excel or word?

I am new to Stata and have never used tabstat or matrices before so any help gratefully received!

Many thanks,

Mel
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35726

17 Jun 2023, 04:18

This is perhaps an example of the x-y problem. https://xyproblem.info/

tabstat is a very useful command I use often, but I wouldn't choose to work with it here, for the reason that prompts this post, how to collate the results. But tabstatmat is on SSC. If anyone reads its help file, be advised: that child has left home, and I will not advise on how to handle it, despite some lingering affection.

Setting aside the SE of the mean, which calls just for an extension of the main ideas here, you want to loop over variables and over categories of variables and compile results for a binary outcome of percent yes, which is just 100 times the mean of a (0, 1) indicator.

There is I guess an application of the new table machinery that makes this easier, but I fall back on what I know better.

Without a data example, I reach for a similar example with a binary outcome and various predictors. To show willing, I also add a silly string variable just in case you have any string categorical variables.

Code:

webuse nlswork, clear 

gen frogtoad = cond(runiform() < 0.5, "frog", "toad")

* start here 
gen varname = ""
gen category = "" 
gen percent = . 

* first results will go in observation 1 
local i = 1 

* use your variables, naturally 
quietly foreach v in race msp nev_mar frogtoad { 
    capture confirm numeric variable `v'
    local isnumeric = _rc == 0 
    
    levelsof `v', local(levels)
        
    tokenize `"`r(levels)'"' 
        
    * loop over the distinct levels of that predictor 
    forval j = 1/`r(r)' { 
        * predictor is numeric: show value label if defined, otherwise value if numeric 
        if `isnumeric' { 
            su union if `v' == ``j'', meanonly 
            local show : label (`v') ``j''
        } 
        
        * predictor is string 
        else {
            su union if `v' == "``j''", meanonly
            local show "``j''"
        } 
        
        replace varname = "`v'" in `i'
        replace category = "`show'" in `i'
        replace percent = 100 * r(mean) in `i'
        
        * bump up the observation number 
        local ++i 
    }
}

format percent %2.1f 
list varname category percent if percent < . , sepby(varname)

      +-------------------------------+
       |  varname   category   percent |
       |-------------------------------|
    1. |     race      White      20.7 |
    2. |     race      Black      30.4 |
    3. |     race      Other      20.9 |
       |-------------------------------|
    4. |      msp          0      25.5 |
    5. |      msp          1      22.2 |
       |-------------------------------|
    6. |  nev_mar          0      23.0 |
    7. |  nev_mar          1      25.1 |
       |-------------------------------|
    8. | frogtoad       frog      23.5 |
    9. | frogtoad       toad      23.4 |
       +-------------------------------+

There is a presumption here that the dataset is big enough to hold the results you want. If that's not true, there will be a solution using postfile or frames.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35726

17 Jun 2023, 09:42

Now extended to include SE of mean.

Code:

webuse nlswork, clear 

gen frogtoad = cond(runiform() < 0.5, "frog", "toad")

* start here 
gen varname = ""
gen category = "" 
gen percent = . 
gen SE = . 

* first results will go in observation 1 
local i = 1 

* use your variables, naturally 
quietly foreach v in race msp nev_mar frogtoad { 
    capture confirm numeric variable `v'
    local isnumeric = _rc == 0 
    
    levelsof `v', local(levels)
        
    tokenize `"`r(levels)'"' 
        
    * loop over the distinct levels of that predictor 
    forval j = 1/`r(r)' { 
        * predictor is numeric: show value label if defined, otherwise value if numeric 
        if `isnumeric' { 
            su union if `v' == ``j''
            local show : label (`v') ``j''
        } 
        
        * predictor is string 
        else {
            su union if `v' == "``j''" 
            local show "``j''"
        } 
        
        replace varname = "`v'" in `i'
        replace category = "`show'" in `i'
        replace percent = 100 * r(mean) in `i'
        replace SE = 100 * r(sd) / sqrt(r(N)) in `i' 
        
        * bump up the observation number 
        local ++i 
    }
}

format percent  %2.1f
format SE %3.2f  
list varname category percent SE if percent < . , sepby(varname)

Comment

Mel de Lange

Join Date: Feb 2022

Posts: 21
#4

19 Jun 2023, 09:48

Hi Nick,
Thank you so much for taking the time to do that - it has been really, really useful. And yes, I see your point about the x y problem!
Thanks again,
Mel
1 like
Comment

Announcement