Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matrices from tabstat: formating, combining & exporting

    Hi there,

    I am using Stata 16 and am ultimately trying to create a table that shows the % of e.g. males/female, different age categories etc. that are an insomnia case (insomniacase is a binary yes/no variable).

    I can get the data I need using the following code:
    Code:
    foreach varname of varlist sex age_cat ethnicity {
    tab `varname' insomniacase, row nofreq
    }
    However, I would then have to copy & paste each individual number from the Stata output into a word table which my supervisor doesn't want me to do as I actually have 30 variables(all binary/categorical) and am likely to have to repeat this analysis numerous times in different iterations.


    He has suggested I use the tabstat command as follows:
    Code:
    tabstat insomniacase, by(age_cat) stats(mean semean) save
    return list
    matrix A= r(Stat1) \ r(Stat2) \ r(Stat3) \ r(Stat4)
    svmat A
    (FYI we have included semean here as we will use this in some plots later on).

    However, this gives me the data I need in a single column (8x1) and without row names (the name of each age category), when what I actually want is to have a matrix in the same format as the tabstat output (4 x 2) with the names of the different categories). How can I do this? I have tried transposing the matrix (e.g. mat A= A') but that just puts the data into one row rather than one column....

    Once I've got my matrix for one variable in the right format I'll need to repeat this so that I have a matrix for each of my variables. Is there then a way to then combine all of my matrices and then export the final big matrices into excel or word?

    I am new to Stata and have never used tabstat or matrices before so any help gratefully received!

    Many thanks,

    Mel




  • #2
    This is perhaps an example of the x-y problem. https://xyproblem.info/

    tabstat is a very useful command I use often, but I wouldn't choose to work with it here, for the reason that prompts this post, how to collate the results. But tabstatmat is on SSC. If anyone reads its help file, be advised: that child has left home, and I will not advise on how to handle it, despite some lingering affection.

    Setting aside the SE of the mean, which calls just for an extension of the main ideas here, you want to loop over variables and over categories of variables and compile results for a binary outcome of percent yes, which is just 100 times the mean of a (0, 1) indicator.

    There is I guess an application of the new table machinery that makes this easier, but I fall back on what I know better.

    Without a data example, I reach for a similar example with a binary outcome and various predictors. To show willing, I also add a silly string variable just in case you have any string categorical variables.

    Code:
    webuse nlswork, clear 
    
    gen frogtoad = cond(runiform() < 0.5, "frog", "toad")
    
    * start here 
    gen varname = ""
    gen category = "" 
    gen percent = . 
    
    * first results will go in observation 1 
    local i = 1 
    
    * use your variables, naturally 
    quietly foreach v in race msp nev_mar frogtoad { 
        capture confirm numeric variable `v'
        local isnumeric = _rc == 0 
        
        levelsof `v', local(levels)
            
        tokenize `"`r(levels)'"' 
            
        * loop over the distinct levels of that predictor 
        forval j = 1/`r(r)' { 
            * predictor is numeric: show value label if defined, otherwise value if numeric 
            if `isnumeric' { 
                su union if `v' == ``j'', meanonly 
                local show : label (`v') ``j''
            } 
            
            * predictor is string 
            else {
                su union if `v' == "``j''", meanonly
                local show "``j''"
            } 
            
            replace varname = "`v'" in `i'
            replace category = "`show'" in `i'
            replace percent = 100 * r(mean) in `i'
            
            * bump up the observation number 
            local ++i 
        }
    }
    
    format percent %2.1f 
    list varname category percent if percent < . , sepby(varname)
    
          +-------------------------------+
           |  varname   category   percent |
           |-------------------------------|
        1. |     race      White      20.7 |
        2. |     race      Black      30.4 |
        3. |     race      Other      20.9 |
           |-------------------------------|
        4. |      msp          0      25.5 |
        5. |      msp          1      22.2 |
           |-------------------------------|
        6. |  nev_mar          0      23.0 |
        7. |  nev_mar          1      25.1 |
           |-------------------------------|
        8. | frogtoad       frog      23.5 |
        9. | frogtoad       toad      23.4 |
           +-------------------------------+
    There is a presumption here that the dataset is big enough to hold the results you want. If that's not true, there will be a solution using postfile or frames.

    Comment


    • #3
      Now extended to include SE of mean.

      Code:
      webuse nlswork, clear 
      
      gen frogtoad = cond(runiform() < 0.5, "frog", "toad")
      
      * start here 
      gen varname = ""
      gen category = "" 
      gen percent = . 
      gen SE = . 
      
      * first results will go in observation 1 
      local i = 1 
      
      * use your variables, naturally 
      quietly foreach v in race msp nev_mar frogtoad { 
          capture confirm numeric variable `v'
          local isnumeric = _rc == 0 
          
          levelsof `v', local(levels)
              
          tokenize `"`r(levels)'"' 
              
          * loop over the distinct levels of that predictor 
          forval j = 1/`r(r)' { 
              * predictor is numeric: show value label if defined, otherwise value if numeric 
              if `isnumeric' { 
                  su union if `v' == ``j''
                  local show : label (`v') ``j''
              } 
              
              * predictor is string 
              else {
                  su union if `v' == "``j''" 
                  local show "``j''"
              } 
              
              replace varname = "`v'" in `i'
              replace category = "`show'" in `i'
              replace percent = 100 * r(mean) in `i'
              replace SE = 100 * r(sd) / sqrt(r(N)) in `i' 
              
              * bump up the observation number 
              local ++i 
          }
      }
      
      format percent  %2.1f
      format SE %3.2f  
      list varname category percent SE if percent < . , sepby(varname)

      Comment


      • #4
        Hi Nick,
        Thank you so much for taking the time to do that - it has been really, really useful. And yes, I see your point about the x y problem!
        Thanks again,
        Mel

        Comment

        Working...
        X