Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Formatting the by() variable in tabstat

    Hello,

    I am working with a database structured at the firm-year-month level. I am using --tabstat-- to create some summary statistics by the time variable. If I use the option "columns(statistics)", --tabstat-- retains the formatting of the time variable (%tm). However, if I use "columns(variables)", it does not retain the formatting. Is this behavior expected or a bug? Is there a way to format the time variable as %tm in the last table (please, see formating of the "year_month" variable in the first three tables versus the last one in the code below)? If not, can I achieve what I want using --table--? I tried --table--, but was facing the same issue.

    Any thoughts are appreciated.

    Code:
    version 17
    cls
    clear all
    
    set seed 928502853
    set obs 24
    gen year_month = tm(2010,Jan) + _n - 1
    format %tm year_month
    
    expand 30
    bysort year_month: gen firm_id = _n
    
    foreach num of numlist 1(1)3 {
        gen var_`num' = rnormal()
    }
    
    tabstat var_1, by(year_month) statistics(mean sd min p25 p50 p75 max count) columns(statistics) format(%9.2fc)
    tabstat var_2, by(year_month) statistics(mean sd min p25 p50 p75 max count) columns(statistics) format(%9.2fc)
    tabstat var_3, by(year_month) statistics(mean sd min p25 p50 p75 max count) columns(statistics) format(%9.2fc)
    
    tabstat var_1 var_2 var_3, by(year_month) statistics(mean) columns(variables) format(%9.2fc)

  • #2
    I've found labmask (from SSC written by Nick Cox) is useful for instances such as these. You can also specify your desired format in your string() call.

    Code:
    cls
    clear all
    
    set seed 928502853
    set obs 24
    gen year_month = tm(2010,Jan) + _n - 1
    format %tm year_month
    
    expand 30
    bysort year_month: gen firm_id = _n
    
    foreach num of numlist 1(1)3 {
        gen var_`num' = rnormal()
    }
    
    
    gen month = string(year_month, "%tm")
    labmask year_month, val(month)
    drop month
    
    tabstat var_1, by(year_month) statistics(mean sd min p25 p50 p75 max count) columns(statistics) format(%9.2fc)
    tabstat var_2, by(year_month) statistics(mean sd min p25 p50 p75 max count) columns(statistics) format(%9.2fc)
    tabstat var_3, by(year_month) statistics(mean sd min p25 p50 p75 max count) columns(statistics) format(%9.2fc)
    
    tabstat var_1 var_2 var_3, by(year_month) statistics(mean) columns(variables) format(%9.2fc)
    Code:
    year_mo~h |      mean        sd       min       p25       p50       p75       max         N
    ----------+--------------------------------------------------------------------------------
       2010m1 |      0.02      1.05     -2.50     -0.91      0.26      0.93      1.46     30.00
       2010m2 |     -0.06      1.13     -3.33     -0.82      0.02      0.68      1.94     30.00
       2010m3 |      0.04      1.02     -2.10     -0.52     -0.08      0.46      3.13     30.00
       2010m4 |      0.11      0.92     -1.80     -0.55      0.22      0.66      1.83     30.00
       2010m5 |      0.58      0.85     -1.25     -0.07      0.43      1.20      2.20     30.00
       2010m6 |     -0.11      0.88     -2.04     -0.66     -0.13      0.39      1.49     30.00
       2010m7 |      0.21      0.95     -2.21     -0.57      0.27      0.72      2.55     30.00
       2010m8 |     -0.13      1.15     -2.65     -0.94     -0.02      0.54      2.01     30.00
       2010m9 |      0.16      0.80     -1.65     -0.34      0.10      0.69      1.90     30.00
      2010m10 |      0.14      0.88     -1.63     -0.57      0.19      0.85      1.57     30.00
      2010m11 |     -0.42      1.27     -2.16     -1.36     -0.72      0.52      2.57     30.00
      2010m12 |      0.29      1.01     -1.54     -0.60      0.39      1.03      2.51     30.00
       2011m1 |     -0.00      0.97     -2.67     -0.48     -0.03      0.57      2.18     30.00
       2011m2 |      0.26      1.17     -2.05     -0.52      0.19      0.98      2.94     30.00
       2011m3 |     -0.19      1.41     -2.87     -1.25     -0.33      0.92      3.39     30.00
       2011m4 |     -0.13      0.67     -1.37     -0.45     -0.18      0.31      1.17     30.00
       2011m5 |      0.11      0.88     -1.52     -0.45      0.11      0.48      1.95     30.00
       2011m6 |      0.08      1.20     -2.41     -0.36      0.25      0.86      2.28     30.00
       2011m7 |     -0.12      1.28     -2.72     -1.17      0.07      0.76      2.59     30.00
       2011m8 |     -0.31      0.91     -2.22     -0.97     -0.47      0.15      1.92     30.00
       2011m9 |      0.08      1.10     -1.88     -0.66     -0.03      0.82      2.20     30.00
      2011m10 |      0.08      1.09     -2.02     -0.72      0.28      0.83      2.15     30.00
      2011m11 |      0.12      0.98     -1.46     -0.51     -0.10      0.72      2.24     30.00
      2011m12 |      0.01      0.80     -1.47     -0.57     -0.03      0.57      1.62     30.00
    ----------+--------------------------------------------------------------------------------
        Total |      0.04      1.03     -3.33     -0.62      0.05      0.72      3.39    720.00
    -------------------------------------------------------------------------------------------
    Last edited by Justin Niakamal; 19 Jul 2021, 17:43.

    Comment


    • #3
      Hi Justin,

      Thank you very much for your response. I had tried multiple workarounds without success. It seems the way to go is to encode the time variable, and --labmask-- makes things much easier! Thank you, and thanks to Nick Cox for --labmask--!

      For those that are having the same issue, --labmask-- is not on SSC, but on the Stata Journal's webpage. It can be installed with the following command:

      Code:
      net install "http://www.stata-journal.com/software/sj8-2/gr0034"
      Finally, just to be clear, the output that Justin reports above can be obtained without the use of --encode-- or --labmask--. It is the last --tabstat-- command that requires one of the two commands.

      Code:
      tabstat var_1 var_2 var_3, by(year_month) statistics(mean) columns(variables) format(%9.2fc)
      Justin Niakamal, thank you one more time. Regards,

      Rafael

      Comment


      • #4
        Glad labmask is useful. On SSC it is part of the labutil package, but it’s better cited in terms of a Stata Journal link, given the associated 2008 paper.

        Comment

        Working...
        X