Output tables - Summary statistics

Jay howard

Join Date: Jan 2017

Posts: 12
#1

Output tables - Summary statistics

02 Apr 2017, 09:04

Dear all

I've unfortunately been struggling for quite some time now with my output tables. I have tried using different functions/packages, including estout and outreg2 but have been unsuccessful so far. Would anyone be able to kindly help me achieve the following:

I have several variables ("var1" "var" and "var") and would like to create a table which gives their average on a yearly basis (ideally also including whether they are different from 0 on a statistically significant level). My output table would therefore ideally look something like this (also including a fifth column as the difference between var3 and var1)

year var 1 var2 var3 var3-var1

2000 0.2 0.3 0.3 0.1
2001 0.1 0.2 0.1 0.0
2002 0.2 0.1 0.2 0.0
... --- --- ---

I have so far been trying this, but apprently this doesn't work for more than two groups: estpost ttest var1 var2 var3 var4(var3-var1) , by(year)

I would be very grateful for any suggestions anyone may have.

Thanks in advance and best regards,
Jay
Tags: None

Attaullah Shah

Join Date: Aug 2014
Posts: 1669

02 Apr 2017, 10:23

By chance, I have a program, that I use for my personal projects. If that can be of any help, I am posting it here. Since the program was written for personal use, I have not yet written a help file for it. Following are some details
To download, type the following

Code:

net install ast, from(https://sites.google.com/site/imspeshawar)

To use it, here is a simple example

Code:

clear
webuse grunfeld
ast invest, by(year)

Which produces the following output file

Code:

         Means T-tests Results  
-----------------------------------------------   
Var:  Obs.    Mean invest    t-value
-----------------------------------------------
1935    10    72.746         2.202340964
1936    10    101.6069975    2.218223109
1937    10    122.4809995    2.282455622
1938    10    77.55499957    2.518295316
1939    10    80.52599823    2.334870578
1940    10    113.2650023    2.234818649
1941    10    139.7189988    2.347402201
1942    10    122.6650005    2.252323191
1943    10    117.7900013    2.204823267
1944    10    120.9250015    2.260694951
1945    10    124.1590019    2.327498541
1946    10    161.3589957    2.31997082
1947    10    147.135002     2.452109737
1948    10    153.9480011    2.532219125
1949    10    139.2439992    2.379833517
1950    10    151.0610014    2.285328082
1951    10    199.5830034    2.46709474
1952    10    224.0330017    2.396103548
1953    10    275.5830019    2.164894203
1954    10    273.7809953    1.948081547
============================================

Please note that currently, ast allows statistics over a single variable. Further, I am not sure whether it matches your criteria of aesthetics

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35616

03 Apr 2017, 03:49

Attaullah's program could do with some formatting options! I don't think t values ever deserve 10 significant figures, unless you are checking for numerical consistency with other programs.

I don't know whether there is a way to tweak (e.g.) esttab or outreg (SSC) or several of the other substantial tabulation programs to do this. The programs I have named are outstanding programs, but I have never used them. I don't often produce elaborate tables of regression results and tend to write my own code when I do.

That aside, I'd commend to beginners and experienced users alike a simpler strategy, more or less along the lines of the tutorial in http://www.stata-journal.com/sjpdf.h...iclenum=pr0053

1. Build up a dataset including results.

2. Reduce to the dataset you want.

The code here is less than it seems, as the first block is just building up a sandbox dataset for play, absent a decent data example in the original posting. (The Grunfeld dataset is a good sandbox, but I couldn't think of an example of getting the difference between two of the variables there that didn't seem really silly, even to my minimal inner economist.)

This example makes use of Tips 1, 3, 4, 6, 8 and 11 of the cited paper, offered as a miniature course on working out your own output. I didn't work hard at variable labels etc. People who ask for this often do that kind of thing in their word or text processor any way.

Code:

* just setting up sandbox 
clear 
set obs 200 
egen year = seq(), from(2001) to(2010) block(20)
set seed 2803 

forval j = 1/3 { 
    gen var`j' = rnormal() 
}

* easiest to calculate difference in advance 
gen var4 = var3 - var1 

* the work of putting results in variables starts here! 
bysort year: gen n = _N 

* loop over variables and over groups 
quietly forval j = 1/4 { 
     gen t`j' = . 
     gen mean`j' = . 
     
     forval y = 2001/2010 { 
         ttest var`j' = 0 if year == `y' 
         replace mean`j' = r(mu_1) if year == `y' 
         replace t`j' = r(t) if year == `y' 
     }
} 

* we have our results: they are constant within groups, so the means suffice 
collapse n mean? t?, by(year) 

format mean? t? %3.2f 

list, sep(0) 

     +---------------------------------------------------------------------------+
     | year    n   mean1   mean2   mean3   mean4      t1      t2      t3      t4 |
     |---------------------------------------------------------------------------|
  1. | 2001   20    0.31   -0.03   -0.22   -0.54    1.17   -0.13   -1.04   -1.75 |
  2. | 2002   20   -0.16    0.04    0.30    0.46   -0.79    0.18    1.17    1.23 |
  3. | 2003   20   -0.28    0.10   -0.23    0.05   -2.05    0.44   -1.11    0.19 |
  4. | 2004   20   -0.24   -0.20    0.03    0.26   -0.93   -1.16    0.13    0.97 |
  5. | 2005   20   -0.13   -0.18   -0.03    0.11   -0.64   -0.81   -0.12    0.32 |
  6. | 2006   20    0.30    0.55   -0.05   -0.35    1.23    2.54   -0.23   -1.09 |
  7. | 2007   20    0.09   -0.26   -0.10   -0.19    0.45   -1.13   -0.43   -0.50 |
  8. | 2008   20   -0.05    0.54   -0.31   -0.26   -0.18    2.46   -1.98   -0.94 |
  9. | 2009   20   -0.05    0.28    0.19    0.24   -0.23    1.24    0.99    0.76 |
 10. | 2010   20   -0.09   -0.19    0.27    0.36   -0.39   -0.86    0.92    0.95 |
     +---------------------------------------------------------------------------+

Naturally if it made more sense to list mean1 t1 mean2 t2 etc. you can just do that.

Comment

Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#4

03 Apr 2017, 07:17

Attaullah's program could do with some formatting options! I don't think t values ever deserve 10 significant figures, unless you are checking for numerical consistency with other programs.

Dear Nick, I did not intend to imply that 10 values are sufficient for a t-test. For lack of any other good example data, I just used grunfeld data set.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35616
#5

03 Apr 2017, 07:25

10 significant figures means how many digits are shown in display. See e.g. your high school science texts or https://en.wikipedia.org/wiki/Significant_figures

It's not another name for sample size!

Evidently by default your program produces an output file which doesn't limit how many digits are stored. That's fine, but I am suggesting that your program, or at least your users, need to exercise some rounding options for display. Perhaps you will build that in to future versions or perhaps you regard it as a separate issue.
Your choice entirely, but a reporting command should surely include options for controlling display.

I didn't look long at the code for ast, but note that it assumes that the by() variable is numeric.

Otherwise put, the number of figures displayed comes from whatever you used to show results, and is not a matter of the example data (which are fine for your purpose, but not for mine).

Last edited by Nick Cox; 03 Apr 2017, 07:30.
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#6

03 Apr 2017, 09:03

Thanks for the correction, I misread your message in #3. Thanks for our advice on using limited decimal points for t-statistics.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35616
#7

03 Apr 2017, 11:12

Attaullah: For your kind of problem as tackled by ast you might also want two or more grouping variables. Elsewhere you have used industry and year combinations. If you allow a general varlist as argument to by(), then you can do this inside any program, assuming a prior marksample:

Code:

tempvar group egen `group' = group(`by') if `touse' su `group', meanonly

Then you can cut out the levelsof call in favour of a loop over

Code:

forval j = 1/`r(max)' { .<code using -if `group' == `j'-> }

That way neither string variables nor two or more variables as arguments to by() are problematic.

You still have to tackle the output.

Meanwhile, consider what you could do with statsby as an alternative.

Last edited by Nick Cox; 03 Apr 2017, 11:18.
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#8

03 Apr 2017, 14:24

Thank you Nick for this helpful suggestions. I shall incorporate these suggestion in the next version of the program, which I am trying to make more dynamic using the putexcel capabilities.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Jay howard

Join Date: Jan 2017

Posts: 12
#9

03 Apr 2017, 16:29

Many thanks Attaullah and Nick. As always, your help is very much appreciated.

@Attaulah: your code does do what I want, so thank you for that.

@Nick: your code obviously has the advantage of calculating the results for several variables at the same time. I nevertheless have a quick follow-up question: If I understand your code correctly, you are currently attributing random numbers to each of the three variables using rnormal. For the interest of my study, I would rather use my own variables I've previously computed. Could you help me understand what would be the best way to feed them into the equation? Would I need to rename them or what would you suggest as the best solution?

Many thanks,
Jay
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35616
#10

03 Apr 2017, 16:43

Indeed: you should naturally use your own data, not random normal deviates. As I wrote

The code here is less than it seems, as the first block is just building up a sandbox dataset for play, absent a decent data example in the original posting.

and the code itself repeats the comment about a sandbox. Hereabouts a sandbox is just a place to play; the term may not translate to where you are, especially if your first language is not English.

Your data example showed one observation per year and so a t test for each year would just fail. It was easier to make up my own example using code.

You told us that your variables were named year var1 var2 var3 so I followed suit. If your real names are different then the code naturally won't work and you have a choice between changing the code (which I advise) and changing your variable names. One clear constraint is that var3 - var1 cannot be a legal variable name, so you need to choose something different. I chose var4 for the difference purely to make the code simpler. In general, I strongly advise informative, evocative variable names, but you can make your own choices.
Comment
Jay howard

Join Date: Jan 2017

Posts: 12
#11

03 Apr 2017, 17:36

Ok understood, thank you for the clarification. I apologize if I'm asking rather silly questions, but do I need to have my variables labeled as numbers as you did above. For instance, in my dataset, my three variables are named annual_CS, annual_CT, annual_CS. I tried doing the following to incorporate them into the loop, but get the error "too few variables specified".

* loop over variables and over groups
quietly foreach var of varlist annual_CS annual_CT annual_AS {
gen `t' = .
gen `mean' = .

forval y = 1994/2012 {
ttest `var' = 0 if year == `y'
replace `mean' = r(mu_1) if year == `y'
replace `t' = r(t) if year == `y'
}
}
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35616
#12

03 Apr 2017, 17:55

If that's the entirety of the code, then you're using local macros you never define and the code will fail when it sees

Code:

gen `t' = .

which is evaluated as

Code:

gen = .

as undefined macros are treated as empty strings. This is closer to what you want (and closer to my code). Note that you need different variables to hold means and t values for each original variable. Otherwise the outer loop would fail second time around as the variables being generated already exist (and in any case they would be overwritten with new results).

Code:

quietly foreach var of varlist annual_CS annual_CT annual_AS { gen t_`var' = . gen mean_`var' = . forval y = 1994/2012 { ttest `var' = 0 if year == `y' replace mean_`var' = r(mu_1) if year == `y' replace t_`var' = r(t) if year == `y' } }
1 like
Comment
Jay howard

Join Date: Jan 2017

Posts: 12
#13

04 Apr 2017, 16:26

Thank you very much Nick, I'm very grateful for your help with this. I definitely need to spend some more time looking into these loops, as I've rarely had to use them until now.

Have a great evening/day.

Best regards,
Jay
Comment

Announcement