Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Output tables - Summary statistics

    Dear all

    I've unfortunately been struggling for quite some time now with my output tables. I have tried using different functions/packages, including estout and outreg2 but have been unsuccessful so far. Would anyone be able to kindly help me achieve the following:

    I have several variables ("var1" "var" and "var") and would like to create a table which gives their average on a yearly basis (ideally also including whether they are different from 0 on a statistically significant level). My output table would therefore ideally look something like this (also including a fifth column as the difference between var3 and var1)


    year var 1 var2 var3 var3-var1

    2000 0.2 0.3 0.3 0.1
    2001 0.1 0.2 0.1 0.0
    2002 0.2 0.1 0.2 0.0
    ... --- --- ---


    I have so far been trying this, but apprently this doesn't work for more than two groups: estpost ttest var1 var2 var3 var4(var3-var1) , by(year)

    I would be very grateful for any suggestions anyone may have.

    Thanks in advance and best regards,
    Jay











  • #2
    By chance, I have a program, that I use for my personal projects. If that can be of any help, I am posting it here. Since the program was written for personal use, I have not yet written a help file for it. Following are some details
    To download, type the following
    Code:
    net install ast, from(https://sites.google.com/site/imspeshawar)
    To use it, here is a simple example
    Code:
    clear
    webuse grunfeld
    ast invest, by(year)
    Which produces the following output file

    Code:
             Means T-tests Results  
    -----------------------------------------------   
    Var:  Obs.    Mean invest    t-value
    -----------------------------------------------
    1935    10    72.746         2.202340964
    1936    10    101.6069975    2.218223109
    1937    10    122.4809995    2.282455622
    1938    10    77.55499957    2.518295316
    1939    10    80.52599823    2.334870578
    1940    10    113.2650023    2.234818649
    1941    10    139.7189988    2.347402201
    1942    10    122.6650005    2.252323191
    1943    10    117.7900013    2.204823267
    1944    10    120.9250015    2.260694951
    1945    10    124.1590019    2.327498541
    1946    10    161.3589957    2.31997082
    1947    10    147.135002     2.452109737
    1948    10    153.9480011    2.532219125
    1949    10    139.2439992    2.379833517
    1950    10    151.0610014    2.285328082
    1951    10    199.5830034    2.46709474
    1952    10    224.0330017    2.396103548
    1953    10    275.5830019    2.164894203
    1954    10    273.7809953    1.948081547
    ============================================
    Please note that currently, ast allows statistics over a single variable. Further, I am not sure whether it matches your criteria of aesthetics
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Attaullah's program could do with some formatting options! I don't think t values ever deserve 10 significant figures, unless you are checking for numerical consistency with other programs.

      I don't know whether there is a way to tweak (e.g.) esttab or outreg (SSC) or several of the other substantial tabulation programs to do this. The programs I have named are outstanding programs, but I have never used them. I don't often produce elaborate tables of regression results and tend to write my own code when I do.

      That aside, I'd commend to beginners and experienced users alike a simpler strategy, more or less along the lines of the tutorial in http://www.stata-journal.com/sjpdf.h...iclenum=pr0053

      1. Build up a dataset including results.

      2. Reduce to the dataset you want.

      The code here is less than it seems, as the first block is just building up a sandbox dataset for play, absent a decent data example in the original posting. (The Grunfeld dataset is a good sandbox, but I couldn't think of an example of getting the difference between two of the variables there that didn't seem really silly, even to my minimal inner economist.)

      This example makes use of Tips 1, 3, 4, 6, 8 and 11 of the cited paper, offered as a miniature course on working out your own output. I didn't work hard at variable labels etc. People who ask for this often do that kind of thing in their word or text processor any way.

      Code:
      * just setting up sandbox 
      clear 
      set obs 200 
      egen year = seq(), from(2001) to(2010) block(20)
      set seed 2803 
      
      forval j = 1/3 { 
          gen var`j' = rnormal() 
      }
      
      * easiest to calculate difference in advance 
      gen var4 = var3 - var1 
      
      * the work of putting results in variables starts here! 
      bysort year: gen n = _N 
      
      * loop over variables and over groups 
      quietly forval j = 1/4 { 
           gen t`j' = . 
           gen mean`j' = . 
           
           forval y = 2001/2010 { 
               ttest var`j' = 0 if year == `y' 
               replace mean`j' = r(mu_1) if year == `y' 
               replace t`j' = r(t) if year == `y' 
           }
      } 
      
      * we have our results: they are constant within groups, so the means suffice 
      collapse n mean? t?, by(year) 
      
      format mean? t? %3.2f 
      
      list, sep(0) 
      
           +---------------------------------------------------------------------------+
           | year    n   mean1   mean2   mean3   mean4      t1      t2      t3      t4 |
           |---------------------------------------------------------------------------|
        1. | 2001   20    0.31   -0.03   -0.22   -0.54    1.17   -0.13   -1.04   -1.75 |
        2. | 2002   20   -0.16    0.04    0.30    0.46   -0.79    0.18    1.17    1.23 |
        3. | 2003   20   -0.28    0.10   -0.23    0.05   -2.05    0.44   -1.11    0.19 |
        4. | 2004   20   -0.24   -0.20    0.03    0.26   -0.93   -1.16    0.13    0.97 |
        5. | 2005   20   -0.13   -0.18   -0.03    0.11   -0.64   -0.81   -0.12    0.32 |
        6. | 2006   20    0.30    0.55   -0.05   -0.35    1.23    2.54   -0.23   -1.09 |
        7. | 2007   20    0.09   -0.26   -0.10   -0.19    0.45   -1.13   -0.43   -0.50 |
        8. | 2008   20   -0.05    0.54   -0.31   -0.26   -0.18    2.46   -1.98   -0.94 |
        9. | 2009   20   -0.05    0.28    0.19    0.24   -0.23    1.24    0.99    0.76 |
       10. | 2010   20   -0.09   -0.19    0.27    0.36   -0.39   -0.86    0.92    0.95 |
           +---------------------------------------------------------------------------+
      Naturally if it made more sense to list mean1 t1 mean2 t2 etc. you can just do that.

      Comment


      • #4
        Attaullah's program could do with some formatting options! I don't think t values ever deserve 10 significant figures, unless you are checking for numerical consistency with other programs.
        Dear Nick, I did not intend to imply that 10 values are sufficient for a t-test. For lack of any other good example data, I just used grunfeld data set.
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          10 significant figures means how many digits are shown in display. See e.g. your high school science texts or https://en.wikipedia.org/wiki/Significant_figures

          It's not another name for sample size!

          Evidently by default your program produces an output file which doesn't limit how many digits are stored. That's fine, but I am suggesting that your program, or at least your users, need to exercise some rounding options for display. Perhaps you will build that in to future versions or perhaps you regard it as a separate issue.
          Your choice entirely, but a reporting command should surely include options for controlling display.

          I didn't look long at the code for ast, but note that it assumes that the by() variable is numeric.

          Otherwise put, the number of figures displayed comes from whatever you used to show results, and is not a matter of the example data (which are fine for your purpose, but not for mine).
          Last edited by Nick Cox; 03 Apr 2017, 07:30.

          Comment


          • #6
            Thanks for the correction, I misread your message in #3. Thanks for our advice on using limited decimal points for t-statistics.
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment


            • #7
              Attaullah: For your kind of problem as tackled by ast you might also want two or more grouping variables. Elsewhere you have used industry and year combinations. If you allow a general varlist as argument to by(), then you can do this inside any program, assuming a prior marksample:

              Code:
              tempvar group
              egen `group' = group(`by') if `touse'
              su `group', meanonly
              Then you can cut out the levelsof call in favour of a loop over

              Code:
              forval j = 1/`r(max)' {
                    .<code using -if `group' == `j'->
              }
              That way neither string variables nor two or more variables as arguments to by() are problematic.

              You still have to tackle the output.

              Meanwhile, consider what you could do with statsby as an alternative.
              Last edited by Nick Cox; 03 Apr 2017, 11:18.

              Comment


              • #8
                Thank you Nick for this helpful suggestions. I shall incorporate these suggestion in the next version of the program, which I am trying to make more dynamic using the putexcel capabilities.
                Regards
                --------------------------------------------------
                Attaullah Shah, PhD.
                Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                FinTechProfessor.com
                https://asdocx.com
                Check out my asdoc program, which sends outputs to MS Word.
                For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                Comment


                • #9
                  Many thanks Attaullah and Nick. As always, your help is very much appreciated.

                  @Attaulah: your code does do what I want, so thank you for that.

                  @Nick: your code obviously has the advantage of calculating the results for several variables at the same time. I nevertheless have a quick follow-up question: If I understand your code correctly, you are currently attributing random numbers to each of the three variables using rnormal. For the interest of my study, I would rather use my own variables I've previously computed. Could you help me understand what would be the best way to feed them into the equation? Would I need to rename them or what would you suggest as the best solution?

                  Many thanks,
                  Jay

                  Comment


                  • #10
                    Indeed: you should naturally use your own data, not random normal deviates. As I wrote

                    The code here is less than it seems, as the first block is just building up a sandbox dataset for play, absent a decent data example in the original posting.
                    and the code itself repeats the comment about a sandbox. Hereabouts a sandbox is just a place to play; the term may not translate to where you are, especially if your first language is not English.

                    Your data example showed one observation per year and so a t test for each year would just fail. It was easier to make up my own example using code.

                    You told us that your variables were named year var1 var2 var3 so I followed suit. If your real names are different then the code naturally won't work and you have a choice between changing the code (which I advise) and changing your variable names. One clear constraint is that var3 - var1 cannot be a legal variable name, so you need to choose something different. I chose var4 for the difference purely to make the code simpler. In general, I strongly advise informative, evocative variable names, but you can make your own choices.

                    Comment


                    • #11
                      Ok understood, thank you for the clarification. I apologize if I'm asking rather silly questions, but do I need to have my variables labeled as numbers as you did above. For instance, in my dataset, my three variables are named annual_CS, annual_CT, annual_CS. I tried doing the following to incorporate them into the loop, but get the error "too few variables specified".

                      * loop over variables and over groups
                      quietly foreach var of varlist annual_CS annual_CT annual_AS {
                      gen `t' = .
                      gen `mean' = .

                      forval y = 1994/2012 {
                      ttest `var' = 0 if year == `y'
                      replace `mean' = r(mu_1) if year == `y'
                      replace `t' = r(t) if year == `y'
                      }
                      }








                      Comment


                      • #12
                        If that's the entirety of the code, then you're using local macros you never define and the code will fail when it sees

                        Code:
                        gen `t' = .
                        which is evaluated as

                        Code:
                        gen = .
                        as undefined macros are treated as empty strings. This is closer to what you want (and closer to my code). Note that you need different variables to hold means and t values for each original variable. Otherwise the outer loop would fail second time around as the variables being generated already exist (and in any case they would be overwritten with new results).

                        Code:
                        quietly foreach var of varlist annual_CS annual_CT annual_AS { 
                            gen t_`var' = .
                            gen mean_`var' = .
                        
                            forval y = 1994/2012 {
                                ttest `var' = 0 if year == `y'
                                replace mean_`var' = r(mu_1) if year == `y'
                                replace t_`var' = r(t) if year == `y'
                            }
                        }

                        Comment


                        • #13
                          Thank you very much Nick, I'm very grateful for your help with this. I definitely need to spend some more time looking into these loops, as I've rarely had to use them until now.

                          Have a great evening/day.

                          Best regards,
                          Jay

                          Comment

                          Working...
                          X