Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using esttab to output multiple tabstat, by(varname) cammands with missing cells and spaces in label values

    Hi all,

    When using esttab to output statistics created by multiple tabstat commands, there are times when one or more of the cells defined by the tabstat’s by(varname)
    will be empty. This does not normally create an issue even when combining multiple tabstat estimates, some with and some without empty cells. However, if there is a space in the label values of the tabstat by variable, the output can become incorrectly formatted. For example, this code produces correct results.

    Code:
    #delimit ;
       sysuse auto;  
      
       label define rep78 1 "One" 2 "Two" 3 "Three" 4 "Four" 5 "Five-or-more";
       label values rep78 rep78;
         
       eststo : estpost tabstat price if foreign==0, by(rep78)
                        statistics(n mean semean) columns(statistics) nototal;
       eststo : estpost tabstat price if foreign==1, by(rep78)
                        statistics(n mean semean) columns(statistics) nototal;
    
       esttab ,
              cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( ))))
              replace label unstack noobs nonumbers collabels(none)
              mlabels("Domestic" "Foreign") nomtitles nonotes;
       eststo clear;
    
    ----------------------------------------------
                             Domestic      Foreign
    ----------------------------------------------
    One                      4564.500            
                            (369.500)            
    Two                      5967.625            
                           (1265.494)            
    Three                    6607.074     4828.667
                            (704.611)    (742.249)
    Four                     5881.556     6261.444
                            (530.673)    (632.031)
    Five-or-more             4204.500     6292.667
                            (220.500)    (921.876)
    ----------------------------------------------

    However, if I add a space to five or more, I get this.


    Code:
       label define rep78 5 "Five or more", modify;
    
       eststo : estpost tabstat price if foreign==0, by(rep78)
                        statistics(n mean semean) columns(statistics) nototal;
       eststo : estpost tabstat price if foreign==1, by(rep78)
                        statistics(n mean semean) columns(statistics) nototal;
    
       esttab ,
              cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( ))))
              replace label unstack noobs nonumbers collabels(none)
              mlabels("Domestic" "Foreign") nomtitles nonotes;
       eststo clear;
    
    ----------------------------------------------
                             Domestic      Foreign
    ----------------------------------------------
    One                      4564.500            
                            (369.500)            
    Two                      5967.625            
                           (1265.494)            
    Three                    6607.074     4828.667
                            (704.611)    (742.249)
    Four                     5881.556     6261.444
                            (530.673)    (632.031)
    Five or more             4204.500            
                            (220.500)            
    Five                                  6292.667
                                         (921.876)
    ----------------------------------------------

    Obviously I can work around this by not using spaces in
    label values, but is there a solution that will allow me to include the spaces?

  • #2
    estout is from Stata Journal, as you are asked to explain (refer to FAQ Advice #12). See

    Code:
    help quotes
    to know how to properly define a label with embedded spaces. In particular, you will want to specify the label as

    Code:
    `""label with a space""' 
    In this way, esttab will store the variable label in a macro named -e(labels)-, after which you should use the option

    Code:
    esttab...., varlabels(`e(labels)')
    However, if you are using the same label across models, just define the label using the -varlabels()- option of esttab directly

    Code:
    sysuse auto, clear
    eststo clear
    eststo : estpost tabstat price if foreign==0, by(rep78) ///
    statistics(n mean semean) columns(statistics) nototal
    eststo : estpost tabstat price if foreign==1, by(rep78) ///
    statistics(n mean semean) columns(statistics) nototal
    local labels 1 "One" 2 "Two" 3 "Three" 4 "Four" 5 "Five or more"
    esttab ,cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( )))) ///
    replace label unstack noobs nonumbers collabels(none) ///
    mlabels("Domestic" "Foreign") nomtitles nonotes varlabels(`labels')
    Res.:

    Code:
    . esttab ,cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( )))) ///
    > replace label unstack noobs nonumbers collabels(none) ///
    > mlabels("Domestic" "Foreign") nomtitles nonotes varlabels(`labels')
    
    ----------------------------------------------
                             Domestic      Foreign
    ----------------------------------------------
    One                      4564.500            
                            (369.500)            
    Two                      5967.625            
                           (1265.494)            
    Three                    6607.074     4828.667
                            (704.611)    (742.249)
    Four                     5881.556     6261.444
                            (530.673)    (632.031)
    Five or more             4204.500     6292.667
                            (220.500)    (921.876)
    ----------------------------------------------
    Last edited by Andrew Musau; 06 Jun 2020, 10:56.

    Comment


    • #3

      Hi Andrew,

      Thanks for your reply. I could not get the esttab command to format properly when defining the value labels with embedded spaces. This is the best results I could get.

      Code:
      sysuse auto, clear
      eststo clear
      label define rep78_lbl 1 `"One"' 2 `"Two"' 3 `"Three"' 4 `"Four"' 5 `""Five or more""'
      label values rep78 rep78_lbl
      eststo : estpost tabstat price if foreign==0, by(rep78) ///
                       statistics(n mean semean) columns(statistics) nototal
      eststo : estpost tabstat price if foreign==1, by(rep78) ///
                       statistics(n mean semean) columns(statistics) nototal
      esttab , cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( )))) ///
               label unstack noobs nonumbers collabels(none) varlabels(`e(labels)') ///
               mlabels("Domestic" "Foreign") nomtitles nonotes
      
      
      ----------------------------------------------
                               Domestic      Foreign
      ----------------------------------------------
      1                        4564.500            
                              (369.500)            
      2                        5967.625            
                             (1265.494)            
      Three                    6607.074     4828.667
                              (704.611)    (742.249)
      Four                     5881.556     6261.444
                              (530.673)    (632.031)
      "Five or more"           4204.500     6292.667
                              (220.500)    (921.876)
      ----------------------------------------------
      I know I can get the correct results if I restore the first estimates before the esttab command.
      Code:
      estimates restore est1
      However, in a for each loop, I would not know which estimates to restore. Plus, I could not get this to work without showing the quotes around “Five or more”.

      Your code that defines the labels in the local macro works. However, it breaks down if there are already value labels assigned to the variable. For example:
      Code:
      sysuse auto, clear
      eststo clear
      label define rep78_lbl 1 "One" 2 "Two" 3 "Three" 4 "Four" 5 "Five or more"
      label values rep78 rep78_lbl
      eststo : estpost tabstat price if foreign==0, by(rep78) ///
      statistics(n mean semean) columns(statistics) nototal
      eststo : estpost tabstat price if foreign==1, by(rep78) ///
      statistics(n mean semean) columns(statistics) nototal
      local labels 1 "One" 2 "Two" 3 "Three" 4 "Four" 5 "Five or more"
      esttab , cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( )))) ///
      replace label unstack noobs nonumbers collabels(none) ///
      mlabels("Domestic" "Foreign") nomtitles nonotes varlabels(`labels')
      
      ----------------------------------------------
                               Domestic      Foreign
      ----------------------------------------------
      One                      4564.500            
                              (369.500)            
      Two                      5967.625            
                             (1265.494)            
      Three                    6607.074     4828.667
                              (704.611)    (742.249)
      Four                     5881.556     6261.444
                              (530.673)    (632.031)
      Five or more             4204.500            
                              (220.500)            
      Five                                  6292.667
                                           (921.876)
      ----------------------------------------------
      The bigger issue is that, as you indicated, this would only work if I am using the same labels across all models, which is not always the case. Also, it seems risky to manually define the variable labels in every Stata program in a project. I prefer to define variables and labels one time in one place. Building off your code, this is what I came up with.
      Code:
      sysuse auto, clear
      eststo clear
      
      label define rep78_lbl 1 "One" 2 "Two" 3 "Three" 4 "Four" 5 "Five or more"
      label values rep78 rep78_lbl
      
      levelsof rep78, local(rep78_levels)
      foreach val of local rep78_levels {
        local rep78_`val' : label rep78_lbl `val'
        local labels = `"`labels' "' + "`val' " + `""`rep78_`val''""'
      }
      
      label values rep78 .
      
      eststo : estpost tabstat price if foreign==0, by(rep78) ///
                       statistics(n mean semean) columns(statistics) nototal
      eststo : estpost tabstat price if foreign==1, by(rep78) ///
                       statistics(n mean semean) columns(statistics) nototal
      
      esttab , cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( )))) ///
      replace label unstack noobs nonumbers collabels(none) ///
      mlabels("Domestic" "Foreign") nomtitles nonotes varlabels(`labels')
      
      label values rep78 rep78_lbl
      
      ----------------------------------------------
                               Domestic      Foreign
      ----------------------------------------------
      One                      4564.500            
                              (369.500)            
      Two                      5967.625            
                             (1265.494)            
      Three                    6607.074     4828.667
                              (704.611)    (742.249)
      Four                     5881.556     6261.444
                              (530.673)    (632.031)
      Five or more             4204.500     6292.667
                              (220.500)    (921.876)
      ----------------------------------------------
      I am not sure why it is necessary to drop the value labels when I specify varlabels(`labels'). Is there a more straightforward solution?

      Comment


      • #4
        The problem you are working around is introduced by tabstat. In Stata, variable names do not contain spaces and this was the principle guiding Ben when writing the estout code. However, tabstat uses labels in place of names, thus violating the no spaces rule.

        The bigger issue is that, as you indicated, this would only work if I am using the same labels across all models, which is not always the case.
        I do not get why then you will want to merge models where there is no common element between them. If you mean that some models have more categories than others, then you need the full set of value labels, which is why we put this is a macro at the end and include this using the option -varlabels()-.

        I know I can get the correct results if I restore the first estimates before the esttab command.
        Code:
        estimates restore est1
        However, in a for each loop, I would not know which estimates to restore.
        Indeed, if you use this approach, you need to use the labels extracted from the model with the full set of labels. There is no intelligent combination of labels stored in the macro -e(labels)-. Therefore, if no model contains the full set of labels, you are in trouble. As you point out, you always need to know which model this is in advance to restore it just before running the esttab command. Bottom line, it's safest and easier to have the full set of labels stored in your own macro.

        Plus, I could not get this to work without showing the quotes around “Five or more”.

        Your code that defines the labels in the local macro works. However, it breaks down if there are already value labels assigned to the variable.
        Finally to address these two points, which are what you are ultimately interested in, I will note that there is a further step needed to resolve the display of the quotes. But because the main aim of my approach was to get the labels stored in the macro -e(labels)-, I will not pursue this extra step solution as it is less efficient than directly including the option -elabel- in the estpost command. Therefore, below I illustrate how you can revise both sets of code.

        Code:
        sysuse auto, clear
        eststo clear
        label define rep78_lbl 1 "One" 2 "Two" 3 "Three" 4 "Four" 5 "Five or more"
        label values rep78 rep78_lbl
        eststo : estpost tabstat price if foreign==0, by(rep78) ///
                         statistics(n mean semean) columns(statistics) nototal elabels
        eststo : estpost tabstat price if foreign==1, by(rep78) ///
                         statistics(n mean semean) columns(statistics) nototal elabels
        est restore est1
        esttab , cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( )))) ///
                 label unstack noobs nonumbers collabels(none) varlabels(`e(labels)') ///
                 mlabels("Domestic" "Foreign") nomtitles nonotes

        Code:
        sysuse auto, clear
        eststo clear
        label define rep78_lbl 1 "One" 2 "Two" 3 "Three" 4 "Four" 5 "Five or more"
        label values rep78 rep78_lbl
        levelsof rep78, local(rep78_levels)
        foreach val of local rep78_levels {
          local rep78_`val' : label rep78_lbl `val'
          local labels = `"`labels' "' + "`val' " + `""`rep78_`val''""'
        }
        eststo : estpost tabstat price if foreign==0, by(rep78) ///
                         statistics(n mean semean) columns(statistics) nototal elabels
        eststo : estpost tabstat price if foreign==1, by(rep78) ///
                         statistics(n mean semean) columns(statistics) nototal elabels
        esttab , cells(mean(fmt(%12.3f)) semean(fmt(%12.3f) par(( )))) ///
        replace label unstack noobs nonumbers collabels(none) ///
        mlabels("Domestic" "Foreign") nomtitles nonotes varlabels(`labels')
        Last edited by Andrew Musau; 07 Jun 2020, 02:45.

        Comment


        • #5

          Andrew,

          Thanks!

          I do not get why then you will want to merge models where there is no common element between them.
          I cannot think of a situation where I would want to merge models with no common element in a single table. However, I often write foreach loops that create multiple tables for a set of variables where each table uses a different by(`varname’).

          I did not know about the -elabel- option in the estpost command. I cannot find this option documented anywhere. Other than looking at the estpost code, I do not know how you knew about it. I have actually been frustrated for years thinking that estpost did not have a way to always store labels in e(labels) because it solves much more common problems than the one I asked about in this thread. Specifically, situations where some sets of labels contain a “.” or are over 30 characters, but others are not.

          This is super helpful.

          Brian

          Comment


          • #6
            I did not know about the -elabel- option in the estpost command. I cannot find this option documented anywhere. Other than looking at the estpost code, I do not know how you knew about it.
            In
            Code:
            help estpost
            under tabstat, you will find

            elabels to enforce saving the by() values/labels in macro e(labels).
            Do you have the latest version?

            Code:
            . which estpost
            c:\ado\plus\e\estpost.ado
            *! version 1.2.0  13jun2019  Ben Jann

            Comment


            • #7
              No, I did not have the latest version of estpost. I just did adoupdate, update. Thanks again.

              Comment

              Working...
              X