Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tables for descriptive and correlations

    Dear Forum,

    I would like to create a table for descriptive statistics and a table for correlations. I have panel data and three dependent variables (emissions, tq and rdi), one independent variable (purpose) and seven control variables.
    When I run the regressions, stata automatically drops the observations that are empty in one place and that stata cannot use.
    For the tables, stata does not do this and accordingly the data for the variables are skewed. Namely, for example, the mean values are wrong, because more values are considered than later in the regression.
    Is there a command so that only those observations are included in the two tables that are also included in the regressions? My current commands are these:

    asdoc sum emissions tq rdi purpose_l2 dirty_l2 b2c_l2 cf_l2 growth_l2 capexi_l2 adexi_l2 sl_l2 fs_l2 om_l2, label s(count sum mean max min variance sd) dec(3) save(Descriptive statistics.doc)replace

    asdoc pwcorr emissions tq rdi purpose_l2 dirty_l2 b2c_l2 cf_l2 growth_l2 capexi_l2 adexi_l2 sl_l2 fs_l2 om_l2, obs label star(0.05) fhc(\b) fhr(\b) sig dec(3) save(Correlations table) replace

    Thank you very much and best regards,
    Jana

  • #2
    Code:
    . sysuse nlsw88, clear
    (NLSW, 1988 extract)
    
    .
    . //creat an indicator variable that marks all observations that will be used
    . gen touse = !missing(grade, wage, hours, tenure)
    
    .
    .
    . // ======================================================= tabel of descriptives
    . local vars grade wage hours tenure
    
    .
    . //create the basic table
    . table (var) (result) if touse, stat(count `vars') ///
    >                                stat(total `vars') ///
    >                                stat(mean  `vars') ///
    >                                stat(max   `vars') ///
    >                                stat(min   `vars') ///
    >                                stat(var   `vars') ///
    >                                stat(sd    `vars')
    
    --------------------------------------------------------------------------------------------------------------------------------------------
                            |  Number of nonmissing values      Total       Mean   Maximum value   Minimum value   Variance   Standard deviation
    ------------------------+-------------------------------------------------------------------------------------------------------------------
    Current grade completed |                        2,225     29,155   13.10337              18               0    6.36431             2.522758
    Hourly wage             |                        2,225   17,359.7   7.802101        40.74659        1.004952   33.27648             5.768577
    Usual hours worked      |                        2,225     82,853    37.2373              80               1   109.9392             10.48519
    Job tenure (years)      |                        2,225   13,293.2   5.974457        25.91667               0   30.32255             5.506591
    --------------------------------------------------------------------------------------------------------------------------------------------
    
    .                       
    . // determine the format of numbers                   
    . collect style cell result[count] , nformat(%7.0fc)                      
    
    . collect style cell result[total] , nformat(%8.0fc)
    
    . collect style cell result[mean] , nformat(%7.1fc)
    
    . collect style cell result[min] , nformat(%7.1fc)
    
    . collect style cell result[max] , nformat(%7.1fc)
    
    . collect style cell result[variance] , nformat(%7.1fc)
    
    . collect style cell result[sd] , nformat(%7.1fc)
    
    .
    . // I don't like the vertical line separating variables from statistics
    . collect style cell border_block, border(right, pattern(nil))
    
    .
    . // shorten the statistics names
    . collect label levels result count "N"        ///
    >                             max   "maximum"  ///
    >                             min   "minimum"  ///
    >                             sd    "std. dev.", modify
    
    .
    . // admire the result                            
    . collect preview
    
    -----------------------------------------------------------------------------------------
                                 N    Total   Mean   maximum   minimum   Variance   std. dev.
    -----------------------------------------------------------------------------------------
    Current grade completed  2,225   29,155   13.1      18.0       0.0        6.4         2.5
    Hourly wage              2,225   17,360    7.8      40.7       1.0       33.3         5.8
    Usual hours worked       2,225   82,853   37.2      80.0       1.0      109.9        10.5
    Job tenure (years)       2,225   13,293    6.0      25.9       0.0       30.3         5.5
    -----------------------------------------------------------------------------------------
    
    .
    . // now you can use -collect export- to export the table to Word
    .
    . // ========================================================== correlation matrix
    . // create the basic table
    . collect get r(C) : corr grade wage hours tenure
    (obs=2,225)
    
                 |    grade     wage    hours   tenure
    -------------+------------------------------------
           grade |   1.0000
            wage |   0.3256   1.0000
           hours |   0.0891   0.1588   1.0000
          tenure |   0.1208   0.1779   0.1607   1.0000
    
    
    . collect layout (rowname) (colname)
    
    Collection: Table
          Rows: rowname
       Columns: colname
       Table 1: 4 x 4
    
    --------------------------------------------------------------------------------------------------------
                             Current grade completed   Hourly wage   Usual hours worked   Job tenure (years)
    --------------------------------------------------------------------------------------------------------
    Current grade completed                        1      .3255688             .0891138             .1207746
    Hourly wage                             .3255688             1             .1588106             .1779269
    Usual hours worked                      .0891138      .1588106                    1             .1606707
    Job tenure (years)                      .1207746      .1779269             .1606707                    1
    --------------------------------------------------------------------------------------------------------
    
    .
    . // determine format of numbers
    . collect style cell rowname, nformat(%7.3f)
    
    .
    . // admire the result
    . collect preview
    
    --------------------------------------------------------------------------------------------------------
                             Current grade completed   Hourly wage   Usual hours worked   Job tenure (years)
    --------------------------------------------------------------------------------------------------------
    Current grade completed                    1.000         0.326                0.089                0.121
    Hourly wage                                0.326         1.000                0.159                0.178
    Usual hours worked                         0.089         0.159                1.000                0.161
    Job tenure (years)                         0.121         0.178                0.161                1.000
    --------------------------------------------------------------------------------------------------------
    
    .
    . // now you can use -collect export- to export the table to Word
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hello Maarten,

      thank you very much for the code, it helped me a lot!

      However, I still have a few problems with the creation of the correlation table. When I enter the code from you, I get the error: "Your layout specification does not uniquely match any items. Dimension cmdset might help uniquely match items.". When I leave out the -collect layout (rowname) (colname)- part, the table strangely outputs the info from the descriptive table.

      Furthermore, in the correlation table, do I also need to add again an indicator variable that marks all observations that will be used, or is that already included in your code?

      Thank you so much!

      Comment


      • #4
        Here is the do file without the output. You need to run everything. (Maybe you forgot the line collect get r(C) : corr grade wage hours tenure ?) Another thing to look at is whether your version of Stata is up to date. Type update query to find that out.

        Assuming that the list of variables is also the list of variables used in your models (dependent and independent variables, then no, you don't need touse for correlate. It automatically ignores all observations with one or more missing values on any of the variables specified. It doesn't do any harm, but it won't help either.

        Code:
        sysuse nlsw88, clear
        
        //creat an indicator variable that marks all observations that will be used
        gen touse = !missing(grade, wage, hours, tenure)
        
        // ======================================================= tabel of descriptives
        local vars grade wage hours tenure
        
        //create the basic table
        table (var) (result) if touse, stat(count `vars') ///
                                       stat(total `vars') ///
                                       stat(mean  `vars') ///
                                       stat(max   `vars') ///
                                       stat(min   `vars') ///
                                       stat(var   `vars') ///
                                       stat(sd    `vars')
        
        // determine the format of numbers                   
        collect style cell result[count] , nformat(%7.0fc)                      
        collect style cell result[total] , nformat(%8.0fc)
        collect style cell result[mean] , nformat(%7.1fc)
        collect style cell result[min] , nformat(%7.1fc)
        collect style cell result[max] , nformat(%7.1fc)
        collect style cell result[variance] , nformat(%7.1fc)
        collect style cell result[sd] , nformat(%7.1fc)
        
        // I don't like the vertical line separating variables from statistics
        collect style cell border_block, border(right, pattern(nil))
        
        
        // shorten the statistics names
        collect label levels result count "N"        ///
                                    max   "maximum"  ///
                                    min   "minimum"  ///
                                    sd    "std. dev.", modify
        
        // admire the result                            
        collect preview
        
        // now you can use -collect export- to export the table to Word
        
        // ========================================================== correlation matrix
        // create the basic table
        collect get r(C) : corr grade wage hours tenure
        collect layout (rowname) (colname)
        
        // determine format of numbers
         collect style cell rowname, nformat(%7.3f)
        
        // admire the result
        collect preview
        
        // now you can use -collect export- to export the table to Word
        Last edited by Maarten Buis; 09 Mar 2023, 08:38.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Thank you very much again, I updated stata and now it works! This really helped me a lot!
          One last question would be if it is also possible to leave out the values above the 1.000 values? Thanks in advance

          Comment


          • #6
            Originally posted by Jana Schue View Post
            One last question would be if it is also possible to leave out the values above the 1.000 values?
            I don't know what that means

            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              I think my question was a bit misleading, sorry for that!
              In the correlation table, there are always the same variables in row and column in a straight line from top left to bottom right, resulting in a 1.0 in the table. This makes the values double because each combination of variables occurs twice, once above this 1.0 line and once below. I have often seen it in papers that to avoid the duplicate values, only the values below this line with 1.0 standing values remain and those above are empty. If this is possible to show, I would be very grateful for a tip!

              Comment

              Working...
              X