Tables for descriptive and correlations

Jana Schue

Join Date: Oct 2021

Posts: 116
#1

Tables for descriptive and correlations

09 Mar 2023, 02:55

Dear Forum,

I would like to create a table for descriptive statistics and a table for correlations. I have panel data and three dependent variables (emissions, tq and rdi), one independent variable (purpose) and seven control variables.
When I run the regressions, stata automatically drops the observations that are empty in one place and that stata cannot use.
For the tables, stata does not do this and accordingly the data for the variables are skewed. Namely, for example, the mean values are wrong, because more values are considered than later in the regression.
Is there a command so that only those observations are included in the two tables that are also included in the regressions? My current commands are these:

asdoc sum emissions tq rdi purpose_l2 dirty_l2 b2c_l2 cf_l2 growth_l2 capexi_l2 adexi_l2 sl_l2 fs_l2 om_l2, label s(count sum mean max min variance sd) dec(3) save(Descriptive statistics.doc)replace

asdoc pwcorr emissions tq rdi purpose_l2 dirty_l2 b2c_l2 cf_l2 growth_l2 capexi_l2 adexi_l2 sl_l2 fs_l2 om_l2, obs label star(0.05) fhc(\b) fhr(\b) sig dec(3) save(Correlations table) replace

Thank you very much and best regards,
Jana
Tags: None

Maarten Buis

Join Date: Mar 2014
Posts: 3460

09 Mar 2023, 03:42

Code:

. sysuse nlsw88, clear
(NLSW, 1988 extract)

.
. //creat an indicator variable that marks all observations that will be used
. gen touse = !missing(grade, wage, hours, tenure)

.
.
. // ======================================================= tabel of descriptives
. local vars grade wage hours tenure

.
. //create the basic table
. table (var) (result) if touse, stat(count `vars') ///
>                                stat(total `vars') ///
>                                stat(mean  `vars') ///
>                                stat(max   `vars') ///
>                                stat(min   `vars') ///
>                                stat(var   `vars') ///
>                                stat(sd    `vars')

--------------------------------------------------------------------------------------------------------------------------------------------
                        |  Number of nonmissing values      Total       Mean   Maximum value   Minimum value   Variance   Standard deviation
------------------------+-------------------------------------------------------------------------------------------------------------------
Current grade completed |                        2,225     29,155   13.10337              18               0    6.36431             2.522758
Hourly wage             |                        2,225   17,359.7   7.802101        40.74659        1.004952   33.27648             5.768577
Usual hours worked      |                        2,225     82,853    37.2373              80               1   109.9392             10.48519
Job tenure (years)      |                        2,225   13,293.2   5.974457        25.91667               0   30.32255             5.506591
--------------------------------------------------------------------------------------------------------------------------------------------

.                       
. // determine the format of numbers                   
. collect style cell result[count] , nformat(%7.0fc)                      

. collect style cell result[total] , nformat(%8.0fc)

. collect style cell result[mean] , nformat(%7.1fc)

. collect style cell result[min] , nformat(%7.1fc)

. collect style cell result[max] , nformat(%7.1fc)

. collect style cell result[variance] , nformat(%7.1fc)

. collect style cell result[sd] , nformat(%7.1fc)

.
. // I don't like the vertical line separating variables from statistics
. collect style cell border_block, border(right, pattern(nil))

.
. // shorten the statistics names
. collect label levels result count "N"        ///
>                             max   "maximum"  ///
>                             min   "minimum"  ///
>                             sd    "std. dev.", modify

.
. // admire the result                            
. collect preview

-----------------------------------------------------------------------------------------
                             N    Total   Mean   maximum   minimum   Variance   std. dev.
-----------------------------------------------------------------------------------------
Current grade completed  2,225   29,155   13.1      18.0       0.0        6.4         2.5
Hourly wage              2,225   17,360    7.8      40.7       1.0       33.3         5.8
Usual hours worked       2,225   82,853   37.2      80.0       1.0      109.9        10.5
Job tenure (years)       2,225   13,293    6.0      25.9       0.0       30.3         5.5
-----------------------------------------------------------------------------------------

.
. // now you can use -collect export- to export the table to Word
.
. // ========================================================== correlation matrix
. // create the basic table
. collect get r(C) : corr grade wage hours tenure
(obs=2,225)

             |    grade     wage    hours   tenure
-------------+------------------------------------
       grade |   1.0000
        wage |   0.3256   1.0000
       hours |   0.0891   0.1588   1.0000
      tenure |   0.1208   0.1779   0.1607   1.0000


. collect layout (rowname) (colname)

Collection: Table
      Rows: rowname
   Columns: colname
   Table 1: 4 x 4

--------------------------------------------------------------------------------------------------------
                         Current grade completed   Hourly wage   Usual hours worked   Job tenure (years)
--------------------------------------------------------------------------------------------------------
Current grade completed                        1      .3255688             .0891138             .1207746
Hourly wage                             .3255688             1             .1588106             .1779269
Usual hours worked                      .0891138      .1588106                    1             .1606707
Job tenure (years)                      .1207746      .1779269             .1606707                    1
--------------------------------------------------------------------------------------------------------

.
. // determine format of numbers
. collect style cell rowname, nformat(%7.3f)

.
. // admire the result
. collect preview

--------------------------------------------------------------------------------------------------------
                         Current grade completed   Hourly wage   Usual hours worked   Job tenure (years)
--------------------------------------------------------------------------------------------------------
Current grade completed                    1.000         0.326                0.089                0.121
Hourly wage                                0.326         1.000                0.159                0.178
Usual hours worked                         0.089         0.159                1.000                0.161
Job tenure (years)                         0.121         0.178                0.161                1.000
--------------------------------------------------------------------------------------------------------

.
. // now you can use -collect export- to export the table to Word

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Jana Schue

Join Date: Oct 2021

Posts: 116
#3

09 Mar 2023, 06:02

Hello Maarten,

thank you very much for the code, it helped me a lot!

However, I still have a few problems with the creation of the correlation table. When I enter the code from you, I get the error: "Your layout specification does not uniquely match any items. Dimension cmdset might help uniquely match items.". When I leave out the -collect layout (rowname) (colname)- part, the table strangely outputs the info from the descriptive table.

Furthermore, in the correlation table, do I also need to add again an indicator variable that marks all observations that will be used, or is that already included in your code?

Thank you so much!
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3460

09 Mar 2023, 08:35

Here is the do file without the output. You need to run everything. (Maybe you forgot the line collect get r(C) : corr grade wage hours tenure ?) Another thing to look at is whether your version of Stata is up to date. Type update query to find that out.

Assuming that the list of variables is also the list of variables used in your models (dependent and independent variables, then no, you don't need touse for correlate. It automatically ignores all observations with one or more missing values on any of the variables specified. It doesn't do any harm, but it won't help either.

Code:

sysuse nlsw88, clear

//creat an indicator variable that marks all observations that will be used
gen touse = !missing(grade, wage, hours, tenure)

// ======================================================= tabel of descriptives
local vars grade wage hours tenure

//create the basic table
table (var) (result) if touse, stat(count `vars') ///
                               stat(total `vars') ///
                               stat(mean  `vars') ///
                               stat(max   `vars') ///
                               stat(min   `vars') ///
                               stat(var   `vars') ///
                               stat(sd    `vars')

// determine the format of numbers                   
collect style cell result[count] , nformat(%7.0fc)                      
collect style cell result[total] , nformat(%8.0fc)
collect style cell result[mean] , nformat(%7.1fc)
collect style cell result[min] , nformat(%7.1fc)
collect style cell result[max] , nformat(%7.1fc)
collect style cell result[variance] , nformat(%7.1fc)
collect style cell result[sd] , nformat(%7.1fc)

// I don't like the vertical line separating variables from statistics
collect style cell border_block, border(right, pattern(nil))


// shorten the statistics names
collect label levels result count "N"        ///
                            max   "maximum"  ///
                            min   "minimum"  ///
                            sd    "std. dev.", modify

// admire the result                            
collect preview

// now you can use -collect export- to export the table to Word

// ========================================================== correlation matrix
// create the basic table
collect get r(C) : corr grade wage hours tenure
collect layout (rowname) (colname)

// determine format of numbers
 collect style cell rowname, nformat(%7.3f)

// admire the result
collect preview

// now you can use -collect export- to export the table to Word

Last edited by Maarten Buis; 09 Mar 2023, 08:38.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Jana Schue

Join Date: Oct 2021

Posts: 116
#5

09 Mar 2023, 09:01

Thank you very much again, I updated stata and now it works! This really helped me a lot!
One last question would be if it is also possible to leave out the values above the 1.000 values? Thanks in advance
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3460
#6

09 Mar 2023, 11:09

Originally posted by Jana Schue View Post

One last question would be if it is also possible to leave out the values above the 1.000 values?

I don't know what that means

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#7

09 Mar 2023, 12:20

I think my question was a bit misleading, sorry for that!
In the correlation table, there are always the same variables in row and column in a straight line from top left to bottom right, resulting in a 1.0 in the table. This makes the values double because each combination of variables occurs twice, once above this 1.0 line and once below. I have often seen it in papers that to avoid the duplicate values, only the values below this line with 1.0 standing values remain and those above are empty. If this is possible to show, I would be very grateful for a tip!
Comment

Announcement