Generate and export summary statistics by group using dtable

Anne-Claire Jo

Join Date: Feb 2021
Posts: 162

Generate and export summary statistics by group using dtable

12 May 2025, 07:28

Hello Stata users,

I am trying to generate and export summary statistic table using dtable command.
First of all, my data looks something like:

country	year	ind	LP_quantile	jcr	jdr
A	2016	3	0-10	0.1	0.2
A	2016	3	10-40	0.1	0.1
A	2016	3	40-60	0.1	0.1
A	2016	3	60-90	0.1	0.1
A	2016	3	90-100	0.1	0.1
A	2016	6	0-10	0.2	0.2
A	2016	6	10-40	0.2	0.2
A	2016	6	40-60	0.2	0.3
A	2016	6	60-90	0.3	0.3
A	2016	6	90-100	0.4	0.4
A	2017	3	0-10	0.4	0.5

.. and so on. It is panel data: country x year x industry.
I would like to generate two separate summary statistics of variables "jcr" and "jdr" by (quantile group) & by (quantile group*ind).

So, I began with the following code:

Code:

local vars_sumstat jcr jdr
local varlist `vars_sumstat'
local quant LP_quantile

collect clear

dtable , ///
    sample( , stat(freq)) ///
    cont(`varlist', stat(p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
    by(`quant')
    
collect label levels result frequency "N", modify
foreach n of numlist 1 5 10 25 50 75 90 95 99 {
    collect remap result = percentile, fortags(result[p`n'])
    collect label levels percentile p`n' "`n'%", modify
}

collect label dim percentile "Percentiles", modify
collect style header percentile, title(label)
collect style header result, title(hide) level(label)
collect style cell result[frequency] percentile, nformat(%3.0f)
collect style cell result[mean], nformat(%5.2f)
collect style cell percentile#cell_type[column-header], border(bottom, color(black))
collect style header var, title(hide) level(hide)

foreach v of varlist `varlist' {
    collect layout (var[`v']#quant) (result[frequency mean min max] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
    collect title "Distribution of `v' by quantile group"
    collect export "sum_stat.xlsx", sheet("`v'", replace) modify
}

However, the output says:

Code:

(dimension quant not found)
(level max of dimension result not found)
(level min of dimension result not found)

Collection: DTable
      Rows: var[jcr]#quant
   Columns: result[frequency mean min max] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99]

Your layout specification does not uniquely match any items. One or more of the following
dimensions might help uniquely match items: colname, LP_quantile, statcmd, var.
(collection DTable exported to file

and there weren't any table, but it was written:

Your layout specification does not uniquely match any items. One or more of the following dimensions might help uniquely match items: colname, LP_quantile, statcmd, var.

Could someone help me with this issue and get the desired table, please?

Thanks so much!

Tags: None

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1391

13 May 2025, 00:26

Your most critical error was to not enclose quant in single quotes, to allow for macro substitution. This should work:

Code:

local vars_sumstat jcr jdr
local varlist `vars_sumstat'
local quant LP_quantile

collect clear

dtable , ///
    sample( , stat(freq)) ///
    cont(`varlist', stat(mean min max p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
    by(`quant')

foreach n of numlist 1 5 10 25 50 75 90 95 99 {
    collect remap result = percentile, fortags(result[p`n'])
    collect label levels percentile p`n' "`n'%", modify
}

collect label levels result frequency "N" mean "Mean" min "Min" max "Max", replace
collect label dim percentile "Percentiles", modify
collect style header percentile, title(label)
collect style header result, title(hide) level(label)
collect style cell result[mean], nformat(%3.2f)
collect style cell result[min max]  percentile, nformat(%3.1f)
collect style cell percentile#cell_type[column-header], border(bottom, color(black))
collect style header var, title(hide) level(hide)

foreach v of varlist `varlist' {
    collect addtags var[`v'], fortags(result[frequency]) replace
    collect title "Distribution of `v' by quantile group", clear
    collect layout (var[`v']#`quant') (result[frequency mean min max] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
    collect export "sum_stat.xlsx", sheet("`v'", replace) modify
}

from which you get

Code:

Distribution of jcr by quantile group
---------------------------------------------------------------
             N Mean Min Max             Percentiles            
                            -----------------------------------
                             1%  5% 10% 25% 50% 75% 90% 95% 99%
---------------------------------------------------------------
LP_quantile                                                    
  0-10       3 0.23 0.1 0.4 0.1 0.1 0.1 0.1 0.2 0.4 0.4 0.4 0.4
  10-40      2 0.15 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
  40-60      2 0.15 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
  60-90      2 0.20 0.1 0.3 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.3 0.3
  90-100     2 0.25 0.1 0.4 0.1 0.1 0.1 0.1 0.3 0.4 0.4 0.4 0.4
  Total     11 0.20 0.1 0.4 0.1 0.1 0.1 0.1 0.2 0.3 0.4 0.4 0.4
---------------------------------------------------------------

and

Code:

Distribution of jdr by quantile group
---------------------------------------------------------------
             N Mean Min Max             Percentiles            
                            -----------------------------------
                             1%  5% 10% 25% 50% 75% 90% 95% 99%
---------------------------------------------------------------
LP_quantile                                                    
  0-10       3 0.30 0.2 0.5 0.2 0.2 0.2 0.2 0.2 0.5 0.5 0.5 0.5
  10-40      2 0.15 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
  40-60      2 0.20 0.1 0.3 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.3 0.3
  60-90      2 0.20 0.1 0.3 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.3 0.3
  90-100     2 0.25 0.1 0.4 0.1 0.1 0.1 0.1 0.3 0.4 0.4 0.4 0.4
  Total     11 0.23 0.1 0.5 0.1 0.1 0.1 0.1 0.2 0.3 0.4 0.5 0.5
---------------------------------------------------------------

Also, Anne-Claire Jo you've posted several times on this forum by now, so I know you know this already -- please post your data extract using the dataex command, not by copy-pasting a table.

Last edited by Hemanshu Kumar; 13 May 2025, 00:30.

Comment

Anne-Claire Jo

Join Date: Feb 2021

Posts: 162
#3

13 May 2025, 02:41

Hemanshu Kumar Thanks a lot for your help.
And apologies for the data, there was confidentiality issue so I just had to create a sample table to describe the dataset.

I tried with the code you provided, however, it is giving me this error:

Code:

. collect clear . . dtable , /// > sample( , stat(freq)) /// > cont(`varlist', stat(mean min max p1 p5 p10 p25 p50 p75 p90 p95 p99)) /// > by(`quant') no observations

I ensure that the dataset is quite heavy and it's weird that there's no observation..
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1391
#4

13 May 2025, 02:56

That's strange. I'm not sure what is going on. The code works fine on the data extract you provided. Remember to not run the code in parts, but at one go, with your dataset in memory.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35685
#5

13 May 2025, 03:03

Once again, reporting that the same code gives different results doesn't help us much without a data example to show what is going on at your end.

As the FAQ Advice underlines, confidentiality as a constraint just implies that your data example should be based on realistic but fake data, where realistic means only: use variables and values and data layout that reproduce the problem.

My only guess is that your local macros are not visible to the command that fails. You could check that by putting

Code:

set trace on

just before the command to see if the local macros are indeed visible. More on this pitfall at https://journals.sagepub.com/doi/pdf...36867X20931028

That is, we need to see the end of the trace just before the error message.
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1391

13 May 2025, 03:07

Originally posted by Nick Cox View Post

My only guess is that your local macros are not visible to the command that fails.

That was my first thought too. But I tested that on the data extract, and it still did not produce that error.

Code:

. clear

. input str1 country      int year        byte ind        str6 LP_quantile        float(jcr       jdr)

       country      year       ind  LP_quan~e        jcr        jdr
  1. A       2016    3       "0-10"  0.1     0.2
  2. A       2016    3       "10-40" 0.1     0.1
  3. A       2016    3       "40-60" 0.1     0.1
  4. A       2016    3       "60-90" 0.1     0.1
  5. A       2016    3       "90-100"        0.1     0.1
  6. A       2016    6       "0-10"  0.2     0.2
  7. A       2016    6       "10-40" 0.2     0.2
  8. A       2016    6       "40-60" 0.2     0.3
  9. A       2016    6       "60-90" 0.3     0.3
 10. A       2016    6       "90-100"        0.4     0.4
 11. A       2017    3       "0-10"  0.4     0.5
 12. end

. 
. /*
> local vars_sumstat jcr jdr
> local varlist `vars_sumstat'
> local quant LP_quantile
> */
. collect clear

. 
. dtable , ///
>     sample( , stat(freq)) ///
>     cont(`varlist', stat(mean min max p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
>     by(`quant')

---------
  Summary
---------
N      11
---------

Comment

Anne-Claire Jo

Join Date: Feb 2021

Posts: 162
#7

13 May 2025, 03:37

Hemanshu Kumar Sorry for the confusion, I made an error in the previous commands (that are irrelvant here) and that made the observations dropped.
But it's working super well! Thanks a lot for your help
Comment

Announcement

Generate and export summary statistics by group using dtable

Comment

Comment

Comment

Comment

Comment

Comment