Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate and export summary statistics by group using dtable

    Hello Stata users,

    I am trying to generate and export summary statistic table using dtable command.
    First of all, my data looks something like:
    country year ind LP_quantile jcr jdr
    A 2016 3 0-10 0.1 0.2
    A 2016 3 10-40 0.1 0.1
    A 2016 3 40-60 0.1 0.1
    A 2016 3 60-90 0.1 0.1
    A 2016 3 90-100 0.1 0.1
    A 2016 6 0-10 0.2 0.2
    A 2016 6 10-40 0.2 0.2
    A 2016 6 40-60 0.2 0.3
    A 2016 6 60-90 0.3 0.3
    A 2016 6 90-100 0.4 0.4
    A 2017 3 0-10 0.4 0.5
    .. and so on. It is panel data: country x year x industry.
    I would like to generate two separate summary statistics of variables "jcr" and "jdr" by (quantile group) & by (quantile group*ind).

    So, I began with the following code:
    Code:
    local vars_sumstat jcr jdr
    local varlist `vars_sumstat'
    local quant LP_quantile
    
    collect clear
    
    dtable , ///
        sample( , stat(freq)) ///
        cont(`varlist', stat(p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
        by(`quant')
        
    collect label levels result frequency "N", modify
    foreach n of numlist 1 5 10 25 50 75 90 95 99 {
        collect remap result = percentile, fortags(result[p`n'])
        collect label levels percentile p`n' "`n'%", modify
    }
    
    collect label dim percentile "Percentiles", modify
    collect style header percentile, title(label)
    collect style header result, title(hide) level(label)
    collect style cell result[frequency] percentile, nformat(%3.0f)
    collect style cell result[mean], nformat(%5.2f)
    collect style cell percentile#cell_type[column-header], border(bottom, color(black))
    collect style header var, title(hide) level(hide)
    
    foreach v of varlist `varlist' {
        collect layout (var[`v']#quant) (result[frequency mean min max] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
        collect title "Distribution of `v' by quantile group"
        collect export "sum_stat.xlsx", sheet("`v'", replace) modify
    }
    However, the output says:
    Code:
    (dimension quant not found)
    (level max of dimension result not found)
    (level min of dimension result not found)
    
    Collection: DTable
          Rows: var[jcr]#quant
       Columns: result[frequency mean min max] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99]
    
    Your layout specification does not uniquely match any items. One or more of the following
    dimensions might help uniquely match items: colname, LP_quantile, statcmd, var.
    (collection DTable exported to file
    and there weren't any table, but it was written:
    Your layout specification does not uniquely match any items. One or more of the following dimensions might help uniquely match items: colname, LP_quantile, statcmd, var.
    Could someone help me with this issue and get the desired table, please?

    Thanks so much!

  • #2
    Your most critical error was to not enclose quant in single quotes, to allow for macro substitution. This should work:

    Code:
    local vars_sumstat jcr jdr
    local varlist `vars_sumstat'
    local quant LP_quantile
    
    collect clear
    
    dtable , ///
        sample( , stat(freq)) ///
        cont(`varlist', stat(mean min max p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
        by(`quant')
    
    foreach n of numlist 1 5 10 25 50 75 90 95 99 {
        collect remap result = percentile, fortags(result[p`n'])
        collect label levels percentile p`n' "`n'%", modify
    }
    
    collect label levels result frequency "N" mean "Mean" min "Min" max "Max", replace
    collect label dim percentile "Percentiles", modify
    collect style header percentile, title(label)
    collect style header result, title(hide) level(label)
    collect style cell result[mean], nformat(%3.2f)
    collect style cell result[min max]  percentile, nformat(%3.1f)
    collect style cell percentile#cell_type[column-header], border(bottom, color(black))
    collect style header var, title(hide) level(hide)
    
    foreach v of varlist `varlist' {
        collect addtags var[`v'], fortags(result[frequency]) replace
        collect title "Distribution of `v' by quantile group", clear
        collect layout (var[`v']#`quant') (result[frequency mean min max] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
        collect export "sum_stat.xlsx", sheet("`v'", replace) modify
    }
    from which you get
    Code:
    Distribution of jcr by quantile group
    ---------------------------------------------------------------
                 N Mean Min Max             Percentiles            
                                -----------------------------------
                                 1%  5% 10% 25% 50% 75% 90% 95% 99%
    ---------------------------------------------------------------
    LP_quantile                                                    
      0-10       3 0.23 0.1 0.4 0.1 0.1 0.1 0.1 0.2 0.4 0.4 0.4 0.4
      10-40      2 0.15 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
      40-60      2 0.15 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
      60-90      2 0.20 0.1 0.3 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.3 0.3
      90-100     2 0.25 0.1 0.4 0.1 0.1 0.1 0.1 0.3 0.4 0.4 0.4 0.4
      Total     11 0.20 0.1 0.4 0.1 0.1 0.1 0.1 0.2 0.3 0.4 0.4 0.4
    ---------------------------------------------------------------
    and
    Code:
    Distribution of jdr by quantile group
    ---------------------------------------------------------------
                 N Mean Min Max             Percentiles            
                                -----------------------------------
                                 1%  5% 10% 25% 50% 75% 90% 95% 99%
    ---------------------------------------------------------------
    LP_quantile                                                    
      0-10       3 0.30 0.2 0.5 0.2 0.2 0.2 0.2 0.2 0.5 0.5 0.5 0.5
      10-40      2 0.15 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
      40-60      2 0.20 0.1 0.3 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.3 0.3
      60-90      2 0.20 0.1 0.3 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.3 0.3
      90-100     2 0.25 0.1 0.4 0.1 0.1 0.1 0.1 0.3 0.4 0.4 0.4 0.4
      Total     11 0.23 0.1 0.5 0.1 0.1 0.1 0.1 0.2 0.3 0.4 0.5 0.5
    ---------------------------------------------------------------
    Also, Anne-Claire Jo you've posted several times on this forum by now, so I know you know this already -- please post your data extract using the dataex command, not by copy-pasting a table.
    Last edited by Hemanshu Kumar; 13 May 2025, 00:30.

    Comment


    • #3
      Hemanshu Kumar Thanks a lot for your help.
      And apologies for the data, there was confidentiality issue so I just had to create a sample table to describe the dataset.

      I tried with the code you provided, however, it is giving me this error:
      Code:
      . collect clear
      
      . 
      . dtable , ///
      >     sample( , stat(freq)) ///
      >     cont(`varlist', stat(mean min max p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
      >     by(`quant')
      no observations
      I ensure that the dataset is quite heavy and it's weird that there's no observation..

      Comment


      • #4
        That's strange. I'm not sure what is going on. The code works fine on the data extract you provided. Remember to not run the code in parts, but at one go, with your dataset in memory.

        Comment


        • #5
          Once again, reporting that the same code gives different results doesn't help us much without a data example to show what is going on at your end.

          As the FAQ Advice underlines, confidentiality as a constraint just implies that your data example should be based on realistic but fake data, where realistic means only: use variables and values and data layout that reproduce the problem.

          My only guess is that your local macros are not visible to the command that fails. You could check that by putting

          Code:
          set trace on 
          just before the command to see if the local macros are indeed visible. More on this pitfall at https://journals.sagepub.com/doi/pdf...36867X20931028

          That is, we need to see the end of the trace just before the error message.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            My only guess is that your local macros are not visible to the command that fails.
            That was my first thought too. But I tested that on the data extract, and it still did not produce that error.

            Code:
            . clear
            
            . input str1 country      int year        byte ind        str6 LP_quantile        float(jcr       jdr)
            
                   country      year       ind  LP_quan~e        jcr        jdr
              1. A       2016    3       "0-10"  0.1     0.2
              2. A       2016    3       "10-40" 0.1     0.1
              3. A       2016    3       "40-60" 0.1     0.1
              4. A       2016    3       "60-90" 0.1     0.1
              5. A       2016    3       "90-100"        0.1     0.1
              6. A       2016    6       "0-10"  0.2     0.2
              7. A       2016    6       "10-40" 0.2     0.2
              8. A       2016    6       "40-60" 0.2     0.3
              9. A       2016    6       "60-90" 0.3     0.3
             10. A       2016    6       "90-100"        0.4     0.4
             11. A       2017    3       "0-10"  0.4     0.5
             12. end
            
            . 
            . /*
            > local vars_sumstat jcr jdr
            > local varlist `vars_sumstat'
            > local quant LP_quantile
            > */
            . collect clear
            
            . 
            . dtable , ///
            >     sample( , stat(freq)) ///
            >     cont(`varlist', stat(mean min max p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
            >     by(`quant')
            
            ---------
              Summary
            ---------
            N      11
            ---------

            Comment


            • #7
              Hemanshu Kumar Sorry for the confusion, I made an error in the previous commands (that are irrelvant here) and that made the observations dropped.
              But it's working super well! Thanks a lot for your help

              Comment

              Working...
              X