counting elements in a collection?

Paul Rathouz

Join Date: Oct 2023

Posts: 35
#1

counting elements in a collection?

03 Jul 2025, 07:02

Hi -- I have been learning all about collections and tables. I see how one can list out the dimensions of a collection and the levels of a dimension. But, when trying to figure out what is what in a collection, like debugging a routine, it would be really useful to be able to *count up the elements* that have a certain combination of tags (where a tag is -dim[level]- or -dim1[level1]#dim2[level2]-, etc.). Does anyone know how to do this? -- P
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

04 Jul 2025, 00:33

Paul:
I would start off with taking a look at -group- function available from -egen-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Paul Rathouz

Join Date: Oct 2023

Posts: 35
#3

04 Jul 2025, 06:36

Right, but that just helps with the data set, not with what is in the -collection-. Let me ask something else. is there a way to export a collection to Stata database for inspection?
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1401
#4

05 Jul 2025, 01:23

There is the ability to collect save a collection to a JSON file, which you can inspect using any plain text editor.
1 like
Comment
Paul Rathouz

Join Date: Oct 2023

Posts: 35
#5

05 Jul 2025, 05:56

I will try that! Thx.
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 784

Yesterday, 14:23

Also, see https://jsonformatter.org/json-parser, and Stata -help Tables Builder- One may parse the JSON file and import back into Stata to use Stata commands for searching, listing, etc. A first draft using Java ruturning a CSV for Stata follows below. (and a "Collection browser" web-app could be made using Python Dash.)

Code:

* code will create directory and install java code
capt mkdir tempdir
cd tempdir

clear all
use https://www.stata-press.com/data/r18/nhanes2l  
quietly: collect: regress bpsystol weight i.diabetes i.sex
quietly: collect: regress bpsystol weight diabetes##sex
collect addtags extra[Rsquared], fortags(result[r2])
collect addtags extra[N], fortags(result[N])
collect recode result r2=_r_b N=_r_b
collect layout (colname extra) (cmdset#result[_r_b _r_se])
collect style use table-reg2-fv1
collect style showbase off
collect label levels extra Rsquared "R-squared"
collect style cell result[_r_b], halign(right) nformat(%6.2f)
collect style cell result[_r_se], halign(left) nformat(%6.2f) sformat("(%s)")
collect style cell extra[Rsquared]#result, halign(right) nformat(%6.4f)
collect style cell extra[N]#result, halign(right) nformat(%4.0f)
collect style cell colname[_cons], border(bottom, pattern(single))

tablesbuilder // nice tool

* save JSON representation of collection
collect save "mytable.json", replace

*****************************************************
* parse json return flattend csv for import to Stata
*****************************************************

*NB: will download jar files.

local urlbase https://repo1.maven.org/maven2/com/fasterxml/jackson/core/

foreach pkg in core annotations databind {

    capt noi confirm file jackson-`pkg'-2.19.1.jar 

    if ( _rc==601 ) {
        
        copy `urlbase'jackson-`pkg'/2.19.1/jackson-`pkg'-2.19.1.jar . 
    }
}

dir *.jar 

*****************************************************
* ad-hoc using Stata java shell
*****************************************************

java

/cp 'jackson-databind-2.19.1.jar'
/cp 'jackson-core-2.19.1.jar'
/cp 'jackson-annotations-2.19.1.jar'

/* will raise warning "this feature is not allowed with Java/JShell integration */

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.JsonNode;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

class JsonFlattenerCSV {

    public static class Row {
        List<String> path;
        String value;
        Row(List<String> path, String value) {
            this.path = path;
            this.value = value;
        }
    }

    public void flatten(JsonNode node, List<String> path, List<Row> rows) {
        if (node.isObject()) {
            Iterator<String> fields = node.fieldNames();
            while (fields.hasNext()) {
                String f = fields.next();
                List<String> newPath = new ArrayList<>(path);
                newPath.add(f);
                flatten(node.get(f), newPath, rows);
            }
        } else if (node.isArray()) {
            int i = 0;
            for (JsonNode item : node) {
                List<String> newPath = new ArrayList<>(path);
                newPath.add(String.valueOf(i));
                flatten(item, newPath, rows);
                i++;
            }
        } else {
            rows.add(new Row(new ArrayList<>(path), node.asText()));
        }
    }

    public void exportToCSV(List<Row> rows, File file) throws Exception {
        int maxDepth = 0;
        for (Row r : rows) {
            if (r.path.size() > maxDepth) maxDepth = r.path.size();
        }
        try (BufferedWriter w = new BufferedWriter(
                new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8))) {
            for (int i = 0; i < maxDepth; i++) w.write("c" + i + ";");
            w.write("value\n");
            for (Row r : rows) {
                for (int i = 0; i < maxDepth; i++) {
                    if (i < r.path.size()) w.write(r.path.get(i) + ";");
                    else w.write(";");
                }
                w.write(r.value + "\n");
            }
        }
    }

    public void run(String inputFile, String outputFile) throws Exception {
        ObjectMapper mapper = new ObjectMapper();
        JsonNode root = mapper.readTree(new File(inputFile));
        List<Row> rows = new ArrayList<>();
        flatten(root, new ArrayList<>(), rows);
        exportToCSV(rows, new File(outputFile));
    }
}

JsonFlattenerCSV jf = new JsonFlattenerCSV();

jf.run("mytable.json", "output.csv");

end

*****************************************************
* The JSON is recursively flattened to rows, each row contains path elements + leaf value.
* The max path depth determines how many c0, c1, ... columns you get.

import delimited using  output.csv , ///
    varnames(1) /// The CSV header reflects the maximum depth with c0, c1, ... columns and a value column.
    favorstrfixed clear

* BE AWARE of duplicate information in data structure 

levelsof c0, clean  
qui foreach el in `r(levels)' {
    qui duplicates report c1 if c0 == "`el'"
    noi di "number of `el':"  `r(unique_value)'
}

*Example: for Items: reshape to on row per Item (c1)   
frame put c0 c1 c2 value if c0 =="Items", into(items)
frame change items
qui {
    gen sorder=_n
    bysort c1 (sorder): gen i = _n
    drop sorder
    rename (c0 c1 c2) (object item key)
    reshape wide key value, i(object item) j(i) 
    compress
}

isid item  
list in 1/5 
list in -5/-1 
tab key2    

exit

Code:

. * BE AWARE of duplicate information in this data structure 
. 
. levelsof c0, clean  
Items Labels Layout MacroResults MatrixResults ScalarResults StripeFactorDimNames Styles Tags ci-level collection-name

. qui foreach el in `r(levels)' {
number of Items:151
number of Labels:1
number of Layout:3
number of MacroResults:10
number of MatrixResults:1
number of ScalarResults:12
number of StripeFactorDimNames:2
number of Styles:4
number of Tags:13
number of ci-level:1
number of collection-name:1

. 
. *Example: for Items: reshape to on row per Item (c1)   
. frame put c0 c1 c2 value if c0 =="Items", into(items)

. frame change items

. qui {

. 
. isid item  

. list in 1/5 

     +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
     | object                                                                                                                                         item   key1               value1        key2   value2        key3   value3 |
     |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  1. |  Items    cmdset[1]#coleq[bpsystol]#colname[0.diabetes]#colname_remainder[_cons]#diabetes[0]#program_class[eclass]#result[_r_b]#result_type[matrix]      d                  0.0   omit-type     base   term-type   factor |
  2. |  Items   cmdset[1]#coleq[bpsystol]#colname[0.diabetes]#colname_remainder[_cons]#diabetes[0]#program_class[eclass]#result[_r_df]#result_type[matrix]      s                                                                |
  3. |  Items   cmdset[1]#coleq[bpsystol]#colname[0.diabetes]#colname_remainder[_cons]#diabetes[0]#program_class[eclass]#result[_r_se]#result_type[matrix]      d                  0.0   omit-type     base   term-type   factor |
  4. |  Items    cmdset[1]#coleq[bpsystol]#colname[1.diabetes]#colname_remainder[_cons]#diabetes[1]#program_class[eclass]#result[_r_b]#result_type[matrix]      d   14.341150538351911                                           |
  5. |  Items   cmdset[1]#coleq[bpsystol]#colname[1.diabetes]#colname_remainder[_cons]#diabetes[1]#program_class[eclass]#result[_r_df]#result_type[matrix]      d              10345.0                                           |
     +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

. list in -5/-1 

     +---------------------------------------------------------------------------------------------------------------------------------------+
     | object                                                               item   key1               value1   key2   value2   key3   value3 |
     |---------------------------------------------------------------------------------------------------------------------------------------|
147. |  Items   cmdset[2]#program_class[eclass]#result[rank]#result_type[scalar]      d                  5.0                                 |
148. |  Items   cmdset[2]#program_class[eclass]#result[rmse]#result_type[scalar]      d   22.139057122174968                                 |
149. |  Items    cmdset[2]#program_class[eclass]#result[rss]#result_type[scalar]      d    5069985.923078333                                 |
150. |  Items   cmdset[2]#program_class[eclass]#result[title]#result_type[macro]      s    Linear regression                                 |
151. |  Items     cmdset[2]#program_class[eclass]#result[vce]#result_type[macro]      s                  ols                                 |
     +---------------------------------------------------------------------------------------------------------------------------------------+

. tab key2    

      2 key |      Freq.     Percent        Cum.
------------+-----------------------------------
  omit-type |         14      100.00      100.00
------------+-----------------------------------
      Total |         14      100.00

Announcement

counting elements in a collection?

Comment

Comment

Comment

Comment

Comment