Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting elements in a collection?

    Hi -- I have been learning all about collections and tables. I see how one can list out the dimensions of a collection and the levels of a dimension. But, when trying to figure out what is what in a collection, like debugging a routine, it would be really useful to be able to *count up the elements* that have a certain combination of tags (where a tag is -dim[level]- or -dim1[level1]#dim2[level2]-, etc.). Does anyone know how to do this? -- P

  • #2
    Paul:
    I would start off with taking a look at -group- function available from -egen-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Right, but that just helps with the data set, not with what is in the -collection-. Let me ask something else. is there a way to export a collection to Stata database for inspection?

      Comment


      • #4
        There is the ability to collect save a collection to a JSON file, which you can inspect using any plain text editor.

        Comment


        • #5
          I will try that! Thx.

          Comment


          • #6
            Also, see https://jsonformatter.org/json-parser, and Stata -help Tables Builder- One may parse the JSON file and import back into Stata to use Stata commands for searching, listing, etc. A first draft using Java ruturning a CSV for Stata follows below. (and a "Collection browser" web-app could be made using Python Dash.)
            Code:
            * code will create directory and install java code
            capt mkdir tempdir
            cd tempdir
            
            clear all
            use https://www.stata-press.com/data/r18/nhanes2l  
            quietly: collect: regress bpsystol weight i.diabetes i.sex
            quietly: collect: regress bpsystol weight diabetes##sex
            collect addtags extra[Rsquared], fortags(result[r2])
            collect addtags extra[N], fortags(result[N])
            collect recode result r2=_r_b N=_r_b
            collect layout (colname extra) (cmdset#result[_r_b _r_se])
            collect style use table-reg2-fv1
            collect style showbase off
            collect label levels extra Rsquared "R-squared"
            collect style cell result[_r_b], halign(right) nformat(%6.2f)
            collect style cell result[_r_se], halign(left) nformat(%6.2f) sformat("(%s)")
            collect style cell extra[Rsquared]#result, halign(right) nformat(%6.4f)
            collect style cell extra[N]#result, halign(right) nformat(%4.0f)
            collect style cell colname[_cons], border(bottom, pattern(single))
            
            tablesbuilder // nice tool
            
            * save JSON representation of collection
            collect save "mytable.json", replace
            
            *****************************************************
            * parse json return flattend csv for import to Stata
            *****************************************************
            
            *NB: will download jar files.
            
            local urlbase https://repo1.maven.org/maven2/com/fasterxml/jackson/core/
            
            foreach pkg in core annotations databind {
            
                capt noi confirm file jackson-`pkg'-2.19.1.jar 
            
                if ( _rc==601 ) {
                    
                    copy `urlbase'jackson-`pkg'/2.19.1/jackson-`pkg'-2.19.1.jar . 
                }
            }
            
            dir *.jar 
            
            *****************************************************
            * ad-hoc using Stata java shell
            *****************************************************
            
            java
            
            /cp 'jackson-databind-2.19.1.jar'
            /cp 'jackson-core-2.19.1.jar'
            /cp 'jackson-annotations-2.19.1.jar'
            
            /* will raise warning "this feature is not allowed with Java/JShell integration */
            
            import com.fasterxml.jackson.databind.ObjectMapper;
            import com.fasterxml.jackson.databind.JsonNode;
            import java.io.BufferedWriter;
            import java.io.File;
            import java.io.FileOutputStream;
            import java.io.OutputStreamWriter;
            import java.nio.charset.StandardCharsets;
            import java.util.ArrayList;
            import java.util.Iterator;
            import java.util.List;
            
            class JsonFlattenerCSV {
            
                public static class Row {
                    List<String> path;
                    String value;
                    Row(List<String> path, String value) {
                        this.path = path;
                        this.value = value;
                    }
                }
            
                public void flatten(JsonNode node, List<String> path, List<Row> rows) {
                    if (node.isObject()) {
                        Iterator<String> fields = node.fieldNames();
                        while (fields.hasNext()) {
                            String f = fields.next();
                            List<String> newPath = new ArrayList<>(path);
                            newPath.add(f);
                            flatten(node.get(f), newPath, rows);
                        }
                    } else if (node.isArray()) {
                        int i = 0;
                        for (JsonNode item : node) {
                            List<String> newPath = new ArrayList<>(path);
                            newPath.add(String.valueOf(i));
                            flatten(item, newPath, rows);
                            i++;
                        }
                    } else {
                        rows.add(new Row(new ArrayList<>(path), node.asText()));
                    }
                }
            
                public void exportToCSV(List<Row> rows, File file) throws Exception {
                    int maxDepth = 0;
                    for (Row r : rows) {
                        if (r.path.size() > maxDepth) maxDepth = r.path.size();
                    }
                    try (BufferedWriter w = new BufferedWriter(
                            new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8))) {
                        for (int i = 0; i < maxDepth; i++) w.write("c" + i + ";");
                        w.write("value\n");
                        for (Row r : rows) {
                            for (int i = 0; i < maxDepth; i++) {
                                if (i < r.path.size()) w.write(r.path.get(i) + ";");
                                else w.write(";");
                            }
                            w.write(r.value + "\n");
                        }
                    }
                }
            
                public void run(String inputFile, String outputFile) throws Exception {
                    ObjectMapper mapper = new ObjectMapper();
                    JsonNode root = mapper.readTree(new File(inputFile));
                    List<Row> rows = new ArrayList<>();
                    flatten(root, new ArrayList<>(), rows);
                    exportToCSV(rows, new File(outputFile));
                }
            }
            
            JsonFlattenerCSV jf = new JsonFlattenerCSV();
            
            jf.run("mytable.json", "output.csv");
            
            end
            
            *****************************************************
            * The JSON is recursively flattened to rows, each row contains path elements + leaf value.
            * The max path depth determines how many c0, c1, ... columns you get.
            
            import delimited using  output.csv , ///
                varnames(1) /// The CSV header reflects the maximum depth with c0, c1, ... columns and a value column.
                favorstrfixed clear
            
            * BE AWARE of duplicate information in data structure 
            
            levelsof c0, clean  
            qui foreach el in `r(levels)' {
                qui duplicates report c1 if c0 == "`el'"
                noi di "number of `el':"  `r(unique_value)'
            }
            
            *Example: for Items: reshape to on row per Item (c1)   
            frame put c0 c1 c2 value if c0 =="Items", into(items)
            frame change items
            qui {
                gen sorder=_n
                bysort c1 (sorder): gen i = _n
                drop sorder
                rename (c0 c1 c2) (object item key)
                reshape wide key value, i(object item) j(i) 
                compress
            }
            
            isid item  
            list in 1/5 
            list in -5/-1 
            tab key2    
            
            exit
            Code:
            . * BE AWARE of duplicate information in this data structure 
            . 
            . levelsof c0, clean  
            Items Labels Layout MacroResults MatrixResults ScalarResults StripeFactorDimNames Styles Tags ci-level collection-name
            
            . qui foreach el in `r(levels)' {
            number of Items:151
            number of Labels:1
            number of Layout:3
            number of MacroResults:10
            number of MatrixResults:1
            number of ScalarResults:12
            number of StripeFactorDimNames:2
            number of Styles:4
            number of Tags:13
            number of ci-level:1
            number of collection-name:1
            
            . 
            . *Example: for Items: reshape to on row per Item (c1)   
            . frame put c0 c1 c2 value if c0 =="Items", into(items)
            
            . frame change items
            
            . qui {
            
            . 
            . isid item  
            
            . list in 1/5 
            
                 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                 | object                                                                                                                                         item   key1               value1        key2   value2        key3   value3 |
                 |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
              1. |  Items    cmdset[1]#coleq[bpsystol]#colname[0.diabetes]#colname_remainder[_cons]#diabetes[0]#program_class[eclass]#result[_r_b]#result_type[matrix]      d                  0.0   omit-type     base   term-type   factor |
              2. |  Items   cmdset[1]#coleq[bpsystol]#colname[0.diabetes]#colname_remainder[_cons]#diabetes[0]#program_class[eclass]#result[_r_df]#result_type[matrix]      s                                                                |
              3. |  Items   cmdset[1]#coleq[bpsystol]#colname[0.diabetes]#colname_remainder[_cons]#diabetes[0]#program_class[eclass]#result[_r_se]#result_type[matrix]      d                  0.0   omit-type     base   term-type   factor |
              4. |  Items    cmdset[1]#coleq[bpsystol]#colname[1.diabetes]#colname_remainder[_cons]#diabetes[1]#program_class[eclass]#result[_r_b]#result_type[matrix]      d   14.341150538351911                                           |
              5. |  Items   cmdset[1]#coleq[bpsystol]#colname[1.diabetes]#colname_remainder[_cons]#diabetes[1]#program_class[eclass]#result[_r_df]#result_type[matrix]      d              10345.0                                           |
                 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
            
            . list in -5/-1 
            
                 +---------------------------------------------------------------------------------------------------------------------------------------+
                 | object                                                               item   key1               value1   key2   value2   key3   value3 |
                 |---------------------------------------------------------------------------------------------------------------------------------------|
            147. |  Items   cmdset[2]#program_class[eclass]#result[rank]#result_type[scalar]      d                  5.0                                 |
            148. |  Items   cmdset[2]#program_class[eclass]#result[rmse]#result_type[scalar]      d   22.139057122174968                                 |
            149. |  Items    cmdset[2]#program_class[eclass]#result[rss]#result_type[scalar]      d    5069985.923078333                                 |
            150. |  Items   cmdset[2]#program_class[eclass]#result[title]#result_type[macro]      s    Linear regression                                 |
            151. |  Items     cmdset[2]#program_class[eclass]#result[vce]#result_type[macro]      s                  ols                                 |
                 +---------------------------------------------------------------------------------------------------------------------------------------+
            
            . tab key2    
            
                  2 key |      Freq.     Percent        Cum.
            ------------+-----------------------------------
              omit-type |         14      100.00      100.00
            ------------+-----------------------------------
                  Total |         14      100.00
            Click image for larger version

Name:	JSONtree.png
Views:	1
Size:	43.6 KB
ID:	1779560




            Comment

            Working...
            X