Announcement

Collapse
No announcement yet.
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ahmed Abdalla, re#96, one of the helpful packages for running many models across 1000s of 'omics features is the parallel package. For example, running mixed models across 5000+ proteins we can use code like below. While StataMP is a multiprocessor version of Stata, I think parallel capabilities making manual use of our cores when needed could be part of official Stata as well.

    Code:
    program define parfor
        levelsof gene, local(mygenes)
        global mygenes = r(levels) // a global macro is required
        foreach i of global mygenes {
            // mixed model here for protein differential expression
            xsvmat, from(r(table)') rowname(parm) names(col) idstr("`i'") /// 
                saving(.\parallel\gene_`i', replace)
        }
    end
    
    parallel initialize 8 // request 8 of 14 available cores
    parallel, prog(parfor) by(gene): parfor // by(gene) stops splitting a gene between cores
    
    // code to combine results files here

    Comment


    • I very much like navigator and bookmarks. Would it be possible to create two types of bookmarks - like for "chapters" and "subchapters" in the code? It´s clear for me, I can use capitals etc., but maybe this can be done as well already on the program side.

      Comment


      • Originally posted by Matej Seifert View Post
        I very much like navigator and bookmarks. Would it be possible to create two types of bookmarks - like for "chapters" and "subchapters" in the code? It´s clear for me, I can use capitals etc., but maybe this can be done as well already on the program side.
        You can create a hierarchy of bookmarks by using multiple # characters

        **# top level

        **## second level

        **### third level

        I'm not sure how many levels are available, I've only gone up to 3 or maybe 4

        Comment


        • Originally posted by Bert Lloyd View Post

          You can create a hierarchy of bookmarks by using multiple # characters

          **# top level

          **## second level

          **### third level

          I'm not sure how many levels are available, I've only gone up to 3 or maybe 4
          Thank you, Bert. I was unaware of this.

          Comment


          • Allow users to split labels on markers (using twoway) into multiple lines either by word or by letter count. Additionally allow users to justify the split by right, center, or left justified.

            Comment


            • Allow users to import JSON files into Stata. I see we have *.parquet now but would be nice to include JSON too.

              Comment


              • Re: #111: JSON allows arbitrary nesting of key-value pairs, with values that can be scalars, arrays, or objects. There's no single, standard way to represent a rectangular (observations x variables) dataset in JSON, which makes it impossible to write a general, structure-agnostic importer. That said, I'd love to see a JSON parser/ writer in Mata.

                Comment


                • Originally posted by daniel klein View Post
                  Re: #111: JSON allows arbitrary nesting of key-value pairs, with values that can be scalars, arrays, or objects. There's no single, standard way to represent a rectangular (observations x variables) dataset in JSON, which makes it impossible to write a general, structure-agnostic importer. That said, I'd love to see a JSON parser/ writer in Mata.
                  A JSON writer shouldn't be too tricky to write for a simple rectangular dataset. What did you have in mind?

                  Comment


                  • It would be easy to add flogit for fractional responses to telasso, and would make it comparable to teffects. The estimation is identical to logit but where in the objective function the y can be any value in [0,1] rather than just zero or one. Robust inference is already being used and so this is literally removing the data check that y must be binary in the logit version.

                    Comment


                    • Would you allow to see filtering history for data editor window - to allow for easy navigation between more created filters?

                      Comment


                      • I would like to see some changes in how -collapse- handles value labels. Specifically, when we aggregate with (max), (min), (firstnm), (lastnm), (first), or (last), the values that appear in the collapsed variable are necessarily values that occurred in the original variable, or, at worst, are missing. Consequently it would be helpful if the value label of the original variable were carried over to the collapsed data set and applied to the variable.

                        Also, there is a defined sort order for string variables, so I don't understand why (max) and (min) are not allowed with them.

                        Comment


                        • I feel I've asked this before, but I could not locate it in the search. How come there is no option "vce(none)" for essentially any Stata command? When running a simulation to evaluate bias and efficiency, one does not need to, or want to, compute the standard errors. For some commands and scenarios the extra computation is trivial but I think for some it's enough to noticeably slow the simulations. If I'm wrong, I'd be glad to know that, too. I'm thinking specifically of teffects.

                          Comment


                          • One workflow I often struggle with as a Stata user is extracting structured data from text. In medical research, we frequently receive patient information in the form of PDF medical charts. To process these data, I usually rely on colleagues who use Python to extract the text and convert it into variables, which I then import into Stata and further process with code to create structured variables.

                            With the increasing availability of large language models (LLMs), it would be extremely useful if Stata could support this workflow more directly. For example, Stata could allow users to extract text from PDFs into variables and then use integrated LLM-based tools to identify and extract relevant information (e.g., diagnoses, dates, laboratory values) and convert them into structured numerical or categorical variables.

                            Such functionality could significantly streamline the process of transforming unstructured clinical text into analysis-ready datasets, while keeping the entire workflow within Stata and maintaining reproducibility.

                            Comment


                            • #118 Ahmad Abbadi Not speaking for StataCorp, but speaking for how as a user I want StataCorp to work!

                              I sense your need here -- which I guess is shared by others. But I think we all benefit long-term from extreme caution by StataCorp on what is supported. The company tends to be reluctant to adopt or to depend on third-party software because

                              1. StataCorp doesn't want to adopt functionality that it can't control.

                              2. StataCorp doesn't want to document functionality that may become rapidly out-of-date, not to say obsolete.

                              I have no special expertise or experience on LLMs, but just one wild guess, that the landscape is fast changing and that even major players may disappear with minimal notice.

                              In any case, which functionality do you have in mind?

                              It's possibly not a sensitive issue any more, but I can recall some years of requests that StataCorp give special support to external text editors, with two extremes to the request:

                              A. Support my own favourite (naturally).

                              B. Support all common text editors.

                              By and large, StataCorp's solution was just to work intermittently on its own text editor (I mean the do-file editor).

                              Comment


                              • Re: #117 Jeff Wooldridge Adding a vce(none) option would also be useful for non-parametric power analyses. This was mentioned in the Stata 18 wishlist.
                                Associate Professor of Finance and Economics
                                University of Illinois
                                www.julianreif.com

                                Comment

                                Working...
                                X