Wishlist for Stata 16

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#61

01 Dec 2017, 16:06

Well, but I don't get it. A do-file is running automatically. At some point you get it to pop some result into the clipboard. Now you have some other program running, say in Python or C++, and it, too, is running automatically. So either you've got to program in a delay that makes that other program sleep until the clipboard has something to offer it, or you're going to have a mess. Also, if you have several programs in operation that also use the clipboard (I sometimes, for example, do word processing while I am waiting for a long Stata program to finish running), you need to program some way for the receiving program to figure out which clipboard it's supposed to capture data from. I don't know enough Python or C++ to say for sure these things can't be done--probably they can with some relatively obscure functions--but I certainly don't know how to do it.

So the synchronization of these programs sounds like a nightmare to me, and my instinct would be to pass the result between programs using, say, a text file as a go-between: it's very easy for the receiving program to test for the existence of the anticipated file and, if necessary, wait for it to appear. The file-open functions themselves would provide the needed information.
Comment
Belinda Foster

Join Date: Jul 2016

Posts: 132
#62

01 Dec 2017, 16:57

Clyde Schechter it can be a bit restrictive yes, but fairly easy to implement if you have some experience in other languages. In the simplest case you just run the scripts sequentially, either in a standalone computer or in a virtual machine. Using text files is not always the best option. It depends on what you want to do i guess. I think, more generally, Stata needs to interface better with the operating system for more elaborate programs.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#63

14 Dec 2017, 08:16

It would be great if the hard-coded limit of 10,000 results for the Mata function dir() was lifted (see this post that mentions that unicode translate is affected by the limit). The issue also affects filelist (SSC); see this recent thread about a user with 2,000,000 text files split into 20 (annual) directories.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#64

23 Dec 2017, 17:43

Originally posted by Weiwen Ng View Post

...Then, automating the bootstrapped LR test described in the paper linked would be very nice. It appears that the bootstrapped LR test is the current gold standard for model selection.
...

I see that I failed to supply the paper where the boostrapped LR test for latent class models was discussed. This paper discusses the issue, plus it describes some stopping rules that Mplus uses (if you keep getting -2LL differences in favor of the k-class model vs the k-1 class model, you can presumably stop short of doing 1000 bootstraps to prove your point). However, as it turns out, the bootstrap LR test is simple to perform. You would simply bootstrap latent class/profile models for k classes and k-1 classes, then calculate the value of -2 * the difference in log likelihoods. In fact, you could do this in Stata 15, but for the fact that the program prohibits the bootstrap prefix from accepting GSEM with categorical latent variables.

On one hand, this is for good reason. There are numerous convergence issues, and the order in which the latent classes are identified will not be the same from model to model. This rationale makes sense for why we don't get to use bootstrap to estimate parameter standard errors here.

On the other hand, it really does seem like the BLRT is a very well-accepted test for the correct number of classes, to the point where people in the know might question why a paper produced by a Stata user didn't include BLRT statistics. I feel like the BLRT probably provides information that is complementary to the various information criteria. And with this test, you don't care what order the classes get assigned, all you want is the empirical distribution of -2LL. And you can feed the boostrapped datasets the parameters from your validated model (perhaps with some random noise added), and/or each observation's predicted class probabilities from the k- and k-1 class models to ensure convergence.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#65

03 Jan 2018, 07:37

I suggest adding an option to margins to make it compute the prediction interval for an individual Y at a given value of X. Here are a couple of relevant threads.
https://www.statalist.org/forums/for...te-and-predict

https://www.statalist.org/forums/for...-command/page2

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#66

16 Jan 2018, 09:44

Inclusion of standard errors in the coeflegend option to regress. Currently it only displays betas
1 like
Comment
Belinda Foster

Join Date: Jul 2016

Posts: 132
#67

01 Feb 2018, 23:49

A problem that i face more and more is managing large chunks of code in the do file editor. It can literally be a nightmare navigating through different parts of a program which consists of hundreds lines of code, or a do file containing multiple programs.

I suggested in a previous post of mine implementing permanent bookmarks but i am now convinced that some sort of section folding solution similar to that implemented in RStudio is required. Not sure if this has already been suggested.
1 like
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#68

02 Feb 2018, 12:07

Originally posted by Belinda Foster View Post

A problem that i face more and more is managing large chunks of code in the do file editor. It can literally be a nightmare navigating through different parts of a program which consists of hundreds lines of code, or a do file containing multiple programs.

I suggested in a previous post of mine implementing permanent bookmarks but i am now convinced that some sort of section folding solution similar to that implemented in RStudio is required. Not sure if this has already been suggested.

FWIW, I address this in two ways:
I edit my code on an external editor (Sublime Text, but probably Atom, Visual Studio Code, or any free editor would work). This gives me a lot of things out of the table, including permanent bookmarks. And to be honest, I don't expect Stata to be able to match all the features of standalone code editors.

I split my code in smallish chunks. Huge files are unwieldy and play poorly with git and version control. Instead, I have a master.do file that calls a bunch of smaller do-files
1 like
Comment
Belinda Foster

Join Date: Jul 2016

Posts: 132
#69

03 Feb 2018, 07:35

Sergio Correia I do not expect Stata to match all the features of an external editor but suggesting that it is not worthwhile for StataCorp to continue improving the built-in editor does not sound right. I have tried (1) and in fact i use VSC for programming in other languages, but for Stata it is not worth the hassle if i cannot directly run the code and see the results. It is simply not practical. I agree with (2) but this only works for simple do files and not more complex programs in ado files.
1 like
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#70

05 Feb 2018, 01:56

Originally posted by Belinda Foster View Post

Sergio Correia I do not expect Stata to match all the features of an external editor but suggesting that it is not worthwhile for StataCorp to continue improving the built-in editor does not sound right. I have tried (1) and in fact i use VSC for programming in other languages, but for Stata it is not worth the hassle if i cannot directly run the code and see the results. It is simply not practical. I agree with (2) but this only works for simple do files and not more complex programs in ado files.

At the risk of derailing this thread, one solution I have found is to place my code in if-blocks. Say I have a dofile that does a lot of side analyses, then I'll put each one in a if-block - at the top of the dofile I have a list of "parameters" determining which section to execute. This has two advantages - I can easily set which part of the dofile is executed (I don't like selecting code) and I can close any sections I'm not interested in.

Code:

** Parameters global prepData "0" global combineData "1" [...] ** Prep data if "$prepData" == "1" { [..] } ** Combine data if "$combineData" == "1" { [..] }
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#71

05 Feb 2018, 09:12

Originally posted by Belinda Foster View Post

for Stata it is not worth the hassle if i cannot directly run the code and see the results...

I think you can actually do it, through Stata automation. Are you on Windows or OSX? (I know it can be done on those, not sure about Linux)
Comment
Mauricio Caceres

Join Date: Sep 2015

Posts: 130
#72

05 Feb 2018, 12:03

Sergio Correia You can do it in Linux by piping commands through a console session, but unlike Windows or OSX you can't do it through the GUI.

Speaking of which, I would actually really appreciate the ability to pipe commands to the Stata GUI on Linux.
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#73

05 Feb 2018, 15:20

I'd like a data type option for fvrevar and tsrevar. A data type option for fvrevar in particular might sound weird - why would you want to store binary factor variable tempvars as doubles? - but there's a reason.

I keep running into situations where I create factor variable tempvars that have to be transformed along with the rest of the model - e.g., the within transformation in a fixed effects model - and I want to use double precision for the transformed variables. recast is slow, and other ways of dealing with it are a hassle.

Much better would be if the fvrevar/tsrevar temporary variables are created as doubles if this is specified as an option.
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#74

04 Mar 2018, 12:46

Some kind of timer for how long estimations are going to take. I know this is massively dependent on what else the user will do on their computer and the amount of RAM they have, but it would be useful to get a rough estimate / a progress bar that doesn't display an amount of time but just shows whether the estimation is 10% complete or 80% complete, for example.

Whenever I do twoway lowess graphs I have the issue of not knowing how far along they are.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#75

04 Mar 2018, 13:44

Well, I can envision how one might implement a progress bar for lowess graphs. But I cannot imagine how one would do this for estimation commands. It would be great if one could, but likelihood functions can be very irregular and bizarre, and I don't think there is any way to know how close or far from the maximum you until you actually arrive there, nor even whether you will ever get there.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment