Wishlist for Stata 19

Nick Cox

Join Date: Mar 2014

Posts: 35810
#76

14 Jun 2023, 10:03

Mike Murphy Well, this is about saying what people want and you did.

The rest is secondary:

My mind boggles at wanting (to be able) to work with variable names 500 characters long that differ in the last character, but there you go.

It's not a zero-sum game, as accommodating a small number of users who want what may be a very difficult thing to implement is not what I want StataCorp to spend their time doing -- unless it's my want!

I think every user has incentives to make the output they get as informative and easy to read as possible -- for them and their readers (any one else how has to work with their output). How much you're paying yourself is undoubtedly personal context.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#77

14 Jun 2023, 11:07

I'm paying for software, I want features that fit my use case.

I pay for mine, too, and I also want features that fit my use case. Problem is, my use cases are probably different from yours, and from Nick's, and from everybody else's. And if StataCorp attempted to give all of us what we want, the package would probably have to be priced at $50,000 so they could support an army of programmers to create and maintain the world's most bloated piece of software. And performance would be in the tank. So they have to make choices.

Despite the fact that, as best I can recall, not a single one of my wish list posts has ever been fulfilled, I think they do a great job of identifying the really important improvements in the package, including things that I never thought of but which turn out to be really useful. And I think they do a great job of identifying the needs of the large bulk of users. For example, I don't recall anybody ever requesting the whole suite of DID estimators that have been recently introduced to Stata in the Wishlist threads. And I don't actually use them myself. But you can see from the threads on this forum that lots of people are using them, and while there is a learning curve involved, they are providing the huge number of users who do all manner of DID estimations with a simpler way of specifying and estimating their models. I think this is a good example of a business recognizing it's customers' needs better than the customers themselves did.

Moreover, many needs are handled by user-written programs that are readily available from SSC, Stata Journal, or other sources. True, you cannot write an ado-file that will change the 32 character word length limit. But most wishes that StataCorp passes over can be fulfilled in this way.
5 likes
Comment
Mike Murphy

Join Date: Jul 2014

Posts: 88
#78

14 Jun 2023, 12:01

Clyde Schechter I agree, naturally StataCorp cannot (and should not) attempt to implement every request in these threads. I was asked to provide a rationale for my feature request, and was attempting to do so.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3890
#79

22 Jun 2023, 02:29

I would like to see two additional Mata functions:

st_datalabel()

should complement st_varlabel() and st_varvaluelabel() and provide access to the dataset label.

st_label_dir()

should return a column vector of the value label names defined in memory, i.e., replicate Stata's label dir in Mata.
3 likes
Comment
Cherokee Lee

Join Date: May 2017

Posts: 4
#80

23 Jun 2023, 03:07

Support for reading parquet file is really helpful. Now I use csv file to bulid a link with python and stata. It takes 20 minutes a time, while parquet cost just few seconds. Please, Please, Please support the parquet file as soon as possible. It would be really helpful.
3 likes
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#81

27 Jun 2023, 07:53

My suggestion is related to Structural Equation Modelling (the -sem- command). It would be great if we can type a command using -sem- and Stata provides a sketch (or diagram) for that command, and vice versa.
2 likes
Comment
Jonathan Afilalo

Join Date: Nov 2016

Posts: 42
#82

29 Jun 2023, 06:02

XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost XGBoost LightGBM
Comment
Felix Kaysers

Join Date: Oct 2022

Posts: 75
#83

18 Jul 2023, 08:13

Allow the option append when collext export to .docx.

Cheers,
Felix
Stata Version: MP 18.0
OS: Windows 11
Comment
Scott Adams

Join Date: Sep 2014

Posts: 51
#84

18 Jul 2023, 12:33

Allow code-based editing of graphs.
After the initial plot it would be nice to be able to change features by issuing something like
graph edit, xtitle("new title")

A trivial example that is easy to do in the graph editor, but more complicated changes are not possible or very tedious (e.g., change the y-labels on a Kaplan Meier plot from 0 to 1 to percentages).
Graph editing would allow a programmer to break the twoway command into multiple steps which could be helpful for sanity.

Sorry if this has been posted. (And yes I am aware of the unofficial gr_edit and serset and the graph editor record thing.)
4 likes
Comment
Jaakko Markkanen

Join Date: Mar 2021

Posts: 3
#85

21 Jul 2023, 04:53

I agree with #60. Not computing standard errors could give huge speedups in many occasions. My other wishes are the same as for Stata 18 (I think none of these changed wr.t. Stata 17):

1. Implement the estimation of otherwise identical regression equations with only different outcome variables by using only one command (like in fixest for R). Looping different outcome variables currently requires unnecessary matrix calculations.
2. Allow import delimited and export delimited to read/write compressed files.
3. Native support for parquet files.
4. frlink should support 1:m
5. Append for frames
6. Collapse to frame
7. Allow frlink and merge based on label values. Currently, it's not possible to encode and label string IDs from different data sources before merging because the keys won't match. This is very inconvenient with large data sets.
8. Related to the previous part, a new data type 'category' would probably be a good idea. Or a new, more efficient data type for strings (like in data.table/datatable for R and Python).
8 likes
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#86

21 Jul 2023, 10:02

I am working in a secure environment where the data is stored on a remote machine and accessed from Stata via the local network.

During the analysis I am running Stata code coming from 3 sources: StataCorp (Stata's base code), my own code, and 3rd party code. Any of the 3 may rely on preserve/restore commands. The concern is that an event that crashes the computer may leave the files, preserved by Stata in the temporary folder, exposed and easily accessible in a subsequent session.

While I can modify my own code to avoid preserve/restore, I can't modify the code from the other 2 sources (and I wouldn't want to modify every small procedure which relies on them, and then monitor all of the updates and changes in those tools).

My preferred solution, that I am asking to consider implementing in the future versions, would be to tell Stata to preserve in memory, something like:

Code:

set mempreserve on

after which all preserves are stored in RAM, and are not saved to any disk, and this is without any modification to the existing code.

I am confident I have ample memory on the executing machine to store 100s of copies of the datasets being processed, so that capacity should not be an issue, while data privacy is of a higher priority.

I realize that some commands use tempfile+save+use, rather then preserve+restore, to achieve essentially the same. I'll have to deal with them individually.

I know I can create a RAM drive and point Stata to save the temporary files there, by changing the corresponding environment variable, but creating a RAM drive will require admin rights on all machines where this is executed, and thus it remains the last resort measure.

Any relevant advice on avoiding the data exposure from Stata to other processes is welcomed.

Thank you,
Sergiy Radyakin
3 likes
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#87

22 Jul 2023, 01:49

Is there a way that I can take off/ drop the date" 01 jan1960" attached to the time?
example;

01jan1960 00:01:18
so that I will have only 00:01:18 standing alone without 01jan 1960 at the background?

I used this format: below;
format timediff %tcHH:MM:SS

I got the desire results alright but it still shows "01jan1960 00:01:18" in the content.

And surprisingly when I copied it to Excel, the content is showing 12:01:18 am at the formula bar but see 00:01:18 in the cell itself.
I attached the example data set.

It is my wish that, Stata consider this Scenario in the subsequent version.

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(MRESPONSE_ID timediff)
1 78000
2 515000
3 28000
4 93000
end
format %tc timediff
format timediff %tcHH:MM:SS

Thank you.
Great Job!
Sham
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 755
#88

23 Jul 2023, 06:22

Expanding a wish from one posted much earlier for v18 (or v17 or...): It would simplify some programming if there were available cumulative distribution functions for multiparameter distributions that accommodated all parameters as arguments, e.g. expand

Code:

normal(x)

to

Code:

normal(x,m,s)

and

Code:

gammap(a,x)

to

Code:

gammap(a,b,g,x)

The help file

Code:

help density_functions##gamma

notes that

Probabilities for the three-parameter gamma distribution (see gammaden()) can be calculated by shifting and scaling x; that is, gammap(a,(x - g)/b).

But I see no compelling reason not to provide functions that do this directly.

Note: I am still using v17 so it this may have already been implemented in v18.
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 755
#89

23 Jul 2023, 06:32

Endorse #60 and #85 regarding suppressing standard error computation. margins allows this with the nose option. Why not make available for other commands, or at least those commands where computing SEs may entail nontrivial computation beyond that required for estimation of the "main" parameters?
5 likes
Comment

Castor Comploj

Join Date: Mar 2021
Posts: 91

#90

03 Aug 2023, 04:29

Collapse command: replace (semean) with missing values if the variable is binary.

I have the following example, which I do manually. When I plot means with 95% CI by age-group, I think STATA should by default always use the correct standard errors. When there is a proportion (e.g. with only possible values 0 and 1), then the sebinomial should be used. I do not see a use of having semean for a proportion.

HTML Code:

collapse (mean) `y'_mean = `y' (sd) `y'_sd = `y' (semean) `y'_sem = `y' (sebinomial) `y'_seb = `y' (max) `y'_max = `y' , by(`agegrp') 
    ** if max is 1 (i.e. y is binomial/proportion), use seb in calculation of confidence interval, otherwise sem **
    if `y'_max ==1 { 
    loc se "seb"
    }
    else {
    loc se "sem"
    }
gen upper = `y'_mean + 1.96 * `y'_`se' /*95% confidence interval*/
gen lower = `y'_mean - 1.96 * `y'_`se'
twoway ///
|| rcap upper lower `agegrp'  ///
|| scatter `y'_mean `agegrp'

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment