Wishlist for Stata 18

Mead Over

Join Date: Sep 2014

Posts: 112
#361

29 Apr 2022, 14:06

Wish: To facilitate future replication of Stata results, a StataCorp utility to help users "freeze" a collection of user-contributed ADO, Mata and MLIB programs for publication/posting with the Stata DO files that call those programs

Many more responsible journals require that referees and eventual readers be able to replicate the analytical results in submitted papers. Through support to the Stata Journal and periodic Stata user conferences, Stata also encourages and helps users to produce and publish Stata programs that extend Stata's capabilities in small and large ways. But in the future when a researcher attempts to replicate the results in a published paper, the community-contributed programs originally used might not be available in the same version, or at all.

Thus a user wishing to enable future replication of a set of interlocking DO files and community-contributed ADO/Mata/Mlib files must figure out how to assemble and "freeze" the community-contributed ADO files used in a given research project. This is doable and many users are already doing it, each in his or her own way. But it would be great if there were a set of StataCorp supported conventions and utilities to standardize the process. (Ideally some journals, starting with Stata Journal, would even require that Stata users conform to such StataCorp-recommended conventions and use the recommended program-freezing utilities.)

diana gold 's SSC program -dependencies- seems to me to be an excellent model for a Stata-supported way to "freeze" (her word) a set of user-contributed Stata ADO/MATA/MLIB files in order to facilitate future replication of research results. I like the fact that it allows the future replicator to temporarily modify their -adopath- as they replicate and then undo this change and delete the replication-specific collection of ADO/Mata/MLIB programs at will. Other SSC programs that accomplish some of the same objectives include -zippkg-, -rqrs-, -which_version-, -copycode-, -adolist- and -usepackage-.

I take the point that results produced using the updated community-contributed ADO files may differ from those originally published exactly because the ADO file's bugs have been fixed. The new results might be "better". But I think this is an argument in favor of, rather than against, requiring authors to publish their frozen ADO files as part of a journal submission. I think that replicators need to start with a script that reproduces as exactly as possible the published result, before they experiment to discover the sensitivity of those results to different approaches and/or data. It is the replicator's responsibility to discover that the newer version of the community-contributed program produces a different result.

diana gold, daniel klein Nick Cox, Sergio Correia and others have extensively discussed these issues on these threads:
https://www.statalist.org/forums/for...lable-from-ssc
https://www.statalist.org/forums/for...ge-require-ado
https://www.statalist.org/forums/for...o-local-folder
https://www.statalist.org/forums/for...os#post1523554
https://www.statalist.org/forums/for...79#post1662079

Last edited by Mead Over; 29 Apr 2022, 14:55.
9 likes
Comment
Jean-Michel Galarneau

Join Date: Aug 2018

Posts: 39
#362

04 May 2022, 09:59

a slightly bigger arrow in the replace all button in the do files such that we can access the replace all in selection with more ease.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#363

05 May 2022, 07:45

Extension of the existing -tabulate- and -table- commands to have an option that would look at attached variable labels of the input variables and use those defined levels of the label to add them to the tabulations. In the one dimensional case, this is closest to Ben Jann's -fre- command with the -i()- option, which lets the user include specific values that would otherwise have zero frequency. One related request of mine has now been implemented as the new -table, zerocounts- option. However, this option is limited in scope to where zero counts are implied by the cross-tabulation.

There is no support for this currently in any of the official commands, but this can be a useful feature when trying to create tabulations where you specifically want to show zero frequency counts.

As a quick example to demonstrate this, consider the following.

Code:

tabi 0 1 2 \ 0 3 4

The output eliminates the first column because there are no observations in any of those cells.

Code:

. tabi 0 1 2 \ 0 3 4 | col row | 2 3 | Total -----------+----------------------+---------- 1 | 1 2 | 3 2 | 3 4 | 7 -----------+----------------------+---------- Total | 4 6 | 10
4 likes
Comment
Lili Bulfone

Join Date: Apr 2021

Posts: 11
#364

06 May 2022, 23:09

It'd be useful to have some documentation on the gr_edit command
5 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#365

07 May 2022, 10:02

`
EDIT: Sorry--no intended post here. Cat walked on the keyboard and somehow that led to a save.
7 likes
Comment
Ali Atia

Join Date: May 2020

Posts: 737
#366

07 May 2022, 14:35

Seconding #364. At present it is easy enough to record actions in the graph editor and copy paste them into a gr_edit command, but it would be great to have documentation on how to write those commands from scratch.

Better yet, it would be great to be able to do everything that can be done via the graph editor within the original graph or twoway command used to generate a graph, which sometimes seems impossible (recent example here: https://www.statalist.org/forums/for...bol-color-size)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#367

09 May 2022, 15:21

It would be nice to have a convenient one-step command to copy value labels from one frame to another. You can accomplish it by copying any variable to which that value is attached (assuming there is one, which isn't always the case) into the second frame, and then apply that label to the desired other variable and then drop the one you brought in. Or you can -label save- the label from the original frame to a tempfile and -run- it in the other. But it wold be convenient to be able to do it in a single command.
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3845
#368

10 May 2022, 00:38

Originally posted by Clyde Schechter View Post

It would be nice to have a convenient one-step command to copy value labels from one frame to another. You can accomplish it by copying any variable to which that value is attached (assuming there is one, which isn't always the case) into the second frame, and then apply that label to the desired other variable and then drop the one you brought in. Or you can -label save- the label from the original frame to a tempfile and -run- it in the other.

It could be even simpler than that. To copy a value label to the current frame:

Code:

frame other_frame : mata : st_vlload("lblname", values=., labels="") mata : st_vlmodify("lblname", values, labels)

To copy a value label to another frame:

Code:

mata : st_vlload("lblname", values=., labels="") frame other_frame : mata : st_vlmodify("lblname", values, labels)

I see how a more general and robust approach would be convenient.
3 likes
Comment
Niels Henrik Bruun

Join Date: Aug 2014

Posts: 555
#369

18 May 2022, 04:53

I've become quite fond of the -describe using-.

It would be nice when working with large datasets to peek into the first or the last rows.
I suggest options for -use- like -use var1 var2 using "dataset.dta" in 1/100, last- to see the last 100 rows for var1 and var2 in "dataset.dta".

If a min/max/missing report could be saved in the dataset as metadata, that would be nice too. Maybe this should be an option to -save-.

Kind regards

nhb
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#370

18 May 2022, 05:59

Originally posted by Niels Henrik Bruun View Post

I've become quite fond of the -describe using-.

It would be nice when working with large datasets to peek into the first or the last rows.
I suggest options for -use- like -use var1 var2 using "dataset.dta" in 1/100, last- to see the last 100 rows for var1 and var2 in "dataset.dta".

If a min/max/missing report could be saved in the dataset as metadata, that would be nice too. Maybe this should be an option to -save-.

The usual -in- range for the last observations would be as below (not the final character is a lower case L).

Code:

mycmd in -100/l
1 like
Comment
George Ford

Join Date: Aug 2014

Posts: 3142
#371

19 May 2022, 06:23

I would like for -capture drop x y z- to work even if one the variables does not exist. Now, if there was no y, then it would not delete x or z. As a result, I have to use multiple lines of code when one should do.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3845
#372

19 May 2022, 06:49

Veto #371. First, such a change would violate the (implicit) rule of "do it all or do nothing at all", which is implemented pretty much throughout Stata and ensures that you will never have to guess on the current state of the dataset.

More generally, while I see how the behavior can be frustrating in this situation, a change would necessarily lead to inconsistencies. If

Code:

capture drop x y z

would drop only existing variables then

Code:

drop x y z

would need to do the same. This is because capture does not change the behavior of any other command. If it did, we would need to look up the specific modifications capture did to each specific command. This is clearly way more inconvenient than the current (predictable) behavior.

If instead, we changed the way drop works, we would not even need capture. The problem with that is that whenever a command is referring to a varlist, it refers either to existing variables or to new variables, never to a mixture. Commands that may refer to existing and new variables, e.g., generate, have these groups of variables clearly separated in their syntax diagram.

Changing drop would also be inconsistent with keep because we can obviously not keep variables that do not exist.

Last edited by daniel klein; 19 May 2022, 06:55.
3 likes
Comment
Ali Atia

Join Date: May 2020

Posts: 737
#373

19 May 2022, 07:35

Seconding #372. Also -- if a captured command "partially" succeeds, what is the return code stored in _rc? Is it 0, because of "partial" success, or is it (in this case) 111, because of "partial" failure? This would likely mess with a lot of the traditional usage of capture to lead in to a conditional argument based on the value of _rc.
1 like
Comment
Daniel Fernandes

Join Date: Oct 2020

Posts: 7
#374

20 May 2022, 05:38

Originally posted by Ali Atia View Post

Seconding #372. Also -- if a captured command "partially" succeeds, what is the return code stored in _rc? Is it 0, because of "partial" success, or is it (in this case) 111, because of "partial" failure? This would likely mess with a lot of the traditional usage of capture to lead in to a conditional argument based on the value of _rc.

That would be a very strange behaviour for the capture command. I would rather have

Code:

drop var1 var2, force

as an option in the command. I still think this is not a very good option. If you absolutely need this behaviour you can also program it yourself with something along the lines of:

Code:

program define checkvars, rclass syntax namelist unab varlist: * return local newvars: list namelist - varlist return local vars: list namelist & varlist end

Code:

checkvars var1 var2 var3 var4 drop `r(vars)'
4 likes
Comment
Ali Atia

Join Date: May 2020

Posts: 737
#375

20 May 2022, 07:54

Originally posted by Daniel Fernandes View Post

That would be a very strange behaviour for the capture command. I would rather have

Code:

drop var1 var2, force

as an option in the command. I still think this is not a very good option. If you absolutely need this behaviour you can also program it yourself with something along the lines of:

Code:

program define checkvars, rclass syntax namelist unab varlist: * return local newvars: list namelist - varlist return local vars: list namelist & varlist end

Code:

checkvars var1 var2 var3 var4 drop `r(vars)'

I believe you meant to respond to #371 -- we are in agreement that altering the behavior of capture in this way wouldn't be a good idea.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment