Wishlist for Stata 18

FernandoRios

Join Date: Apr 2014

Posts: 2470
#526

07 Nov 2022, 18:38

Two additional wishes for mata
1) it would be great if one could run mata scripts from the dofile editor, once Mata is activated (it usually gives error because "do" is not valid from within mata)
2) Perhaps it would be useful to have a dedicated mata editor. So that programming and using mata is as flexible as with other languages.
F
4 likes
Comment
JanDitzen

Join Date: Jan 2015

Posts: 350
#527

08 Nov 2022, 23:47

I am getting more and more often frustrated with managing ado paths when working with coauthors or helping people find bugs. Also I fear that a lot of people are not really aware what is in their ado folders and which version of a program is actually used. Thus I would suggest the following (I think I mentioned some of it somewhere before but can't find it):
adopath clear to reset the environment to the default when Stata is loaded

ensure that the path exists when adding one using adopath +

ignore back/forwardlash at the end of path when using adopath -

extend which so it shows path which program is used and alternative paths if a program appears in several ado folders.

Often I find people have installed several versions of programs without realising it. This causes incomplete updates, errors and can even lead to inconsistent results. The problem is that one can install a package from SSC or other sources at the same time. For example a user can install a version 1 from SSC and then an update from - say - GitHub. I had occurrences when Stata thought there are two versions and the update was incomplete. Some files were overwritten, others not. The user went ahead and the package produced at best an error, at worse results with an outdated version of a program. It would be great if it would be possible to give a program an identifier which is independent of the source from which is installed together with a program specific version indicator.

I think the points raised are important because one of the great advantages over R are (from my point of view) Stata's capabilities of version control, easy installations of packages/community contributed programs and reproducibility.
8 likes
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#528

09 Nov 2022, 10:38

Subject: Extend -permute- to shuffle more than one variable.

I'd recommend extending the -permute- command to allow shuffling of more than one variable. (This occurs to me in the context of responding to some StataList questions about what are being called "placebo tests" in econometrics, which as near as I can tell often are a type of permutation test with multiple variables being permuted.) This should be easy to do, as all the "hard stuff" to program in -permute- is already there, i.e., keeping track of and reporting the results. Shuffling multiple variables at each rep. should not be hard.
6 likes
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#529

11 Nov 2022, 08:35

I or others may have mentioned it, but integrating R code into Stata would be awesome too like we've done with Python. Too often, we have to "choose" between Python or R... where instead, we should have the best of the three worlds- data cleaning in Stata, and fancier statistical analyses in R/Python. It would go a great way into making people bi or trilingual, as well as expand the scope of Stata's use to others.
2 likes
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#530

11 Nov 2022, 10:10

I have long asked for xthtaylor to work with the special case in which there are no exogenous time-invariant regressors. Currently, the command aborts with an error message. There is no econometric reason for preventing the estimation of such a model. This should be a relatively straightforward update. See also: https://www.statalist.org/forums/for...nous-variables

https://www.kripfganz.de/stata/
1 like
Comment
Lee Tucker

Join Date: Nov 2022

Posts: 1
#531

16 Nov 2022, 13:22

I don't think I've seen this one so far: it would be really great to have a simple syntax for frame iteration (looping through the rows of a frame, loading variable values into local macros that can then be used in the execution of code within a loop). A common use case for me would be iterating through candidate specifications where each iteration of the loop requires multiple pieces of information to run. More generally, frame iteration would accomplish many of the same patterns that are accomplished in Python with list or dictionary iteration, and in SAS with hash object iterators.

A very simple and non-sensical example (more typical use would have the specifications frame loaded from another source):

Code:

sysuse auto frame create specifications strL( name xvars vcetype ) frame post specifications ("weight only") ("weight") ("ols") frame post specifications ("weight and length, robust") ("weight length") ("robust") frame post specifications ("weight and mpg, bootstrap") ("weight mpg") ("bootstrap") frame create results strL( name b se ) foreach of frame specifications { regress price i.foreign `xvars', vce(`vcetype') frame post results ("`name'") (_b[1.foreign]) (_se[1.foreign]) }

I'm not wedded to this specific syntax, but you get the idea. Iterating over observations of a frame now is very clunky, because each local macro variable has to be explicitly assigned inside an observation counter loop, rather than being automatically assigned using the name of the corresponding variable in the frame.

Also want to +1 on appending frames to one another as built-in functionality. I have found the various frame appending packages to be buggy.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#532

16 Nov 2022, 18:42

Originally posted by Lee Tucker View Post

I don't think I've seen this one so far: it would be really great to have a simple syntax for frame iteration (looping through the rows of a frame, loading variable values into local macros that can then be used in the execution of code within a loop). A common use case for me would be iterating through candidate specifications where each iteration of the loop requires multiple pieces of information to run. More generally, frame iteration would accomplish many of the same patterns that are accomplished in Python with list or dictionary iteration, and in SAS with hash object iterators.

A very simple and non-sensical example (more typical use would have the specifications frame loaded from another source):

Code:

sysuse auto frame create specifications strL( name xvars vcetype ) frame post specifications ("weight only") ("weight") ("ols") frame post specifications ("weight and length, robust") ("weight length") ("robust") frame post specifications ("weight and mpg, bootstrap") ("weight mpg") ("bootstrap") frame create results strL( name b se ) foreach of frame specifications { regress price i.foreign `xvars', vce(`vcetype') frame post results ("`name'") (_b[1.foreign]) (_se[1.foreign]) }

I'm not wedded to this specific syntax, but you get the idea. Iterating over observations of a frame now is very clunky, because each local macro variable has to be explicitly assigned inside an observation counter loop, rather than being automatically assigned using the name of the corresponding variable in the frame.

Also want to +1 on appending frames to one another as built-in functionality. I have found the various frame appending packages to be buggy.

What you've described can already be done with a little tweaking. It wouldn't be too much more work to load in specifications from an external file (say an Excel sheet), and to split out the part of the loop into a utility program that serves to fetch the relevant specs from a specific row. Nevertheless, this is one, minimal technique that works.

Code:

* Prepare Specs and results frames frame create specifications str64( name xvars vcetype ) frame post specifications ("weight only") ("weight") ("ols") frame post specifications ("weight and length, robust") ("weight length") ("robust") frame post specifications ("weight and mpg, bootstrap") ("weight mpg") ("bootstrap") frame specifications: compress frame create results str64(name) double(b se) // Load data, and step through regressions, driven by the Specs mkf data cwf data sysuse auto frame specifications: local nspecs = _N forval i = 1/`nspecs' { cwf specifications local i_name = "`=name[`i']'" local i_xvars = "`=xvars[`i']'" local i_vce = "`=vcetype[`i']'" cwf data regress price i.foreign `i_xvars', vce(`i_vce') frame post results ("`i_name'") (_b[1.foreign]) (_se[1.foreign]) } cwf results list

As an aside, it's better to store estimation results as numeric types (here, double) so you can manipulate them later if needed. You can also keep your specs to strings shorter than strL because it's unlikely (or impossible) to have such long strings in most cases, and you can save some memory by being more conservative.
2 likes
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 785

#533

17 Nov 2022, 05:32

re #531/#532 another variant of very clunky

Code:

frame change specifications

des, varlist
local varlist `r(varlist)'  

forvalues i = 1/`=_N' {
    
    foreach arg of local varlist {

        loc `arg' = `arg'[`i']
    }    

    frame default {
        
        regress price i.foreign `xvars', vce(`vcetype')
        frame post results ("`name'") (_b[1.foreign]) (_se[1.foreign])
    }
}

Comment

wbuchanan

Join Date: Mar 2014

Posts: 1362
#534

18 Nov 2022, 06:07

Leonardo Guizzetti
Assuming there is only one categorical variable of interest, I’ll post a different solution I developed to this challenge (which also includes some other functionality related to the task) a bit later. The solution I created requires an additional command call to set up the infrastructure, but will deal with any number of categories (and assumes the intercept is not of interest), captures the model command call, and allows additional info to be passed to describe the model. The approach I took uses a Mata object that I defined after I figured out that struct objects didn’t seem to persist between calls to an ado. So all the metadata needed gets stored in an object and the methods defined on the object handle all the storage and retrieval work.
1 like
Comment
Aaron Wolf

Join Date: Aug 2017

Posts: 2
#535

20 Nov 2022, 11:40

Stata kernel for Jupyter notebooks. The current Stata magic command works, but is a little clunky, and I am not a fan of needing to write in at the start of each cell. It is also a bit slow, in my experience. Much slower than running in Stata directly. Likewise, Stata should likely find a way to help notebooks with syntax highlighting.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#536

21 Nov 2022, 03:23

It would be very useful if there was an option to fvexpand which removes base values, as Mark Schaffer's fvstrip does (which is also in Sergio Correia's ftools under the name ms_fvstrip):
https://www.statalist.org/forums/for...-fvexpand-list

https://www.kripfganz.de/stata/
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#537

21 Nov 2022, 08:01

Another thing I would most appreciate is to allow pinpoint customizations of the legend placement with coordinates, e.g.,

Code:

legend(pos(coord(1970, 45)))
3 likes
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#538

22 Nov 2022, 06:16

Jared Greathouse what would those coordinates represent? Also, are you just asking to define where the upper left corner would begin/start or something else?
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2172

#539

22 Nov 2022, 09:33

Hey, so say I have a time series that looks like this

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double cf float(cigsale3 year)
122.09479671055374   123 1970
 121.6065529858943   121 1971
123.65496555673786 123.5 1972
124.10483385256285 124.4 1973
126.75462121002437 126.7 1974
126.63825689453941 127.1 1975
127.92824675473713   128 1976
 126.4873238266998 126.4 1977
125.37864959522437 126.1 1978
122.34822098598946 121.9 1979
  119.928632673551 120.2 1980
119.10175992318514 118.6 1981
115.68795915144617 115.4 1982
111.04282414949492 110.8 1983
104.56636628133799 104.8 1984
 102.7682761771388 102.8 1985
 99.35677151118881  99.7 1986
 97.39989301281054  97.5 1987
 91.15104874688336  90.1 1988
 88.66315821472102  82.4 1989
 85.21665593484636  77.8 1990
 80.76789213762831  68.7 1991
 79.08976003988303  67.5 1992
 79.33037122087953  63.4 1993
 76.85184614706827  58.6 1994
 74.46827712093958  56.4 1995
 73.64585997258112  54.5 1996
 73.61649595338298  53.8 1997
 71.64467330066938  52.3 1998
 70.90756101627477  47.2 1999
 65.74723052634468  41.6 2000
end
format %ty year

cap set scheme gg_w3d 

if _rc {
    
net install schemepack, ///
from("https://raw.githubusercontent.com/asjadnaqvi/stata-schemepack/main/installation/") ///
replace    
    
}

lab var cf "Counterfactual Sales"
lab var cig "Real Cigarette Sales"

tsset year, y
cls
tsline cig, lcol("12 10 0") lwidth(thick) || /// Observed Sales
tsline cf, lcol("237 41 57") lwidth(medthick) lpat(--),, /// Counterfactual
scheme(gg_w3d) ///
tline(1989, lcol(blue) lpat(solid) lwid(medthick)) ///
legend(ring(0) pos(3)) ///
tti(Year) ///
yti(Cigarette Sales) ///
plotregion(fcol(gs12))

The coordinates are meant to rep the x-y axis respectively, allowing the user to place the legend literally wherever they please. At present the legend appears such that it intersects with the counterfactual (I know I can just delete the color from within the legend box, but suppose I like the color and wanna keep it). Say I also, arbitrarily, wanna keep the legend on the right side of the graph. In a situation where I could do, for example,

Code:

legend(pos(coord(1990, 99.5)))

we'd have the legend's box moved inward, such that its left vertice touches 1990, and its centroid intersects with the y value of 99.5. Now of course, this could get a lot trickier when multiple things are plotted, so no doubt some tinkering and adjustment would be needed. I like the clock way, but I always wondered why we couldn't simply be arbitrary and place it wherever our heart desires. wbuchanan

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#540

22 Nov 2022, 09:41

Now that it's on my mind, it would also be cool to integrate hex colors into Stata's customization scheme. So if I like the color imperial red, I could just do

Code:

lcol("#ed2939")

instead of

Code:

lcol("237 41 57")
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment