Wish list for Stata 14

Attaullah Shah replied

27 Aug 2014, 11:40
I would like Stata to have more memory than 500 mb and more than 32000 variables. Computer RAMs have now significantly larger sizes these days. I often have to compress and delete variables to create more space for additional calculations and variable creations in Stata.
Leave a comment:
László Sándor replied

25 Aug 2014, 08:51
Originally posted by Roberto Ferrer View Post

Add a semicolon, and you get an automatic Enter:

Code:

global F3 "set trace on;" global F4 "set trace off;"

Oh, of course, thank you. I would edit/delete my post if I could. Btw. the formatting JavaScript (?) is broken in Safari (8) as well, and editing of previous posts is rarely available.

Sorry about your signature, I should have known. Those are perfectly fine general principles, of course. (And I should have formatted by post better if I could.)
Leave a comment:
Roberto Ferrer replied

24 Aug 2014, 21:11
Originally posted by László View Post

Here is the solution Roberto meant to share with us:

Code:

global F3 "set trace on" global F4 "set trace off"

Then pressing these function keys will spell these commands out on the command line. You still need to press Enter, ...

Add a semicolon, and you get an automatic Enter:

Code:

global F3 "set trace on;" global F4 "set trace off;"

... and there will be two different keys switching it on and off. I consider this inferior to, say, the excellent Cmd+1, Cmd+9 switching between the command line and the do-file editor on a Mac StataCorp built in.

I too think a shortcut that allows toggling would be superior. Just meant to comment a substitute procedure for the time being.

Note that I have not typed anything before (exactly), Stata did nothing in response (exactly) [N.B. exactly], and I can describe my data without -list-, and you don't need -input- either, you only need -clear all-.

Formatting works horribly in Chrome here, and you cannot edit previous post, only the last one. I tried to clean up posts before. (E.g. often the cursor just jumps to the end of the line, you cannot edit what you typed before, like now I could not write this to the worst sentence.)

This is in response to my "signature". It's meant to be a message to those who do not follow Statalist guidelines and was purposeless in your case.
1 like
Leave a comment:
László Sándor replied

24 Aug 2014, 14:47
Originally posted by Roberto Ferrer View Post

One way is using F-keys. Limited, but might work for you. See [U ] 10.2 F-keys.

Here is the solution Roberto meant to share with us:

Code:

global F3 "set trace on" global F4 "set trace off"

Then pressing these function keys will spell these commands out on the command line. You still need to press Enter, and there will be two different keys switching it on and off. I consider this inferior to, say, the excellent Cmd+1, Cmd+9 switching between the command line and the do-file editor on a Mac StataCorp built in. Thus this post would be relevant to this thread.

Note that I have not typed anything before (exactly), Stata did nothing in response (exactly) [N.B. exactly], and I can describe my data without -list-, and you don't need -input- either, you only need -clear all-.

Formatting works horribly in Chrome here, and you cannot edit previous post, only the last one. I tried to clean up posts before. (E.g. often the cursor just jumps to the end of the line, you cannot edit what you typed before, like now I could not write this to the worst sentence.)
Leave a comment:
Roberto Ferrer replied

24 Aug 2014, 14:29
Originally posted by László View Post

It would be great to have keyboard shortcuts for the most common "set" commands. Namely, 'set trace on/off'. Setting it on and off should be very easy. (Otherwise I am often tracing the ensuing -help command…)

One way is using F-keys. Limited, but might work for you. See [U ] 10.2 F-keys.
Leave a comment:
László Sándor replied

24 Aug 2014, 13:14
Originally posted by László View Post

Note that F(-1) or L(-1) does not work in expressions but works as varnames. You can do -regress y F(-1).y-. But -g t = F(-1).y- indeed gives you an "unknown function ()" error. Strange, unfortunate and inconsistent to my eye.

Also note that Stata will parse this specification, and you won't be able to refer to, say, _b[F(-1).y], only _b[L.y], after the -regress- above. You can work around this (viz. reproduce the name _b sees) with -tsunab-.
Leave a comment:
László Sándor replied

24 Aug 2014, 12:48
It would be great to have keyboard shortcuts for the most common "set" commands. Namely, 'set trace on/off'. Setting it on and off should be very easy. (Otherwise I am often tracing the ensuing -help command…)
Leave a comment:
László Sándor replied

24 Aug 2014, 12:15
OK, a more specific, constructive comment on the memory space issue (keeping up with Sergiy's humble suggestion): Please let us have a system-wide setting where we are content with -preserve- to RAM. At least in the cases where the user assures data will always be < ~50% of system memory *and* the user allows Stata to "abuse" the system (server) freely, disk I/O should not be a limitation for all the operations Stata needs tempfiles for.

This would still not make -preserve-restore- free, but come a long-long way.

This would not help with Stata re-sorting variables unnecessarily either. As Mata does have a separate memory space and sorts only data (views) it loaded, we could hope for StataCorp building more of its operations into Mata if they need sorting. However, sorting (or the need for presorted data) seems so prevalent that this is not very realistic.
Leave a comment:
László Sándor replied

24 Aug 2014, 10:41
This is a deeper issue in Stata, probably hard to change, but here it is: I don't like the idea of -marksample- in all our ado files. If you use -marksample-, you need to use "if" everywhere afterwards, which loops over the entire data to flag variables all the time, while an original (though rare) "in" condition (or simply no "if" and nothing missing) would simply refer to the right arrays in memory, with immediate indices. It is a very convenient feature, but needless overhead in big data. If don't see what optimization of marksample could ever help ensuing lines which only see "if `touse'" to know that they could use smart indexing right away.

This might be related or unrelated, but there seem to be more and more features of Stata (factor variables, large-N small-T panels etc.) which would benefit greatly from sparse matrices in Mata. One wonders how hard it is to add.

Separate memory spaces came up before, but note that the huge costs of sorting and preserving-restoring in data with many covariates (esp. if unused in a line) or irrelevant observations, comes from the fact that the rest of the big data is also moved in memory needlessly.
1 like
Leave a comment:
László Sándor replied

24 Aug 2014, 10:32
Originally posted by Michael Anbar View Post

I would love to see several aspects of time series made more standard. For one thing, why doesn't this work?

Code:

g x = L(-1).t

to refer to the first lead of a variable? I know I can use F1.t or F.t, but a consistent syntax would make loop much easier. I often find myself looping through leads and lags of variables, and it's needlessly cumbersome to have to throw in an if statement when indexes switch from positive to negative, because every other major time series package actually matches mathematical notation.

Furthermore, I would love to see egen's functions, e.g. anymatch, support time series operators.

Note that F(-1) or L(-1) does not work in expressions but works as varnames. You can do -regress y F(-1).y-. But -g t = F(-1).y- indeed gives you an "unknown function ()" error. Strange, unfortunate and inconsistent to my eye.
Leave a comment:
Laurence Lester replied

20 Aug 2014, 18:40
To follow my previous question to the list: an option to send do-file line numbers to the screen as the do file runs.
1 like
Leave a comment:
Michael Anbar replied

20 Aug 2014, 17:38
I would love to see several aspects of time series made more standard. For one thing, why doesn't this work?

Code:

g x = L(-1).t

to refer to the first lead of a variable? I know I can use F1.t or F.t, but a consistent syntax would make loop much easier. I often find myself looping through leads and lags of variables, and it's needlessly cumbersome to have to throw in an if statement when indexes switch from positive to negative, because every other major time series package actually matches mathematical notation.

Furthermore, I would love to see egen's functions, e.g. anymatch, support time series operators.
1 like
Leave a comment:
Sergiy Radyakin replied

20 Aug 2014, 17:06
Clyde, I totally agree. I have requested "find in project files" at the Boston Conference recently (for example Visual Studio does this with Ctrl+Shift+F). Alternatively I envision "Search in all open files" since there is not always a project that unites them all. Currently to create manageable code one needs to break it down to pieces, but to find something in the code one needs external tools (which exist, and depend on the platform and user's personal preferences). Proper folding (1 procedure = 1 line) should also be useful in big projects. Unfortunately programs in .ado files (yes there may be multiple) are currently not foldable (perhaps because of possible variations in program/program define), and even mata code folds imperfectly, imho.

As for versioning - that is imho totally separate task, which is not a Statistical Package's job to do. Most version control systems will track any files you want, and anything Stata is using as input can be tracked. Due to multiple platforms, OSs, and VCS assumptions I think it makes no sense to incorporate this into Stata. Besides many free tools are available for that. On the other hand Stata can be a bit more friendly to VCS systems and not use binary formats where possible. For example, the project file is currently binary. An XML or Jason, or a simple INI file can do the same job while allowing to resolve the conflicts graciously.

Finally, on this thread, may I humbly suggest splitting suggestions from wishes? Some suggestions are actually resolved quickly by other users pointing to already available functionality, but such suggestions really clutter this thread. I tend to think of a wish in this context as something that is not doable by the user in principle, but something that should be relatively easy to do for developers having access to internals. For example, if the list of the variables can be exposed so that the user can pick variable names from it, why not expose the list of globals? The rest, (I wish Stata (program) did my job, or I wish Stata (program) was smart enough to understand what I want from it) I put into the group dreams, which is not something that is worth discussing. But I think what could help is some weighting of features (easily done here in the forum with opinion polls), such as "what do you prefer 3D charts, or Mata debugger?". Both are useful, and maybe even equivalent in man-hours. But the market imho will strongly signal the former, since the latter is interesting only to a few developers. With some other features it is less clear.

Best regards, Sergiy Radyakin

Regards, Sergiy Radyakin
Leave a comment:
Clyde Schechter replied

20 Aug 2014, 15:10
But in the code - simply separate the code into several files.

I'm all for breaking things up into manageable pieces. But if you're working on a relatively small screen, such as my laptop, then even with the window maximized, I can only see 35 lines. Given that I liberally use whitespace to separate sections of the code, and I tend to throw in a lot of comments to remind myself later what I was thinking, that probably boils down to about 15 lines of working code in the window. It really wouldn't make sense to chunk my do-files that small. Also, I would be faced with the problem of searching a bunch of do-files to find which one the particular piece I'm looking for is in. It's much easier to just be able to split the window and then scroll in the second half to the place needed. (Often you can locate it with "Find".)
Leave a comment:
Sergiy Radyakin replied

20 Aug 2014, 14:58
Sarah, I think such split might be actually quite useful for the output window, if I need to do some immediate calculations (with display 2*3, etc) after a long output, I hate scrolling it back after every computation. But in the code - simply separate the code into several files. Eyeballing StataCorp's source it seems their ado files are ~12-15 screens on average. And it is pretty manageable, if only we could collapse Stata's programs, which is apparently not possible now.
Regards, Sergiy
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: