Wishlist for Stata 16

Andrea Discacciati replied

15 Mar 2018, 07:18
Originally posted by Nick Cox View Post

(Or, more likely, they just end up abbreviated any way, so where's the gain?)

Clyde (#83) illustrated one example where longer variable names are welcome. The usefulness of longer variable names in that example is orthogonal to how variable names are displayed (abbreviated or not).
Leave a comment:
Nick Cox replied

15 Mar 2018, 04:48
I think the problem with longer variable names and variable labels is not that people don't want them sometimes (they naturally do) or that there is a problem of principle in changing the limit (there presumably isn't). The problem is where is Stata to find the space to put them in output. There is a fairly substantial area of difficulty there in revising many commands. (Or, more likely, they just end up abbreviated any way, so where's the gain?)
1 like
Leave a comment:
Jesse Wursten replied

15 Mar 2018, 03:51
Originally posted by Clyde Schechter View Post

...

I'd like to echo that for macro names and variable labels. 80 characters to describe a variable is often way too short, severely limiting the use of this feature. The macro problem is essentially identical to the one posed by Clyde. I.e. I might make macros to hold some means of variables or regressions results across a wide set of specifications (e.g. small vs large firms, estimation in logs or not, using different dependent variables etc etc) and you very quickly reach the 32 character limit that way.
1 like
Leave a comment:
Clyde Schechter replied

14 Mar 2018, 13:47
I would like to see the 32 character limit on variable names relaxed to, say 48 or even 64.

I know, when we first started out the limit was much more stringent. But I really like using names that are explanatory, and I dislike abbreviating names by omitting vowels or the like. And sometimes you have to distinguish between variations on a theme (admission_date, discharge_date, surgery_date, etc.) I agree that 32 characters usually accommodates things; I rejoiced when the limit was first raised to 32.

But sometimes in data management it becomes necessary to create new variables based on the names of existing variables. For example, you might loop over a bunch of variables looking for certain kinds of data problems, and you might want to create a new variable doing something like

Code:

foreach v of varlist whatever { // MAYBE CALCULATE SOME STUFF FIRST // TO IDENTIFY PROBLEMATIC OBSERVATIONS FOR // VARIABLE V gen byte problem_`v' = some_logical_expression }

Well, that entails adding 8 characters to the variable name. So if you started out with 25, the code breaks. Of course you can use -strtoname()- to get you a substitute name that will fit, but then you have the problem that your problem_* names are no longer completely parallel with the names of the variables they are referring to, so that now writing another loop to fix the problems gets complicated with dancing around the name differences. Yes, of course, instead of problem_, one might use flag_, p_, or even just _, particularly if the variables are only needed in the interim and will not be saved with the data set. But even these will cause a break if applied to 28, 31 or 32 character names.

I know, there is probably no upper limit that will satisfy every need. But I can't help thinking that raising it to 48 or 64 would do little harm and would be welcomed by a non-negligible number of us who run into these problems.
4 likes
Leave a comment:
Chris Larkin replied

06 Mar 2018, 12:48
Nick Cox: I have seen the dots on some commands, e.g. bootstrapping, and personally I find them quite helpful! I'm pretty ignorant as to their underlying limitations though so perhaps knowing this would frustrate me more. And the textbook you mention offers sage advice, except i'd replace 'coffee' with 'scotch' for those late night coding sessions.

Clyde Schechter: i've used runby a couple of times (mostly when Robert Picaud suggested it on this forum). I hadn't previously had a good look at the help file though, and wasn't aware of the -status- option. It sounds like a sensible approach, balancing information provided to the user with potentially flooding their output window, and it would be great if StataCorp could integrate your code into future releases.
Leave a comment:
Sule Yaylaci replied

06 Mar 2018, 11:56
Adding the option 'mlmv' to gsem command would be great!
Leave a comment:
Clyde Schechter replied

06 Mar 2018, 09:03
Stata already has the same attitude, insofar as dots are issued with some commands as Stata loops repeatedly through the same kind of calculation.
(Equally, these dots are often just irritating and I cherish the option to turn them off.)

Yes, particularly if you are, say, doing a simulation with 50,000 reps, the dots get very annoying!

In response to this, when Robert Picard and I developed the -runby- command (SSC), which was explicitly designed for, in effect, looping with a large number of iterations, we struggled with how to, on the one hand, let the user know that progress is being made, but not flood the output log with dots or messages. Robert came up with a solution that I think is brilliant. -runby- will run silently by default, but you can get a progress report by specifying the -status- option. If you specify -status-, you will initially get a progress report roughly every second, and then after 5 seconds, the reporting rate slows to every 5 seconds, then later to every 15 seconds, and then to just every minute. Each progress report indicates the number of iterations processed so far, the elapsed time so far, and an estimate (based on extrapolation to the total number of iterations needed) of how much time remains. There is also some information about the amount of data generated, and a running tally of the number of -by()- groups that generated errors (analogous to red x's in StataCorp's -dots-).

I really hope that StataCorp will take a look at the code for this and adopt his approach, in lieu of -dots- for its multiple-iterations commands.

But this will not be applicable to iterations of likelihood maximization for the reasons noted earlier.
2 likes
Leave a comment:
Nick Cox replied

06 Mar 2018, 08:31
Stata already has the same attitude, insofar as dots are issued with some commands as Stata loops repeatedly through the same kind of calculation.
(Equally, these dots are often just irritating and I cherish the option to turn them off.)

It doesn't seem difficult to add this to a simulation program such as you're describing. That's not quite what you're asking, but it is often equivalent.

I still remember one computer programming book from the 1980s issuing repeated advice "This may take some time, so go and get yourself a coffee". No allowance for other tastes!
Leave a comment:
Chris Larkin replied

05 Mar 2018, 21:25
Very fair points Clyde Schechter and Nick Cox. Even though there are instances where it is not possible to predict the time for estimations, there are still some where it is possible to give a sense to users. If i'm running simulations that estimate 10,000 OLS models (or more), I resort to setting a timer for 50 or so and then manually doing the maths to figure out how long this will take. It's crude, as i'm often not doing anything when I count the time for the first 50 -- but if simulations are running in the background i will likely be using my computer for other things at the same time -- it's better than not having any idea though!
Leave a comment:
Ronán Conroy replied

05 Mar 2018, 13:02
If I were Statacorp, I'd be looking at RStudio and thinking "we have to look better than that, and fast". Package management, help search, ability to browse multiple file types, and – big selling point – ability to do literate programming – all within 'one window to rule them all' : it's a splendid piece of design.

And it's just a small thing, but every time I have to explain to my students that in the dialog for 2-way tabulate, when Stata means row percentages it calls them relative frequencies. It would be good if it said percentages.
1 like
Leave a comment:
Nick Cox replied

05 Mar 2018, 11:42
Not the question, but I have stopped using lowess, for two reasons. First, if I use lowess then I have to explain at some point what it is, which for most readerships is awkward as the Stata idea of lowess isn't equivalent to many others. The method has morphed and mutated in various ways over 40 or so years and through several hands. Second, lpoly is much more flexible (pun intended) in how it can be used and much easier to link to standard literature. (But it isn't faster.)

But the main point is that made by Clyde: often one can't tell how long it takes to climb the mountain before you've done it.
Leave a comment:
Clyde Schechter replied

04 Mar 2018, 13:44
Well, I can envision how one might implement a progress bar for lowess graphs. But I cannot imagine how one would do this for estimation commands. It would be great if one could, but likelihood functions can be very irregular and bizarre, and I don't think there is any way to know how close or far from the maximum you until you actually arrive there, nor even whether you will ever get there.
Leave a comment:
Chris Larkin replied

04 Mar 2018, 12:46
Some kind of timer for how long estimations are going to take. I know this is massively dependent on what else the user will do on their computer and the amount of RAM they have, but it would be useful to get a rough estimate / a progress bar that doesn't display an amount of time but just shows whether the estimation is 10% complete or 80% complete, for example.

Whenever I do twoway lowess graphs I have the issue of not knowing how far along they are.
Leave a comment:
Mark Schaffer replied

05 Feb 2018, 15:20
I'd like a data type option for fvrevar and tsrevar. A data type option for fvrevar in particular might sound weird - why would you want to store binary factor variable tempvars as doubles? - but there's a reason.

I keep running into situations where I create factor variable tempvars that have to be transformed along with the rest of the model - e.g., the within transformation in a fixed effects model - and I want to use double precision for the transformed variables. recast is slow, and other ways of dealing with it are a hassle.

Much better would be if the fvrevar/tsrevar temporary variables are created as doubles if this is specified as an option.
Leave a comment:
Mauricio Caceres replied

05 Feb 2018, 12:03
Sergio Correia You can do it in Linux by piping commands through a console session, but unlike Windows or OSX you can't do it through the GUI.

Speaking of which, I would actually really appreciate the ability to pipe commands to the Stata GUI on Linux.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: