Wish list for Stata 14

László Sándor

Join Date: Apr 2014

Posts: 120
#91

25 Aug 2014, 08:51

Originally posted by Roberto Ferrer View Post

Add a semicolon, and you get an automatic Enter:

Code:

global F3 "set trace on;" global F4 "set trace off;"

Oh, of course, thank you. I would edit/delete my post if I could. Btw. the formatting JavaScript (?) is broken in Safari (8) as well, and editing of previous posts is rarely available.

Sorry about your signature, I should have known. Those are perfectly fine general principles, of course. (And I should have formatted by post better if I could.)
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#92

27 Aug 2014, 11:40

I would like Stata to have more memory than 500 mb and more than 32000 variables. Computer RAMs have now significantly larger sizes these days. I often have to compress and delete variables to create more space for additional calculations and variable creations in Stata.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#93

27 Aug 2014, 12:11

Attaullah, I'm not sure where you got the 500 mb limit, but that is not Stata's limit; e.g., I have 32 GB of RAM and have often loaded files >25GB
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#94

27 Aug 2014, 12:52

Rich Goldstein, what I meant by 500 mb limit was the Stata commands of set memory 50000, set maxvar 32000 or set matsize 11000

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#95

27 Aug 2014, 13:03

What version of Stata are you currently using? In recent versions there is no reason to use the set memory command because memory management is handled automatically.

Memory for current versions of Stata is limited by what the operating system will provide. As for the maxvar and matsize considerations, I'm skeptical of any data management task that really involves more than 32,000 separate variables. If I actually had data with that many variables, I'd almost certainly want to work with it in smaller subsets to increase efficiency regardless of Stata's theoretical limits.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#96

27 Aug 2014, 13:20

My wish list for Stata 14 is pretty modest. I'd like all panel data commands to support cluster-robust variance matrix estimators. Currently, xthtaylor and xtivreg do not allow this. Thus, when one computes standard errors to compare them with, say, output from xtreg, the standard errors are not comparable. Sure, one can bootstrap to obtain cluster-robust inference, but there's no reason one should have to. The analytical formulas are simple. Someone at Stata could add these features in less than a day.

Oh, and while the user-written command xtivreg2 allows clustering, it does not allow a random effects option. I was pleased that xtmixed now allows a cluster option, which makes it somewhat puzzling that some more basic commands, such as xtivreg, do not.
2 likes
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#97

27 Aug 2014, 22:05

Rich Goldstein and Sarah Edgington , Thanks for your responses. I am using stata 13.1 SE, still I have to delete observations or compress variables otherwise stata will warn that the current memory is not enough. I have 4GB RAM.
Regards
Attaullah Shah

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#98

28 Aug 2014, 02:05

Attaullah Shah: if you work with such large datasets, using compress on all variables would be a good idea anyhow. That command is explicitly designed such that you don't loose any information, but if possible gain memory. In large datasets that gain can be substantial at no cost other than the time it takes to run compress.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#99

28 Aug 2014, 05:02

Thanks Maarten Buis, yes i do use compress more often. Can please elaborate what you specifically mean by "gain memory". Are you talking about increasing the RAM?

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Erika Kociolek

Join Date: Apr 2014

Posts: 83
#100

29 Aug 2014, 07:57

I'd like to see a variable format for percents similar to what is available for commas. I would also like to see more straightforward way to add information to graphs that isn't necessarily what is being shown in the graph itself (see the link attached to this post). Having graphic schemes that are cleaner and a bit more modern-looking would be helpful. I agree with the many comments about generating outputs that can be easily dropped into Word or other programs without too much formatting.

Link to question about adding other data to graphs

http://www.statalist.org
1 like
Comment
skolenik

Join Date: Mar 2014

Posts: 102
#101

02 Sep 2014, 09:47

lookfor is begging for option valuelabels that would search the text of the value labels, along with the variable labels that it does search now. More advanced search capabilities such as regular expressions would also be highly appreciated. In not-so-well documented data sets with hundreds of variables, I *know* the variable gender *must* be there, but in the latest incarnation of it that I faced, it was QB10 that had variable label that contained the (truncated) question text "Because it is sometimes difficult to determine over the phone, I am asked" and the rest was truncated (and it went along as "to verify if you are...") with category labels "Male" and "Female". There is no way on Earth I could have found that in the data set by itself with the existing lookfor capabilities, although lookfor male, valuelabels would have found it.

-- Stas Kolenikov || http://stas.kolenikov.name
-- Principal Survey Scientist, Abt SRBI
-- Opinions stated in this post are mine only
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35700
#102

02 Sep 2014, 10:12

Stas: findname (SJ) has functionality for finding variables depending on the names and/or the contents of variable and value labels.
1 like
Comment
Michael Anbar

Join Date: Aug 2014

Posts: 116
#103

02 Sep 2014, 10:23

Originally posted by László View Post

Note that F(-1) or L(-1) does not work in expressions but works as varnames. You can do -regress y F(-1).y-. But -g t = F(-1).y- indeed gives you an "unknown function ()" error. Strange, unfortunate and inconsistent to my eye.

Yes, this is quite inconsistent, and I think this is something that Statacorp could work on to take market share away from RATS, R, etc. and other programs that are consistent in how they approach time series. See my previous post on p4 about how certain Stata commands require date literals like 2000q4, while others require integers of the form tq(2000q4). This makes for needlessly verbose syntax.
1 like
Comment
Michael Anbar

Join Date: Aug 2014

Posts: 116
#104

04 Sep 2014, 13:54

I would also love to see Stata have a built-in driver for SQLite (http://en.wikipedia.org/wiki/Sqlite). This would be useful for institutions and organizations (like mine) that use SQLite for data storage and processing, but would like to interface into Stata directory. Whether or not this driver was implemented through the ODBC probably wouldn't make much difference to the end user. I started thinking about this because SAS has the PROC SQL procedure that allows you to use SQL syntax with a dataset, which would be a great asset for Stata to have. PROC SQL is a slightly different issue than SQLite, but it would be convenient, at least for many of the people I work with, to have a quick interface in Stata to be able to read from SQLite databases. SQLite is a small C library, and many languages, e.g. Python, actually have the library built in, so I doubt it would add much in terms of disk space.
1 like
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#105

04 Sep 2014, 15:31

Originally posted by László View Post

Other little things:

Multiple variables to absorb with -areg-.
Multiple variables to cluster by/on. (Which can be very slow without a neat C implementation.)
Detrending in -xtreg- or -areg-, i.e. actually allowing group-level trends/coefficients without blowing up -regress- with i.group##c.time. (There is a reason why -xtreg- and -areg- are orders of magnitude faster.)

Note that -reghdfe- on SSC seems to go a long way on the first and the last points. If it's still not as fast as (reasonably) possible, StataCorp should take this on and build the improved version in. If Sergio (Correia) came close to the efficiency frontier, all the better reason to incorporate this into version 14. Way too many processor cycles and PhD days are wasted on waiting for these models to be estimated. (Or they are just never attempted unless a referee is adamant on another robustness check.)

By the way, I am not sure I see the reason why -xtreg, fe- should be three times slower than -areg-, and even -areg- only half as fast as -_robust, absorb()-. Surely some flexibility is built into the more generic commands, but I don't think the extra parsing and eclass posting caused these speed differences (on 64 cores, so the more complex commands are not better parallelized). As panel methods are a major selling point of Stata, maybe -xtreg- and -areg- could be faster still, and offer multiple fixed effects. (And also multiple variables to cluster on.)

Code:

clear all set obs 100000000 mata: idx = st_addvar("double",("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18","x19"),1) V = J(0,0,.) st_view(V,.,idx) V[.,.] = runiform(100000000,19) end g long id = floor(_n/10) g byte time = mod(_n,10) timer on 1 _regress x1 x2-x19, absorb(id) timer off 1 timer on 2 areg x1 x2-x19, absorb(id) timer off 2 xtset id time timer on 3 xtreg x1 x2-x19, fe timer off 3 timer list exit

Code:

. timer list 1: 103.89 / 1 = 103.8850 2: 316.13 / 1 = 316.1350 3: 1116.49 / 1 = 1116.4890
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment