Wishlist for Stata 18

Jean-Michel Galarneau replied

12 Sep 2022, 14:33
Margins and marginsplot support for restricted cubic splines.
3 likes
Leave a comment:
Nobuaki Michihata replied

12 Sep 2022, 04:09
To be able to estimate subdistribution hazard ratios with more than two groups and to perform Gray's test with Stata.
Leave a comment:
Fahad Mirza replied

12 Sep 2022, 03:35
Wishful thinking here but will be nice if Stata 18 has at least 2 core support for basic versions up to SE. And 4 core for MP.

Datasets are getting large and some speed will be a bonus.
4 likes
Leave a comment:
wbuchanan replied

09 Sep 2022, 07:06
Hua Peng (StataCorp)
Thanks again as always for the insight. All that said, any chance for the buffer functions in Mata to make it possible to parse 8-byte unsigned integers?
Leave a comment:
Hua Peng (StataCorp) replied

08 Sep 2022, 16:54
ustrregexm(), ustrregexs(), and ustrregexra() are not going to work for what you intend here. All three functions are implemented using ICU lib, which works with well formed text string in UTF-16 encoding, not binary data. Hence the function will convert Stata/Mata strings from UTF-8 to UTF-16 encoding before handing it to ICU, and the results will be convert back to UTF-8 encoding. Any invalid Unicode sequence will be changed.

Code:

mapelem = ustrregexm(test, ".*(<map>.*</map>).*")

will not work consistently depending on if there is line feed in test. The following modified regular expression should handle line feed:

Code:

mapelem = ustrregexm(test, "(.|\n)*(<map>(.|\n)*</map>)(.|\n)*")

After that,

Code:

mapstr= ustrregexs(1)

will converted the byte sequence between <map> and </map> to UTF-16 then converted it back to UTF-8, hence any invalid Unicode sequences will be changed during the conversion, i.e., highly likely you get something back which is different. Here I believe strpos() and substr() should work,

Code:

mata: fh = fopen("auto.dta", "r") test = fread(fh, 612) // test = "ab<map>cd</map>ef" p1 = strpos(test, "<map>") p1 p2 = strpos(test, "</map>") len = p2 - p1 - 5 len start = p1 + 5 b = substr(test, start, len) b end
Last edited by Hua Peng (StataCorp); 08 Sep 2022, 17:12.
4 likes
Leave a comment:
wbuchanan replied

07 Sep 2022, 12:11
I'm unable to edit my previous comment, but there seem to be some inconsistencies in what -ustrregexm/s()- returns when using a different .dta file. I will continue trying to see if I can figure out what is wrong with the way I applied the logic of daniel klein's function.
Leave a comment:
wbuchanan replied

07 Sep 2022, 07:07
daniel klein I just looked at the code you referenced and tried an experiment with it to see if it would recover the correct information, but it doesn't seem like it is parsing 8-byte unsigned integers correctly (assuming I was interpreting things correctly). My approach was going to be to read in enough bytes initially to get the <map> element, and then use those 8byte unsigned integers to quickly locate all of the other elements in the file header. When I used the same method you are using in your

Code:

hexread

function the result was definitely not correct:

Code:

. mata ------------------------------------------------- mata (type end to exit) ------------------------ : fh = fopen("Sample.dta", "r") : test = fread(fh, 612) : mapelem = ustrregexm(test, ".*(<map>.*</map>).*") : mapstr= ustrregexs(1) : map = ustrregexra(mapstr, "</?map>", "") : x = ascii(substr(map, 9, 16)) : y = inbase(16, x) : for(i = 2; i <= cols(y); ++ i) y[1] = y[1] + substr("0" + y[1], -2) : frombase(16, y[1]) 3.18931e+38 <- This is a really small file, so it seems unlikely that the second value in the map element would be at this byte position : x = ascii(strreverse(substr(map, 9, 16))) : y = inbase(16, x) : for(i = 2; i <= cols(y); ++ i) y[1] = y[1] + substr("0" + y[1], -2) : frombase(16, y[1]) 0 <- The second value in <map> should be the byte value where the <map> element is located

Regardless, if the point of the buffer functions in Mata is to allow the reading/writing of files, it seems like it would be reasonable that they be able to read/write the same types of values used in a .dta file.
1 like
Leave a comment:
daniel klein replied

06 Sep 2022, 09:18
Originally posted by wbuchanan View Post

[email protected] correction, not possible to do in Mata, currently.

Probably, I misunderstand the request, or there is a bug in one of my routines. I have a clumsy way of reading the variable names from the dta file in my usesome command. I was under the impression that you could, in principle, read the other metadata fields as well.
Leave a comment:
wbuchanan replied

06 Sep 2022, 09:00
[email protected] correction, not possible to do in Mata, currently. That said, if StataCorp is able to allow the buffer functions in Mata to read/write 8-byte unsigned/signed integers it wouldn't be terribly difficult to do what you're asking in Mata and only read in the metadata while skipping the data and strls entirely.
Leave a comment:
Karen Strope (StataCorp) replied

06 Sep 2022, 07:10
Originally posted by Tom Dietz View Post

This isn't a part of the code but a thought on policies. I am moving to emeriti status and so my university will no longer pay for Stata. With the end of perpetual licenses I will have to pay out of pocket for Stata in about two years. So after 30 years of using and teaching Stata almost exclusively I will reluctantly switch to R as I plan to continue doing research. I'm wondering if Stata could have a pricing policy of the sort used by many scientific societies--with a special rate for retirees.

Hi Tom! I wanted to reassure you that perpetual licenses are still available. You can upgrade your existing perpetual license online at https://www.stata.com/order. If you would like to purchase a new perpetual license, or if you have any questions, you can contact us at [email protected]. We are happy to go over licensing options, including licensing options for retirees, with you.
3 likes
Leave a comment:
Ali Atia replied

05 Sep 2022, 11:43
Originally posted by Federico Bindi View Post

Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...

In many ways, Stata's macros replicate (or can be made to replicate) the functionality of lists -- see here for more: pmacrolists.pdf (stata.com).
Leave a comment:
John Mullahy replied

05 Sep 2022, 07:22
re: #470

Code:

ssc describe tuples

While not a new data structure it may still prove useful.
Leave a comment:
Maarten Buis replied

05 Sep 2022, 04:56
Originally posted by Federico Bindi View Post

Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...

Do you want those in Mata? I cannot imagine why you would want those in Stata (but that could also have something to do with my imagination...). If you present a use case for those datastructures, then your request can become more convincing. Everybody can say they want something, but there is only a limited amount of resources.

Right now it sounds like a common problem that many people who migrate from language A to language B have: they miss some aspect of language A, but don't know yet that there is some other way that language B does things that makes that aspect unnecessary and/or even inefficient. This is not ciriticism of you, it is a common process we all go through at some point.
4 likes
Leave a comment:
Federico Bindi replied

05 Sep 2022, 00:25
Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...
1 like
Leave a comment:
Daniel Feenberg replied

03 Sep 2022, 15:25
No doubt Stata works better in interactive mode, but I don't.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: