Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Jean-Michel Galarneau
    replied
    Margins and marginsplot support for restricted cubic splines.

    Leave a comment:


  • Nobuaki Michihata
    replied
    To be able to estimate subdistribution hazard ratios with more than two groups and to perform Gray's test with Stata.

    Leave a comment:


  • Fahad Mirza
    replied
    Wishful thinking here but will be nice if Stata 18 has at least 2 core support for basic versions up to SE. And 4 core for MP.

    Datasets are getting large and some speed will be a bonus.

    Leave a comment:


  • wbuchanan
    replied
    Hua Peng (StataCorp)
    Thanks again as always for the insight. All that said, any chance for the buffer functions in Mata to make it possible to parse 8-byte unsigned integers?

    Leave a comment:


  • Hua Peng (StataCorp)
    replied

    ustrregexm(), ustrregexs(), and ustrregexra() are not going to work for what you intend here. All three functions are implemented using ICU lib, which works with well formed text string in UTF-16 encoding, not binary data. Hence the function will convert Stata/Mata strings from UTF-8 to UTF-16 encoding before handing it to ICU, and the results will be convert back to UTF-8 encoding. Any invalid Unicode sequence will be changed.


    Code:
    mapelem = ustrregexm(test, ".*(<map>.*</map>).*")
    will not work consistently depending on if there is line feed in test. The following modified regular expression should handle line feed:

    Code:
    mapelem = ustrregexm(test, "(.|\n)*(<map>(.|\n)*</map>)(.|\n)*")
    After that,

    Code:
    mapstr= ustrregexs(1)
    will converted the byte sequence between <map> and </map> to UTF-16 then converted it back to UTF-8, hence any invalid Unicode sequences will be changed during the conversion, i.e., highly likely you get something back which is different. Here I believe strpos() and substr() should work,

    Code:
      
    mata:
    fh = fopen("auto.dta", "r")
    test = fread(fh, 612)  
    // test = "ab<map>cd</map>ef"
    p1 = strpos(test, "<map>")  
    p1  
    p2 = strpos(test, "</map>")  
    len = p2 - p1 - 5  
    len  
    start = p1 + 5  
    b = substr(test, start, len)  
    b
    end
    Last edited by Hua Peng (StataCorp); 08 Sep 2022, 17:12.

    Leave a comment:


  • wbuchanan
    replied
    I'm unable to edit my previous comment, but there seem to be some inconsistencies in what -ustrregexm/s()- returns when using a different .dta file. I will continue trying to see if I can figure out what is wrong with the way I applied the logic of daniel klein's function.

    Leave a comment:


  • wbuchanan
    replied
    daniel klein I just looked at the code you referenced and tried an experiment with it to see if it would recover the correct information, but it doesn't seem like it is parsing 8-byte unsigned integers correctly (assuming I was interpreting things correctly). My approach was going to be to read in enough bytes initially to get the <map> element, and then use those 8byte unsigned integers to quickly locate all of the other elements in the file header. When I used the same method you are using in your
    Code:
    hexread
    function the result was definitely not correct:

    Code:
    . mata
    ------------------------------------------------- mata (type end to exit) ------------------------
    : fh = fopen("Sample.dta", "r")
    
    : test = fread(fh, 612)
    
    : mapelem = ustrregexm(test, ".*(<map>.*</map>).*")
    
    : mapstr= ustrregexs(1)
    
    : map = ustrregexra(mapstr, "</?map>", "")
    
    : x = ascii(substr(map, 9, 16))
    
    : y = inbase(16, x)
    
    : for(i = 2; i <= cols(y); ++ i) y[1] = y[1] + substr("0" + y[1], -2)
    
    : frombase(16, y[1])
      3.18931e+38 <- This is a really small file, so it seems unlikely that the second value in the map element would be at this byte position
    
    : x = ascii(strreverse(substr(map, 9, 16)))
    
    : y = inbase(16, x)
    
    : for(i = 2; i <= cols(y); ++ i) y[1] = y[1] + substr("0" + y[1], -2)
    
    : frombase(16, y[1])
      0 <- The second value in <map> should be the byte value where the <map> element is located
    Regardless, if the point of the buffer functions in Mata is to allow the reading/writing of files, it seems like it would be reasonable that they be able to read/write the same types of values used in a .dta file.

    Leave a comment:


  • daniel klein
    replied
    Originally posted by wbuchanan View Post
    [email protected] correction, not possible to do in Mata, currently.
    Probably, I misunderstand the request, or there is a bug in one of my routines. I have a clumsy way of reading the variable names from the dta file in my usesome command. I was under the impression that you could, in principle, read the other metadata fields as well.

    Leave a comment:


  • wbuchanan
    replied
    [email protected] correction, not possible to do in Mata, currently. That said, if StataCorp is able to allow the buffer functions in Mata to read/write 8-byte unsigned/signed integers it wouldn't be terribly difficult to do what you're asking in Mata and only read in the metadata while skipping the data and strls entirely.

    Leave a comment:


  • Karen Strope (StataCorp)
    replied
    Originally posted by Tom Dietz View Post
    This isn't a part of the code but a thought on policies. I am moving to emeriti status and so my university will no longer pay for Stata. With the end of perpetual licenses I will have to pay out of pocket for Stata in about two years. So after 30 years of using and teaching Stata almost exclusively I will reluctantly switch to R as I plan to continue doing research. I'm wondering if Stata could have a pricing policy of the sort used by many scientific societies--with a special rate for retirees.
    Hi Tom! I wanted to reassure you that perpetual licenses are still available. You can upgrade your existing perpetual license online at https://www.stata.com/order. If you would like to purchase a new perpetual license, or if you have any questions, you can contact us at [email protected]. We are happy to go over licensing options, including licensing options for retirees, with you.

    Leave a comment:


  • Ali Atia
    replied
    Originally posted by Federico Bindi View Post
    Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...
    In many ways, Stata's macros replicate (or can be made to replicate) the functionality of lists -- see here for more: pmacrolists.pdf (stata.com).

    Leave a comment:


  • John Mullahy
    replied
    re: #470
    Code:
    ssc describe tuples
    While not a new data structure it may still prove useful.

    Leave a comment:


  • Maarten Buis
    replied
    Originally posted by Federico Bindi View Post
    Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...
    Do you want those in Mata? I cannot imagine why you would want those in Stata (but that could also have something to do with my imagination...). If you present a use case for those datastructures, then your request can become more convincing. Everybody can say they want something, but there is only a limited amount of resources.

    Right now it sounds like a common problem that many people who migrate from language A to language B have: they miss some aspect of language A, but don't know yet that there is some other way that language B does things that makes that aspect unnecessary and/or even inefficient. This is not ciriticism of you, it is a common process we all go through at some point.

    Leave a comment:


  • Federico Bindi
    replied
    Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...

    Leave a comment:


  • Daniel Feenberg
    replied
    No doubt Stata works better in interactive mode, but I don't.

    Leave a comment:

Working...
X