Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clyde Schechter
    replied
    Here's one I've been mulling over for a while. The way missing values work with logical expressions is often problematic. Missing value is treated as true. But in most contexts, a missing value on an expression or variable really means "could be true or false, we don't know." Adjusting simple logical expressions quickly gets complicated. When calculating a conjunction (&), we would want 1& . = ., 0 & . = 0, . & . = .. For disjunction, 1 | . = 1, 0 | . = ., . | . = ..

    With existing Stata features you can work around this by recoding . to 0.5, and then use min(a, b) for a&b, and max(a, b) for a | b and min(a, b) for a | b. But if you have a lengthy logical expression with many operators, and perhaps parenthesized expressions nested within, this kind of translation becomes tedious and error-prone. Moreover, the resulting code is as opaque as possible.

    Now, redefining the operation of & and | to do this would break lots of existing code, and would create chaos even if the old behavior were maintained under version control. But why not define new logical operators && and || that would behave this way? It would also be nice to have a negation operator that gave us negation of . = . !! would not work for an analogous negation operator, because !! is itself a legitimate expression of double negation (and one that I find helpful and use fairly often), so one would have to find some other expression for it (or perhaps a function). && would never clash with anything else as it is never legal syntax currently. || is "taken" as a separator between the fixed and random components of mixed models, but I think that it would seldom if ever create confusion as that is a highly distinct context that the parser could recognize, though it might require some lookahead. (In linguistic terms, I think the two meanings of || would be in complementary distribution.)

    Leave a comment:


  • Jim Steiner
    replied
    A few little things I'd like (or may be unaware that exist):

    1. Find-Replace in do-editor tells you how many instances of the find term were replaced.
    2. Option to open multiple do-files from explorer window in one editor window rather than one Stata instance per do-file. I don't really want to have to create .stpr files for that purpose.
    3. Option to set do file preferences (e.g. font/colors) permanently (this may already be an option--I just haven't seen it if it exists).
    4.
    Code:
    inlist()
    command that can hold more than 10 string vars directly rather than having to loop or use multiple or statements.

    Leave a comment:


  • Jean-Claude Arbaut
    replied
    I would have other ideas, but on the top of my wishlist:

    * Ability to call an external DLL from Mata. Might need additional types to help (byte, short...) in writing more or less the equivalent of a Declare in VBA.
    (if it's flexible enough, it would open many many manyyyyyyy other possibilities: call libraries for numerical computations and special functions, plotting, multiprecision, OS services, file I/O in other formats...), and of course add Mata functions that are not easy (or not fast enough) to implement in pure Mata code.
    * In the preceding, a way to pass directly Mata matrices (or even a dataset) back and forth would be very valuable.
    * Ability to call Mata functions (especially user defined ones) in Stata, in some places where only Stata functions are currently available (maybe with the help of a generic function, e.g. callmata("functionname", arg1, ...), for instance in gen/egen.
    * Ability to plot from Mata, especially to plot data from a vector/matrix, and to update a plot with subsequent Mata code.

    Leave a comment:


  • Jesse Wursten
    replied
    Perhaps an arcane/unfeasible request... but is it possible to split the dofile editor and "main stata" processes? Imagine you are running some heavy regressions, you might still want to work on your dofile while it's running. On modern computers with multiple CPU cores, that's often feasible in theory, given that Stata rarely ever uses all cores to their maximum capacity. In practice, the dofile editor always seems to hang up or be very slow at least.

    I know you can edit your dofiles in separate programs, but in the end they are never as integrated with Stata as its own dofile editor, so I'd prefer to keep using it.

    Leave a comment:


  • Dave Airey
    replied
    How about power and sample size for ROC AUC analysis, maybe similar to power.roc.test() from pROC in R?

    Leave a comment:


  • Jesse Wursten
    replied
    I wish Stata stops printing lines to the command window once it encounters an error within an if condition. Especially in larger code blocks, scrolling up to find where the error actually occurred gets boring really quickly.

    Here's an example.

    Code
    Code:
    local this "example"
    if "`this'" == "example" {
        di "`value'"
        di "something else"
        error 413
        di "1. I don't want to see this line in the output"
        di "2. I don't want to see this line either"
        di "3. You get the idea by now"
    }
    Actual output
    Code:
    . if "`this'" == "example" {
    .         di "`value'"
    
    .         di "something else"
    something else
    .         error 413
    r(413);
    .         di "1. I don't want to see this line in the output"
    .         di "2. I don't want to see this line either"
    .         di "3. You get the idea by now"
    . }
    r(413);
    
    end of do-file
    r(413);
    What I would like
    Code:
    . if "`this'" == "example" {
    .         di "`value'"
    
    .         di "something else"
    something else
    .         error 413
    r(413);
    
    end of do-file
    
    r(413);

    Some context on when this becomes annoying. When I have a dofile that does three separate (but related) things, I enclose each block in an if-condition (e.g. if "$runA" == "1"). At the top of the dofile I can then set global runA to 1 or 0, depending on whether I want to run that part at this point. No hassle with commenting out parts, no issues with common macros not being defined yet, no remembering which lines need and need not be selected to get the thing to run. These blocks can get very long (hundreds of lines is not uncommon). Correspondingly, any small error, or even a manually added stop (/error 1) means I have to scroll all the way up to see where it happened. Yet I've never in my life needed all those printed lines, because they are verbatim copies of my dofile anyway.

    Leave a comment:


  • George Hoffman
    replied
    Nick Cox , Mauricio Caceres :

    thank you for these suggestions.
    tabcount works for tab. for other commands, i'm going to play around with varparse. it's not a complete solution but it's a great start.
    ​​​​​​​thanks again.

    Leave a comment:


  • Nick Cox
    replied
    George Hoffman Mauricio Caceres #134ff

    See also tabcount from a while back:

    SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q4/03 SJ 3(4):420--439 (no commands)
    reviews three user-written commands (tabcount, makematrix,
    and groups) as different approaches to tabulation problems

    tabcount from http://fmwww.bc.edu/RePEc/bocode/t
    'TABCOUNT': module to tabulate frequencies, with zeros explicit / tabcount
    tabulates frequencies for up to 7 variables. Its main / distinctive
    features are that zero frequencies of one or more / specified values or
    conditions are always shown in the table / (i.e. entirely empty rows,




    Code:
    clear
    input float(var1 var2)
    10 10
    20 20
    30 30
    40 40
    50 50
    60 60
    70 70
    80 80
    90 90
    10 90
    20 80
    30 70
    40 60
    50 50
    60 40
    30 70
    20 80
    10 90
    end
    
    tabcount var1 var2, c1(<=60 >60) c2(<=70 >70)
    
    ----------------------
              |    var2  
         var1 | <=70   >70
    ----------+-----------
         <=60 |   11     4
          >60 |    1     2
    ----------------------

    Leave a comment:


  • Mauricio Caceres
    replied
    Originally posted by George Hoffman View Post
    Ncik - no, not that. i was not concise enough in my description of what I was thinking!
    i envision a way to generate the indicator variables on the fly.
    more generally, the temporary varibel need not be an indicator variable.
    the syntax engine would evaluate expressions and create a temporary variable from the expression.

    example:

    Code:
     reg y x1 x2 {x3<=5} {1/x4}
    thanks

    this would regress y against x1, x2, an indicator for x3<=5, and the value of 1/x4
    A basic version of this is not too difficult to implement via a separate command, though it would certainly be nice to have such a thing built in and for everything to be correctly labeled as the expression that generated the on-the-fly variable, rather than a temporary variable name. This requires Stata 14+

    Code:
    program varparse
        _on_colon_parse `0'
        local 1 `s(after)'
        local ix 0
        qui while ustrregexm(`"`1'"', "\{(.+?)\}") {
            tempvar v`++ix'
            local g`ix' = ustrregexs(1)
            gen `v`ix'' = `g`ix''
            label var `v`ix'' `"`=ustrregexs(1)'"'
            local 1 = ustrregexrf(`"`1'"', "\{(.+?)\}", `"`v`ix''"')
        }
        `1'
    end
    
    clear
    set seed 1729
    set obs 100
    gen x1 = runiform()
    gen x2 = rnormal()
    gen x3 = runiform() * 10
    gen x4 = rnormal()
    gen y  = 1 + x1 - x2 + 2 * (x3 <= 5) - 3 / x4 + rnormal() * 2
    gen var1 = int(100 * runiform())
    gen var2 = int(100 * runiform())
    
    varparse: tab {var1<=60} {var2<=70}
    varparse: reg y x1 x2 {x3<=5} {1/x4}
    This gives what you want, I think. While "tab" uses the variable label, however, reg does not. Not sure how to make that happen (perhaps swapping the variable names for the labels in regress can be added to the wishlist? It's not so obvious how to do it in esttab etc. either since the variables no longer exist in memory).

    Leave a comment:


  • Tom Poulton
    replied
    My wish for the next Stata update: I would really appreciate the ability to add headers and footers to documents produced using putpdf.

    Leave a comment:


  • George Hoffman
    replied
    Ncik - no, not that. i was not concise enough in my description of what I was thinking!
    i am looking for an on-the-fly variable creation.
    my "var1<=60" evaluates to 0 or 1 depending on the value of var1. likewise for var2. thus the tab statement that i envisioned would yield a 2x2 table
    in the ccode below, i generated two indicator variables var160 and var170 to demonstrate the desired effect. this particular example takes 3 lines of code to tabulate, but more complex conidtions would require more.

    Code:
     input var1 var2
    
              var1       var2
      1. 10 10
      2. 20 20
      3. 30 30
      4. 40 40
      5. 50 50
      6. 60 60
      7. 70 70
      8. 80 80
      9. 90 90
     10. 10 90
     11. 20 80
     12. 30 70
     13. 40 60
     14. 50 50
     15. 60 40
     16. 30 70
     17. 20 80
     18. 10 90
     19. end
    
    . tab var1 var2 if var1<=60 & var2<=70
    
               |                                     var2
          var1 |        10         20         30         40         50         60         70 |     Total
    -----------+-----------------------------------------------------------------------------+----------
            10 |         1          0          0          0          0          0          0 |         1
            20 |         0          1          0          0          0          0          0 |         1
            30 |         0          0          1          0          0          0          2 |         3
            40 |         0          0          0          1          0          1          0 |         2
            50 |         0          0          0          0          2          0          0 |         2
            60 |         0          0          0          1          0          1          0 |         2
    -----------+-----------------------------------------------------------------------------+----------
         Total |         1          1          1          2          2          2          2 |        11
    
    
    . def var160 = var1<=60
    
    . def var270 = var2<=70
    
    
    . tab var160 var270
    
               |        var270
        var160 |         0          1 |     Total
    -----------+----------------------+----------
             0 |         2          1 |         3
             1 |         4         11 |        15
    -----------+----------------------+----------
         Total |         6         12 |        18
    
    .
    i envision a way to generate the indicator variables on the fly.
    more generally, the temporary varibel need not be an indicator variable.
    the syntax engine would evaluate expressions and create a temporary variable from the expression.

    example:

    Code:
     reg y x1 x2 {x3<=5} {1/x4}
    thanks

    this would regress y against x1, x2, an indicator for x3<=5, and the value of 1/x4
    Last edited by George Hoffman; 08 Oct 2018, 06:21. Reason: correct typo

    Leave a comment:


  • Nick Cox
    replied
    George Hoffman You can do this with

    Code:
    tab var1 var2 if var1<=60 & var2<=70
    but I think you already know that. So, you're asking for more concise syntax.

    Leave a comment:


  • George Hoffman
    replied
    i'm wondering if there is any work on, or interest in, using on-the-fly expressions as temporary arguments to commands.
    for example, to do a 2x2 crosstab of continuous variable var1 cut at 60 and var2 cut at 70:
    tab var1<=60 var1<=70

    i think there would have to be an expression delimiter like {var1<=60} to signify generation of a temporary variable, to distinguish from a logical statement.

    maybe there is a way to do this currently on a single command line; if so, i apologize in advance!
    thanks for considering
    george

    Leave a comment:


  • Sergio Correia
    replied
    Originally posted by Mike Zyphur View Post
    Loops should be run in parallel whenever possible by default, and GPUs should be automatically recruited for this and other processes.
    I agree, but a few caveats:

    1) Mata's compiler is not as smart as a C++ compiler (for obvious reasons), so you might benefit from some of these Mata speedup tips: http://scorreia.com/blog/2016/10/06/mata-tips.html
    2) The low hanging fruit for Mata speedups would be adding integer types within Mata (so we can use them within loops).
    3) Parallelizing operations either with CPU or GPU is a good goal, but incredibly hard and easy to do incorrectly. Perhaps looking at the approach of other languages might be useful (Julia? or is their approach too JIT to be useful?). In other words, I don't think you might be able to get #3 for a while.
    4) Have you tried the parallels package? It is not multithreading but multiprocessing, but might help you if you have many cores and CPU-heavy ops that don't require much back-and-forth memory sharing.




    Leave a comment:


  • Mike Zyphur
    replied
    Hi,
    After a few years of using Stata, this is my first post to the Stata list. I am running 15.1 and would like to request something for Stata 16 that is consistent with a recurring theme here and my experiences whenever computations get serious (particularly with big datasets, which are becoming the norm in many areas of research):

    Faster commands and operations, and better parallelization. Ironically, to setup my subscription to this list requesting greater optimization, I had to interrupt a .do file that had been running for a few hours and was scheduled to run for a few more. A key reason why the .do file and the Mata code on which it is based runs so slowly is because Stata does not automatically detect when loops produce results that are independent and therefore can be parallelized. This seems like a very simple thing to automatically detect. I recognize there's a user-written protocol for this, but it's not optimal and does not work in many circumstances.

    For Stata to be competitive it needs to be optimizedm including its looping functions for multiple processing cores. What would be best, of course, is if this was done while taking advantage of the GPU capabilities of most modern computers. Serious computing is being parallelized in this fashion to take advantage of the thousands of processing cores that many workstations now have. Stata should consider how best to keep pace with this so that it can stay competitive.

    I can see the appeal of Stata and I appreciate many of its features and functions, but in the long run it will only stay competitive and practically usable if it is optimized in a serious way. Loops should be run in parallel whenever possible by default, and GPUs should be automatically recruited for this and other processes.

    Thanks!
    Mike

    Leave a comment:

Working...
X