Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Belinda Foster
    started a topic Wishlist for Stata 16

    Wishlist for Stata 16

    As per Nick Cox's "request"!

  • Dave Airey
    replied
    It would be nice to add tests for comparing Pearson correlations:

    https://journals.plos.org/plosone/ar...l.pone.0121945

    and Spearman correlations:

    https://www.omicsonline.org/open-acc....php?aid=54592

    Leave a comment:


  • daniel klein
    replied
    Originally posted by wbuchanan View Post
    [...] I remember either for Stata 14 or 15 there was a counter up on the Stata homepage for a few weeks prior to the release.
    Yes, and then the countdown stopped and the page would not update for a day or so ... I prefer the old-fashioned announcement (traditionally on Statalist): Stata 16 is shipping now.

    Leave a comment:


  • wbuchanan
    replied
    Is anyone else wondering when the release date for Stata 16 will be announced? I remember either for Stata 14 or 15 there was a counter up on the Stata homepage for a few weeks prior to the release.

    Leave a comment:


  • Bruce Weaver
    replied
    Please consider tweaking -ranksum- to make it report (at least optionally) the Mann-Whitney U statistic and Wilcoxon's W (as some authors call it). As noted in this thread, -ranksum- currently reports neither. Thanks.

    Leave a comment:


  • Rich Goldstein
    replied
    I agree with daniel klein and have previously discussed this with Stata personnel as noted in #11 in https://www.statalist.org/forums/for...ple-imputation

    Leave a comment:


  • daniel klein
    replied
    Probably too late for 16, but could we please have a persist option in mi impute chained, similar to the one in community-contributed ice (Royston; SSC or SJ)?

    The problem: Multiple imputations via chained equations often fails because one of the models, usually mlogit, fails to converge. If this happens on observed data or if it happens on each iteration, I do not mind Stata stopping with an error; probably there is something wrong with my model. However, it is terribly annoying to have your machine running for a day, only to find that mlogit did not converge in iteration 7 on m=42. The model converged 410 times before (10 iterations * 41 datasets, not counting the runs on observed data); chances are it will converge in iteration 8 on m=42. So, I really want to be able to tell Stata to just skip this one iteration for the respective variable, not terminate the complete process.

    Ideally, I want a model-specific option, like

    Code:
    mi impute chained ... (mlogit, skipnonconvergence(#)) ...
    that specifies the maximum number of iterations per imputed dataset that I am willing to skip if the model does not converge. This seems far less dangerous than giving us the already existing force option that just happily accepts missing imputed values.

    Best
    Daniel
    Last edited by daniel klein; 12 Jun 2019, 02:51. Reason: formatting of option names

    Leave a comment:


  • Clyde Schechter
    replied
    Thank you, that is very useful.

    Leave a comment:


  • Bjarte Aagnes
    replied
    #242: While waiting for the stripnonprintable()

    ustrregexra() can be used to strip off "non-printable" characters using Unicode categories:
    Code:
    scalar S2 = ustrregexra(S1,"[^\p{L}\p{M}\p{N}\p{P}\p{S}\p{Zs}]","")
    If only single U+0020 SPACEs is wanted, remaining whitespace characters can be replaced and trimed by:
    Code:
    scalar S2 = itrim(ustrregexra(S2,"\p{Zs}",ustrunescape("\u0020")))
    Code:
    \p{L}  or \p{Letter}: any kind of letter from any language
    \p{M}  or \p{Mark}  : a character intended to be combined with another character
    \p{N}  or \p{Number}: any kind of numeric character in any script.
    \p{P}  or \p{Punctuation}: any kind of punctuation character.
    \p{S}  or \p{Symbol}: math symbols, currency signs, dingbats, box-drawing characters, etc.
    \p{Zs} or \p{Space_Separator}: a whitespace character that is invisible, but does take up space.
    List of Unicode characters of category “Space Separator”: https://www.compart.com/en/unicode/category/Zs

    Leave a comment:


  • Clyde Schechter
    replied
    Problems frequently arise from non-printing characters in strings. By definition, they can't be seen by the user, but Stata sees them and takes them seriously. It often crops up with string variables in data sets that have been imported from various sources. And they can create havoc when you are trying to merge two data sets from different sources with different kinds of non-printing characters contaminating them. Can we have a string function that eliminates all non-printing characters from a string? Perhaps it could even be built out from the existing -egen, sieve()- function in the engemore package as a new class of characters "printable", but updated to cope with Unicode.

    Leave a comment:


  • John Mullahy
    replied
    Could the capabilities of cformat be extended to other types of displayed output? I'm thinking specifically of correlation matrixes from correlate, but I suspect there are others as well where such functionality might be useful.
    Code:
    . set cformat
    
    . reg y x
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(1, 998)       =    288.46
           Model |  437.799556         1  437.799556   Prob > F        =    0.0000
        Residual |   1514.6571       998  1.51769249   R-squared       =    0.2242
    -------------+----------------------------------   Adj R-squared   =    0.2235
           Total |  1952.45666       999  1.95441107   Root MSE        =    1.2319
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .4625608   .0272347    16.98    0.00     .4091169    .5160047
           _cons |   .0619798   .0389869     1.59    0.11    -.0145258    .1384855
    ------------------------------------------------------------------------------
    
    . corr y x
    (obs=1,000)
    
                 |        y        x
    -------------+------------------
               y |   1.0000
               x |   0.4735   1.0000
    
    
    . set cformat %5.2f
    
    . reg y x
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(1, 998)       =    288.46
           Model |  437.799556         1  437.799556   Prob > F        =    0.0000
        Residual |   1514.6571       998  1.51769249   R-squared       =    0.2242
    -------------+----------------------------------   Adj R-squared   =    0.2235
           Total |  1952.45666       999  1.95441107   Root MSE        =    1.2319
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |       0.46       0.03    16.98    0.00         0.41        0.52
           _cons |       0.06       0.04     1.59    0.11        -0.01        0.14
    ------------------------------------------------------------------------------
    
    . corr y x
    (obs=1,000)
    
                 |        y        x
    -------------+------------------
               y |   1.0000
               x |   0.4735   1.0000

    Leave a comment:


  • George Hoffman
    replied
    dear Leonardo and Clyde -
    thank you for you rresponses.
    @ leonardo - yes, the information is available in other ways already. the properties window does display most of the fields that I referenced.. i'm not sure how most people use Stata, but the more windows I have open, the less room I have for the results pane, which is where I'm ususally focused.
    @ clyde - yes, i acknowledge a bad habit. others have asked for version control to be built into the save function. i am aware of some user-written options. i will investigate.
    i am a long-time user of this fantastic program and usergroup. i appreciate the help. thank you all again.

    Leave a comment:


  • Clyde Schechter
    replied
    Re #237, with respect to the problem that would have ensued after -save, replace-, I would argue that it is just bad programming practice to ever overwrite a source data file with a derived data file. Even with all the information you ask for in the status bar, there is always the possibility that the code taking you from start to end contains errors, errors that don't happen to show up in the information shown in the status bar. To be prepared for that possibility, whenever you transform a data set you should save it as a new data set under a new name. Never overwrite the data you started with, and always save the do-file. If you do that, if an error is discovered later, you can always fix the error and re-run.

    Leave a comment:


  • Leonardo Guizzetti
    replied
    Originally posted by George Hoffman View Post
    a 'simple' suggestion: the bottom of the Stata main window should/could act as a more versatile status bar. it already shows the results of `pwd'. most typically, it could show number of obs (_N), the number of variables, memory consumption, sort order, last _rc, it could also indicate if data had changed since last `use'. perhaps, the user could select what cold be displayable in the status bar....

    this suggestion arose because i spent the last two days working with a dataset that had some observations dropped (becsue of an errant .ado that i was building). im not sur ehow many times that might have happened previously - but i came very close to 'save, replace' as i usually do, which would have led to a very bad situation. perhaps, if the obs and var count were readily visible, i would notice the dataset status without explicit query.
    thank you for considering
    george hoffman
    What's wrong with how Stata already displays this information? For example, last _rc code is displayed next to its command in the cmdlog, and the viewer pane displays _N, memory usage (for that dataset only, not active operations), number of variables, etc. The only thing I don't think it automatically shows is sort order, but that is found quickly enough by using a -describe- command.

    Leave a comment:


  • George Hoffman
    replied
    a 'simple' suggestion: the bottom of the Stata main window should/could act as a more versatile status bar. it already shows the results of `pwd'. most typically, it could show number of obs (_N), the number of variables, memory consumption, sort order, last _rc, it could also indicate if data had changed since last `use'. perhaps, the user could select what cold be displayable in the status bar....

    this suggestion arose because i spent the last two days working with a dataset that had some observations dropped (becsue of an errant .ado that i was building). im not sur ehow many times that might have happened previously - but i came very close to 'save, replace' as i usually do, which would have led to a very bad situation. perhaps, if the obs and var count were readily visible, i would notice the dataset status without explicit query.
    thank you for considering
    george hoffman
    Last edited by George Hoffman; 12 May 2019, 11:23.

    Leave a comment:

Working...
X