Wishlist for Stata 16

John Mullahy

Join Date: Dec 2016
Posts: 750

#241

15 May 2019, 12:56

Could the capabilities of cformat be extended to other types of displayed output? I'm thinking specifically of correlation matrixes from correlate, but I suspect there are others as well where such functionality might be useful.

Code:

. set cformat

. reg y x

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =    288.46
       Model |  437.799556         1  437.799556   Prob > F        =    0.0000
    Residual |   1514.6571       998  1.51769249   R-squared       =    0.2242
-------------+----------------------------------   Adj R-squared   =    0.2235
       Total |  1952.45666       999  1.95441107   Root MSE        =    1.2319

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .4625608   .0272347    16.98    0.00     .4091169    .5160047
       _cons |   .0619798   .0389869     1.59    0.11    -.0145258    .1384855
------------------------------------------------------------------------------

. corr y x
(obs=1,000)

             |        y        x
-------------+------------------
           y |   1.0000
           x |   0.4735   1.0000


. set cformat %5.2f

. reg y x

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =    288.46
       Model |  437.799556         1  437.799556   Prob > F        =    0.0000
    Residual |   1514.6571       998  1.51769249   R-squared       =    0.2242
-------------+----------------------------------   Adj R-squared   =    0.2235
       Total |  1952.45666       999  1.95441107   Root MSE        =    1.2319

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |       0.46       0.03    16.98    0.00         0.41        0.52
       _cons |       0.06       0.04     1.59    0.11        -0.01        0.14
------------------------------------------------------------------------------

. corr y x
(obs=1,000)

             |        y        x
-------------+------------------
           y |   1.0000
           x |   0.4735   1.0000

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30078
#242

08 Jun 2019, 16:46

Problems frequently arise from non-printing characters in strings. By definition, they can't be seen by the user, but Stata sees them and takes them seriously. It often crops up with string variables in data sets that have been imported from various sources. And they can create havoc when you are trying to merge two data sets from different sources with different kinds of non-printing characters contaminating them. Can we have a string function that eliminates all non-printing characters from a string? Perhaps it could even be built out from the existing -egen, sieve()- function in the engemore package as a new class of characters "printable", but updated to cope with Unicode.
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

#243

09 Jun 2019, 14:42

#242: While waiting for the stripnonprintable()

ustrregexra() can be used to strip off "non-printable" characters using Unicode categories:

Code:

scalar S2 = ustrregexra(S1,"[^\p{L}\p{M}\p{N}\p{P}\p{S}\p{Zs}]","")

If only single U+0020 SPACEs is wanted, remaining whitespace characters can be replaced and trimed by:

Code:

scalar S2 = itrim(ustrregexra(S2,"\p{Zs}",ustrunescape("\u0020")))

Code:

\p{L}  or \p{Letter}: any kind of letter from any language
\p{M}  or \p{Mark}  : a character intended to be combined with another character
\p{N}  or \p{Number}: any kind of numeric character in any script.
\p{P}  or \p{Punctuation}: any kind of punctuation character.
\p{S}  or \p{Symbol}: math symbols, currency signs, dingbats, box-drawing characters, etc.
\p{Zs} or \p{Space_Separator}: a whitespace character that is invisible, but does take up space.

List of Unicode characters of category “Space Separator”: https://www.compart.com/en/unicode/category/Zs

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30078
#244

09 Jun 2019, 14:50

Thank you, that is very useful.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3844
#245

12 Jun 2019, 02:47

Probably too late for 16, but could we please have a persist option in mi impute chained, similar to the one in community-contributed ice (Royston; SSC or SJ)?

The problem: Multiple imputations via chained equations often fails because one of the models, usually mlogit, fails to converge. If this happens on observed data or if it happens on each iteration, I do not mind Stata stopping with an error; probably there is something wrong with my model. However, it is terribly annoying to have your machine running for a day, only to find that mlogit did not converge in iteration 7 on m=42. The model converged 410 times before (10 iterations * 41 datasets, not counting the runs on observed data); chances are it will converge in iteration 8 on m=42. So, I really want to be able to tell Stata to just skip this one iteration for the respective variable, not terminate the complete process.

Ideally, I want a model-specific option, like

Code:

mi impute chained ... (mlogit, skipnonconvergence(#)) ...

that specifies the maximum number of iterations per imputed dataset that I am willing to skip if the model does not converge. This seems far less dangerous than giving us the already existing force option that just happily accepts missing imputed values.

Best
Daniel

Last edited by daniel klein; 12 Jun 2019, 02:51. Reason: formatting of option names
2 likes
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#246

12 Jun 2019, 02:57

I agree with daniel klein and have previously discussed this with Stata personnel as noted in #11 in https://www.statalist.org/forums/for...ple-imputation
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1130
#247

14 Jun 2019, 12:21

Please consider tweaking -ranksum- to make it report (at least optionally) the Mann-Whitney U statistic and Wilcoxon's W (as some authors call it). As noted in this thread, -ranksum- currently reports neither. Thanks.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#248

17 Jun 2019, 07:46

Is anyone else wondering when the release date for Stata 16 will be announced? I remember either for Stata 14 or 15 there was a counter up on the Stata homepage for a few weeks prior to the release.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3844
#249

17 Jun 2019, 09:04

Originally posted by wbuchanan View Post

[...] I remember either for Stata 14 or 15 there was a counter up on the Stata homepage for a few weeks prior to the release.

Yes, and then the countdown stopped and the page would not update for a day or so ... I prefer the old-fashioned announcement (traditionally on Statalist): Stata 16 is shipping now.
Comment
Dave Airey

Join Date: Apr 2014

Posts: 396
#250

17 Jun 2019, 12:57

It would be nice to add tests for comparing Pearson correlations:

https://journals.plos.org/plosone/ar...l.pone.0121945

and Spearman correlations:

https://www.omicsonline.org/open-acc....php?aid=54592
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment