Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Could the capabilities of cformat be extended to other types of displayed output? I'm thinking specifically of correlation matrixes from correlate, but I suspect there are others as well where such functionality might be useful.
    Code:
    . set cformat
    
    . reg y x
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(1, 998)       =    288.46
           Model |  437.799556         1  437.799556   Prob > F        =    0.0000
        Residual |   1514.6571       998  1.51769249   R-squared       =    0.2242
    -------------+----------------------------------   Adj R-squared   =    0.2235
           Total |  1952.45666       999  1.95441107   Root MSE        =    1.2319
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .4625608   .0272347    16.98    0.00     .4091169    .5160047
           _cons |   .0619798   .0389869     1.59    0.11    -.0145258    .1384855
    ------------------------------------------------------------------------------
    
    . corr y x
    (obs=1,000)
    
                 |        y        x
    -------------+------------------
               y |   1.0000
               x |   0.4735   1.0000
    
    
    . set cformat %5.2f
    
    . reg y x
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(1, 998)       =    288.46
           Model |  437.799556         1  437.799556   Prob > F        =    0.0000
        Residual |   1514.6571       998  1.51769249   R-squared       =    0.2242
    -------------+----------------------------------   Adj R-squared   =    0.2235
           Total |  1952.45666       999  1.95441107   Root MSE        =    1.2319
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |       0.46       0.03    16.98    0.00         0.41        0.52
           _cons |       0.06       0.04     1.59    0.11        -0.01        0.14
    ------------------------------------------------------------------------------
    
    . corr y x
    (obs=1,000)
    
                 |        y        x
    -------------+------------------
               y |   1.0000
               x |   0.4735   1.0000

    Comment


    • Problems frequently arise from non-printing characters in strings. By definition, they can't be seen by the user, but Stata sees them and takes them seriously. It often crops up with string variables in data sets that have been imported from various sources. And they can create havoc when you are trying to merge two data sets from different sources with different kinds of non-printing characters contaminating them. Can we have a string function that eliminates all non-printing characters from a string? Perhaps it could even be built out from the existing -egen, sieve()- function in the engemore package as a new class of characters "printable", but updated to cope with Unicode.

      Comment


      • #242: While waiting for the stripnonprintable()

        ustrregexra() can be used to strip off "non-printable" characters using Unicode categories:
        Code:
        scalar S2 = ustrregexra(S1,"[^\p{L}\p{M}\p{N}\p{P}\p{S}\p{Zs}]","")
        If only single U+0020 SPACEs is wanted, remaining whitespace characters can be replaced and trimed by:
        Code:
        scalar S2 = itrim(ustrregexra(S2,"\p{Zs}",ustrunescape("\u0020")))
        Code:
        \p{L}  or \p{Letter}: any kind of letter from any language
        \p{M}  or \p{Mark}  : a character intended to be combined with another character
        \p{N}  or \p{Number}: any kind of numeric character in any script.
        \p{P}  or \p{Punctuation}: any kind of punctuation character.
        \p{S}  or \p{Symbol}: math symbols, currency signs, dingbats, box-drawing characters, etc.
        \p{Zs} or \p{Space_Separator}: a whitespace character that is invisible, but does take up space.
        List of Unicode characters of category “Space Separator”: https://www.compart.com/en/unicode/category/Zs

        Comment


        • Thank you, that is very useful.

          Comment


          • Probably too late for 16, but could we please have a persist option in mi impute chained, similar to the one in community-contributed ice (Royston; SSC or SJ)?

            The problem: Multiple imputations via chained equations often fails because one of the models, usually mlogit, fails to converge. If this happens on observed data or if it happens on each iteration, I do not mind Stata stopping with an error; probably there is something wrong with my model. However, it is terribly annoying to have your machine running for a day, only to find that mlogit did not converge in iteration 7 on m=42. The model converged 410 times before (10 iterations * 41 datasets, not counting the runs on observed data); chances are it will converge in iteration 8 on m=42. So, I really want to be able to tell Stata to just skip this one iteration for the respective variable, not terminate the complete process.

            Ideally, I want a model-specific option, like

            Code:
            mi impute chained ... (mlogit, skipnonconvergence(#)) ...
            that specifies the maximum number of iterations per imputed dataset that I am willing to skip if the model does not converge. This seems far less dangerous than giving us the already existing force option that just happily accepts missing imputed values.

            Best
            Daniel
            Last edited by daniel klein; 12 Jun 2019, 02:51. Reason: formatting of option names

            Comment


            • I agree with daniel klein and have previously discussed this with Stata personnel as noted in #11 in https://www.statalist.org/forums/for...ple-imputation

              Comment


              • Please consider tweaking -ranksum- to make it report (at least optionally) the Mann-Whitney U statistic and Wilcoxon's W (as some authors call it). As noted in this thread, -ranksum- currently reports neither. Thanks.
                --
                Bruce Weaver
                Email: [email protected]
                Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                Version: Stata/MP 18.0 (Windows)

                Comment


                • Is anyone else wondering when the release date for Stata 16 will be announced? I remember either for Stata 14 or 15 there was a counter up on the Stata homepage for a few weeks prior to the release.

                  Comment


                  • Originally posted by wbuchanan View Post
                    [...] I remember either for Stata 14 or 15 there was a counter up on the Stata homepage for a few weeks prior to the release.
                    Yes, and then the countdown stopped and the page would not update for a day or so ... I prefer the old-fashioned announcement (traditionally on Statalist): Stata 16 is shipping now.

                    Comment


                    • It would be nice to add tests for comparing Pearson correlations:

                      https://journals.plos.org/plosone/ar...l.pone.0121945

                      and Spearman correlations:

                      https://www.omicsonline.org/open-acc....php?aid=54592

                      Comment

                      Working...
                      X