Announcement

Collapse
No announcement yet.
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    (1) Add native support for reading Parquet files.

    (2) Add built-in support in ivregress for the Sanderson-Windmeijer (SW) first-stage test of weak identification when there are multiple endogenous regressors. (This test is currently available in ivreg2.)
    Associate Professor of Finance and Economics
    University of Illinois
    www.julianreif.com

    Comment


    • #32
      Allow/create a few fill patterns for graphs!

      Comment


      • #33
        The standard format of the result produced in the Stata result window by the sum command while using the , d option, to display the additional statistics, is like:

        Code:
        . sysuse auto, clear
        . sum mpg, d
        
                                Mileage (mpg)
        -------------------------------------------------------------
              Percentiles      Smallest
         1%           12             12
         5%           14             12
        10%           14             14       Obs                  74
        25%           18             14       Sum of wgt.          74
        
        50%           20                      Mean            21.2973
                                Largest       Std. dev.      5.785503
        75%           25             34
        90%           29             35       Variance       33.47205
        95%           34             35       Skewness       .9487176
        99%           41             41       Kurtosis       3.975005
        But, in the above the result values r(sum), r(min) and r(max) are not included:
        Code:
        . return list
        
        scalars:
                          r(N) =  74
                      r(sum_w) =  74
                       r(mean) =  21.2972972972973
                        r(Var) =  33.47204738985561
                         r(sd) =  5.785503209735141
                   r(skewness) =  .9487175964588155
                   r(kurtosis) =  3.97500459645325
                        r(sum) =  1576
                        r(min) =  12
                        r(max) =  41
                         r(p1) =  12
                         r(p5) =  14
                        r(p10) =  14
                        r(p25) =  18
                        r(p50) =  20
                        r(p75) =  25
                        r(p90) =  29
                        r(p95) =  34
                        r(p99) =  41
        although there is ample room available to include them.

        My proposal to include them in the window report is:
        Code:
                                    Mileage (mpg)
        ---------------------------------------------------------------------
              Percentiles      Smallest
         1%           12             12       Obs                  74
         5%           14             12       Sum of wgt.          74
        10%           14             14       Mean                 21.2973
        25%           18             14       Std. dev.             5.785503
        
        50%           20                      Variance             33.47205
                                Largest       Skewness               .9487176
        75%           25             34       Kurtosis              3.975005
        90%           29             35       Sum                1576
        95%           34             35       Min                  12
        99%           41             41       Max                  41
        I suppose the above does not meet the criterium of the next rocket science contribution to the field of (medical) statistics or econometrics, but, using sum, d is a daily routine and having all results available on the fly might be of use for many Stata users.
        http://publicationslist.org/eric.melse

        Comment


        • #34
          I agree with adding the sum to the statistics reported in the Results window. But min and max are redundant: Stata already shows the four smallest and four largest values, so the first and last of those, respectively, are the values of the min and max.

          Comment


          • #35
            Ben Jann wrote a module called moremata which interestingly includes a routine to calculate percentiles. What differentiates this with the existing percentile calculation performed in Stata is the option to choose multiple methods. Apparently, based on the code for mm_quantile() within the routine allows for 12 different definitions to compute percentile. Within these 12 definitions, Stata uses definition 2 (default) and alternatively definition 6. Python, R, and other programs use a different definition.

            Just a thought here, but would be nice to include all definitions to help folks replicate processes in other programs. These definitions also apply to the calculation of median and interquartile range (IQR)

            Link to moremata: https://ideas.repec.org/c/boc/bocode/s455001.html

            Definitions listed below:
            Click image for larger version

Name:	image (6).png
Views:	1
Size:	142.9 KB
ID:	1780102

            Comment


            • #36
              For some time, I have been requesting that the ability to read Raster files in Stata would be useful as many economists (at least in my circle) look for ways where they can create routine which allow for single program execution to do their analysis especially when it comes to geospatial analysis. Recently, a package was released by a team from Xiamen University and Hefei University called readraster (link below) that uses Java integration to allow for Raster analysis in Stata.

              Maybe, and if it is worth the time and effort, the team at Stata can consider developing on this? Would be useful i believe.
              read and process raster data in Stata. Contribute to kerrydu/readraster development by creating an account on GitHub.
              Last edited by Fahad Mirza; Yesterday, 10:48.

              Comment


              • #37
                I would (still) like to see the documentation for ttest updated to clarify that the welch option produces Welch's (1947) adjustment, whereas unequal produces the adjustment that was developed by Welch (1938) and independently (apparently) by Satterthwaite (1946). See this old thread for details: I think this is important because I believe that when people talk about Welch's t-test, they usually mean the Welch (1938) test, aka., the Welch-Satterthwaite method.
                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 19.5 (Windows)

                Comment

                Working...
                X