Wishlist for Stata 19

Bruce Weaver

Join Date: May 2014

Posts: 1133
#31

11 May 2023, 14:14

I wish the porder option for -ranksum- displayed a confidence interval for P{score(group==1) > score(group==2)}. As Conroy (2012) showed, one can get that CI via Roger Newson's -somersd- package (SJ). But it would be far more convenient to get it directly from -ranksum-, IMO.

Newson, R. 2002. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences. Stata Journal 2: 45-64.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

#32

11 May 2023, 15:36

Following up on #31, I just noticed that for the particular example I was tinkering with, -somersd- gives a 95% CI with the lower limit < 0, which seems problematic, as I am trying to get the CI for a population proportion. The CI from -roctab-, on the other hand, has a lower limit = 0, which seems more sensible.

Here is the example.

Output:

Code:

. ranksum score, by(group) porder

Two-sample Wilcoxon rank-sum (Mann–Whitney) test

       group |      Obs    Rank sum    Expected
-------------+---------------------------------
           1 |        4          17          20
           2 |        5          28          25
-------------+---------------------------------
    Combined |        9          45          45

Unadjusted variance       16.67
Adjustment for ties        0.00
                     ----------
Adjusted variance         16.67

H0: score(group==1) = score(group==2)
         z = -0.735
Prob > |z| = 0.4624
Exact prob = 0.5556

P{score(group==1) > score(group==2)} = 0.350

. * Let g1 be an indicator for group 1 membership
. generate byte g1 = group==1

. * Use -somersd- (SJ) to get a CI for the MW statistic,
. * as suggested by Conroy (2012, SJ):
. somersd g1 score, transf(c) tdist
Somers' D with variable: g1
Transformation: Harrell's c
Valid observations: 9
Degrees of freedom: 8

Symmetric 95% CI for Harrell's c
------------------------------------------------------------------------------
             |              Jackknife
          g1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       score |        .35   .2263734     1.55   0.161    -.1720179    .8720179
------------------------------------------------------------------------------

. * Use roctab to get a CI for the MW statistic
. roctab g1 score

                      ROC                     Asymptotic normal  
           Obs       area     Std. err.      [95% conf. interval]
     ------------------------------------------------------------
             9     0.3500       0.2102        0.00000     0.76190

. * Notice that the asymptotic CI from -roctab-
. * does not match the CI from -somersd-.
. * Notice too that -somersd- is yielding a
. * lower limit < 0 for this particular example.
. * That seems problematic, given that probability
. * values must fall in the range 0 to 1.

Code:

* Read in the data:
clear
input group score
1 12   
1 17   
1  9    
1 21
2  8   
2 18    
2 26      
2 15   
2 23
end
* Use the -ranksum- command:
ranksum score, by(group) porder
* Let g1 be an indicator for group 1 membership
generate byte g1 = group==1
* Use -somersd- (SJ) to get a CI for the MW statistic,
* as suggested by Conroy (2012, SJ):
somersd g1 score, transf(c) tdist
* Use roctab to get a CI for the MW statistic
roctab g1 score
* Notice that the asymptotic CI from -roctab-
* does not match the CI from -somersd-.
* Notice too that -somersd- is yielding a
* lower limit < 0 for this particular example.
* That seems problematic, given that probability
* values must fall in the range 0 to 1.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Dorothea Ekoka Mbassi

Join Date: Aug 2022

Posts: 33
#33

12 May 2023, 06:57

There should be in-build options for publication-quality graph exporting! i.e., chose height/width and dpi and export to all customary image and vector graphic file formats!
5 likes
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#34

15 May 2023, 00:36

Add the log-F(df1, df2) density function to the list of built-in priors for bayes and bayesmh. It can be hand-coded using a substitutable expression, but it's a bit laborious if the regression model has several parameters (think: factor-variable predictors), and it seems that built-in density functions execute much faster.

Selected references
B. W. Brown, F. M. Spears & L. B. Levy, The log F: A Distribution for All Seasons. Computational Statistics (2002) 17:47–58
S. Greenland & M. A. Mansourniac, Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions. Statistics in Medicine (2015) 34:3133–43
1 like
Comment
Chinh Nguyen (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 182
#35

15 May 2023, 14:54

Originally posted by Dorothea Ekoka Mbassi View Post

There should be in-build options for publication-quality graph exporting! i.e., chose height/width and dpi and export to all customary image and vector graphic file formats!

Can you expand on this? What other vector file formats are worth supporting since we already support PDF and SVG?

You can specify the width and/or height in inches and pixels when exporting a graph to SVG. You can use the xsize and ysize options to specify the width and height in inches when exporting a graph to PDF.

Last edited by Chinh Nguyen (StataCorp); 15 May 2023, 15:10.

-Chinh Nguyen
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#36

15 May 2023, 16:40

Originally posted by Chinh Nguyen (StataCorp) View Post

Can you expand on this? What other vector file formats are worth supporting since we already support PDF and SVG?

You can specify the width and/or height in inches and pixels when exporting a graph to SVG. You can use the xsize and ysize options to specify the width and height in inches when exporting a graph to PDF.

In the spirit of #33, though I wouldn't attempt to speak on Dorothea's behalf, I can think of some convenience options that may be considered. Many of the graphics formats support height and width in pixels. Users often think about their graphics in terms of physical dimensions and DPI. For example, journals often have (at least) a requirement for minimum DPI of submitted images. I understand that one could simply do the conversion calculation to obtain the desired DPI (= width in inches times pixels). I also understand that if all of the pixels are there, the DPI is arbitrary. Then again, when users double-click on a newly created JPG image, it just looks small on the screen because of whatever default dpi is assumed (for example, on my Windows machine using IrfanView, this is recorded as 96x96 dpi in the image metadata). On option could be to add an -dpi()- option that will override this metadata and perform the back-calculation to pixel size behind the scenes.

As a second, somewhat related request going back at least 4 years, would you consider adding some form of (lossless) compressed TIFF image option (such as LZMA or zlib)? Exporting high-resolution raw TIFF generates massive files.
4 likes
Comment
Niels Henrik Bruun

Join Date: Aug 2014

Posts: 555
#37

22 May 2023, 01:23

I have become quite fond of -mplotoffset-.

The offset feature should be a standard in the -marginsplot- command.

Thank you to Nicholas Winter for a very nice command

Last edited by Niels Henrik Bruun; 22 May 2023, 01:26.

Kind regards

nhb
2 likes
Comment
Mike Murphy

Join Date: Jul 2014

Posts: 88
#38

23 May 2023, 08:51

Increase the limit on length of variables names, preferably by an order of magnitude
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#39

23 May 2023, 09:27

Mike Murphy If you want to ask for variable names say 320 characters long, that perhaps should come with an idea of where they are going to fit in output. Stata already has to abbreviate variable names often because the rest of the output has to be shown. Otherwise put, what is the rationale for this?
Comment
Mike Murphy

Join Date: Jul 2014

Posts: 88
#40

23 May 2023, 09:57

Nick Cox The rationale is twofold. One is that is I frequently work with secondary data with large #s of variables, where I have no control over the initial variable naming conventions. Going through a large dataset and resolving cases where Stata has renamed a variable "v2938" etc. is tedious, as is resolving cases where a loop adding a prefix (ie. "log_`v'") fails. Two is simple preference, I value legibility and would like the ability to name a variable "log_income_conditional_first_sample" rather than "linc_c_s1" or similar. As you note, Stata already abbreviates variables in output so I don't see what the cost is- users who prefer shorter variable names would be unaffected.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#41

23 May 2023, 11:10

There will be costs. Here are some.

If Stata allows this in Stata 19, then Stata 19 datasets will be unreadable in any earlier versions.

I take your point about data input if external providers are using long names, or their equivalent, but while Stata 19 may allow longer names, there won't be more room available for variable names in most statistical or graphical commands.

Longer variable names make it harder to produce intelligible abbreviations.

Last edited by Nick Cox; 23 May 2023, 11:56.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#42

24 May 2023, 09:57

I'm not sure I really want this, but I wonder what others might think. For many years on this Forum I have repeatedly inveighed against the indiscriminate use of global macros. At least as far as I can tell from what I see here on Statalist, I have persuaded few people about this. So I'm thinking of switching my approach from a war on global macros (not to be confused with the global war on terrorism) to a harm reduction approach.

The danger of global macros arises from the fact that their contents can be modified anywhere, including places that may not even be visible to the programmer when side effects of the modification arise. This makes for intractable bugs. On the other hand, I do appreciate that global macros can be more convenient to use than local macros. So what about a write-once version of global macro? Something analogous to #define in C, or const string in C++. The user could create and define the macro at one point in the code, but its value would not be changeable thereafter. And any attempt to change its values would be an error and cause a break. My observation is that global macros are commonly (mis)used for things like lists of variables that will be used repeatedly in the code. It seems to me that a write-once global macro would provide all the convenience of a global macro, but would pose no danger. It would serve the purpose for this kind of usage, but would not be capable of causing the kind of havoc that unrestricted global macros can wreak.

To be clear, I am not suggesting here that changeable global macros be eliminated. If nothing else, too much working code would be broken by doing that. I'm suggesting that we create a new, third kind of macro. I'm not sure what to call it, nor precisely what kind of syntax would be best for defining and referencing these. But what about this concept?
8 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#43

24 May 2023, 14:22

Hi Clyde Schechter. I like your suggestion in #42. I reckon this harm reduction approach would be far more palatable for many users, including me. But I think it would be especially helpful to folks who are relative newbies to Stata (or to coding generally). I'm sure the students in one of my courses would appreciate it, for example.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#44

24 May 2023, 14:36

Originally posted by Clyde Schechter View Post

I'm not sure I really want this, but I wonder what others might think. For many years on this Forum I have repeatedly inveighed against the indiscriminate use of global macros. At least as far as I can tell from what I see here on Statalist, I have persuaded few people about this. So I'm thinking of switching my approach from a war on global macros (not to be confused with the global war on terrorism) to a harm reduction approach.

The danger of global macros arises from the fact that their contents can be modified anywhere, including places that may not even be visible to the programmer when side effects of the modification arise. This makes for intractable bugs. On the other hand, I do appreciate that global macros can be more convenient to use than local macros. So what about a write-once version of global macro? Something analogous to #define in C, or const string in C++. The user could create and define the macro at one point in the code, but its value would not be changeable thereafter. And any attempt to change its values would be an error and cause a break. My observation is that global macros are commonly (mis)used for things like lists of variables that will be used repeatedly in the code. It seems to me that a write-once global macro would provide all the convenience of a global macro, but would pose no danger. It would serve the purpose for this kind of usage, but would not be capable of causing the kind of havoc that unrestricted global macros can wreak.

To be clear, I am not suggesting here that changeable global macros be eliminated. If nothing else, too much working code would be broken by doing that. I'm suggesting that we create a new, third kind of macro. I'm not sure what to call it, nor precisely what kind of syntax would be best for defining and referencing these. But what about this concept?

There is a workaround to your harm-reduction approach that could be implemented now. You could write a Stata program whose sole task is to set all of the global macro values. Call it once at the start of your program, and as needed later on to aid in debugging.

Code:

cap program drop myglobals program myglobals global msg "Hello World" end qui myglobals di "$msg"
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#45

24 May 2023, 16:48

#44 does not prevent other programs (or do-files) to reset the respective global macros, which as I understand it, is Clyde's primary concern.

I believe Clyde suggests an extended global command that defines something like a static final (class member) variable in Mata. Fantasy syntax could be

Code:

global foo "bar" , final

The global macro foo could then not be altered.

Obviously, you would need to think about the scope of such global macros. Do you have to restart Stata to change (including delete) it? Seems cumbersome. What do you do if two programs (or do-files) that depend on each other (but might have different authors) define the same (final) global macro? First come, first serve? Not sure that errors due to such collisions would be easier to debug than what we have now.

Last edited by daniel klein; 24 May 2023, 16:56.
3 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment