Announcement

Collapse
No announcement yet.
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    I wish the porder option for -ranksum- displayed a confidence interval for P{score(group==1) > score(group==2)}. As Conroy (2012) showed, one can get that CI via Roger Newson's -somersd- package (SJ). But it would be far more convenient to get it directly from -ranksum-, IMO.


    Newson, R. 2002. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences. Stata Journal 2: 45-64.

    --
    Bruce Weaver
    Email: [email protected]
    Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
    Version: Stata/MP 18.0 (Windows)

    Comment


    • #32
      Following up on #31, I just noticed that for the particular example I was tinkering with, -somersd- gives a 95% CI with the lower limit < 0, which seems problematic, as I am trying to get the CI for a population proportion. The CI from -roctab-, on the other hand, has a lower limit = 0, which seems more sensible.

      Here is the example.

      Output:

      Code:
      . ranksum score, by(group) porder
      
      Two-sample Wilcoxon rank-sum (Mann–Whitney) test
      
             group |      Obs    Rank sum    Expected
      -------------+---------------------------------
                 1 |        4          17          20
                 2 |        5          28          25
      -------------+---------------------------------
          Combined |        9          45          45
      
      Unadjusted variance       16.67
      Adjustment for ties        0.00
                           ----------
      Adjusted variance         16.67
      
      H0: score(group==1) = score(group==2)
               z = -0.735
      Prob > |z| = 0.4624
      Exact prob = 0.5556
      
      P{score(group==1) > score(group==2)} = 0.350
      
      . * Let g1 be an indicator for group 1 membership
      . generate byte g1 = group==1
      
      . * Use -somersd- (SJ) to get a CI for the MW statistic,
      . * as suggested by Conroy (2012, SJ):
      . somersd g1 score, transf(c) tdist
      Somers' D with variable: g1
      Transformation: Harrell's c
      Valid observations: 9
      Degrees of freedom: 8
      
      Symmetric 95% CI for Harrell's c
      ------------------------------------------------------------------------------
                   |              Jackknife
                g1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
             score |        .35   .2263734     1.55   0.161    -.1720179    .8720179
      ------------------------------------------------------------------------------
      
      . * Use roctab to get a CI for the MW statistic
      . roctab g1 score
      
                            ROC                     Asymptotic normal  
                 Obs       area     Std. err.      [95% conf. interval]
           ------------------------------------------------------------
                   9     0.3500       0.2102        0.00000     0.76190
      
      . * Notice that the asymptotic CI from -roctab-
      . * does not match the CI from -somersd-.
      . * Notice too that -somersd- is yielding a
      . * lower limit < 0 for this particular example.
      . * That seems problematic, given that probability
      . * values must fall in the range 0 to 1.
      Code:
      Code:
      * Read in the data:
      clear
      input group score
      1 12   
      1 17   
      1  9    
      1 21
      2  8   
      2 18    
      2 26      
      2 15   
      2 23
      end
      * Use the -ranksum- command:
      ranksum score, by(group) porder
      * Let g1 be an indicator for group 1 membership
      generate byte g1 = group==1
      * Use -somersd- (SJ) to get a CI for the MW statistic,
      * as suggested by Conroy (2012, SJ):
      somersd g1 score, transf(c) tdist
      * Use roctab to get a CI for the MW statistic
      roctab g1 score
      * Notice that the asymptotic CI from -roctab-
      * does not match the CI from -somersd-.
      * Notice too that -somersd- is yielding a
      * lower limit < 0 for this particular example.
      * That seems problematic, given that probability
      * values must fall in the range 0 to 1.
      --
      Bruce Weaver
      Email: [email protected]
      Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
      Version: Stata/MP 18.0 (Windows)

      Comment


      • #33
        There should be in-build options for publication-quality graph exporting! i.e., chose height/width and dpi and export to all customary image and vector graphic file formats!

        Comment


        • #34
          Add the log-F(df1, df2) density function to the list of built-in priors for bayes and bayesmh. It can be hand-coded using a substitutable expression, but it's a bit laborious if the regression model has several parameters (think: factor-variable predictors), and it seems that built-in density functions execute much faster.

          Selected references
          B. W. Brown, F. M. Spears & L. B. Levy, The log F: A Distribution for All Seasons. Computational Statistics (2002) 17:47–58
          S. Greenland & M. A. Mansourniac, Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions. Statistics in Medicine (2015) 34:3133–43

          Comment


          • #35
            Originally posted by Dorothea Ekoka Mbassi View Post
            There should be in-build options for publication-quality graph exporting! i.e., chose height/width and dpi and export to all customary image and vector graphic file formats!
            Can you expand on this? What other vector file formats are worth supporting since we already support PDF and SVG?

            You can specify the width and/or height in inches and pixels when exporting a graph to SVG. You can use the xsize and ysize options to specify the width and height in inches when exporting a graph to PDF.
            Last edited by Chinh Nguyen (StataCorp); 15 May 2023, 15:10.
            -Chinh Nguyen

            Comment


            • #36
              Originally posted by Chinh Nguyen (StataCorp) View Post

              Can you expand on this? What other vector file formats are worth supporting since we already support PDF and SVG?

              You can specify the width and/or height in inches and pixels when exporting a graph to SVG. You can use the xsize and ysize options to specify the width and height in inches when exporting a graph to PDF.
              In the spirit of #33, though I wouldn't attempt to speak on Dorothea's behalf, I can think of some convenience options that may be considered. Many of the graphics formats support height and width in pixels. Users often think about their graphics in terms of physical dimensions and DPI. For example, journals often have (at least) a requirement for minimum DPI of submitted images. I understand that one could simply do the conversion calculation to obtain the desired DPI (= width in inches times pixels). I also understand that if all of the pixels are there, the DPI is arbitrary. Then again, when users double-click on a newly created JPG image, it just looks small on the screen because of whatever default dpi is assumed (for example, on my Windows machine using IrfanView, this is recorded as 96x96 dpi in the image metadata). On option could be to add an -dpi()- option that will override this metadata and perform the back-calculation to pixel size behind the scenes.

              As a second, somewhat related request going back at least 4 years, would you consider adding some form of (lossless) compressed TIFF image option (such as LZMA or zlib)? Exporting high-resolution raw TIFF generates massive files.

              Comment


              • #37
                I have become quite fond of -mplotoffset-.
                Click image for larger version

Name:	image_31175.png
Views:	1
Size:	165.0 KB
ID:	1714428

                The offset feature should be a standard in the -marginsplot- command.

                Thank you to Nicholas Winter for a very nice command
                Last edited by Niels Henrik Bruun; 22 May 2023, 01:26.
                Kind regards

                nhb

                Comment


                • #38
                  Increase the limit on length of variables names, preferably by an order of magnitude

                  Comment


                  • #39
                    Mike Murphy If you want to ask for variable names say 320 characters long, that perhaps should come with an idea of where they are going to fit in output. Stata already has to abbreviate variable names often because the rest of the output has to be shown. Otherwise put, what is the rationale for this?

                    Comment


                    • #40
                      Nick Cox The rationale is twofold. One is that is I frequently work with secondary data with large #s of variables, where I have no control over the initial variable naming conventions. Going through a large dataset and resolving cases where Stata has renamed a variable "v2938" etc. is tedious, as is resolving cases where a loop adding a prefix (ie. "log_`v'") fails. Two is simple preference, I value legibility and would like the ability to name a variable "log_income_conditional_first_sample" rather than "linc_c_s1" or similar. As you note, Stata already abbreviates variables in output so I don't see what the cost is- users who prefer shorter variable names would be unaffected.

                      Comment


                      • #41
                        There will be costs. Here are some.

                        If Stata allows this in Stata 19, then Stata 19 datasets will be unreadable in any earlier versions.

                        I take your point about data input if external providers are using long names, or their equivalent, but while Stata 19 may allow longer names, there won't be more room available for variable names in most statistical or graphical commands.

                        Longer variable names make it harder to produce intelligible abbreviations.
                        Last edited by Nick Cox; 23 May 2023, 11:56.

                        Comment


                        • #42
                          I'm not sure I really want this, but I wonder what others might think. For many years on this Forum I have repeatedly inveighed against the indiscriminate use of global macros. At least as far as I can tell from what I see here on Statalist, I have persuaded few people about this. So I'm thinking of switching my approach from a war on global macros (not to be confused with the global war on terrorism) to a harm reduction approach.

                          The danger of global macros arises from the fact that their contents can be modified anywhere, including places that may not even be visible to the programmer when side effects of the modification arise. This makes for intractable bugs. On the other hand, I do appreciate that global macros can be more convenient to use than local macros. So what about a write-once version of global macro? Something analogous to #define in C, or const string in C++. The user could create and define the macro at one point in the code, but its value would not be changeable thereafter. And any attempt to change its values would be an error and cause a break. My observation is that global macros are commonly (mis)used for things like lists of variables that will be used repeatedly in the code. It seems to me that a write-once global macro would provide all the convenience of a global macro, but would pose no danger. It would serve the purpose for this kind of usage, but would not be capable of causing the kind of havoc that unrestricted global macros can wreak.

                          To be clear, I am not suggesting here that changeable global macros be eliminated. If nothing else, too much working code would be broken by doing that. I'm suggesting that we create a new, third kind of macro. I'm not sure what to call it, nor precisely what kind of syntax would be best for defining and referencing these. But what about this concept?

                          Comment


                          • #43
                            Hi Clyde Schechter. I like your suggestion in #42. I reckon this harm reduction approach would be far more palatable for many users, including me. But I think it would be especially helpful to folks who are relative newbies to Stata (or to coding generally). I'm sure the students in one of my courses would appreciate it, for example.
                            --
                            Bruce Weaver
                            Email: [email protected]
                            Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                            Version: Stata/MP 18.0 (Windows)

                            Comment


                            • #44
                              Originally posted by Clyde Schechter View Post
                              I'm not sure I really want this, but I wonder what others might think. For many years on this Forum I have repeatedly inveighed against the indiscriminate use of global macros. At least as far as I can tell from what I see here on Statalist, I have persuaded few people about this. So I'm thinking of switching my approach from a war on global macros (not to be confused with the global war on terrorism) to a harm reduction approach.

                              The danger of global macros arises from the fact that their contents can be modified anywhere, including places that may not even be visible to the programmer when side effects of the modification arise. This makes for intractable bugs. On the other hand, I do appreciate that global macros can be more convenient to use than local macros. So what about a write-once version of global macro? Something analogous to #define in C, or const string in C++. The user could create and define the macro at one point in the code, but its value would not be changeable thereafter. And any attempt to change its values would be an error and cause a break. My observation is that global macros are commonly (mis)used for things like lists of variables that will be used repeatedly in the code. It seems to me that a write-once global macro would provide all the convenience of a global macro, but would pose no danger. It would serve the purpose for this kind of usage, but would not be capable of causing the kind of havoc that unrestricted global macros can wreak.

                              To be clear, I am not suggesting here that changeable global macros be eliminated. If nothing else, too much working code would be broken by doing that. I'm suggesting that we create a new, third kind of macro. I'm not sure what to call it, nor precisely what kind of syntax would be best for defining and referencing these. But what about this concept?
                              There is a workaround to your harm-reduction approach that could be implemented now. You could write a Stata program whose sole task is to set all of the global macro values. Call it once at the start of your program, and as needed later on to aid in debugging.

                              Code:
                              cap program drop myglobals
                              program myglobals
                                global msg "Hello World"
                              end
                              
                              qui myglobals
                              di "$msg"

                              Comment


                              • #45
                                #44 does not prevent other programs (or do-files) to reset the respective global macros, which as I understand it, is Clyde's primary concern.

                                I believe Clyde suggests an extended global command that defines something like a static final (class member) variable in Mata. Fantasy syntax could be

                                Code:
                                global foo "bar" , final
                                The global macro foo could then not be altered.

                                Obviously, you would need to think about the scope of such global macros. Do you have to restart Stata to change (including delete) it? Seems cumbersome. What do you do if two programs (or do-files) that depend on each other (but might have different authors) define the same (final) global macro? First come, first serve? Not sure that errors due to such collisions would be easier to debug than what we have now.
                                Last edited by daniel klein; 24 May 2023, 16:56.

                                Comment

                                Working...
                                X