Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New package on SSC: sgpv - Second Generation P-Values based on Blume et.al.(2018,2019)

    Thanks to Kit Baum, the sgpv package is now also available from SSC (in addition to my own Github page):

    The commands were inspired by this thread.
    Second Generation P-Values (SGPVs) were proposed first in Blume et.al. (2018) (references are at the bottom of this post) as an alternative to the standard p-values.
    Second Generation P-Values (SGPVs) are the proportion of null-hypotheses which are within an interval estimate of a parameter of interest.
    SGPVs are easier to understand than normal p-values.
    Remember that the usual p-values are the probability of observing the value of a test-statistic given that the null-hypothesis is true.
    The p-values do not tell you the probability that the null-hypothesis is true.
    SGPVs also lie within the 0-1 range. Blume et al. denote SGPVs also by pδ.
    pδ lies between 0 and 1.
    A pδ of 0 indicates that 0% of the null hypotheses are compatible with the data.
    A pδ of 1 indicates that 100% of the null hypotheses are compatible with the data.
    A pδ between 0 and 1 indicates inconclusive evidence.
    A pδ of 1/2 indicates strictly inconclusive evidence.

    The sgpv-package contains all of the R-functions from the R-package translated into Stata.
    The sgpv-package consists of:
    • sgpv - a wrapper around the other commands, sgpvalue and fdrisk, to be used after estimations commands
    • sgpvalue - calculate the SGPVs
    • sgpower - power functions for the SGPVs
    • fdrisk - false confirmation/discovery risks for the SGPVs
    • plotsgpv - plot the SGPVs
    The sgpv command is the main command for normal users. Each command has its own dialog box to make using the commands easier.
    It offers features which are not available in the original R-code like calculation SGPVs after estimations and GUIs.
    There are some limitations with regards to the accepted input compared to the original R-code but these should not matter for the average user.
    Examples of how to work around these limitations are also provided in the respective help files.

    The examples below are taken from the help file of the sgpv-command and show how easy it is to get the SGPVs side-by-side with the normal p-values after an estimation command.
    Code:
    * sgpv as a prefix-command:
      syuse auto, clear
      sgpv: regress price mpg weight foreign
    
     * Save estimation for later usage 
        estimate store pricereg 
    
     * The same result but this time after the last estimation.
       sgpv
        
     * Now run a quantile regression instead    
       qreg price mpg weight foreign 
       estimates store priceqreg
    
     * Calculate SGPVs for the stored estimations and only for the foreign coefficient
       sgpv, estimate(pricereg) coefficient("foreign")
       sgpv, estimate(priceqreg) coefficient("foreign")
    References:
    Blume JD, D’Agostino McGowan L, Dupont WD, Greevy RA Jr. (2018). Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses. PLoS ONE 13(3): e0188299. https://doi.org/10.1371/journal.pone.0188299


    Blume JD, Greevy RA Jr., Welty VF, Smith JR, Dupont WD (2019). An Introduction to Second-generation p-values. The American Statistician. In press. https://doi.org/10.1080/00031305.2018.1537893

  • #2
    looks promising
    but there appears to be some error on the ssc


    . ssc install sgpv
    checking sgpv consistency and verifying not already installed...
    file http://fmwww.bc.edu/repec/bocode/p/p...ia-example.ado not found
    could not copy http://fmwww.bc.edu/repec/bocode/p/p...ia-example.ado
    (no action taken)

    ssc install: apparent error in package file for sgpv; please notify [email protected], providing package name
    r(601);

    Comment


    • #3
      Thanks for the notice
      I will try to reach repec and see if they can fix it. The problem is definitely on their side. The file is correctly on the server, only the filename in the package file is incorrect.
      I also noted another mistake from my side. I accidentally included a file in my submission to Kit Baum which contains random older codefragments.

      I will try to get it removed from the package.
      It might take a while for these corrections to be made.
      If nothing else helps, I will resubmit the package to Kit Baum as an update.
      While waiting for the correction, you also install the package from my Github page by running
      Code:
      net install sgpv, from(https://raw.githubusercontent.com/skbormann/stata-tools/master/) replace

      Comment


      • #4
        Thanks to Kit Baum, everything should work now as expected.

        Comment


        • #5
          Dear Sven-Kristjan,

          Exploring the help file as well as the
          Code:
          net describe sgpv, from(http://fmwww.bc.edu/RePEc/bocode/s)
          I tried to download the ANCILLARY FILES but this fails with the error message:
          Code:
          file http://fmwww.bc.edu/RePEc/bocode/s/sgpv-examples.do not found
          could not copy http://fmwww.bc.edu/RePEc/bocode/s/sgpv-examples.do
          (no action taken)
          Possibly this can be corrected.

          Best,
          Eric
          http://publicationslist.org/eric.melse

          Comment


          • #6
            Dear Eric,
            thank you for the notice. I will try to get it corrected. The file in the error message does not exist and should not be mentioned in the package file for this command.
            Until this error gets corrected, the additional files can be also download via the next command from my Github page.
            Code:
            net get sgpv, from(https://raw.githubusercontent.com/skbormann/stata-tools/master/) replace
            Best regards,
            Sven-Kristjan

            Comment


            • #7
              Thanks again to Kit Baum. Everything should work now as expected.

              Comment


              • #8
                Dear Sven,

                I have not had a chance to look at this command in detail, but was running through your documentation just now and had a question.

                In the supplemental remarks to the referenced 2018 paper, a multiple regression is covered in Remark 9. Because interval null bounds and the bounds of an uncertainty interval (e.g., CI) are required to compute an SGPV (my understanding), multiple p-values representing multiple hypothesis would require multiple interval nulls, no? In Remark 9 they get around this by changing the question to a single p-value derived from an R^2 test, and an interval null associated with a change in R^2. In the remark they say, “We could compute the second-generation p-value for the three-dimensional vector, but this requires specifying a three-dimensional interval null and obtaining simultaneous confidence intervals, i.e. a CI for the entire vector as opposed to three independent CIs for each element.”, which must be a harder problem.

                I had interpreted SGPVs to really be useful when you had multiple studies or p-values related to the same interval null hypothesis, as the examples in the paper demonstrate. Your documentation examples seem to be computing SGPVs over multiple hypotheses? How are you calculating the SGPVs in the example multiple regression in the supporting documentation to your program? I'll send another note to you offline too.

                Dave

                Comment


                • #9
                  Related to my question #8, is how are you calculating SGPVs on a multiple regression without supplying bounds to the null intervals? In this case you must be using a default? From looking at the documentation, it appears you are defaulting to using 0 point null hypotheses, if bounds are not supplied. I think some of the statistical properties of SGPVs come from the use of a interval null hypothesis, rather than the point null? Isn't that the whole idea of SGPVs? While the choice of using a non-interval null hypothesis might allow you to make a wrapper command on any Stata estimation result, and avoid the hard work of achieving a meaningful null interval hypothesis, it seems at odds with the SGPV literature to date.

                  Comment


                  • #10
                    Dear Dave,
                    thank you for your message. I will take a look at the said paper again and see if I can answer your question.
                    In general, my command(s) is/are just a translation into Stata of the original R-code which you can find here. So everything that Blume et al. do in their paper should be feasible with my command(s).
                    The only differences between Blume et al. and my command(s) are related to (internal) differences between Stata and R and to my attempts to make the commands easier to use in Stata for/after estimations. The calculations are exactly the same.

                    The default null-hypothesis for the sgpv-command is a point null-hypothesis of 0. This is the same null-hypothesis used by Stata to calculate the traditional p-values.

                    Can you tell me where in the documentation of this command I give the impression that the SGPVs are calculated across multiple hypotheses?

                    I looked at the Remark 9 of the supplementary material but I don't understand yet where the numbers come from.


                    Isn't that the whole idea of SGPVs? While the choice of using a non-interval null hypothesis might allow you to make a wrapper command on any Stata estimation result, and avoid the hard work of achieving a meaningful null interval hypothesis, it seems at odds with the SGPV literature to date.
                    I agree with you that an interval null hypothesis should be used. The meaningful interval null hypothesis needs to be supplied by the user of the sgpv-command.
                    I should probably make this more explicit in the documentation and write more examples with different interval null hypothesis for individual coefficients.

                    The broader problem is that interval null hypotheses are not widely used yet, at least not in the economics literature. Therefore, I cannot expect that a new user of these commands uses them for more than as another way of summarizing of the statistical evidence in favor of observing an effect.
                    Using a default point null-hypothesis is an easy way to introduce SGPVs to the general public, in my opinion.
                    I thought about deriving default interval hypotheses with something like +- 1% or +- 0.1% as the intervals from the values of the dependent variable. Even if it was possible, it would only move the problem of providing a meaningful interval null-hypothesis one step further away.
                    ​​​​​​​But in the end, I have no idea how to "force" the user to supply a meaningful interval hypothesis.

                    I hope that these remarks answer your questions. If not, let me know.

                    As a side note: There is now a new testing version of the sgpv-package available on my Github page which can be installed via
                    Code:
                    net install sgpv, from(https://raw.githubusercontent.com/skbormann/stata-tools/testing/) replace
                    if you want to help improving the commands.
                    Besides some bugfixes, the selection of coefficients should now work as expected and as "promised" by the dialog box for the sgpv-command. The bonus statistics (delta gaps and false discovery risks) are now only displayed and calculated if explicitly requested. I believe that for most users the bonus statistics are not meaningful.
                    Last but not least, the sgpv-command understands new subcommands, which allow calling the other commands from the sgpv-command.
                    For example the code below is equal to the first example for one-sided intervals in the sgpvalue-helpfile.
                    Code:
                    sgpv value, estlo(log(1.3)) esthi(.) nulllo(.) nullhi(log(1.1))
                    This new version should become soon available via SSC.

                    Comment


                    • #11
                      Hi Sven,

                      My comment #8 above was a thought in development. Sorry about that. Your documentation doesn't discuss joint hypotheses, I agree. Remark 9 in the supplement to the 2018 paper was about multiple regression, but focused on the joint testing of some coefficients.

                      My comment #9 was a constructive one. I actually don't think you should make available such as 'easy' SGPV implementation of defaulting to zero without a lot of warnings, as I don't think this offers any improvements over a traditional p-value? A default will be taken as what is usual. But you are saying the R code defaults to zero? The SGPV offers three conclusions that the traditional p-value doesn't. It concludes the result is not different from the null interval, the result is inconclusive, or that the result is different from the null. The traditional p-value concludes only that you have evidence against the null or that the result is inconclusive. I think this depends on the interval null being used? Also, there is the bit of theory about the type I and II error going to zero with increasing sample size (point 6 in section 2.3 of the 2018 paper). This is critical for multiplicity issues I was understanding, and is also dependent on an interval null?

                      With regard to uptake of this method, the major impediment was recognized by the authors as the need to state a meaningful null interval hypothesis. With respect, I don't think encouraging a zero point null is the solution but this is what will happen with your package I fear.

                      It is fantastic that you have implemented code in the R package in Stata. Thank you for your efforts.

                      Cheers,

                      Dave

                      Comment


                      • #12
                        A default will be taken as what is usual. But you are saying the R code defaults to zero?
                        The R code does not have a default value. There are no examples for the R codes of how to calculate the SGPVs after estimation commands in R. The default of 0 is my own choice.

                        With regard to uptake of this method, the major impediment was recognized by the authors as the need to state a meaningful null interval hypothesis. With respect, I don't think encouraging a zero point null is the solution but this is what will happen with your package I fear.
                        I agree with you that my default settings might encourage continuing using a zero point null hypothesis. I could add warnings if a point null hypothesis is used.
                        I don't think that the major problem is to state a meaningful null interval hypothesis. You can always default to something very small like ±0.01% or similar for an interval null hypothesis. At least in the economics literature, sometimes there is difference made between statistically significant and economically significant. Economically significant implies an interval null hypothesis which most economists should have in mind when interpreting the results.
                        I see the major impediments somewhere else. Researchers need to know about SGPVs in the first place to see a need to use my commands. This assumes that said researchers/practitioners really understand the problem with the traditional p-values and understand the benefit of SGPVs.
                        But without the pressure from reviewers and editors to use SGPVs instead of or side-by-side to traditional p-values, many researchers probably won't change their practice of reporting and relying on the traditional p-values for showing the significance of their results.
                        When I imagine giving a research seminar about why the institute of economics of my university should add SGPVs to their analyses, then I believe that I won't see too much enthusiasm in their eyes. Maybe unless SGPVs can be integrated easily in the existing workflow. Which is one of the reasons why I wrote this wrapper command.
                        Of course, thinking in terms of interval hypothesis needs to be taught to students, which is not happening currently because of the reliance on the traditional zero-effect point null hypothesis.

                        For the moment, I would wait and see how often the sgpv-package is downloaded from SCC. I hope that people who are drawn to this package understand how to use SGPVs or at least are willing to read the linked articles.
                        I am considering submitting an article to the Stata Journal about this package. In this article, I could emphasize using an interval null-hypothesis instead of a point null hypothesis. At the moment, I give this "hint" only in the documentation when discussing the interpretation of the results from an example.

                        I perceive all your comments as constructive. So no worries. You ask questions which I have asked myself before or point me to parts of the papers which I have not read yet very thoroughly.
                        I have to admit that I did not look into the statistical consequences of specifying a point null hypothesis for SGPVs, because I was more interested in writing working code.
                        I perceive a trade-off between the ease of using a new statistical tool like the SGPVs and the need to change the existing practices. For now, I have chosen the seemingly easier way of having a default null hypothesis in the hope that SGPVs will be picked up and users educate themselves.
                        Overall, I don't see an easy solution to this trade-off. But I am willing to discuss these issues further. Maybe together we can find a better solution.

                        Comment


                        • #13
                          Hi Sven,

                          As the R package doesn't default to zero, then you are doing something different from what the SGPV authors do. Comparing a 0 point null to a confidence interval is just the usual comparison and doesn't define a SGPV, which I thought requires an interval Ho? My understanding is that there is no overlap possible between a point null and an estimate interval, and a point null can never contain the estimate interval, and so can never conclude the estimate is not different from the null. This is similar to equivalency testing, where an equivalency interval must contain the estimate interval to declare equivalency. What does your code produce for the data in the 2018 paper Figure 2 and Table 1 assuming a point null? There the SGPV depends on the shown null interval.

                          Dave

                          Comment


                          • #14
                            The R-package does not default to zero because it does not have such a wrapper command like the sgpv-command to process the results of previous estimations.
                            A point null can be contained within the estimation interval. Then the SGPV is 0.5 and the evidence is deemed strictly inconclusive.
                            Therefore, it is possible to conclude that the estimate is not different from the null.
                            I agree however with you that "Comparing a 0 point null to a confidence interval is just the usual comparison".

                            Having a point null interval is explicitly supported by the original R-codes. I give you the link to the R-codes for the 2018 paper. These codes are simpler than the official package linked elsewhere. If you look at these codes then you will see that a point null-hypothesis is explicitly accounted for.
                            SGPVs are defined by the overlap of an interval estimate and an interval null hypothesis. But the null-hypothesis interval can be an interval of the "length" 0, containing only one number.
                            An interval null-hypothesis is just deemed the desired case but not required. The authors of the SGPV do not discuss (more) explicitly using a point null-hypothesis for SGPV calculation because they are not in favor of having a point null-hypothesis.
                            See the discussion in section 1.1 and 1.2 of the 2018 paper.

                            The SGPVs for Table 1 and Figure 2 can be exactly reproduced with my commands. I list below the necessary command. I cannot reproduce their figure without translating the R-codes into Stata first. The same goes for the maximum p-value
                            To reproduce the numbers and Figure yourself, you need the R-codes linked above.


                            Code:
                            sgpvalue , nulllo(144) nullhi(148) estlo(145.02 145.01 142.55 141.59 142.04 142.52 140.04 140.02) esthi(146.98 145.99 147.45 150.41 145.96 144.48 143.96 141.98)
                            
                            Second Generation P-Values
                            
                                 SGPV  Delta-Gap 
                            ---------------------
                                    1          . 
                                    1          . 
                             .7040816          . 
                                   .5          . 
                                   .5          . 
                              .244898          . 
                                    0        .02 
                                    0       1.01
                            You can get the results for the traditional point null-hypothesis with the command below. There will explicit warnings about using interval hypothesis of length 0.
                            The error message could be probably more informative about the underlying problem.

                            Code:
                            sgpvalue , nulllo(146) nullhi(146) estlo(145.02 145.01 142.55 141.59 142.04 142.52 140.04 140.02) esthi(146.98 145.99 147.45 150.41 145.96 144.48 143.96 141.98)
                            
                            Second Generation P-Values
                            
                                 SGPV  Delta-Gap 
                            ---------------------
                                   .5          . 
                                    0        .01 
                                   .5          . 
                                   .5          . 
                                    0        .04 
                                    0       1.52 
                                    0       2.04 
                                    0       4.02

                            I think that you are misinterpreting my setting of a default zero point null-hypothesis for the sgpv-command with the general calculation of the SGPVs.
                            The default is just a maybe questionable decision by me but it has nothing do to with ability of the underlying code to use an interval null-hypothesis.

                            Comment


                            • #15
                              Hi Sven,

                              This is what I suspected. In the second table above with a point null the SGPV column never is 1, unlike in the 2018 paper or your successful reproduction in the first table above where an interval null is used. By definition the SGPV can never be 1 with a point null hypothesis, and in that case is just an indicator variable for significance and is rather pointless to compute, no pun intended. Please correct me if I'm wrong.

                              Dave

                              Comment

                              Working...
                              X