New package on SSC: sgpv - Second Generation P-Values based on Blume et.al.(2018,2019)

Dave Airey

Join Date: Apr 2014

Posts: 396
#16

09 May 2020, 12:13

Hi Sven,

One reason why the R code does not have a wrapper for previous estimations is because we would need an interval hypothesis for each term in the model, and it is unlikely to be the same, even on a common z beta scale. At the moment I understand your code provides for the choice of a common interval null hypothesis, which likely won't always make sense.

Dave
Comment
Dave Airey

Join Date: Apr 2014

Posts: 396
#17

09 May 2020, 12:43

Hi Sven,

I am hoping my comments are coming across as constructive and encouraging. This next one might be bothersome, because you've done a lot of work. My suggestion would be to focus the SGPV package away from a wrapper of previous estimations as you have presented it for two reasons: (1) presenting SGPVs based upon 0 point nulls provide no added value (2) we need multiple null intervals for each term if we really did want an SGPV for each term. That's asking a lot. I would instead rework your package to be simpler, and just have it focus on a single interval null hypothesis and not allow a 0 point null. If you do that people will find use for it.

Dave
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#18

12 May 2020, 14:34

I will post responses to the three last comments of Dave Airey in three separate posts and start with responding the comment #15

Hi Sven,

This is what I suspected. In the second table above with a point null, the SGPV column never is 1, unlike in the 2018 paper or your successful reproduction in the first table above where an interval null is used. By definition, the SGPV can never be 1 with a point null hypothesis, and in that case, is just an indicator variable for significance and is rather pointless to compute, no pun intended. Please correct me if I'm wrong.

Dave

I don't see it as a problem that the SGPV is never 1 with point null-hypothesis.
Unless you are interested in confirming some hypothesis, any SGPV more than 0 shows that some hypotheses are supported by the data which you do not deem relevant/interesting (whatever it may mean for you).
I think that you are correct that the SGPV acts like an indicator value for significant deviation from the point null-hypothesis.
The indicator might just be easier to understand than significance at the 0.01, 0.00121 or other small traditional p-values.
But in my understanding, the SGPV just summarizes the evidence of the data against my set of null-hypotheses.
Even with interval null-hypotheses, I can deem any SGPV more than 0 as no significant deviation from my set of null-hypotheses and treat SGPVs not different from the traditional p-values.
If I had, for example, an SGPV of 0.1 for some term of interest then I could report and explain that most of the relevant null-hypotheses are not supported. Therefore, there is a significant effect. Or I want to be 100% certain that no null-hypothesis is supported. Then I will treat my results as insignificant.
These decisions are the same that I have to make when reporting and interpreting the p-values.
Albeit with the caveat that the only guidance for doing so are the papers by Blume et al. whereas, for the p-values, there is a much longer tradition of doing things.

Although with the advantage that SGPVs represent a descriptive summary of the evidence and not the probability P(data|H0) which is easy to confuse with P(H0|data) which is the commonly but wrongly drawn conclusion.
I am not sure how the different the SGPVs are from the traditional p-values in terms of acting as an indicator of significance.
It is up to the user (and the reviewers of papers) of these concepts which the results are presented and how are they framed.
Maybe we approach SGPVs from different perspectives.
Therefore, I am not sure if I were able to correct you because you may see issues which for me are not important or not existing.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#19

12 May 2020, 14:44

Hi Sven,

One reason why the R code does not have a wrapper for previous estimations is because we would need an interval hypothesis for each term in the model, and it is unlikely to be the same, even on a common z beta scale. At the moment I understand your code provides for the choice of a common interval null hypothesis, which likely won't always make sense.

Dave

My speculation why the R code does not have a wrapper is that it is more difficult to write such a general wrapper in R and that the authors of the R-code simply did not deem it useful to write such a wrapper for their paper.
But I don't know enough about R to understand how easy/difficult it is to write general post estimation commands.
You could set an interval null-hypothesis for each term individually by iterating overall coefficients in your estimation.
See the pseudo-code below for an example and my comment to your third comment.

Code:

<some estimation command> local coeflist <list of coefficients of interest> local nulllb <list of lower bounds of null intervals for each coefficient> local nullub <list of upper bounds of null intervals for each coefficient> local i 1 foreach coef of local coeflist{ sgpv ,coef(`coef') nulllo(`=word("`nulllb'",`i')) nullhi(`=word("`nullub'",`i')) q <further options> mat res = r(comparison) // collect the results in matrix for further processing mat results =(nullmat(results) \ res ) local ++i } matlist results, title(<some fancy title>) rowtitle(Coefficients)

I could probably remove the necessity of a foreach-loop by adding direct support for multiple null-hypotheses or different null-hypotheses for each coefficient.
I would probably still place some restrictions on this possibility to allow an easier parsing of the option(s). But I see such a potential feature addition more like a convenience than as a really new and needed functionality.
But if requested, I will try to implement it for the next update or the other after.

For my personal research, a common interval null-hypothesis is enough.
And given that interval null-hypotheses are not well known or used at least in my field, no PhD defense committee will ask for them ;-)
But I guess that in your field having different null-intervals is much more reasonable which might explain our different opinions on this topic.
Comment
Dave Airey

Join Date: Apr 2014

Posts: 396
#20

12 May 2020, 15:09

Hi Sven,

Thank you for taking the time again to respond to my thoughts! I've enjoyed thinking about this a little with you. As I'm not an expert in this area, I'll stop here.

Cheers,

Dave
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#21

12 May 2020, 15:38

Hi Sven,

I am hoping my comments are coming across as constructive and encouraging. This next one might be bothersome, because you've done a lot of work. My suggestion would be to focus the SGPV package away from a wrapper of previous estimations as you have presented it for two reasons: (1) presenting SGPVs based upon 0 point nulls provide no added value (2) we need multiple null intervals for each term if we really did want an SGPV for each term. That's asking a lot. I would instead rework your package to be simpler, and just have it focus on a single interval null hypothesis and not allow a 0 point null. If you do that people will find use for it.

Dave

My current development plans for the sgpv-package are:
Fix potential bugs ( e.g. some non-sensical input is still possible in the sgpvalue-command)

Make minor adjustments in the code of the sgpv-command

Improve the speed of the fdrisk-command by using the numerical integration from the moremata package. This is the most time consuming task because of the amount of rewriting and rethinking the fdrisk-command.

Write an article for the Stata Journal about the package.

Some other potential ideas can be found in the ado-file for the sgpv-command. Beyond this, I have no further plans for the package.
My reason to focus on the wrapper is simply that the wrapper is the only code which is based upon my own ideas.
The rest is just a hopefully faithful translation of the R-code into Stata.
I also feel that I don't have a good understanding of the stastical properties of the SGPVs.
Therefore, I am also reluctant to extend or modify the other commands beyond internal changes.

You would need to tell me what exactly you want to do with the sgpv-package what cannot be done with the current code.
My guess is that you want to do things which are already possible, but for which I did not write yet the necessary examples into the help files.

presenting SGPVs based upon 0 point nulls provide no added value

The problem with the wrapper command lies probably in the "lazily" chosen default value for the wrapper command.
I need a default value otherwise there won't be any calculations if the user has not made up his mind before.

The SGPV for a point 0 null-hypothesis provides a simple decision criterion.
Either 0 is included or not.
Which is fine for my purposes and might lead to a simpler presentation of the results instead of having stars indicating different p-values or something similar.
So there is some added value for me.
But on the other hand, I am not going to report the SGPVs explictly in a text in the foreseeable future, unless for the potential Stata Journal article or by request.

The rest is the responsibility of the user. The user can always provide a better fitting interval hypothesis.
I can add more warnings and more examples to discourage the use of point 0 null-hypothesis, but for the default remains a for me convient solution.

Below is example for code the SGPVs (of my coefficients of interest) after an estimation command and a narrow interval null-hypothesis ± 0.01 percentage points .
The dependent variable was the log wage. The results for such a narrow interval null-hypothesis are not different from the point 0 null-hypothesis.

Code:

sgpv, coef(0b.RusFluency2#1.EngFluency2 1.RusFluency2#0b.EngFluency2 1.RusFluency2#1.EngFluency2) q nulllo(-0.01) nullhi(0.01)

Which brings us to the next point.

we need multiple null intervals for each term if we really did want an SGPV for each term.

Here, I don't understand what you mean with "multiple null intervals". Do you mean n-dimensional null-intervals? Or do you mean a different null-interval for each term of an estimation?
The SGPVs are already calculated individually for each term, so I am a bit confused what you mean.

I hope that I could dispell at least some doubts or explain better my point of view.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#22

13 May 2020, 10:40

I just released another test release on my Github page
Use the command below to install the new test release.

Code:

net install sgpv, from(https://raw.githubusercontent.com/skbormann/stata-tools/testing/) replace

I fixed the bonus-option in the sgpv-command. The false discovery risks are now calculated and displayed if the bonus-option has the values "fdrisk" or "all". Previously, it did not work.
Based on the recent discussions in this thread, I added more explicit and better visible warnings against using the default point 0 null-hypothesis for the sgpv wrapper command.
At some point, I could remove the default point null-hypothesis, but for now I hope that users of this package know what they do.
Unless I find some more bugs during the next days, this update should available via SSC within the next two weeks.

If somebody tests this test release and sends me feedback, I will be very happy and grateful about it.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#23

21 May 2020, 14:53

Thanks to Kit Baum, an update for sgpv package is now also available from SSC.

Compared to the initial release, the following improvement have been made
The type of returned results has been changed from macro to scalar to be more inline with standard practise for the commands sgpower and fdrisk.

sgpv-command:
Major changes:
Coefficients can be now selected via the coefficient-option according to one of the three specifications:

Variables names e.g. "price"

Equation names e.g. "var."

Full names e.g. "var:price"

Changed the name of the nobonus option from nobonus to bonus and thereby changed the behavior of this option -> now bonus statistics are only shown when requested.

Minor changes:
Added new subcommands so that the other commands of the package can be called from the sgpv-command with something like "sgpv value, estlo(log(1.3)) esthi(.) nulllo(.) nullhi(log(1.1))" to run the sgpvalue-command

Added better visible warnings against using the default point 0 null-hypothesis after the displayed results

Made the title of the displayed matrix adapt to the type of null-hypothesis (Interval vs. Point)

Fixed a wrong file name for the dataset in the sgpv-leukemia-example.do

Minor improvements in the example section of the help file and new examples ;
Added a new example showing how to apply a different null-hypothesis for each coefficient; the do-file with the corresponding code will not be downloaded when running adoupdate and needs to be downloaded manually if you do not want to copy & paste the code into Stata yourself.

Added an example how to export results by using estout by Ben Jann

Fixed various inconsistencies between the help file and the ado-file.

Ensure now that the sgpv-command can be used only one for thing either as a prefix-command, for a stored estimation or a matrix. These three possibilities always take precedence over using previously estimated results.

Shortened subcommand "menuInstall" to "menu" and changed the name of the option 'perm' to 'permanent' for this subcommand

Another test version is already available via

Code:

net install sgpv, from(https://raw.githubusercontent.com/skbormann/stata-tools/testing/) replace

This next version should add to the sgpv-command:
A noconstant-option to remove the constant-term before calculating SGPVs

Explicit support for an individual null-hypothesis for each coefficient in the coefficient-option -> Currently, the lower and upper bounds for each null-hypothesis are displayed next to the associated coefficient.

Fix minor inconsistencies in the help-file

Unless, there are some feature requests, I plan to release only bugfixes after the next update.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#24

24 May 2020, 05:51

I noticed an unfortunate confusion in the help file for the sgpv-command. The text for the option "nulllo" is the text of the option "nullhi" and vice versa. Basically, I got confused with the words "lower" and "upper" when modifying the help file . This will be fixed in the next update. My apologies for any potential confusion.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#25

07 Jul 2020, 17:45

Thanks to Kit Baum, an update for sgpv package is now available from SSC.

Compared to the last release, the following improvements have been made.
For the sgpv-command:
Added support for multiple null-hypotheses

Added a noconstant-option to remove the constant from the list of coefficients

The options "nulllo" and "nullhi" allow now expressions/formulas as inputs.

For the plotsgpv-command:
Fixed/improved the support for matrices as input for options "esthi" and "estlo".

Changed the legend slightly to be more in line with R-code.

For the sgpvalue-command:
Added/improved support matrices as inputs for options "esthi" and "estlo".

Noshow-option now works as expected.

Beyond these improvements, unused code has been removed and various bug fixes have been made.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#26

08 Jun 2021, 03:25

Thanks to Kit Baum, an update for sgpv package is now available from SSC.

Compared to the last release, the syntax of most commands is now more Stata like and less R-like. Several options were removed and replaced by them with default settings. The old syntax still works.
Several bugs were fixed.

For the fdrisk command:
Option sgpval became one option 'fcr'. The default is to calculate the Fdr.

Option nullweights became option 'nulltruncnormal'. The former option nullweights("Point") is automatically selected if option nullspace contains only one element. If option nullspace contains two elements then the Uniform distribution is used as the default distribution.

Option altweights became 'alttruncnormal'. The option altweights("Point") is automatically selected if option altspace contains only one element. If option altspace contains two elements then the Uniform distribution is used as the default distribution.

Options inttype and intlevel became options level(#) and likelihood(#). If no option is set then the confidence interval with the default confidence interval level is used.

For the sgpower command: Options "inttype" and "intlevel" were renamed to "level" and "likelihood". The old syntax still works.

For the sgpv command:
Changed the name of the option permament to permdialog to clarify the meaning of the option.

Fixed the format option in the Dialog box.

Added a remove option for the menu subcommand to remove the entries in the profile.do created by the option permdialog.

Renamed the dialog tab "Display" to "Reporting". Moved the options from the dialog tab "Fdrisk" to dialog tab "Reporting".

Depreciated the option bonus() and replaced it with the new options "deltagap", "fdrisk" and "all" which have the same effect as the previous bonus() option. This way is more in line with standard Stata praxis. The bonus option still works but is no longer supported.

Added a forgotten option to calculate the bonus statistics in the example file sgpv-leukemia-example.do and fixed the size of the final matrix.

Removed the fdrisk-options "nullspace" and "nullweights" because they were redudant and added a new option "truncnormal" to request the truncated Normal distribution for the null and alternative space.

Renamed the options "intlevel" and "inttype" to "level" and "likelihood". The level-option works like the same named option in other estimation command. It sets the level of the confidence interval. This option overwrites the level option of an estimation command.

The likelihood-option is meant to be used together with the matrix-option.

The previous inttype and intlevel options did not work as intended.

The title for results matrix now shows the level and type which was used to calculate the SGPVs (, delta-gaps and Fdrs).

Calculating SGPVs for stored estimations will only show the SGPV results and not the saved estimation results.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment