Wishlist for Stata 16

Bruce Weaver

Join Date: May 2014

Posts: 1130
#226

24 Apr 2019, 12:03

Apologies if this has already been suggested.

For ttest, it would be great if one could specify a nonzero difference between mu₁ and mu₂ for unpaired t-tests.

Context. I teach my students that all t-tests have a common format, as follows: t = (statistic - parameter|H₀) / SE_statistic. (See attached slides for a nicer view of that.) For an unpaired t-test, the statistic = the difference between the two sample means, the parameter|H0 = the specified difference between the two population means (which does not necessarily have to be zero), and SE_statistic = the standard error of the difference between two independent means. To illustrate a null specifying a nonzero difference between population means, I made up an example stating that in 1960, the population difference in height between men and women was 5 inches. Someone collecting data currently wishes to test the null hypothesis that the difference is still 5 inches. It would be great if statistical software allowed one to test hypotheses like this without having to resort to the trickery of subtracting a given amount (5 inches in this case) from the scores of one group.

Cheers,
Bruce

Attached Files

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#227

25 Apr 2019, 12:05

Bruce Weaver is this what you have in mind?

Code:

webuse nhanes2f, clear mean weight, over(sex) test Male = Female test Male = Female + 10

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1130

#228

25 Apr 2019, 15:26

Originally posted by Richard Williams View Post

Bruce Weaver is this what you have in mind?

Code:

webuse nhanes2f, clear
mean weight, over(sex)
test Male = Female
test Male = Female + 10

Hi Richard. If I use ttest on the same data, I get a different result. It appears that test after means is using N-1 as the df, not N-2. The difference won't matter much with really large samples, I suppose, but people often have relatively small samples when doing t-tests.

Code:

webuse nhanes2f, clear
ttest weight, by(sex)
display "t^2 = " r(t)^2
mean weight, over(sex)
test Male = Female
* -test- is using df = N-1, not N-2.
test Male = Female + 10
* To get the right result with correct df:
generate wt = weight
replace wt = weight + 10 if sex==2
ttest wt, by(sex)
display "F = " r(t)^2
* I would like to be able to do something like this:
*   ttest weight, by(sex) delta(10)
* where delta = mu1-mu2|H0

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)

Comment

Bruce Weaver

Join Date: May 2014

Posts: 1130
#229

25 Apr 2019, 15:52

You pointed me in the right direction, Richard. Re #227 and #228, -regress- followed by -lincom- gives the right result.

Code:

webuse nhanes2f, clear * Test H0: mu1-mu2 = 10 generate wt = weight replace wt = weight + 10 if sex==2 ttest wt, by(sex) * Perhaps -regress- followed by -lincom- can give the result I want. quietly regress weight i.sex lincom 1.sex - 2.sex-10 display "t = " r(t)

But I still think most users would find it easier if ttest had an option allowing one to specify the value of mu₁-mu₂|H₀. Also, regress does allow various vce() options to deal with variance heterogeneity, but I don't think those options match exactly what you get with the unequal or welch options for ttest.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
George Hoffman

Join Date: Mar 2014

Posts: 71
#230

25 Apr 2019, 16:02

there are multiple user-writtne modules to do variations of quantile regression. these have evolved to meet the many limitations of xtqreg, significantly no factor variables or interactions.
my suggestion/request: enhance xtqreg functionality to include more of the other xt-class commands.
if this functionality already exists i apologize and would appreciate education.
thank you
George Hoffman
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#231

02 May 2019, 07:21

Would it be possible to raise the limit on "estimates store"? Currently one can only store 300 estimations, a limit quickly reached if you need to verify robustness to many different specification tweaks.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#232

02 May 2019, 08:30

Would it be feasible, in the outcome of any regression model, to list the variable(s) a given predictor is collinear with?

Last edited by Carlo Lazzaro; 02 May 2019, 09:12.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3449
#233

02 May 2019, 09:08

Carlo Lazzaro I don't think so because sometimes a given variable is collinear with acombination of variables rather than a single other variable.

The classic example would be year of interview, year of birth, and age. If we have interviews that were taken at different years (so year of interview is not a constant), then we might think that the current (at time of interview) situation may influence the outcome, the age of the respondent could influence the outcome, and the circumstances in which the respondent grew up (the year of birth) might influence the outcome. However, age = year of interview - year of birth, so if you know two, you also know the third. So it is the combination of variables that results in perfect colliniearity.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#234

02 May 2019, 09:17

Maarten:
I do share your point.
Maybe the "collinearity list" can work for simple cases only; when things get messier, Stata could return something like "collinearity depends on a combination of variables. List unfeasible".

Kind regards,
Carlo
(Stata 19.0)
Comment
John Mullahy

Join Date: Dec 2016

Posts: 750
#235

10 May 2019, 08:15

Perhaps this already is possible (but if so I haven't seen how to do it): Could twoway scatteri be modified to accept numlists for the coordinates? E.g. instead of

Code:

tw scatteri 4 5 4 6 4 7

one could use something like

Code:

tw scatteri 4 (5(1)7)

and instead of

Code:

tw scatteri 4 5 4 6 4 7 5 5 5 6 5 7

one could use something like

Code:

tw scatteri (4 5) (5(1)7)
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#236

10 May 2019, 13:01

Originally posted by John Mullahy View Post

Perhaps this already is possible (but if so I haven't seen how to do it): Could twoway scatteri be modified to accept numlists for the coordinates?

One workaround can be to build your immediate coordinate list in a macro and then call -scatteri-, as in the following example.

Code:

tw scatteri 4 5 4 6 4 7 5 5 5 6 5 7, name(have) local coordlist "" foreach a of numlist 4 5 { foreach b of numlist 5(1)7 { local coordlist = "`coordlist' `a' `b'" } } di "`coordlist'" tw scatteri `coordlist', name(want)
1 like
Comment
George Hoffman

Join Date: Mar 2014

Posts: 71
#237

12 May 2019, 11:21

a 'simple' suggestion: the bottom of the Stata main window should/could act as a more versatile status bar. it already shows the results of `pwd'. most typically, it could show number of obs (_N), the number of variables, memory consumption, sort order, last _rc, it could also indicate if data had changed since last `use'. perhaps, the user could select what cold be displayable in the status bar....

this suggestion arose because i spent the last two days working with a dataset that had some observations dropped (becsue of an errant .ado that i was building). im not sur ehow many times that might have happened previously - but i came very close to 'save, replace' as i usually do, which would have led to a very bad situation. perhaps, if the obs and var count were readily visible, i would notice the dataset status without explicit query.
thank you for considering
george hoffman

Last edited by George Hoffman; 12 May 2019, 11:23.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#238

12 May 2019, 12:02

Originally posted by George Hoffman View Post

a 'simple' suggestion: the bottom of the Stata main window should/could act as a more versatile status bar. it already shows the results of `pwd'. most typically, it could show number of obs (_N), the number of variables, memory consumption, sort order, last _rc, it could also indicate if data had changed since last `use'. perhaps, the user could select what cold be displayable in the status bar....

this suggestion arose because i spent the last two days working with a dataset that had some observations dropped (becsue of an errant .ado that i was building). im not sur ehow many times that might have happened previously - but i came very close to 'save, replace' as i usually do, which would have led to a very bad situation. perhaps, if the obs and var count were readily visible, i would notice the dataset status without explicit query.
thank you for considering
george hoffman

What's wrong with how Stata already displays this information? For example, last _rc code is displayed next to its command in the cmdlog, and the viewer pane displays _N, memory usage (for that dataset only, not active operations), number of variables, etc. The only thing I don't think it automatically shows is sort order, but that is found quickly enough by using a -describe- command.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30078
#239

12 May 2019, 13:25

Re #237, with respect to the problem that would have ensued after -save, replace-, I would argue that it is just bad programming practice to ever overwrite a source data file with a derived data file. Even with all the information you ask for in the status bar, there is always the possibility that the code taking you from start to end contains errors, errors that don't happen to show up in the information shown in the status bar. To be prepared for that possibility, whenever you transform a data set you should save it as a new data set under a new name. Never overwrite the data you started with, and always save the do-file. If you do that, if an error is discovered later, you can always fix the error and re-run.
2 likes
Comment
George Hoffman

Join Date: Mar 2014

Posts: 71
#240

12 May 2019, 18:20

dear Leonardo and Clyde -
thank you for you rresponses.
@ leonardo - yes, the information is available in other ways already. the properties window does display most of the fields that I referenced.. i'm not sure how most people use Stata, but the more windows I have open, the less room I have for the results pane, which is where I'm ususally focused.
@ clyde - yes, i acknowledge a bad habit. others have asked for version control to be built into the save function. i am aware of some user-written options. i will investigate.
i am a long-time user of this fantastic program and usergroup. i appreciate the help. thank you all again.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment