Wishlist for Stata 18

Mike Lacy replied

09 Nov 2022, 10:38
Subject: Extend -permute- to shuffle more than one variable.

I'd recommend extending the -permute- command to allow shuffling of more than one variable. (This occurs to me in the context of responding to some StataList questions about what are being called "placebo tests" in econometrics, which as near as I can tell often are a type of permutation test with multiple variables being permuted.) This should be easy to do, as all the "hard stuff" to program in -permute- is already there, i.e., keeping track of and reporting the results. Shuffling multiple variables at each rep. should not be hard.
6 likes
Leave a comment:
JanDitzen replied

08 Nov 2022, 23:47
I am getting more and more often frustrated with managing ado paths when working with coauthors or helping people find bugs. Also I fear that a lot of people are not really aware what is in their ado folders and which version of a program is actually used. Thus I would suggest the following (I think I mentioned some of it somewhere before but can't find it):
adopath clear to reset the environment to the default when Stata is loaded

ensure that the path exists when adding one using adopath +

ignore back/forwardlash at the end of path when using adopath -

extend which so it shows path which program is used and alternative paths if a program appears in several ado folders.

Often I find people have installed several versions of programs without realising it. This causes incomplete updates, errors and can even lead to inconsistent results. The problem is that one can install a package from SSC or other sources at the same time. For example a user can install a version 1 from SSC and then an update from - say - GitHub. I had occurrences when Stata thought there are two versions and the update was incomplete. Some files were overwritten, others not. The user went ahead and the package produced at best an error, at worse results with an outdated version of a program. It would be great if it would be possible to give a program an identifier which is independent of the source from which is installed together with a program specific version indicator.

I think the points raised are important because one of the great advantages over R are (from my point of view) Stata's capabilities of version control, easy installations of packages/community contributed programs and reproducibility.
8 likes
Leave a comment:
FernandoRios replied

07 Nov 2022, 18:38
Two additional wishes for mata
1) it would be great if one could run mata scripts from the dofile editor, once Mata is activated (it usually gives error because "do" is not valid from within mata)
2) Perhaps it would be useful to have a dedicated mata editor. So that programming and using mata is as flexible as with other languages.
F
4 likes
Leave a comment:
David Speed replied

07 Nov 2022, 14:22
Bruce Weaver

Hi Bruce,

I also had to use Field's equations to figure out what was going on with the robust versions. I've wrote code awhile back to produce Brown-Forsythe's and Welch's F statistics, but it's much clunkier than yours - I'd never seen the statsby command before!

Cheers,

David.
Leave a comment:
Luis Pecht replied

07 Nov 2022, 09:59
I would love to see some built-in JSON functionality , despite its flexibility and non-rectangular philosophy. Currently , we depend on python libraries, via SFI, or community-packages, -jsonio- (https://wbuchanan.github.io/StataJSON/) and -insheetjson- (http://fmwww.bc.edu/RePEc/bocode/i/insheetjson.html)

A drop-down command in File section menu to convert it to a mata matrix or associative arrays, would be a possible first step towards.
3 likes
Leave a comment:
David Speed replied

07 Nov 2022, 09:49
Bruce Weaver Thanks for the code in #514!
Leave a comment:
jerome falken replied

07 Nov 2022, 08:49
According to the Stata/MP performance report, many functions benefit from having multiple cores, at the exception of mgarch and mswitch; the problem with mgarch and mswitch is that they can take a very long time to run; my suggestion for Stata 18 would be to parallelize those, but I would like to ask the community first if it's even possible, does the algorithm underneath allows parallelization for those two commands?
1 like
Leave a comment:
Jeffrey Chan replied

05 Nov 2022, 03:44
LASSO for survival analysis (Cox)? Having the capability to do Tukey post hoc tests after ANOVA would be great too.
Leave a comment:
Niels Henrik Bruun replied

04 Nov 2022, 01:44
I haven't checked if anyone else asked for this.
I've just started using frames, and it would be of great help if -frame put- had an option replace and an option change.
3 likes
Leave a comment:

John Mullahy replied

27 Oct 2022, 15:42

A minor but nice option would be to allow the xline and yline options in twoway to have suboptions that would result in the line(s) appearing atop the plotted information rather than beneath it.

In the following code I've used the standard approach and then generated the desired result using twoway function to draw the lines atop of the plotted information.

Code:

cap preserve
cap drop _all

sysuse auto

qui sum mpg
loc m=r(mean)

tw hist mpg, plotr(marg(zero)) xli(`m', lco(red)) saving(gsl1, replace)

tw (hist mpg, plotr(marg(zero)) leg(off)) ///
   (function y=`m', horiz ra(0 .1) lco(red) lpa(solid) saving(gsl2, replace))
   
   
drop _all

set seed 2345
set obs 2000

gen x=rnormal(0 1)
gen y=rnormal(0 1)
qui sum x
loc mx=r(mean)
loc mnx=r(min)
loc mxx=r(max)

qui sum y
loc my=r(mean)
loc mny=r(min)
loc mxy=r(max)

tw scatter y x, plotr(marg(zero)) xli(`mx', lco(red)) yli(`my',lco(red)) xla(-2(1)5) saving(gsl3, replace)

tw (scatter y x, plotr(marg(zero)) xsc(r(`mnx' `mxx')) xla(-2(1)5) ysc(r(`mny' `mxy')) leg(off)) ///
   (function y=`my', ra(`mnx' `mxx') lco(red) lpa(solid) ) ///
   (function y=`mx', horiz ra(`mny' `mxy') lpa(solid) lco(red) saving(gsl4, replace))

cap restore

Leave a comment:

Mead Over replied

24 Oct 2022, 09:44

John Mullahy #515 , indeed Jeroen Weesie contributed -exprcmd- and -expr- to the STB way back in 1998. I still use it for quick exploration.
His -expr:- can be used as a prefix as follows.

Code:

. sysuse auto
(1978 automobile data)

. expr: reg log(price) weight rep78

Expression _Expr0 := log(price)
-> reg  _Expr0 weight rep78  

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(2, 66)        =     21.24
       Model |  4.00927365         2  2.00463683   Prob > F        =    0.0000
    Residual |  6.22817243        66  .094366249   R-squared       =    0.3916
-------------+----------------------------------   Adj R-squared   =    0.3732
       Total |  10.2374461        68  .150550678   Root MSE        =    .30719

------------------------------------------------------------------------------
      _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |    .000333   .0000513     6.50   0.000     .0002307    .0004354
       rep78 |   .1272241   .0410658     3.10   0.003     .0452337    .2092146
       _cons |   7.196296   .2500145    28.78   0.000     6.697125    7.695466
------------------------------------------------------------------------------

. expr: poisson price weight weight^2 rep78

Expression _Expr0 := weight^2
-> poisson  price weight _Expr0 rep78  

Iteration 0:   log likelihood = -21667.605  
Iteration 1:   log likelihood = -21655.464  
Iteration 2:   log likelihood = -21655.463  

Poisson regression                                    Number of obs =       69
                                                      LR chi2(2)    = 36753.75
                                                      Prob > chi2   =   0.0000
Log likelihood = -21655.463                           Pseudo R2     =   0.4591

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |  -.0005043   .0000145   -34.90   0.000    -.0005326    -.000476
      _Expr0 |   1.36e-07   2.20e-09    61.86   0.000     1.32e-07    1.40e-07
       rep78 |   .0955141   .0018534    51.53   0.000     .0918815    .0991466
       _cons |   8.555204    .025696   332.94   0.000      8.50484    8.605567
-

Alternatively, issuing his -exprcmd- once turns on his interpreter until it is subsequently turned off like this:

Code:

. sysuse auto
(1978 automobile data)

. exprcmd

(expr) . reg log(price) weight rep78

Expression _Expr0 := log(price)
-> reg  _Expr0 weight rep78  

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(2, 66)        =     21.24
       Model |  4.00927365         2  2.00463683   Prob > F        =    0.0000
    Residual |  6.22817243        66  .094366249   R-squared       =    0.3916
-------------+----------------------------------   Adj R-squared   =    0.3732
       Total |  10.2374461        68  .150550678   Root MSE        =    .30719

------------------------------------------------------------------------------
      _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |    .000333   .0000513     6.50   0.000     .0002307    .0004354
       rep78 |   .1272241   .0410658     3.10   0.003     .0452337    .2092146
       _cons |   7.196296   .2500145    28.78   0.000     6.697125    7.695466
------------------------------------------------------------------------------

(expr) . poisson price weight weight^2 rep78

Expression _Expr0 := weight^2
-> poisson  price weight _Expr0 rep78  

Iteration 0:   log likelihood = -21667.605  
Iteration 1:   log likelihood = -21655.464  
Iteration 2:   log likelihood = -21655.463  

Poisson regression                                    Number of obs =       69
                                                      LR chi2(2)    = 36753.75
                                                      Prob > chi2   =   0.0000
Log likelihood = -21655.463                           Pseudo R2     =   0.4591

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |  -.0005043   .0000145   -34.90   0.000    -.0005326    -.000476
      _Expr0 |   1.36e-07   2.20e-09    61.86   0.000     1.32e-07    1.40e-07
       rep78 |   .0955141   .0018534    51.53   0.000     .0918815    .0991466
       _cons |   8.555204    .025696   332.94   0.000      8.50484    8.605567
------------------------------------------------------------------------------

(expr) . .

His package also includes a cute utility -diest- to display the variable definitions of the temporary variables constructed by -expr- or -exprcmd-:

Code:

. sysuse auto
(1978 automobile data)

. expr: reg log(price) weight weight^2 rep78

Expression _Expr0 := log(price)
Expression _Expr1 := weight^2
-> reg  _Expr0 weight _Expr1 rep78  

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(3, 65)        =     18.23
       Model |  4.67772488         3  1.55924163   Prob > F        =    0.0000
    Residual |   5.5597212        65  .085534172   R-squared       =    0.4569
-------------+----------------------------------   Adj R-squared   =    0.4319
       Total |  10.2374461        68  .150550678   Root MSE        =    .29246

------------------------------------------------------------------------------
      _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |    -.00067   .0003621    -1.85   0.069    -.0013932    .0000532
      _Expr1 |   1.60e-07   5.74e-08     2.80   0.007     4.58e-08    2.75e-07
       rep78 |   .0963816   .0406237     2.37   0.021     .0152506    .1775127
       _cons |   8.768601   .6107286    14.36   0.000     7.548892     9.98831
------------------------------------------------------------------------------

. diest
------------------------------------------------------------------------------
  _Expr0 log(price)                      |    Coef.   Std. Err.     t    P>|t|
-----------------------------------------+------------------------------------
  weight Weight (lbs.)                   |   -.00067  .0003621  -1.850   0.069
  _Expr1 weight^2                        |  1.60e-07  5.74e-08   2.796   0.007
   rep78 Repair record 1978              |  .0963816  .0406237   2.373   0.021
   _cons                                 |  8.768601  .6107286  14.358   0.000
------------------------------------------------------------------------------

It would be nice if these commands were updated to accommodate some of the capabilities of newer Stata versions.

Leave a comment:

FernandoRios replied

23 Oct 2022, 19:30
Interesting.
I think someone else already did something like that some time ago. I also wrote a small script that works at least with regress -fgreg- (function for regress).
It actually does what John suggests, with the caveat that it "recognizes" functions based on spaces.
fgreg y x1 x2 x1*x2 x1^2
is valid, but this:
freg y x1 x2 x1 ^ 2
is not.
Something i haven't thought about it. I could technically combine this with -f_able- to still be able to estimate marginal effects!

Attached Files

fgreg.ado (937 Bytes, 1 view)
Leave a comment:
JanDitzen replied

23 Oct 2022, 13:19
John Mullahy;n1686463]I'd guess that if this was technically feasible it would have been done a long time ago but I'll put in a pitch for "on the fly" use of variable transformations in estimation commands, e.g.

Code:

reg log(y) x1 x2 poisson y x x^2 x^3 etc.

A subset of RHS transformations can be effected using factor variable notation, of course.[/QUOTE]

I think it must be easily implementable since first differences in the time domain are possible and that's a simple mathematical operations.
Even from a coding perspective it should be easy to implement. Neither (, ^ or other mathematical operators are allowed in variablenames.
1 like
Leave a comment:
John Mullahy replied

23 Oct 2022, 12:34
I'd guess that if this was technically feasible it would have been done a long time ago but I'll put in a pitch for "on the fly" use of variable transformations in estimation commands, e.g.

Code:

reg log(y) x1 x2 poisson y x x^2 x^3 etc.

A subset of RHS transformations can be effected using factor variable notation, of course.
1 like
Leave a comment:

Bruce Weaver replied

22 Oct 2022, 16:01

David Speed, I agree with your suggestions in #513. Meanwhile, here is some code I cobbled together earlier this year to compute Welch's F. Maybe it will help in the meanwhile.

Code:

// Date:  15-Jun-2021
// Use equations shown in Andy Field's document to compute
// Welch's F-test.
// https://www.discoveringstatistics.com/docs/welchf.pdf
// The data AF uses come from this document:
// https://www.discoveringstatistics.com/repository/contrasts.pdf

clear
input byte group y
1 3  
1 2  
1 1  
1 1  
1 4
2 5  
2 2  
2 4  
2 2  
2 3
3 7  
3 4  
3 5  
3 3  
3 6
end

preserve
drop if missing(group)
statsby Ybar=r(mean) Var=r(Var) n=r(N), by(group) clear:  summarize y
list
quietly {
generate double  Wt        = n/Var
generate double  WtMean    = Wt*Ybar
egen     double  SumWt     = total(Wt)
egen     double  SumWtMean = total(WtMean)
generate double  WelchGM   = SumWtMean/SumWt
generate double  WtSqDev   = Wt*(Ybar-WelchGM)^2
egen     double  SSmodel   = total(WtSqDev)
generate byte    k         = _N // # of groups   
generate double  MSmodel   = SSmodel/(k-1)
generate double  Lterm     = (1-Wt/SumWt)^2 / (n-1)
egen     double  lambda    = total(Lterm)
replace          lambda    = lambda*3/(k^2-1)
generate double  MSerror   = 1+2*lambda*(k-2)/3
generate double  WelchF    = MSmodel/MSerror
generate byte    df1       = k-1
generate double  df2       = 1/lambda
generate double  p         = Ftail(df1,df2,WelchF)
}
list WelchF-p in 1
display               _newline ///
"From SPSS ONEWAY:"   _newline ///
"Welch F = "4.320451  _newline ///
"    df1 = " 2              _newline ///
"    df2 = " 7.943375     _newline ///    
"      p = " .053738

restore

// Date:  17-Aug-2021
// Compare Welch's F-test to multilevel modeling approach
// in post #3 here:
// https://www.statalist.org/forums/forum/general-stata-discussion/general/1427959-anova-rejection-of-bartlett-test
// Note that I have added the dfmethod(anova) option.

mixed y i.group, ml residuals(independent, by(group)) ///
nolrtest nolog dfmethod(anova)

margins group
margins group, pwcompare(effects) mcompare(noadjust)

// Compare to -regress- with vce(robust)
regress y i.group, vce(robust)

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: