Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mike Lacy
    replied
    Subject: Extend -permute- to shuffle more than one variable.

    I'd recommend extending the -permute- command to allow shuffling of more than one variable. (This occurs to me in the context of responding to some StataList questions about what are being called "placebo tests" in econometrics, which as near as I can tell often are a type of permutation test with multiple variables being permuted.) This should be easy to do, as all the "hard stuff" to program in -permute- is already there, i.e., keeping track of and reporting the results. Shuffling multiple variables at each rep. should not be hard.

    Leave a comment:


  • JanDitzen
    replied
    I am getting more and more often frustrated with managing ado paths when working with coauthors or helping people find bugs. Also I fear that a lot of people are not really aware what is in their ado folders and which version of a program is actually used. Thus I would suggest the following (I think I mentioned some of it somewhere before but can't find it):
    • adopath clear to reset the environment to the default when Stata is loaded
    • ensure that the path exists when adding one using adopath +
    • ignore back/forwardlash at the end of path when using adopath -
    • extend which so it shows path which program is used and alternative paths if a program appears in several ado folders.
    • Often I find people have installed several versions of programs without realising it. This causes incomplete updates, errors and can even lead to inconsistent results. The problem is that one can install a package from SSC or other sources at the same time. For example a user can install a version 1 from SSC and then an update from - say - GitHub. I had occurrences when Stata thought there are two versions and the update was incomplete. Some files were overwritten, others not. The user went ahead and the package produced at best an error, at worse results with an outdated version of a program. It would be great if it would be possible to give a program an identifier which is independent of the source from which is installed together with a program specific version indicator.

    I think the points raised are important because one of the great advantages over R are (from my point of view) Stata's capabilities of version control, easy installations of packages/community contributed programs and reproducibility.

    Leave a comment:


  • FernandoRios
    replied
    Two additional wishes for mata
    1) it would be great if one could run mata scripts from the dofile editor, once Mata is activated (it usually gives error because "do" is not valid from within mata)
    2) Perhaps it would be useful to have a dedicated mata editor. So that programming and using mata is as flexible as with other languages.
    F

    Leave a comment:


  • David Speed
    replied
    Bruce Weaver

    Hi Bruce,

    I also had to use Field's equations to figure out what was going on with the robust versions. I've wrote code awhile back to produce Brown-Forsythe's and Welch's F statistics, but it's much clunkier than yours - I'd never seen the statsby command before!

    Cheers,

    David.

    Leave a comment:


  • Luis Pecht
    replied
    I would love to see some built-in JSON functionality , despite its flexibility and non-rectangular philosophy. Currently , we depend on python libraries, via SFI, or community-packages, -jsonio- (https://wbuchanan.github.io/StataJSON/) and -insheetjson- (http://fmwww.bc.edu/RePEc/bocode/i/insheetjson.html)

    A drop-down command in File section menu to convert it to a mata matrix or associative arrays, would be a possible first step towards.

    Leave a comment:


  • David Speed
    replied
    Bruce Weaver Thanks for the code in #514!

    Leave a comment:


  • jerome falken
    replied
    According to the Stata/MP performance report, many functions benefit from having multiple cores, at the exception of mgarch and mswitch; the problem with mgarch and mswitch is that they can take a very long time to run; my suggestion for Stata 18 would be to parallelize those, but I would like to ask the community first if it's even possible, does the algorithm underneath allows parallelization for those two commands?

    Leave a comment:


  • Jeffrey Chan
    replied
    LASSO for survival analysis (Cox)? Having the capability to do Tukey post hoc tests after ANOVA would be great too.

    Leave a comment:


  • Niels Henrik Bruun
    replied
    I haven't checked if anyone else asked for this.
    I've just started using frames, and it would be of great help if -frame put- had an option replace and an option change.

    Leave a comment:


  • John Mullahy
    replied
    A minor but nice option would be to allow the xline and yline options in twoway to have suboptions that would result in the line(s) appearing atop the plotted information rather than beneath it.

    In the following code I've used the standard approach and then generated the desired result using twoway function to draw the lines atop of the plotted information.

    Code:
    cap preserve
    cap drop _all
    
    sysuse auto
    
    qui sum mpg
    loc m=r(mean)
    
    tw hist mpg, plotr(marg(zero)) xli(`m', lco(red)) saving(gsl1, replace)
    
    tw (hist mpg, plotr(marg(zero)) leg(off)) ///
       (function y=`m', horiz ra(0 .1) lco(red) lpa(solid) saving(gsl2, replace))
       
       
    drop _all
    
    set seed 2345
    set obs 2000
    
    gen x=rnormal(0 1)
    gen y=rnormal(0 1)
    qui sum x
    loc mx=r(mean)
    loc mnx=r(min)
    loc mxx=r(max)
    
    qui sum y
    loc my=r(mean)
    loc mny=r(min)
    loc mxy=r(max)
    
    tw scatter y x, plotr(marg(zero)) xli(`mx', lco(red)) yli(`my',lco(red)) xla(-2(1)5) saving(gsl3, replace)
    
    tw (scatter y x, plotr(marg(zero)) xsc(r(`mnx' `mxx')) xla(-2(1)5) ysc(r(`mny' `mxy')) leg(off)) ///
       (function y=`my', ra(`mnx' `mxx') lco(red) lpa(solid) ) ///
       (function y=`mx', horiz ra(`mny' `mxy') lpa(solid) lco(red) saving(gsl4, replace))
    
    cap restore

    Leave a comment:


  • Mead Over
    replied
    John Mullahy #515 , indeed Jeroen Weesie contributed -exprcmd- and -expr- to the STB way back in 1998. I still use it for quick exploration.
    His -expr:- can be used as a prefix as follows.

    Code:
    . sysuse auto
    (1978 automobile data)
    
    . expr: reg log(price) weight rep78
    
    Expression _Expr0 := log(price)
    -> reg  _Expr0 weight rep78  
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(2, 66)        =     21.24
           Model |  4.00927365         2  2.00463683   Prob > F        =    0.0000
        Residual |  6.22817243        66  .094366249   R-squared       =    0.3916
    -------------+----------------------------------   Adj R-squared   =    0.3732
           Total |  10.2374461        68  .150550678   Root MSE        =    .30719
    
    ------------------------------------------------------------------------------
          _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |    .000333   .0000513     6.50   0.000     .0002307    .0004354
           rep78 |   .1272241   .0410658     3.10   0.003     .0452337    .2092146
           _cons |   7.196296   .2500145    28.78   0.000     6.697125    7.695466
    ------------------------------------------------------------------------------
    
    . expr: poisson price weight weight^2 rep78
    
    Expression _Expr0 := weight^2
    -> poisson  price weight _Expr0 rep78  
    
    Iteration 0:   log likelihood = -21667.605  
    Iteration 1:   log likelihood = -21655.464  
    Iteration 2:   log likelihood = -21655.463  
    
    Poisson regression                                    Number of obs =       69
                                                          LR chi2(2)    = 36753.75
                                                          Prob > chi2   =   0.0000
    Log likelihood = -21655.463                           Pseudo R2     =   0.4591
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |  -.0005043   .0000145   -34.90   0.000    -.0005326    -.000476
          _Expr0 |   1.36e-07   2.20e-09    61.86   0.000     1.32e-07    1.40e-07
           rep78 |   .0955141   .0018534    51.53   0.000     .0918815    .0991466
           _cons |   8.555204    .025696   332.94   0.000      8.50484    8.605567
    -
    Alternatively, issuing his -exprcmd- once turns on his interpreter until it is subsequently turned off like this:

    Code:
    . sysuse auto
    (1978 automobile data)
    
    . exprcmd
    
    (expr) . reg log(price) weight rep78
    
    Expression _Expr0 := log(price)
    -> reg  _Expr0 weight rep78  
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(2, 66)        =     21.24
           Model |  4.00927365         2  2.00463683   Prob > F        =    0.0000
        Residual |  6.22817243        66  .094366249   R-squared       =    0.3916
    -------------+----------------------------------   Adj R-squared   =    0.3732
           Total |  10.2374461        68  .150550678   Root MSE        =    .30719
    
    ------------------------------------------------------------------------------
          _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |    .000333   .0000513     6.50   0.000     .0002307    .0004354
           rep78 |   .1272241   .0410658     3.10   0.003     .0452337    .2092146
           _cons |   7.196296   .2500145    28.78   0.000     6.697125    7.695466
    ------------------------------------------------------------------------------
    
    (expr) . poisson price weight weight^2 rep78
    
    Expression _Expr0 := weight^2
    -> poisson  price weight _Expr0 rep78  
    
    Iteration 0:   log likelihood = -21667.605  
    Iteration 1:   log likelihood = -21655.464  
    Iteration 2:   log likelihood = -21655.463  
    
    Poisson regression                                    Number of obs =       69
                                                          LR chi2(2)    = 36753.75
                                                          Prob > chi2   =   0.0000
    Log likelihood = -21655.463                           Pseudo R2     =   0.4591
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |  -.0005043   .0000145   -34.90   0.000    -.0005326    -.000476
          _Expr0 |   1.36e-07   2.20e-09    61.86   0.000     1.32e-07    1.40e-07
           rep78 |   .0955141   .0018534    51.53   0.000     .0918815    .0991466
           _cons |   8.555204    .025696   332.94   0.000      8.50484    8.605567
    ------------------------------------------------------------------------------
    
    (expr) . .
    His package also includes a cute utility -diest- to display the variable definitions of the temporary variables constructed by -expr- or -exprcmd-:


    Code:
    . sysuse auto
    (1978 automobile data)
    
    . expr: reg log(price) weight weight^2 rep78
    
    Expression _Expr0 := log(price)
    Expression _Expr1 := weight^2
    -> reg  _Expr0 weight _Expr1 rep78  
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(3, 65)        =     18.23
           Model |  4.67772488         3  1.55924163   Prob > F        =    0.0000
        Residual |   5.5597212        65  .085534172   R-squared       =    0.4569
    -------------+----------------------------------   Adj R-squared   =    0.4319
           Total |  10.2374461        68  .150550678   Root MSE        =    .29246
    
    ------------------------------------------------------------------------------
          _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |    -.00067   .0003621    -1.85   0.069    -.0013932    .0000532
          _Expr1 |   1.60e-07   5.74e-08     2.80   0.007     4.58e-08    2.75e-07
           rep78 |   .0963816   .0406237     2.37   0.021     .0152506    .1775127
           _cons |   8.768601   .6107286    14.36   0.000     7.548892     9.98831
    ------------------------------------------------------------------------------
    
    . diest
    ------------------------------------------------------------------------------
      _Expr0 log(price)                      |    Coef.   Std. Err.     t    P>|t|
    -----------------------------------------+------------------------------------
      weight Weight (lbs.)                   |   -.00067  .0003621  -1.850   0.069
      _Expr1 weight^2                        |  1.60e-07  5.74e-08   2.796   0.007
       rep78 Repair record 1978              |  .0963816  .0406237   2.373   0.021
       _cons                                 |  8.768601  .6107286  14.358   0.000
    ------------------------------------------------------------------------------
    It would be nice if these commands were updated to accommodate some of the capabilities of newer Stata versions.

    Leave a comment:


  • FernandoRios
    replied
    Interesting.
    I think someone else already did something like that some time ago. I also wrote a small script that works at least with regress -fgreg- (function for regress).
    It actually does what John suggests, with the caveat that it "recognizes" functions based on spaces.
    fgreg y x1 x2 x1*x2 x1^2
    is valid, but this:
    freg y x1 x2 x1 ^ 2
    is not.
    Something i haven't thought about it. I could technically combine this with -f_able- to still be able to estimate marginal effects!
    Attached Files

    Leave a comment:


  • JanDitzen
    replied
    John Mullahy;n1686463]I'd guess that if this was technically feasible it would have been done a long time ago but I'll put in a pitch for "on the fly" use of variable transformations in estimation commands, e.g.
    Code:
    reg log(y) x1 x2
    poisson y x x^2 x^3
    etc.
    A subset of RHS transformations can be effected using factor variable notation, of course.[/QUOTE]

    I think it must be easily implementable since first differences in the time domain are possible and that's a simple mathematical operations.
    Even from a coding perspective it should be easy to implement. Neither (, ^ or other mathematical operators are allowed in variablenames.

    Leave a comment:


  • John Mullahy
    replied
    I'd guess that if this was technically feasible it would have been done a long time ago but I'll put in a pitch for "on the fly" use of variable transformations in estimation commands, e.g.
    Code:
    reg log(y) x1 x2
    poisson y x x^2 x^3
    etc.
    A subset of RHS transformations can be effected using factor variable notation, of course.

    Leave a comment:


  • Bruce Weaver
    replied
    David Speed, I agree with your suggestions in #513. Meanwhile, here is some code I cobbled together earlier this year to compute Welch's F. Maybe it will help in the meanwhile.

    Code:
    // Date:  15-Jun-2021
    // Use equations shown in Andy Field's document to compute
    // Welch's F-test.
    // https://www.discoveringstatistics.com/docs/welchf.pdf
    // The data AF uses come from this document:
    // https://www.discoveringstatistics.com/repository/contrasts.pdf
    
    clear
    input byte group y
    1 3  
    1 2  
    1 1  
    1 1  
    1 4
    2 5  
    2 2  
    2 4  
    2 2  
    2 3
    3 7  
    3 4  
    3 5  
    3 3  
    3 6
    end
    
    preserve
    drop if missing(group)
    statsby Ybar=r(mean) Var=r(Var) n=r(N), by(group) clear:  summarize y
    list
    quietly {
    generate double  Wt        = n/Var
    generate double  WtMean    = Wt*Ybar
    egen     double  SumWt     = total(Wt)
    egen     double  SumWtMean = total(WtMean)
    generate double  WelchGM   = SumWtMean/SumWt
    generate double  WtSqDev   = Wt*(Ybar-WelchGM)^2
    egen     double  SSmodel   = total(WtSqDev)
    generate byte    k         = _N // # of groups   
    generate double  MSmodel   = SSmodel/(k-1)
    generate double  Lterm     = (1-Wt/SumWt)^2 / (n-1)
    egen     double  lambda    = total(Lterm)
    replace          lambda    = lambda*3/(k^2-1)
    generate double  MSerror   = 1+2*lambda*(k-2)/3
    generate double  WelchF    = MSmodel/MSerror
    generate byte    df1       = k-1
    generate double  df2       = 1/lambda
    generate double  p         = Ftail(df1,df2,WelchF)
    }
    list WelchF-p in 1
    display               _newline ///
    "From SPSS ONEWAY:"   _newline ///
    "Welch F = "4.320451  _newline ///
    "    df1 = " 2              _newline ///
    "    df2 = " 7.943375     _newline ///    
    "      p = " .053738
    
    restore
    
    // Date:  17-Aug-2021
    // Compare Welch's F-test to multilevel modeling approach
    // in post #3 here:
    // https://www.statalist.org/forums/forum/general-stata-discussion/general/1427959-anova-rejection-of-bartlett-test
    // Note that I have added the dfmethod(anova) option.
    
    mixed y i.group, ml residuals(independent, by(group)) ///
    nolrtest nolog dfmethod(anova)
    
    margins group
    margins group, pwcompare(effects) mcompare(noadjust)
    
    // Compare to -regress- with vce(robust)
    regress y i.group, vce(robust)

    Leave a comment:

Working...
X