In my opinion, Stata is an excellent statistics software -- almost. For many, the criteria for choosing a specific statistics software may be ease of use, expandability, cost, or compatibility with other programs. For me, the most impressive feature is that my statistical thinking (and the methods I use) have improved significantly since I started using Stata -- to which the Stata Forum and its friendly and knowledgeable members are contributing substantially.
Stata also impresses with its consistency, integrity, and error tolerance: it largely protects against the use of statistical models that are unsuitable for the nature of the variables, and it is almost impossible to change variables unintentionally.
However, the word “almost impossible” is both a decisive point and a downside for me.
As far as I know, there are two situations in which the protection from unintended modification of data is not guaranteed:
Example:
Perhaps someone is volunteering to take up the idea to write universally applicable alternatives for mark and markout?
With such improvements, Stata would indeed be excellent and in many respects, at least for me, clearly superior to statistical programs such as R, SAS (and definitely SPSS).
Stata also impresses with its consistency, integrity, and error tolerance: it largely protects against the use of statistical models that are unsuitable for the nature of the variables, and it is almost impossible to change variables unintentionally.
However, the word “almost impossible” is both a decisive point and a downside for me.
As far as I know, there are two situations in which the protection from unintended modification of data is not guaranteed:
- Stata allows you to save data that contain temporary variables. The negative (and probably unnoticed) consequences of this became apparent in my last post, and I demonstrated them elsewhere (admittedly with complex examples). I am aware that changing save in such a way that users must explicitly allow the saving of temporary variables carries the risk of breaking older syntax. Nevertheless, I think the advantages outweigh the disadvantages—and version control, which is recommended anyway, could prevent this scenario (it is also conceivable that such a change to save would only work if the syntax contains a specification of version somewhere in the current .do-file).
- Another situation is markout. Here, it is easy to make the mistake of not specifying the markout variable as the first variable. The result is a (probably unnoticed) change to the variable specified here. One solution could be that when the markout variable is specified using mark, its characteristic is set accordingly and markout requires this characteristic to exist. However, changing mark and markout accordingly would break existing code. Another possibility would be to use two alternative commands, mark2 and markout2 (or perhaps better do_mark and do_markout for .do files) so that mark and markout can still be used for .ado files, and to explicitly point out the dangers and the alternative in the manual entry for mark and markout. A first, not (!) yet universally applicable example of this would be
Code:
program do_mark syntax newvarlist(max=1) mark `varlist' notes `varlist': markout variable end program do_markout, rclass syntax varlist [, NUMeric] local mov : word 1 of `varlist' local note : char `mov'[note1] if "`note'" != "markout variable" { di as err "{bf:`mov'} is not a to-use variable for -do_markout-" error 499 exit } else { local varlist : list varlist - mov local varlist : list varlist - mov // if _all has been used for varlist if "`numeric'"=="" { markout `mov' `varlist' return local markvars `varlist' } else { foreach v of varlist `varlist' { cap confirm numeric variable `v' if _rc local strvars "`strvars' `v'" } local nstr : word count `strvars' local numvars : list varlist-strvars markout `mov' `numvars' di as txt "(`nstr' string variable(s) not used by -markout-)" return local strvars `strvars' return local markvars `numvars' } return local tousevar `mov' } end
Code:
. sysuse auto, clear (1978 automobile data) . . do_mark valid . do_markout valid rep78 . sum _all if valid Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- make | 0 price | 69 6146.043 2912.44 3291 15906 mpg | 69 21.28986 5.866408 12 41 rep78 | 69 3.405797 .9899323 1 5 headroom | 69 3 .8531947 1.5 5 -------------+--------------------------------------------------------- trunk | 69 13.92754 4.343077 5 23 weight | 69 3032.029 792.8515 1760 4840 length | 69 188.2899 22.7474 142 233 turn | 69 39.7971 4.441051 31 51 displacement | 69 198 93.14789 79 425 -------------+--------------------------------------------------------- gear_ratio | 69 2.999275 .4626818 2.19 3.89 foreign | 69 .3043478 .4635016 0 1 valid | 69 1 0 1 1 . . drop valid . mark valid . do_markout valid valid is not a to-use variable for -do_markout- r(499); end of do-file r(499);
With such improvements, Stata would indeed be excellent and in many respects, at least for me, clearly superior to statistical programs such as R, SAS (and definitely SPSS).
Comment