Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nick Cox
    replied
    Clyde's point, I imagine, is that using variables and also an if qualifier would not be consistent with other syntax. Consider

    Code:
    summarize x y if foo == 42 


    There is no problem here because selecting on variables and selecting on observations are just different ways of selecting data and compatible with each other.

    But


    Code:
    keep x y


    Code:
     keep if foo == 42
    select data in different dimensions: you can keep (or drop) variables OR you can do that to observations. As far as Stata is concerned

    Code:
     keep x y if foo == 42
    could only mean selecting in both directions at once, which at its simplest is cutting a rectangular hole in the dataset and throwing the data away, which Stata doesn't allow. .

    I (we) understand that you want

    Code:
    keep x y if foo == 42
    to be acceptable short-hand for

    Code:
    keep x y
    keep if foo == 42
    but I too would support StataCorp never, ever allowing that. Consider that

    Code:
    summarize x y 
    summarize if foo == 42
    are not at all equivalent to

    Code:
    summarize x y if foo == 42
    But you can program this for yourself! Something like

    Code:
    * not recommended by its author 
    program nigelkeep 
       syntax [varlist] [if] 
       if `"`varlist'`if'"' == "" error 198 
       if "`varlist'" != "" keep `varlist' 
       if `"`if'"' != "" keep `if' 
    end

    Leave a comment:


  • Nigel Moore
    replied
    Originally posted by Clyde Schechter View Post
    Re #20: I don't think that would be a good idea. It would make the semantics of -if- qualifiers different for just the -keep- and -drop- commands. The -if- qualifier is very specific: it identifies a subset of the observations to which the command will be applied. That's not relevant to -keep-ing and -drop-ing variables.
    I'm not sure that I follow your logic, Clyde. Using the -if- qualifier with -keep- and -drop- limits those commands to the subset just like any other command.

    Leave a comment:


  • Richard Williams
    replied
    Originally posted by Bruce Weaver View Post
    Making nestreg support factor variable operators would be a nice improvement, IMO. Is there some technical reason why this is not currently supported?

    Cheers,
    Bruce
    I agree. But I'll note that, much a I hate hate hate the old xi: prefix, you could use it with nestreg if you want to avoid creating dummies and interaction yourself.

    Leave a comment:


  • Bruce Weaver
    replied
    Making nestreg support factor variable operators would be a nice improvement, IMO. Is there some technical reason why this is not currently supported?

    Cheers,
    Bruce

    Leave a comment:


  • Clyde Schechter
    replied
    Re #20: I don't think that would be a good idea. It would make the semantics of -if- qualifiers different for just the -keep- and -drop- commands. The -if- qualifier is very specific: it identifies a subset of the observations to which the command will be applied. That's not relevant to -keep-ing and -drop-ing variables. Some more flexible syntax for specifying varlists beyond what is afforded by wildcards might be useful, for this and other purposes, but overloading -if- to do it would, in my view, be an invitation to trouble.

    Re #21:

    1) See https://www.statalist.org/forums/for...-inside-a-loop for a way around this limitation. You still have to do 50 read operations, but they do not read the entire data set each time.

    2) If the subset of the using data that will be needed for any given observation can be specified by an interval to which the value of a variable belongs, then Robert Picard's -rangejoin- command will handle this, and it is much faster than -merge-. See also Sergio Correa's -ftools- package, which deals with the speed issue. (I don't know if it also deals with the memory issue.)

    3) Yes, I agree. This behavior is quite unexpected, and the syntactic acceptance of a vector where only a scalar is admissible leads unwary users down the garden path. More important than the number of people who post here because of their "inexplicable" results from doing this, is the possibly larger number of people who are producing garbage results and are completely unaware of it! On the other hand, Stata has always been fairly generous in allowing people to take shortcuts in syntax that reduce typing. I suspect there is a fair amount of legacy code out there that relies on this convention and would break if this change were made. Perhaps implementing this prohibition, but allowing it under version control would be a reasonable compromise.

    4) Why isn't -quietly- the solution to this?

    5) Yes, that might be helpful.

    6) No comment. Not relevant to my work, but I can see where this would be useful to others. On the other hand, I wonder if there are unintended consequences.
    Last edited by Clyde Schechter; 24 Sep 2017, 09:38.

    Leave a comment:


  • Daniel Feenberg
    replied
    I have several requests:

    1) All input and output commands should allow subsetting by variable or observation. I find it frustrating that to
    save a national file into 51 separate files (one for each state) I have to reread the national file 50 times.

    2) Merge is very slow compared in use - order(s) of magnitude slower. Also, it would be nice if the user could (optionally)
    specify the amount of memory required for the result, so that cases where the Stata estimate is larger than available memory,
    but smaller than the true requirement, could still be run. We run into this at NBER because typically only a tiny fraction
    of a very large dataset is kept.

    3) An attempt to use the -if- command with a variable should return an error (or at least a warning) rather than using the
    first observation.

    4) Provide a way to suppress the "NNN values changed" message in do files, without suppressing actual error messages..

    5) Provide a way to add the variable name to the "NNN values changed" message.

    6) Allow http uploads as well as downloads. We operate a remote SAAS that calculates income tax liability. Currently
    we provide an ado file (-taxsim9.ado-) that invoke ftp to upload the data, but that is a hack.

    Leave a comment:


  • Nigel Moore
    replied
    Originally posted by Jesse Wursten View Post
    It would be nice (but not essential) if the keep command had an "order" option, which orders the variables you keep in the order you specify them in the command. This is of course trivial to implement as a user-written command, but it's something I find myself doing quite regularly. Doing it manually every time is a bit inconvenient because the actual variables to keep might change regularly and then it's a hassle to change it in both. Using a local varlist seems overkill.
    I recently found myself trying to use
    Code:
    keep varlist if exp
    to avoid two -drop- commands (one to drop the unwanted variables, the other to truncate the desired variables to a subset.) Being able to mix varlist and if options with -keep- and -drop- would be nice.

    Leave a comment:


  • Jesse Wursten
    replied
    It would be nice (but not essential) if the keep command had an "order" option, which orders the variables you keep in the order you specify them in the command. This is of course trivial to implement as a user-written command, but it's something I find myself doing quite regularly. Doing it manually every time is a bit inconvenient because the actual variables to keep might change regularly and then it's a hassle to change it in both. Using a local varlist seems overkill.
    Last edited by Jesse Wursten; 15 Sep 2017, 06:53.

    Leave a comment:


  • Belinda Foster
    replied
    In addition, to the various enhancements that others have suggested for the do file editor, i would really like to see permanent bookmarks. Every time i close Stata or if there is a crash, all my bookmarks disappear, which is very annoying as it is very time-consuming to set them all up again. And that's assuming i can remember them all!

    Leave a comment:


  • Nigel Moore
    replied
    Originally posted by Nigel Moore View Post
    AFAIK, the two Dunnet's supported by Stata require balanced data sets, certainly Theresa Powell's does)
    Confirmed, -pwcompare, mcompare(dunnett)- does require balanced datasets.

    In the biological/medical sciences we may well start off with balanced designs, but we often end up with unbalanced datasets. Other software packages can deal with this. In 2017 why cannot Stata? It's pretty lame.

    Leave a comment:


  • Cynthia Inglesias
    replied

    It will be very helpful if the next version of Stata provides easy access to operating system metrics (through c-class values perhaps). For example:

    Code:
     - the number of monitors
     - the resolution of each monitor
     - an indicator about whether a Stata window is maximised
     - the Stata results window dimensions
     - the number of monitors that the Stata results window spans across

    It would also be great if the programmer can dynamically adjust the dimensions of the Stata results window and can set where a dialog box appears on screen.

    Leave a comment:


  • Richard Williams
    replied
    Cynthia Inglesias, I'll add that Stata/MP doesn't seem to speed up sem, at least not the sem models I run. Indeed, when I run sem on these monster UNIX machines, tasks with enormous data sets run much faster but sem doesn't. As I understand it, the UNIX machines have enormous amounts of memory but their processors are no faster than my desktop, maybe even slower.

    So yes, it would be great if Stata were faster with large data sets. But there are other programs, such as sem, which also would be nice to speed up, presumably by better algorithms. (I repeat my request that Stata buy out MPlus or else figure out how to reverse-engineer whatever it is it does to zip fast the competition -- not only Stata, but perhaps everybody else.)

    Leave a comment:


  • Cynthia Inglesias
    replied
    Richard Williams Recently, there was an interesting presentation by Sergio Correia regarding speeding up inefficient Stata commands: https://www.stata.com/meeting/baltim...17_Correia.pdf

    StataCorp should finally implement these suggestions in Stata. I understand that the company has certain priorities but the latest version was a disappointment with regards to features relating to speed improvements and mata. Not everyone has access to Stata MP and datasets become increasingly bigger. Asking people to buy Stata MP for data manipulation is ridiculous.

    Leave a comment:


  • Dario Maimone Ansaldo Patti
    replied
    I would like to see more models to be Estimated using bayes. In particular, i would like to estimate bayesian spatial models using stata instead of being forced to use Matlab or R
    ​​​​​

    Leave a comment:


  • Andrew Wade
    replied
    Hi,
    Having just had troubles replicating results first generated in SAS....I suggest that the egen suite of commands have the option of using weights.
    This may of course not be relevant for all of the egen commands. But it was for what I was doing...using the std command.
    Regards,
    Andrew

    Leave a comment:

Working...
X