Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Richard Williams
    replied
    This clearly answers eric_a_booth 's question (2) with "No"; can anyone additionally test this with Stata/IC or Stata/SE 15 [as of question (1)]?
    I should have added that I am using Stata 15/MP. I don't know if Stata 15/SE could handle monster files by limiting the number of variables selected.

    It is an interesting question. I have a student using 13/IC who tried to work with a file having over 5,000 variables. She couldn't do it so she switched to a machine that could.

    Leave a comment:


  • Daniel Bela
    replied
    The same question came into my mind when I read this; thanks for having a look into this, Richard Williams.

    This clearly answers eric_a_booth 's question (2) with "No"; can anyone additionally test this with Stata/IC or Stata/SE 15 [as of question (1)]?

    Regards
    Bela

    Leave a comment:


  • Richard Williams
    replied
    Just for the heck of it, I created a data set with 40,000 variables. I then went to Stata 14.2.

    Code:
    .  use  var1 var2 var500  using mylargedataset.dta, clear
    .dta too modern
        File mylargedataset.dta is from a more recent version of Stata.  Type update query to determine whether a free update of Stata is
        available, and browse http://www.stata.com/ to determine if a new version is available.
    r(610);
    Going back to 15,

    Code:
    . save "C:\StataData\mylargedataset.dta", replace
    file C:\StataData\mylargedataset.dta saved
    
    . keep var1 var2 var500  
    . 
    . save   mylargedataset.dta, replace
    file mylargedataset.dta saved
    Back to 14.2,
    Code:
    . use mylargedataset, clear
    No problem.

    So no to your first idea, yes to your second.

    I wonder if 14.2 could yet be tweaked to work with your option 1.



    Leave a comment:


  • eric_a_booth
    replied
    Originally posted by Alan Riley (StataCorp) View Post

    Format 119 is almost identical to format 118, but allows for the larger variable numbers. Stata 14 and earlier cannot load datasets with 32,768 or more variables, so it doesn't really matter whether the format is 118 or 119 in that case -- there is no way they can load a dataset of that size.

    In any case, the dataset format has not changed in Stata/IC 15, has not changed in Stata/SE 15, has not changed by default in Stata/MP 15, and only changes to allow larger variable numbers in Stata/MP when absolutely necessary.

    Finally, saveold in Stata 15 allows datasets to be saved back to Stata 11 format.


    Thanks - this is helpful information.
    So, if I save a dataset using my Stata 15 MP and the dataset has >32768 variables (let's call it "mylargedataset.dta") can a collaborator using (1) Stata 15 SE and/or (2) Stata 14 of any flavor use the command

    Code:
     use  var1 var2 var500  using mylargedataset.dta, clear
    to access a portion of the data from the Stata 15 MP file ?
    or would this create an error because the internal format (118/119) is inconsistent??

    Would it be a better idea to save the large stata MP data set as:

    Code:
    preserve
    keep var1 var2 var500  
    save   mylargedataset.dta, replace
    restore
    (or would the 119 format prevent the Stata 14 or STata 15 SE user from accessing this dataset from Stata MP no matter what?)

    Leave a comment:


  • Alan Riley (StataCorp)
    replied
    Originally posted by Friedrich Huebler View Post
    Is the format of Stata datasets the same as before or has it changed?
    Both.

    Stata 14's dataset format is what we call internally '118' format, as that is the version number in its header. Format 118 is also used by Stata/IC 15, Stata/SE 15, and, by default, in Stata/MP 15.

    Format 118 uses two bytes to represent variable numbers, and as such, Stata/MP 15's new maximum of 120,000 variables is impossible to save in a format 118 dataset. If a dataset in Stata/MP 15 has anywhere between 32,768 variables and 120,000 variables, then format 119 is used. Format 119 is almost identical to format 118, but allows for the larger variable numbers. Stata 14 and earlier cannot load datasets with 32,768 or more variables, so it doesn't really matter whether the format is 118 or 119 in that case -- there is no way they can load a dataset of that size.

    In any case, the dataset format has not changed in Stata/IC 15, has not changed in Stata/SE 15, has not changed by default in Stata/MP 15, and only changes to allow larger variable numbers in Stata/MP when absolutely necessary.

    Finally, saveold in Stata 15 allows datasets to be saved back to Stata 11 format.

    Leave a comment:


  • Richard Williams
    replied
    From a note I got from Stata:

    In general, the dataset format for Stata 15 remains the same as for Stata 14, and I believe Stat/Transfer 13 will still work. The exception is that Stata/MP now allows for up to 120,000 variables, so a new dataset format is used for those sets between 32,768 and 120,000 variables.

    Leave a comment:


  • Nick Cox
    replied
    Friedrich: I read this as Steve Dubnoff needing to change his code to accommodate the larger number of variables allowed. So, the dataset format is the same, but that doesn't mean that other programs can read the larger datasets. Clearly this is a guess.

    Leave a comment:


  • Friedrich Huebler
    replied
    Originally posted by Bill Gould (StataCorp) View Post
    Stata 15's dataset format is the same as Stata 14's.
    Originally posted by Steve Dubnoff View Post
    We have just been informed by StataCorp (thank you), that there are changes in the file format to accommodate the larger number of variables (> 32K) that are permitted in Stata/MP version 15.
    Is the format of Stata datasets the same as before or has it changed?

    Leave a comment:


  • Ariel Karlinsky
    replied
    Sebastian,

    1. I was referring to other commands, such as summarize or table

    4. Many workarounds exist. Yet typing commands (and thus changing the file, even if only temporarliy) instead of specifying break-points is bad practice in all programming languages i'm familiar with.

    5. your'e correct regarding the new sp commands. when I looked into the spatial models on the stata 15 page, I thought they used examples with old user packages such as shp2dta - I didn't understand that sp is a new host of stata-vanilla commands. I applaud this suite of commands indeed!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Ariel Karlinsky:

    1. Isn't it already the default setting that variable names are not abbreviated but instead the regression output tables are stretched? See option nolstretch in help estimation options##display_options.

    4. You can put an exit in your do-files to force a break.

    5. The new sp commands are such an example where user-written addons now became part of native Stata. As you mentioned the spmap command, the new command grmap steps into its place in native Stata if I interpret the documentation correctly.

    Leave a comment:


  • Ariel Karlinsky
    replied
    I'm sorry to say that on the face of it, this looks like a very disappointing release.
    Most of these additions would better be charcaterized as simmilar to user addons than what I would expect from stata itself. It also seems to ignore most if not all of the suggestions and requests made by avid stata users in this very forum. I understand of course that statacorp can't do everything and please everyone, but It seems to me like there's ignorance of what users want and wish.
    I would name a few, broad issues that I and others have mentioned and that should've been dealt with on the software level:

    1. Interface and results window: The coercive abbreviation of output (of long variable names etc.) - At least give users the ability to decide whether or not they wish to abbreviate (the infamous ~) output.

    2. No multi-core support in non-MP versions: Other stat software support multi-core (which is standard in computers now days, and has been for sometime) natively. The price differentiation between MP and non-MP flavors prevents users from utilizing the speed benefits of multi cores. A single "flavour" that utilizes multi cores, I think, is a long time coming.

    3. Limited number of variables - while this has increased, there have been several discussions on this very forum how today's "big data" (or even "medium data") sets can have hundreds of thousands of variables. The current limit is not big enough for 2017 and many users (not in my field, btw) would not use stata due to this reason.

    4. Better debugging - Being unable to even set a breakpoint in a do file can be extremly frustrating. debugging programs in stata is more art than sciense, with the user writing nonsense code where I want to program to stop (as it will exit due to error) just to "break" at a given time.

    5. Incorporating general-use addons into vanilla stata - User addons are great, but I would have expected statacorp to work with package authors to get their packages into native stata. Packages that a large percentage of users use daily, and that even appear on the stata FAQ. such as outreg/estout, spmap, ivreg2 etc.

    6. Working with several databases at the same time - I understand that this will mean a major shift in stata-philosophy, but since other stat software to this at ease, I see no reason for stata not to have this pretty basic feature - Instead the user has to juggle with multiple instances of stata, or keep clearing and using each dataset separately.

    7. Speed improvements - I see very little mentioning of "under the hood" improvements, for example - are there not still built-in stata commands which have not yet been mata-ized?

    Leave a comment:


  • Kimin Eom
    replied
    Has STATA15 been improved in its estimation speed by chance? (general SEM in particular) I tried to run a multilevel mediation model using gsem and it took more than a week and ended up giving up...

    Leave a comment:


  • Steve Dubnoff
    replied
    We have just been informed by StataCorp (thank you), that there are changes in the file format to accommodate the larger number of variables (> 32K) that are permitted in Stata/MP version 15. Stat/Transfer will support this change in the next release. In the meantime, those with more moderately sized datasets will have no problems using version 13 of Stat/Transfer with Version 15 of Stata.

    Leave a comment:


  • Richard Williams
    replied
    Originally posted by Richard Williams View Post
    Is Stat/Transfer 14 also on the way? I didn't see an option to order it.
    To answer my own question, Stat/Transfer wrote "Stat/Transfer version 14 is available in the upcoming weeks. Version 13 will work for now with Stata 15 but no prior versions of Stat/Transfer will."

    Leave a comment:


  • Richard Williams
    replied
    Is Stat/Transfer 14 also on the way? I didn't see an option to order it.

    Leave a comment:

Working...
X