Stata 15 is here - Statalist

eric_a_booth

Join Date: Apr 2014

Posts: 288
#16

07 Jun 2017, 18:54

Originally posted by Alan Riley (StataCorp) View Post

Format 119 is almost identical to format 118, but allows for the larger variable numbers. Stata 14 and earlier cannot load datasets with 32,768 or more variables, so it doesn't really matter whether the format is 118 or 119 in that case -- there is no way they can load a dataset of that size.

In any case, the dataset format has not changed in Stata/IC 15, has not changed in Stata/SE 15, has not changed by default in Stata/MP 15, and only changes to allow larger variable numbers in Stata/MP when absolutely necessary.

Finally, saveold in Stata 15 allows datasets to be saved back to Stata 11 format.

Thanks - this is helpful information.
So, if I save a dataset using my Stata 15 MP and the dataset has >32768 variables (let's call it "mylargedataset.dta") can a collaborator using (1) Stata 15 SE and/or (2) Stata 14 of any flavor use the command

Code:

use var1 var2 var500 using mylargedataset.dta, clear

to access a portion of the data from the Stata 15 MP file ?
or would this create an error because the internal format (118/119) is inconsistent??

Would it be a better idea to save the large stata MP data set as:

Code:

preserve keep var1 var2 var500 save mylargedataset.dta, replace restore

(or would the 119 format prevent the Stata 14 or STata 15 SE user from accessing this dataset from Stata MP no matter what?)

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4959
#17

07 Jun 2017, 19:23

Just for the heck of it, I created a data set with 40,000 variables. I then went to Stata 14.2.

Code:

. use var1 var2 var500 using mylargedataset.dta, clear .dta too modern File mylargedataset.dta is from a more recent version of Stata. Type update query to determine whether a free update of Stata is available, and browse http://www.stata.com/ to determine if a new version is available. r(610);

Going back to 15,

Code:

. save "C:\StataData\mylargedataset.dta", replace file C:\StataData\mylargedataset.dta saved . keep var1 var2 var500 . . save mylargedataset.dta, replace file mylargedataset.dta saved

Back to 14.2,

Code:

. use mylargedataset, clear

No problem.

So no to your first idea, yes to your second.

I wonder if 14.2 could yet be tweaked to work with your option 1.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Daniel Bela

Join Date: Apr 2014

Posts: 246
#18

08 Jun 2017, 05:29

The same question came into my mind when I read this; thanks for having a look into this, Richard Williams.

This clearly answers eric_a_booth 's question (2) with "No"; can anyone additionally test this with Stata/IC or Stata/SE 15 [as of question (1)]?

Regards
Bela
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4959
#19

08 Jun 2017, 05:44

This clearly answers eric_a_booth 's question (2) with "No"; can anyone additionally test this with Stata/IC or Stata/SE 15 [as of question (1)]?

I should have added that I am using Stata 15/MP. I don't know if Stata 15/SE could handle monster files by limiting the number of variables selected.

It is an interesting question. I have a student using 13/IC who tried to work with a file having over 5,000 variables. She couldn't do it so she switched to a machine that could.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Alan Riley (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 168
#20

08 Jun 2017, 08:44

Stata/IC (14 or 15) can read any dataset created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as the dataset has no more than 2,047 variables.

Stata/IC (14 or 15) can read subsets of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

Stata/SE and Stata/MP (14 or 15) can read any dataset or subset of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

If a dataset has 32,768 or more variables, it must have been created by Stata/MP 15, and only Stata/MP 15 can read it.

Edit: and, to be clear, of course versions 14 and 15 of Stata/IC, Stata/SE, and Stata/MP can read all earlier Stata dataset formats created on any platform, all the way back to the first version of Stata 30-odd years ago.

Last edited by Alan Riley (StataCorp); 14 Jun 2017, 09:47.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4959
#21

08 Jun 2017, 09:15

If a dataset has 32,768 or more variables, it must have been created by Stata/MP 15, and only Stata/MP 15 can read it.

I was kind of disappointed with 14MP. By far my biggest jobs use sem and MP didn't seem to speed them up. But 15MP is a little cheaper I think and this monstrous number of variables ability could make it more attractive to some people.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
eric_a_booth

Join Date: Apr 2014

Posts: 288
#22

08 Jun 2017, 11:53

Originally posted by Alan Riley (StataCorp) View Post

Stata/IC (14 or 15) can read any dataset created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as the dataset has no more than 2,047 variables.

Stata/IC (14 or 15) can read subsets of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

Stata/SE (14 or 15) can read any dataset or subset of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

If a dataset has 32,768 or more variables, it must have been created by Stata/MP 15, and only Stata/MP 15 can read it.

Edit: and, to be clear, of course versions 14 and 15 of Stata/IC, Stata/SE, and Stata/MP can read all earlier Stata dataset formats created on any platform, all the way back to the first version of Stata 30-odd years ago.

Thanks Alan for this information.
I was also curious about a condition where I have a large >32768 variable dataset managed in MP that I want to send to a collaborator with SE. I wanted to know whether the internal 119 version you describe would persist if I reduced the dataset (thereby making the file unreadable to an SE version even if it had less than 32k vars), and separately whether the SE version could open a subset of the large file created by MP (via use varlist using file.dta).

The example below shows that a large >32768 variable file created by MP cannot be accessed by SE using a subset of the data and also that when the large file is reduced to >32k vars then the internal id changes back to 118 and is readable by SE (I was able to -use- this with Stata 15 SE and Stata 14 SE on mac and windows).

Output from this test:

Code:

. cap program drop whatversion . program define whatversion 1. syntax anything 2. file open handle using "`1'", read text 3. file seek handle 28 4. file read handle test 5. file close handle 6. noi di as smcl `"For file: `1'"' 7. noi di as smcl `"Header: `=substr(`"`test'"', 1, 3)'"' 8. end . . . . **test small file first . global mydir `"/users/ebooth/desktop//"' . sysuse auto, clear (1978 Automobile Data) . sa `"${mydir}/test.dta"', replace file /users/ebooth/desktop///test.dta saved . whatversion `"${mydir}/test.dta"' For file: /users/ebooth/desktop///test.dta Header: 118 . . . . **more than 35000 vars . clear . set maxvar 36000 . set obs 100 number of observations (_N) was 0, now 100 . forval n = 1/35000 { 2. g var`n' = 1 3. } . desc, sh Contains data obs: 100 vars: 35,000 size: 14,000,000 Sorted by: Note: Dataset has changed since last saved. . sa `"${mydir}mpfile.dta"', replace file /users/ebooth/desktop//mpfile.dta saved . whatversion `"${mydir}mpfile.dta"' For file: /users/ebooth/desktop//mpfile.dta Header: 119 . . . u var1-var10 using `"${mydir}mpfile.dta"' . sa `"${mydir}mpfile_extract.dta"' , replace file /users/ebooth/desktop//mpfile_extract.dta saved . whatversion `"${mydir}mpfile_extract.dta"' For file: /users/ebooth/desktop//mpfile_extract.dta Header: 118

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
Comment
Dave Airey

Join Date: Apr 2014

Posts: 396
#23

08 Jun 2017, 13:57

Respectfully, I had different feelings about this release. Nonlinear mixed models. Thank you. Groups in generalized SEM. Thank you. Bayes prefix. Thank you. Better reporting built in. Thanks again. Anyway, it's good to get different thoughts on this release as clearly you were disappointed across some issues of workflow / ease of use.

Dave

Originally posted by Ariel Karlinsky View Post

I'm sorry to say that on the face of it, this looks like a very disappointing release.
Most of these additions would better be charcaterized as simmilar to user addons than what I would expect from stata itself. It also seems to ignore most if not all of the suggestions and requests made by avid stata users in this very forum. I understand of course that statacorp can't do everything and please everyone, but It seems to me like there's ignorance of what users want and wish.
I would name a few, broad issues that I and others have mentioned and that should've been dealt with on the software level:

1. Interface and results window: The coercive abbreviation of output (of long variable names etc.) - At least give users the ability to decide whether or not they wish to abbreviate (the infamous ~) output.

2. No multi-core support in non-MP versions: Other stat software support multi-core (which is standard in computers now days, and has been for sometime) natively. The price differentiation between MP and non-MP flavors prevents users from utilizing the speed benefits of multi cores. A single "flavour" that utilizes multi cores, I think, is a long time coming.

3. Limited number of variables - while this has increased, there have been several discussions on this very forum how today's "big data" (or even "medium data") sets can have hundreds of thousands of variables. The current limit is not big enough for 2017 and many users (not in my field, btw) would not use stata due to this reason.

4. Better debugging - Being unable to even set a breakpoint in a do file can be extremly frustrating. debugging programs in stata is more art than sciense, with the user writing nonsense code where I want to program to stop (as it will exit due to error) just to "break" at a given time.

5. Incorporating general-use addons into vanilla stata - User addons are great, but I would have expected statacorp to work with package authors to get their packages into native stata. Packages that a large percentage of users use daily, and that even appear on the stata FAQ. such as outreg/estout, spmap, ivreg2 etc.

6. Working with several databases at the same time - I understand that this will mean a major shift in stata-philosophy, but since other stat software to this at ease, I see no reason for stata not to have this pretty basic feature - Instead the user has to juggle with multiple instances of stata, or keep clearing and using each dataset separately.

7. Speed improvements - I see very little mentioning of "under the hood" improvements, for example - are there not still built-in stata commands which have not yet been mata-ized?
2 likes
Comment
Ben Kriechel

Join Date: Jan 2015

Posts: 1
#24

09 Jun 2017, 04:43

Another thing.

I noticed that the download page only allows five downloads of the software while it seems to be the default option. I am not certain this is introduced now or has been introduced at some earlier version already.

This struck me as inconvenient as I am not sure how often I will use this download option. Of course, I can download and archive the installation file(s), but is it necessary? It would be quite easy to fix, but I might overlook the underlying reason or overestimate the nuisance to the user.

--- Ben

------------------------------------------
Ben Kriechel, Economix Research & Consulting (Munich)
Stata Version: 15SE
E: [email protected]
WWW: http://www.kriechel.eu/
Comment
Alan Riley (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 168
#25

09 Jun 2017, 08:47

Originally posted by Ben Kriechel View Post

I noticed that the download page only allows five downloads of the software while it seems to be the default option. I am not certain this is introduced now or has been introduced at some earlier version already.

This struck me as inconvenient as I am not sure how often I will use this download option. Of course, I can download and archive the installation file(s), but is it necessary? It would be quite easy to fix, but I might overlook the underlying reason or overestimate the nuisance to the user.

This isn't a new limit. The idea is that if you accidentally leave your download information where someone else sees it, an entire class won't accidentally download your copy. If you run into an issue with the limit though, just email us and our sales group is happy to reset it.
Comment
Emmanuel Segui

Join Date: Mar 2017

Posts: 3
#26

09 Jun 2017, 14:41

I'm very interested in the markdown module to create html and docx. Do I need a full upgrade just for this feature? Thanks.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4959
#27

09 Jun 2017, 15:01

Originally posted by Emmanuel Segui View Post

I'm very interested in the markdown module to create html and docx. Do I need a full upgrade just for this feature? Thanks.

If you do -findit markdown- various user written routines show up. I don't know they compare with what is in Stata 15.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Mohammad Shoaib

Join Date: Jan 2016

Posts: 4
#28

12 Jun 2017, 01:47

Is it possible to in Stata 15 to import data in Persian language? It is not working very well in Stata 14.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4375
#29

12 Jun 2017, 04:41

Assuming that you're talking about a delimited text file, import delimited in Stata 14 should be able to import a file containing Persian language if it is encoded in one of the encodings that Java uses. See help import delimited for more information about which encodings are available.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#30

12 Jun 2017, 14:03

Mohammad Shoaib could you please elaborate on the import/export of the Persian language:

It is not working very well in Stata 14

Thank you, Sergiy
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment