Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Alan Riley (StataCorp) View Post

    Format 119 is almost identical to format 118, but allows for the larger variable numbers. Stata 14 and earlier cannot load datasets with 32,768 or more variables, so it doesn't really matter whether the format is 118 or 119 in that case -- there is no way they can load a dataset of that size.

    In any case, the dataset format has not changed in Stata/IC 15, has not changed in Stata/SE 15, has not changed by default in Stata/MP 15, and only changes to allow larger variable numbers in Stata/MP when absolutely necessary.

    Finally, saveold in Stata 15 allows datasets to be saved back to Stata 11 format.


    Thanks - this is helpful information.
    So, if I save a dataset using my Stata 15 MP and the dataset has >32768 variables (let's call it "mylargedataset.dta") can a collaborator using (1) Stata 15 SE and/or (2) Stata 14 of any flavor use the command

    Code:
     use  var1 var2 var500  using mylargedataset.dta, clear
    to access a portion of the data from the Stata 15 MP file ?
    or would this create an error because the internal format (118/119) is inconsistent??

    Would it be a better idea to save the large stata MP data set as:

    Code:
    preserve
    keep var1 var2 var500  
    save   mylargedataset.dta, replace
    restore
    (or would the 119 format prevent the Stata 14 or STata 15 SE user from accessing this dataset from Stata MP no matter what?)
    Eric A. Booth | Senior Research Statistician | FH LLC | Austin TX
    Specs: Stata 16 MP (4 core) Mac OSX and Windows 10 Pro

    Comment


    • #17
      Just for the heck of it, I created a data set with 40,000 variables. I then went to Stata 14.2.

      Code:
      .  use  var1 var2 var500  using mylargedataset.dta, clear
      .dta too modern
          File mylargedataset.dta is from a more recent version of Stata.  Type update query to determine whether a free update of Stata is
          available, and browse http://www.stata.com/ to determine if a new version is available.
      r(610);
      Going back to 15,

      Code:
      . save "C:\StataData\mylargedataset.dta", replace
      file C:\StataData\mylargedataset.dta saved
      
      . keep var1 var2 var500  
      . 
      . save   mylargedataset.dta, replace
      file mylargedataset.dta saved
      Back to 14.2,
      Code:
      . use mylargedataset, clear
      No problem.

      So no to your first idea, yes to your second.

      I wonder if 14.2 could yet be tweaked to work with your option 1.



      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 16.1MP (2 processor)

      EMAIL: rwilliam@ND.Edu
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #18
        The same question came into my mind when I read this; thanks for having a look into this, Richard Williams.

        This clearly answers eric_a_booth 's question (2) with "No"; can anyone additionally test this with Stata/IC or Stata/SE 15 [as of question (1)]?

        Regards
        Bela

        Comment


        • #19
          This clearly answers eric_a_booth 's question (2) with "No"; can anyone additionally test this with Stata/IC or Stata/SE 15 [as of question (1)]?
          I should have added that I am using Stata 15/MP. I don't know if Stata 15/SE could handle monster files by limiting the number of variables selected.

          It is an interesting question. I have a student using 13/IC who tried to work with a file having over 5,000 variables. She couldn't do it so she switched to a machine that could.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          Stata Version: 16.1MP (2 processor)

          EMAIL: rwilliam@ND.Edu
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #20
            Stata/IC (14 or 15) can read any dataset created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as the dataset has no more than 2,047 variables.

            Stata/IC (14 or 15) can read subsets of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

            Stata/SE and Stata/MP (14 or 15) can read any dataset or subset of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

            If a dataset has 32,768 or more variables, it must have been created by Stata/MP 15, and only Stata/MP 15 can read it.

            Edit: and, to be clear, of course versions 14 and 15 of Stata/IC, Stata/SE, and Stata/MP can read all earlier Stata dataset formats created on any platform, all the way back to the first version of Stata 30-odd years ago.
            Last edited by Alan Riley (StataCorp); 14 Jun 2017, 09:47.

            Comment


            • #21
              If a dataset has 32,768 or more variables, it must have been created by Stata/MP 15, and only Stata/MP 15 can read it.
              I was kind of disappointed with 14MP. By far my biggest jobs use sem and MP didn't seem to speed them up. But 15MP is a little cheaper I think and this monstrous number of variables ability could make it more attractive to some people.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 16.1MP (2 processor)

              EMAIL: rwilliam@ND.Edu
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #22
                Originally posted by Alan Riley (StataCorp) View Post
                Stata/IC (14 or 15) can read any dataset created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as the dataset has no more than 2,047 variables.

                Stata/IC (14 or 15) can read subsets of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

                Stata/SE (14 or 15) can read any dataset or subset of variables from datasets created with Stata/IC (14 or 15), Stata/SE (14 or 15), or Stata/MP (14 or 15) so long as those datasets have no more than 32,767 variables.

                If a dataset has 32,768 or more variables, it must have been created by Stata/MP 15, and only Stata/MP 15 can read it.

                Edit: and, to be clear, of course versions 14 and 15 of Stata/IC, Stata/SE, and Stata/MP can read all earlier Stata dataset formats created on any platform, all the way back to the first version of Stata 30-odd years ago.

                Thanks Alan for this information.
                I was also curious about a condition where I have a large >32768 variable dataset managed in MP that I want to send to a collaborator with SE. I wanted to know whether the internal 119 version you describe would persist if I reduced the dataset (thereby making the file unreadable to an SE version even if it had less than 32k vars), and separately whether the SE version could open a subset of the large file created by MP (via use varlist using file.dta).

                The example below shows that a large >32768 variable file created by MP cannot be accessed by SE using a subset of the data and also that when the large file is reduced to >32k vars then the internal id changes back to 118 and is readable by SE (I was able to -use- this with Stata 15 SE and Stata 14 SE on mac and windows).

                Output from this test:
                Code:
                . cap program drop whatversion
                
                . program define whatversion 
                  1. syntax anything
                  2.                 file open handle using "`1'", read text
                  3.                 file seek handle 28
                  4.         file read handle test
                  5.         file close handle
                  6.                 noi di as smcl `"For file: `1'"'
                  7.                 noi di as smcl `"Header: `=substr(`"`test'"', 1, 3)'"'
                  8. end
                
                . 
                . 
                . 
                . **test small file first
                . global mydir `"/users/ebooth/desktop//"'
                
                . sysuse auto, clear
                (1978 Automobile Data)
                
                . sa `"${mydir}/test.dta"', replace
                file /users/ebooth/desktop///test.dta saved
                
                .         whatversion `"${mydir}/test.dta"'
                For file: /users/ebooth/desktop///test.dta
                Header: 118
                
                . 
                .         
                .         
                . **more than 35000 vars
                . clear
                
                . set maxvar 36000
                
                
                . set obs 100
                number of observations (_N) was 0, now 100
                
                . forval n = 1/35000 {
                  2.         g var`n' = 1
                  3.         }
                
                .         desc, sh
                
                Contains data
                  obs:           100                          
                 vars:        35,000                          
                 size:    14,000,000                          
                Sorted by: 
                     Note: Dataset has changed since last saved.
                
                . sa `"${mydir}mpfile.dta"', replace
                file /users/ebooth/desktop//mpfile.dta saved
                
                .         whatversion `"${mydir}mpfile.dta"'
                For file: /users/ebooth/desktop//mpfile.dta
                Header: 119
                
                . 
                .         
                . u var1-var10 using `"${mydir}mpfile.dta"'
                
                . sa `"${mydir}mpfile_extract.dta"'       , replace
                file /users/ebooth/desktop//mpfile_extract.dta saved
                
                .         whatversion `"${mydir}mpfile_extract.dta"'      
                For file: /users/ebooth/desktop//mpfile_extract.dta
                Header: 118
                Eric A. Booth | Senior Research Statistician | FH LLC | Austin TX
                Specs: Stata 16 MP (4 core) Mac OSX and Windows 10 Pro

                Comment


                • #23
                  Respectfully, I had different feelings about this release. Nonlinear mixed models. Thank you. Groups in generalized SEM. Thank you. Bayes prefix. Thank you. Better reporting built in. Thanks again. Anyway, it's good to get different thoughts on this release as clearly you were disappointed across some issues of workflow / ease of use.

                  Dave

                  Originally posted by Ariel Karlinsky View Post
                  I'm sorry to say that on the face of it, this looks like a very disappointing release.
                  Most of these additions would better be charcaterized as simmilar to user addons than what I would expect from stata itself. It also seems to ignore most if not all of the suggestions and requests made by avid stata users in this very forum. I understand of course that statacorp can't do everything and please everyone, but It seems to me like there's ignorance of what users want and wish.
                  I would name a few, broad issues that I and others have mentioned and that should've been dealt with on the software level:

                  1. Interface and results window: The coercive abbreviation of output (of long variable names etc.) - At least give users the ability to decide whether or not they wish to abbreviate (the infamous ~) output.

                  2. No multi-core support in non-MP versions: Other stat software support multi-core (which is standard in computers now days, and has been for sometime) natively. The price differentiation between MP and non-MP flavors prevents users from utilizing the speed benefits of multi cores. A single "flavour" that utilizes multi cores, I think, is a long time coming.

                  3. Limited number of variables - while this has increased, there have been several discussions on this very forum how today's "big data" (or even "medium data") sets can have hundreds of thousands of variables. The current limit is not big enough for 2017 and many users (not in my field, btw) would not use stata due to this reason.

                  4. Better debugging - Being unable to even set a breakpoint in a do file can be extremly frustrating. debugging programs in stata is more art than sciense, with the user writing nonsense code where I want to program to stop (as it will exit due to error) just to "break" at a given time.

                  5. Incorporating general-use addons into vanilla stata - User addons are great, but I would have expected statacorp to work with package authors to get their packages into native stata. Packages that a large percentage of users use daily, and that even appear on the stata FAQ. such as outreg/estout, spmap, ivreg2 etc.

                  6. Working with several databases at the same time - I understand that this will mean a major shift in stata-philosophy, but since other stat software to this at ease, I see no reason for stata not to have this pretty basic feature - Instead the user has to juggle with multiple instances of stata, or keep clearing and using each dataset separately.

                  7. Speed improvements - I see very little mentioning of "under the hood" improvements, for example - are there not still built-in stata commands which have not yet been mata-ized?

                  Comment


                  • #24
                    Another thing.

                    I noticed that the download page only allows five downloads of the software while it seems to be the default option. I am not certain this is introduced now or has been introduced at some earlier version already.

                    This struck me as inconvenient as I am not sure how often I will use this download option. Of course, I can download and archive the installation file(s), but is it necessary? It would be quite easy to fix, but I might overlook the underlying reason or overestimate the nuisance to the user.

                    --- Ben
                    ------------------------------------------
                    Ben Kriechel, Economix Research & Consulting (Munich)
                    Stata Version: 15SE
                    E: ben@kriechel.eu

                    WWW: http://www.kriechel.eu/

                    Comment


                    • #25
                      Originally posted by Ben Kriechel View Post
                      I noticed that the download page only allows five downloads of the software while it seems to be the default option. I am not certain this is introduced now or has been introduced at some earlier version already.

                      This struck me as inconvenient as I am not sure how often I will use this download option. Of course, I can download and archive the installation file(s), but is it necessary? It would be quite easy to fix, but I might overlook the underlying reason or overestimate the nuisance to the user.
                      This isn't a new limit. The idea is that if you accidentally leave your download information where someone else sees it, an entire class won't accidentally download your copy. If you run into an issue with the limit though, just email us and our sales group is happy to reset it.

                      Comment


                      • #26
                        I'm very interested in the markdown module to create html and docx. Do I need a full upgrade just for this feature? Thanks.

                        Comment


                        • #27
                          Originally posted by Emmanuel Segui View Post
                          I'm very interested in the markdown module to create html and docx. Do I need a full upgrade just for this feature? Thanks.
                          If you do -findit markdown- various user written routines show up. I don't know they compare with what is in Stata 15.
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          Stata Version: 16.1MP (2 processor)

                          EMAIL: rwilliam@ND.Edu
                          WWW: https://www3.nd.edu/~rwilliam

                          Comment


                          • #28
                            Is it possible to in Stata 15 to import data in Persian language? It is not working very well in Stata 14.

                            Comment


                            • #29
                              Assuming that you're talking about a delimited text file, import delimited in Stata 14 should be able to import a file containing Persian language if it is encoded in one of the encodings that Java uses. See help import delimited for more information about which encodings are available.

                              Comment


                              • #30
                                Mohammad Shoaib could you please elaborate on the import/export of the Persian language:
                                It is not working very well in Stata 14
                                Thank you, Sergiy

                                Comment

                                Working...
                                X