Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata I/O error on save despite ample disk space

    I'm running Stata 15.1 MP/4-core in macOS 10.14.5. I have 32 GB of memory.

    I'm appending a series of 4 dta files to create one large master file of survey data. 45 million observations, size 29666MB, memory 34944MB

    I append the 4 files onto each other, which succeeds without a hitch. Then I try to save the aggregated master dta, and I repeatedly get an I/O error. The append and save calls are the only ones I'm running.

    Right now, my hard drive shows 120GB of free space, which ought to be at least 3x larger than the file I'm trying to save. I've tried quitting and restarting both Stata and the OS, but the I/O errors persist.

    Thoughts?

  • #2
    A sort of obvious question, but since you don't tell us the details of your I/O errors, a question that I have to ask: are you certain you are trying to save the dataset into a directory for which you have write permissions?
    Code:
    . shell ls -ld foo
    
    dr-xr-xr-x  2 lisowskiw  staff  64 Jul 11 21:58 foo/
    
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . save foo/newauto
    file foo/newauto.dta could not be opened
    r(603);

    Comment


    • #3
      I wasn't aware that there were different varieties of I/O errors. Here is the full error output:

      Code:
      I/O error writing .dta file
          Usually such I/O errors are caused by the disk or file system being full.
      r(693);
      And yes, I have full permissions to the directory I'm saving to.

      Comment


      • #4
        Just in case:

        1. make sure the drive is formatted so that it can support large files (some are limited to 2GB files only).
        Mac probably doesn't use FAT16, but may have other similar limitations.
        2. Windows has quotas that may be imposed on the user. So the drive may have space but still refuse to save.
        3. Isolate the issue - can you create a large file on that drive from another program? if no - it's not Stata to blame.
        4. If yes, can you just write (as in file write) a large file to the drive?


        Best, Sergiy

        Comment


        • #5
          I think the next step is gather more information from your process. Consider the following example code.
          Code:
          . use `data1', clear
          
          . describe, short
          
          Contains data from /var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//S_20496.000001
            obs:        10,000                          
           vars:           100                          12 Jul 2019 09:47
           size:     4,000,000                          
          Sorted by: 
          
          . append using `data2'
          
          . describe, short
          
          Contains data from /var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//S_20496.000001
            obs:        20,000                          
           vars:           200                          12 Jul 2019 09:47
           size:    16,000,000                          
          Sorted by: 
               Note: Dataset has changed since last saved.
          
          . shell df -H .
          
          Filesystem     Size   Used  Avail Capacity iused               ifree %iused  Mounted on
          /dev/disk1s1   121G    69G    48G    59% 1072149 9223372036853703658    0%   /
          
          . save data12, replace
          file data12.dta saved
          
          .
          After each addition to the combined dataset, describe, short tells us how large the current dataset is. We see that the first dataset has 10,000 observations of 100 variables. After appending the second dataset, we see 20,000 observations of 200 variables. This surprise occurred because, while both datasets have 10,000 observations of 100 variables, the first dataset has variables v1-v100 and the second dataset has variables v101-v200. Be sure that something similar isn't happening in your case. Then, shell df -H . shows that indeed 48G are available on the filesystem to which the process wriites. This confirms that at the time the save is attempted there is sufficient free space - after accounting for temporary files and swap space created by the Stata task - to hold the 16,000,000 float numbers.

          Beyond that, there's still a lot of missing information. Is the filesystem you are writing to on your local drive, or is it on a network drive somewhere? That can make a difference; in particular, on shared server filesystems administrators often apply a quota to each user's allotment to ensure that a runaway process doesn't kill the filesystem for all users.

          Comment


          • #6
            I have no familiarity with the MacOS, but I did once run into a bizarre issue like this in a Windows environment on a virtual machine. Numerically, I seemed to have enough RAM and disk space to easily save/manipulate my large dataset, and yet certain operations would fail complaining of an I/O error. It turned out that, in the background, Stata was saving temporary datasets to disk as part of the program being run, and the specific temp folder was actually on another (much smaller) harddrive, which filled up rather quickly and therefore caused the I/O error. On Windows, this was solved by changing the registry value to tell Stata to use a different TEMP directory (pointing to the large harddrive) rather than the Windows default. (More details of how to do that are described in a Stata FAQ). Perhaps something analogous might be going on here?

            Comment


            • #7
              Should you need to change the location of your macOS Stata TMPDIR directory, the following topic gives a recipe for doing so.

              https://www.statalist.org/forums/for...nment-variable

              I'll add that the possiblity of losing space due to temporary datasets that was raised in post #6 is why I recommended using the shell df -H . command in post #5. Leonardo's experience confirms that which others have had - what looks sufficient before you launch Stata may be less so by the time it comes to write the dataset to disk.

              Comment


              • #8
                please how can I overcome this error message in my stata 16 SE when setting max var as ( set maxvar 20000bytes )
                The error message i got is (no; dataset in memory has changed since last saved)

                Comment


                • #9
                  The error message is misleading. You can only set maxvar when there is no dataset in memory - before you use or import your data, or following a clear command that removes the dataset from memory.

                  With that said, the command you want is
                  Code:
                  set maxvar 20000
                  without the "bytes" that you show in your post. See the output of help set maxvar which also documents three memory-related commands that use the b, k, m, or g suffix on the number to indicate bytes, kilobytes, megabytes, or gigabytes.

                  Comment

                  Working...
                  X