Stata I/O error on save despite ample disk space

Ernesto Vincenti

Join Date: Mar 2015

Posts: 29
#1

Stata I/O error on save despite ample disk space

11 Jul 2019, 18:55

I'm running Stata 15.1 MP/4-core in macOS 10.14.5. I have 32 GB of memory.

I'm appending a series of 4 dta files to create one large master file of survey data. 45 million observations, size 29666MB, memory 34944MB

I append the 4 files onto each other, which succeeds without a hitch. Then I try to save the aggregated master dta, and I repeatedly get an I/O error. The append and save calls are the only ones I'm running.

Right now, my hard drive shows 120GB of free space, which ought to be at least 3x larger than the file I'm trying to save. I've tried quitting and restarting both Stata and the OS, but the I/O errors persist.

Thoughts?
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

11 Jul 2019, 20:01

A sort of obvious question, but since you don't tell us the details of your I/O errors, a question that I have to ask: are you certain you are trying to save the dataset into a directory for which you have write permissions?

Code:

. shell ls -ld foo dr-xr-xr-x 2 lisowskiw staff 64 Jul 11 21:58 foo/ . sysuse auto, clear (1978 Automobile Data) . save foo/newauto file foo/newauto.dta could not be opened r(603);
Comment
Ernesto Vincenti

Join Date: Mar 2015

Posts: 29
#3

11 Jul 2019, 20:25

I wasn't aware that there were different varieties of I/O errors. Here is the full error output:

Code:

I/O error writing .dta file Usually such I/O errors are caused by the disk or file system being full. r(693);

And yes, I have full permissions to the directory I'm saving to.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#4

11 Jul 2019, 21:25

Just in case:

1. make sure the drive is formatted so that it can support large files (some are limited to 2GB files only).
Mac probably doesn't use FAT16, but may have other similar limitations.
2. Windows has quotas that may be imposed on the user. So the drive may have space but still refuse to save.
3. Isolate the issue - can you create a large file on that drive from another program? if no - it's not Stata to blame.
4. If yes, can you just write (as in file write) a large file to the drive?

Best, Sergiy
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

12 Jul 2019, 08:06

I think the next step is gather more information from your process. Consider the following example code.

Code:

. use `data1', clear . describe, short Contains data from /var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//S_20496.000001 obs: 10,000 vars: 100 12 Jul 2019 09:47 size: 4,000,000 Sorted by: . append using `data2' . describe, short Contains data from /var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//S_20496.000001 obs: 20,000 vars: 200 12 Jul 2019 09:47 size: 16,000,000 Sorted by: Note: Dataset has changed since last saved. . shell df -H . Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk1s1 121G 69G 48G 59% 1072149 9223372036853703658 0% / . save data12, replace file data12.dta saved .

After each addition to the combined dataset, describe, short tells us how large the current dataset is. We see that the first dataset has 10,000 observations of 100 variables. After appending the second dataset, we see 20,000 observations of 200 variables. This surprise occurred because, while both datasets have 10,000 observations of 100 variables, the first dataset has variables v1-v100 and the second dataset has variables v101-v200. Be sure that something similar isn't happening in your case. Then, shell df -H . shows that indeed 48G are available on the filesystem to which the process wriites. This confirms that at the time the save is attempted there is sufficient free space - after accounting for temporary files and swap space created by the Stata task - to hold the 16,000,000 float numbers.

Beyond that, there's still a lot of missing information. Is the filesystem you are writing to on your local drive, or is it on a network drive somewhere? That can make a difference; in particular, on shared server filesystems administrators often apply a quota to each user's allotment to ensure that a runaway process doesn't kill the filesystem for all users.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#6

12 Jul 2019, 08:30

I have no familiarity with the MacOS, but I did once run into a bizarre issue like this in a Windows environment on a virtual machine. Numerically, I seemed to have enough RAM and disk space to easily save/manipulate my large dataset, and yet certain operations would fail complaining of an I/O error. It turned out that, in the background, Stata was saving temporary datasets to disk as part of the program being run, and the specific temp folder was actually on another (much smaller) harddrive, which filled up rather quickly and therefore caused the I/O error. On Windows, this was solved by changing the registry value to tell Stata to use a different TEMP directory (pointing to the large harddrive) rather than the Windows default. (More details of how to do that are described in a Stata FAQ). Perhaps something analogous might be going on here?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

12 Jul 2019, 12:10

Should you need to change the location of your macOS Stata TMPDIR directory, the following topic gives a recipe for doing so.

https://www.statalist.org/forums/for...nment-variable

I'll add that the possiblity of losing space due to temporary datasets that was raised in post #6 is why I recommended using the shell df -H . command in post #5. Leonardo's experience confirms that which others have had - what looks sufficient before you launch Stata may be less so by the time it comes to write the dataset to disk.
Comment
Olayiwola Adetutu

Join Date: Sep 2019

Posts: 59
#8

30 Sep 2019, 01:39

please how can I overcome this error message in my stata 16 SE when setting max var as ( set maxvar 20000bytes )
The error message i got is (no; dataset in memory has changed since last saved)
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

30 Sep 2019, 09:04

The error message is misleading. You can only set maxvar when there is no dataset in memory - before you use or import your data, or following a clear command that removes the dataset from memory.

With that said, the command you want is

Code:

set maxvar 20000

without the "bytes" that you show in your post. See the output of help set maxvar which also documents three memory-related commands that use the b, k, m, or g suffix on the number to indicate bytes, kilobytes, megabytes, or gigabytes.
1 like
Comment

Announcement

Stata I/O error on save despite ample disk space

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment