Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error message: I/O error writing .dta file

    Hello everyone,

    I am running "nopomatch" command which decomposes the gender pay gap for each company in my dataset and saves it in a separate .dta file. The command is as follows: nopomatch (varlist), outcome(var) by(var) reportby(var) replace filename

    There are around 15,000 firms in my dataset and 100,000 individuals.

    After running the command for around 2000 companies STATA gives an error:
    "I/O error writing .dta file
    Usually such I/O errors are caused by the disk or file system being full.
    r(693);"

    What I tried so far based on the answers to similar posts on this forum:
    1. Deleting temporary STATA files manually which increased the number of companies by ~400
    2. I checked the memory of the drives - both have around 25GB which I think should be enough
    3. I tried to run the command for a smaller sample and fewer variables - in this case, the STATA did not give an error
    4. When I run the command for fewer variables it improves the performance only marginally (e.g. by 100 companies)

    Anyone has an idea of what could be causing this error and how I could solve it?

    Best regards,
    Natalie



  • #2
    This is just a guess. But given the stochastic nature of the place at which things fall apart, my guess is that -nopomatch- is creating output faster than the operating system is capable of dispatching it to the disk drive, resulting, eventually, in buffer overflows. The error message you are getting is not particularly informative: that's because the OS doesn't provide very specific information back to Stata about what is going on, and Stata really just knows that I/O is failing. So you get this rather unhelpful error message.

    The first thing to think about is where the output files are being written to. Are they on a remote network drive? The connections to those drives are typically quite slow, and iteratively writing to them often fails in this way. Consider creating the files on a local hard drive instead, and then later copying them to the remote drive if necessary.

    Now, I don't know much about -nopomatch-, never even heard of it before your post. I gather it is available from SSC. It looks like it only does one file at a time, so I'm assuming you are running this in some kind of loop* since -nopomatch- only allows you to specify a single output file, but you are generating multiple output files, one for each firm in your data set or something like that. So if I have correctly understood what you are doing, try inserting a -sleep- command into the loop just after -nopomatch-. I would start by trying -sleep 500-. That is often a long enough respite to allow the OS to catch up. If that doesn't do the trick, try -sleep 1000- or perhaps an even longer rest period.

    *If, by chance, you are not looping but doing this under -runby-, just put the -sleep- command at the end of the program that -runby- is iterating.

    Comment


    • #3
      Clyde,

      Thank you very much for your suggestions, much appreciated!

      1. I have used a local hard drive but unfortunately, the number of firms only improves slightly (~20 firms).

      2. For the -nopomatch-, I don't use the loop. I use -reportby- option within -nopomatch- which allows for decomposing the gender pay gap on the firm level which is stored in the single output file.

      Many thanks
      Natalie

      Comment


      • #4
        Well, let me make a different suggestion then. While it is probably possible to hack -nopomatch- itself to stick a -sleep- command in some appropriate place, if I can assume that the calculations performed in each -reportby()- group are independent of those performed in other groups and do not require using any data from other groups, then you may be able to solve this problem with -runby-. Something like this:
        Code:
        capture program drop one_firm
        program define one_firm
            nopomatch (varlist), outcome(var) by(var) replace filename
            sleep 500
            exit
        end
        
        runby one_firm, by(firm)
        The idea is to run -nopomatch- just for one firm at a time and then give the OS 500 ms to catch up.

        -runby- was written by Robert Picard and me, and is available from SSC. If -runby- is telling you that it is encountering errors in processing your data, you can bring the errors to light by adding the -verbose- option to the -runby- command. That will cause Stata to show, rather than suppress, the output and error messages of program one_firm each time it runs. That should enable any debugging needed.

        As I said earlier, I don't know anything about -nopomatch-, so you may need to make modifications to the -nopomatch- command so that it is correctly configured to work with the data on a single firm in memory at one time.

        Comment

        Working...
        X