Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • codebook2, type(appx)- data storage problem

    Hi all:
    I’m using Stata 13.1. I’m trying to create a codebook for several variables with an appendix showing a list of values and frequencies, using the following command:

    . codebook2 *ethnic*, header type(appx)

    [Edit: This is a user-written command from SSC.]

    Stata goes through the standard codebook2 output, for example:
    --------------------------------------------------------------------------------
    variable name: aeethnic08
    variable label: Ethnicity, Aprenda Environment 2007-08
    type: numeric
    range: 1 to 5
    unique values: 5
    missing obs: 364904
    The list of values for this variable is in the Appendix
    --------------------------------------------------------------------------------

    But when Stata gets to the point where it should list the Appendix, I get the following error message:

    I/O error writing .dta file
    Usually such I/O errors are caused by the disk or file system being full.
    r(693);

    I am working with a large data set (about 18GB), but the data read into Stata with no problems, and I still have plenty of disk storage space (at least 36GB) and 3GB of memory remaining. I have the same problem when I try the same command for only one variable, so I don’t think it’s the number of variables that is the problem. When I try the command without the -type(appx)- option, I have no problems, and have not found this problem with any other Stata command. So there is something going wrong with the -type(appx)- part of this command. I’m guessing that Appendix writes something to the .dta file, and whatever this is uses a lot of disk storage space?

    Does anyone have any ideas about what is going wrong, and what I can do to fix this? Thanks, Holly
    Last edited by Holly Heard; 15 May 2014, 13:50. Reason: To note the source of the user-written command.

  • #2
    Advice from the FAQ (section 12)

    If you are using user-written commands, explain that and say where where they came from: the Stata Journal, SSC, or other archives. This helps (often crucially) in explaining your precise problem, and it alerts readers to commands that may be interesting or useful to them.
    Here are some examples:
    I am using xtreg in Stata 12.1.
    I am using estout from SSC in Stata 12.1.

    Comment


    • #3
      Hi Nick. Sorry, new user here. By "where they came from," do you mean the author? I don't see any info about an archive from the codebook2.hlp. Thanks, HH

      Comment


      • #4
        "where they came from" is explained in the original, as quoted above: the Stata Journal, SSC, or other archives.

        You installed codebook2 from somewhere, or someone installed it on your system.

        help codebook2 names an author, whereas official commands do not, and official commands specify a manual entry.

        Either way, search codebook2 or (in a not up-to-date Stata) findit codebook2 finds locations, in this case SSC.

        The general point is that users reading your post might want to try it out, but be mystified because they can't find it on their system.

        All that said, the interesting question is why it doesn't work, and the extra that fails on you does require extra space to store files and my guess is that Stata's way of using memory is biting here. codebook2 is working under version control, which may be a crucial detail, but I do speculate.




        Comment


        • #5
          Holly,

          The code in codebook2 that produces the appendix includes a preserve command. This is because it uses the contract command to generate the frequencies and the contract command destroys the original data set. The preserve command needs enough room to store a temporary copy of your data set on disk. If it is crashing at that point, it is either because the disk where temp files are stored is too small or because you do not have write access to that location. You can do set trace on to see exactly where the program is failing.

          If it turns out the problem is one of space, you might want to contact the author. S/he probably didn't anticipate its use on huge data sets and there are probably other ways to get those frequencies without requiring a preserve.

          Regards,
          Joe

          Comment


          • #6
            Hi All,

            [Stata 13.1]
            Data Size: 18.26 GB
            RAM in my PC: 64 GB
            Hard Disk Free Space: 1.4 TB
            I also have similar problem with -collapse- command. I'm trying to collapse daily stock returns from CRSP.

            collapse (first) g_retC g_retF g_retxC g_retxF SD_C SD_F SD_Cx SD_Fx, by(permno cusip8 fyear)

            And this is the entire process after -set trace on- in Stata:

            ----------------------------------------------------------------------------------------------------------------------------------- begin collapse ---
            - version 8, missing
            - if _caller() < 5 {
            collaps4 `0'
            exit
            }
            - syntax anything(name=clist id=clist equalok) [if] [in] [aw fw iw pw] [, BY(varlist) CW FAST CLABEL GRAPHBAR(name) ]
            - if ("`fast'"=="") preserve
            = if (""=="") preserve
            I/O error writing .dta file
            Usually such I/O errors are caused by the disk or file system being full.
            ------------------------------------------------------------------------------------------------------------------------------------- end collapse ---

            I have set -set max_memory 65g- and -set segmentsize 32g- and -set matsize 11000-, basically I have set all the memory settings to the maximum even though I know some of them are not relevant for the -collapse- command anyway. I assume the -collapse- command is originally from Stata software and as Joe Canner explains above, it has something to do with the -preserve- command. May I know if there are some ways to increase the memory for Stata to preserve the process?

            Thanks a lot!

            Regards,
            Nampuna

            Comment


            • #7
              If you don't get attention that solves your problem, I would re-post under a new thread, as your question has nothing to do with codebook2.

              As the code you cite implies, a preserve is avoided by specifying the fast option to collapse. I can't promise that will solve your problem. You should ensure that your original dataset is saved somewhere.

              Comment

              Working...
              X