Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata memory problem for large dataset

    I have a new laptop whose memory is 1 terrabite.
    I got it for stata.
    I am dealing with a large dataset.
    I have (8390 observation) and I want to cross them with (2200 alternatives)
    So I will have 18,458,000 line in stata editor.
    Then problem that stata refuses to make the cross command and tell me that:
    "
    op. sys. refuses to provide memory
    r(909);
    "

    I read in stata manuel that Stata support untill 2.14 billion observation.
    So I could not understand the problem?

    Thanks in advance for any help,
    Mina Sami

  • #2
    Stata is not balking at the size of your data set. Stata can work with that (unless each observation takes up many thousands of bytes). But there may be other things eating up your memory. If your Stata program has had to create large matrices, for example, there may not be enough left over for all that data. You can check how much memory Stata has allocated to it and using by using the -memory- command. Remember also that Stata and everything it uses also has to share memory with your operating system and any other programs and processes that are running at the time.

    Comment


    • #3
      Mina,

      Clyde is correct, of course, but just to cover all the bases, please give us a description of each data set (number of variables, number of bytes, etc.) and tell us what version of Stata you are using.

      Regards,
      Joe

      Comment


      • #4
        I have a data set with.; 15 variables, obs :8390, size:3,708,380 and i want to cross it with the database that contain : 1 variable, obs: 2000, size: 24,079 in order to make my conditiional logit model.
        I tried to make set segmentsize 1g but stata refuses to make segmentsize more than 600m. I am using STATA/SE 12

        When I type memory in stata while opening the first database, I have the following results: Memory usage
        used allocated
        used allocated
        data (incl. buffers 3,708,822 35,554,432
        var. names, %fmts, ... 1,832 24,400
        overhead 1,064,964 1,065,360
        stata matrices 0 0
        ado files 0 0
        saved results 0 0
        mata matrices 0 0
        mata functions 0 0
        set max var usage 9,110,954 9,110;954
        other 1,080 1,080
        total 13,886,788 43,756,226
        Last edited by Mina Sami; 01 Nov 2014, 04:33.

        Comment


        • #5
          Sounds like others have you covered, but I'm just trying to imagine a laptop with 1 Terabyte of memory. With four GB DIMMs, that's 250 slots. Maybe you can get bigger DIMMs, but still maybe 100 slots? That's a heck of a laptop! I think you mean you have a one TB hard drive.

          Memory doesn't seem to be an issue at first. See for example, below, when I maxed out a machine with 4 GB (that's 0.0125 of a TB). I just generated 626 random uniform variables with a million observations:
          Code:
          Memory usage
                                                      used                allocated
              ---------------------------------------------------------------------
              data                           2,504,000,000            2,986,344,448
              strLs                                      0                        0
              ---------------------------------------------------------------------
              data & strLs                   2,504,000,000            2,986,344,448
          
              ---------------------------------------------------------------------
              data & strLs                   2,504,000,000            2,986,344,448
              var. names, %fmts, ...                77,000                   86,715
              overhead                           2,130,624                2,130,720
          
              Stata matrices                             0                        0
              ado-files                              3,148                    3,148
              stored results                             0                        0
          
              Mata matrices                              0                        0
              Mata functions                             0                        0
          
              set maxvar usage                   1,366,792                1,366,792
          
              other                                  1,158                    1,158
              ---------------------------------------------------------------------
              grand total                    2,507,577,322            2,989,932,981
          Last edited by ben earnhart; 01 Nov 2014, 15:27.

          Comment


          • #6
            It seems to me that Ben is using a 64-bit Stata, while Mina is using a 32-bit Stata. For large datasets use 64-bit Stata.
            Best, Sergiy

            Comment

            Working...
            X