Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Huge dataset - stata not responding

    Dear Statalisters,

    I am running a panel data with 328 million records on my 2 year old dell laptop (stata 13). I was doing a simple

    bysort year and industry: egen total_x=total(x) statement, after a long time, stata stopped responding .



    The data set is 6.7 GB. My question to you is : is there some settings I can change to decrease the chance stata froze on me.

    I stopped the execution, and am now trying to break the data into single years (my panel starts 1975 and ends 2012),

    second approach:

    code:
    use master, clear
    forvalue i=1975/2012 {

    keep if year==`x'

    save data`x'
    }

    I then plan to rerun egen for each year's data.

    not sure if this will somehow bypass the huge size issue. right now, stata has not shown any results yet.

    If you could, please comment or suggest better ways.

    thanks,
    Rochelle

  • #2
    You wrote:
    Code:
    use master, clear
    forvalue i=1975/2012 {
    keep if year==`x'
    save data`x'
    }
    In the forvalues command you define the macro i, but in the next two lines you call the macro x which is undefined, i.e. empty. Are you sure that you did not get an error message like
    Code:
    invalid syntax
    r(198);
    If this was not the problem, you would encounter another problem:
    Code:
    keep if year==1972
    Now the dataset in memory only includes year 1972, so no observations are from 1973; there is nothing to keep with:
    Code:
    keep if year==1973

    Comment


    • #3
      How much RAM does your laptop have? Consider that Stata loads the dataset into memory, thus the freezing might actually be due to not having enough RAM. If the loop does not work,

      Code:
      use master.dta, clear
      forvalues i = 1975/2012 {
      preserve
      keep if year == `i'
      save data`i'.dta, replace
      restore
      }
      and it might very well not, since you also have to include preserve and restore to circumvent the problem Svend noted, you could try to save yearly datasets without the loop

      Code:
      use master.dta, clear
      keep if year == 1975
      save data1975.dta, replace
      
      use master.dta, clear
      keep if year == 1976
      save data1976.dta, replace
      .
      .
      .
      Else you might have to look for access to a computer with more RAM than yours...

      Comment


      • #4
        Thank you Svend, you are correct about my error.

        Thank you Martin ! I was able to do fovalue loop with 2 GB memeory.

        Comment

        Working...
        X