Hi,
I have a task that takes in a huge dataset (180GB, after merging some files together). Loading the dataset (merging files) is the main RAM expense, and then housing the data while I run some analysis. I want to know if its more efficient if I load each subsequent file, store it as a matrix and merge matrices, and then run my analysis in MATA, or the improvement in RAM usage would be minimal. To put simply, is the RAM needed to house the data as its loaded/processed less when the data is kept as a matrix compared to it being housed as a dataframe or about the same.
I need the entire dataset, so I cannot subset.
Thanks!
I have a task that takes in a huge dataset (180GB, after merging some files together). Loading the dataset (merging files) is the main RAM expense, and then housing the data while I run some analysis. I want to know if its more efficient if I load each subsequent file, store it as a matrix and merge matrices, and then run my analysis in MATA, or the improvement in RAM usage would be minimal. To put simply, is the RAM needed to house the data as its loaded/processed less when the data is kept as a matrix compared to it being housed as a dataframe or about the same.
I need the entire dataset, so I cannot subset.
Thanks!
Comment