Importing large csv file into Stata

Arvind Sharma

Join Date: Jan 2016

Posts: 89
#1

Importing large csv file into Stata

13 Sep 2016, 19:39

Hi,
I have multiple large (6Gb) csv files that I am trying to import into Stata. Is there a way to import a sample (random rows) of the original data (in csv file) into Stata?

One option is to write a loop parsing the data by rowrange, and appending the datasets. But is there also a function or package that can do this too?
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

14 Sep 2016, 08:32

The -chunky- package might help you with some of this, as it automates the breaking of a large text file into smaller chunks. Try -ssc describe chunky-. With -chunky-, you'd still need to pull samples out of the chunks it creates. However, you might be better off with a loop as you describe, though, since the programming effort is fairly minimal.
Comment
Arvind Sharma

Join Date: Jan 2016

Posts: 89
#3

14 Sep 2016, 22:02

Thank you Mike.

I solved the problem by importing the cvs file into Stata using a larger RAM computer. To reduce the data so that I can use it on my PC, I did the following:

Code:

forvalues parse=1/8{ gen sample_`parse'=rnormal(0,1) quietly sum sample_`parse' drop if sample_`parse'>r(mean) drop sample_`parse' }
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

14 Sep 2016, 22:16

I'm glad you have a workable solution. Just as a teaching point, what you have done is reduced the sample size by approximately half (because the normal distribution is symmetrical, so approximately half of the random variates will be less than or equal to the mean, and retained) each time through the loop. So you could have accomplished essentially the same thing more simply by:

Code:

keep if runiform() < 2^(-8)

That wouldn't give you the exact same subset you got, but it would give you a subset of essentially the same size, namely 1/256'th of the original size.
1 like
Comment
Vishal Sharma

Join Date: Sep 2018

Posts: 60
#5

15 Feb 2020, 21:05

continuing on this topic...
any new ways of opening a HUGE csv file that my computer cant handle . i have stata mp 15 and i get the following error message :

. -import delimited D:\Vishal\synthetic_opioid_project\LAB_OUT.CSV-
op. sys. refuses to provide memory
Stata ran out of room to track where observations are stored. Right now, Stata has 1236m bytes
allocated to track observations. Stata requested an extra 1m bytes and the operating system said no.
Stata is currently tracking 648019968 observations and was asked to track 648019968. You are up
against the memory limits of this computer.
an error occurred while writing data
r(198);

any thoughts how to open this up ? my plan is to run a few simple commands on only 2 variables in the data set but I think I have to do this in chunks,

thanks
Vishal
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#6

15 Feb 2020, 21:56

Originally posted by Vishal Sharma View Post

any new ways of opening a HUGE csv file that my computer cant handle . . .my plan is to run a few simple commands on only 2 variables in the data set . . .

Have you tried the colrange() option of the import delimited command?

If limiting to two variables still isn't enough paring, then you could use it in conjunction with the command's rowrange() option to get digestible portions.
Comment
Vishal Sharma

Join Date: Sep 2018

Posts: 60
#7

16 Feb 2020, 09:48

I ve tried rowrange and still get the memory error message.
Comment

Announcement

Importing large csv file into Stata

Comment

Comment

Comment

Comment

Comment

Comment