12 million data for Stata/ BE

Jarvine Xie

Join Date: Feb 2022

Posts: 5
#1

12 million data for Stata/ BE

11 Dec 2022, 09:49

My issue is why my code runs so slow with just 12 million observations for Stata/ BE. My laptop is Swift 5 SF514-56T 14-Inch. I definitely think codes should run in a few sec, but instead, removing duplicates takes hours. I am not tech-savvy, so please advise. I do not think a million data need splitting to run faster? I heard that I might need to change my laptop setting or sth so that codes can run faster...but again, I really do not know what to do. Thank you for all help provided.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

11 Dec 2022, 10:27

Stata Basic Edition is, on average, slower than MP or SE. You don't give your code, so beyond that, I can't comment. Welcome to Statalist.
Comment
Jarvine Xie

Join Date: Feb 2022

Posts: 5
#3

11 Dec 2022, 19:51

My code is very simple...
use "${tempdata}\weekly_pattern\2018\01\01\poi", clear
duplicates drop
save "${tempdata}\weekly_pattern\2018\01\01\brand", replace

The dataset contains 12 million observations and 30 variables, and then it takes more than an hour to run this simple code.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

11 Dec 2022, 20:10

So wait. Do you expect all your variables to uniquely ID your observations? Unlikely! You have panel data. You should do

Code:

duplicates drop id time

either way, you have BE, and you have big-boy observations. 12 million will be slower anyways, so that's kinda par for the course.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#5

11 Dec 2022, 20:54

A couple of considerarle outside of the version of Stata that may also be bottlenecks when reading data into memory: a slow, physical disk drive (not solid state) or reading data from a network location meaning that it must be downloaded first. My hunch is that a network transfer might be involved because 12M records is a lot, but an hour is a long time.
2 likes
Comment

Announcement

12 million data for Stata/ BE

Comment

Comment

Comment

Comment