How Insert records (observations) into NOT loaded DTA file

Luis Pecht

Join Date: May 2017

Posts: 151
#1

How Insert records (observations) into NOT loaded DTA file

26 Feb 2023, 05:14

Hi Gurus,

I like persisting data in DTA files. Command use using allows flexibility in load data to memory and as Stata remembers me every day, "More than 2 billion observations are allowed".

However I am limited to 8Gb RAM and when dealing with very large file, it demands inserting observations, while not previously loading the full dataset (>6Gb) to RAM.

Note, I am NOT saying using append command which requires the DTA file to be loaded, but something more similar to a INSERT to SQL

any previous experience on this topic?
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

26 Feb 2023, 06:27

Likely not possible, if I understand you well.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#3

26 Feb 2023, 06:33

Not possible, sorry. Stata must first load data into memory to make modifications to the data.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

26 Feb 2023, 06:36

I'm also a little confused about what you want. Or about what stata tells you.... your data file demands that you..... insert more observations? I've never heard of this.
Comment
Luis Pecht

Join Date: May 2017

Posts: 151
#5

26 Feb 2023, 06:53

I did some research before asking this and I am resignated it is not possible.

Jared, just to be more clear, suppose a have access to a weather daily API, for example, which allows me to retrieve only yesterday data, so I need to accumulate data myself.
Now image, I have been adding this data for 50 years, so EVERY day a need to load (RAM) my WHOLE .dta file to just "append" yesterday data.
My point here is whats the usage of Stata to support 2B observations if there is no practical way to accumulate such amount of data, using a "normal" computer (8-16Gb RAM).

thks anyway.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#6

26 Feb 2023, 09:50

No, Stata is not SQL. Many statistical languages to my knowledge must load in data before changes can be made (including appending), whether it’s Stata, SAS or R. My guess is that the reason has to do with how each language writes its data file.

I can imagine some alternatives, though neither seem very prescribed to me.

1) Write your data updates to a format that can be extended without reading the whole thing into memory. CSV is an obvious choice here.

2) import data in an SQL database then import this from within Stata.

3) since you suggest you have time-series data, why not create datasets by some natural group? For example, one dataset per calendar year, or company, or whatever.

In all cases, you will sooner or later be required to load this data into memory to work with it. But, do you really need *all* of this data all the time? You might naturally only need some subset which might be smaller and easier to use. Furthermore, supposing you can get Stata to do what you want, you still have the same problem of loading data into memory, so I do not see how such a solution is really as helpful as you imagine.

Lastly, I think you misunderstand what Stata’s capabilities are with respect to dataset size (max number of observations and variables). This is only the limit of what Stata is able to address in memory and work with, and there are no guarantees that these datasets will be useable and practical on all computer hardware. Typically the amount of RAM is the limiting factor.
1 like
Comment

Announcement

How Insert records (observations) into NOT loaded DTA file

Comment

Comment

Comment

Comment

Comment