[Merging datasets] The difference between dataset on disk and in memory

Yao Zhao

Join Date: Feb 2017

Posts: 226
#1

[Merging datasets] The difference between dataset on disk and in memory

09 Nov 2018, 09:14

Merging two datasets involves adding information from a dataset on disk to a dataset in memory. The dataset in memory is known as the master dataset.

Can anyone tell me what the meaning of on disk and in memory is?

Many thanks in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30080
#2

09 Nov 2018, 10:20

So, I assume you understand that computers have both active memory, which contains whatever programs and information are being actively run and processed at any given time, and mass storage devices which can passively hold information. These mass storage devices are often physically implemented as disks (although nowadays many of these use a technology that does not have to be shaped like a disk.) The term "on disk" is often used to refer to any information that is being remembered in the mass storage device but is not currently in active memory. When a data set is on disk, you cannot do any calculations with it until it gets "read" into active memory.

When you do a merge, there are always two data sets involved. It is a characteristic of Stata that it will only hold one data set in memory at any given time. So, one of the data sets in a merge is the one that you have been working actively with up to the time of the merger. The other data set is sitting in the mass storage device (i.e., it is "on disk") and the -merge- command will read its contents and combine them with the data set that was already in memory.
Comment

Announcement

[Merging datasets] The difference between dataset on disk and in memory

Comment