Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How Insert records (observations) into NOT loaded DTA file

    Hi Gurus,

    I like persisting data in DTA files. Command use using allows flexibility in load data to memory and as Stata remembers me every day, "More than 2 billion observations are allowed".

    However I am limited to 8Gb RAM and when dealing with very large file, it demands inserting observations, while not previously loading the full dataset (>6Gb) to RAM.

    Note, I am NOT saying using append command which requires the DTA file to be loaded, but something more similar to a INSERT to SQL

    any previous experience on this topic?


  • #2
    Likely not possible, if I understand you well.

    Comment


    • #3
      Not possible, sorry. Stata must first load data into memory to make modifications to the data.

      Comment


      • #4
        I'm also a little confused about what you want. Or about what stata tells you.... your data file demands that you..... insert more observations? I've never heard of this.

        Comment


        • #5
          I did some research before asking this and I am resignated it is not possible.

          Jared, just to be more clear, suppose a have access to a weather daily API, for example, which allows me to retrieve only yesterday data, so I need to accumulate data myself.
          Now image, I have been adding this data for 50 years, so EVERY day a need to load (RAM) my WHOLE .dta file to just "append" yesterday data.
          My point here is whats the usage of Stata to support 2B observations if there is no practical way to accumulate such amount of data, using a "normal" computer (8-16Gb RAM).

          thks anyway.


          Comment


          • #6
            No, Stata is not SQL. Many statistical languages to my knowledge must load in data before changes can be made (including appending), whether it’s Stata, SAS or R. My guess is that the reason has to do with how each language writes its data file.

            I can imagine some alternatives, though neither seem very prescribed to me.

            1) Write your data updates to a format that can be extended without reading the whole thing into memory. CSV is an obvious choice here.

            2) import data in an SQL database then import this from within Stata.

            3) since you suggest you have time-series data, why not create datasets by some natural group? For example, one dataset per calendar year, or company, or whatever.

            In all cases, you will sooner or later be required to load this data into memory to work with it. But, do you really need *all* of this data all the time? You might naturally only need some subset which might be smaller and easier to use. Furthermore, supposing you can get Stata to do what you want, you still have the same problem of loading data into memory, so I do not see how such a solution is really as helpful as you imagine.

            Lastly, I think you misunderstand what Stata’s capabilities are with respect to dataset size (max number of observations and variables). This is only the limit of what Stata is able to address in memory and work with, and there are no guarantees that these datasets will be useable and practical on all computer hardware. Typically the amount of RAM is the limiting factor.

            Comment

            Working...
            X