Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to store and call large data file from shared folder

    Greetings Statalist. My question is as follows: I am working on a very large public dataset with a colleague, and for a variety of reasons we are using Dropbox as our shared folder for code and data storage.

    However, the full dta file is so large that it uses nearly all available Dropbox storage. I am trying to figure out the best way to rectify this. One idea I had was to go through the raw data and create a second much smaller dta file with only the variables I know we want. Then zip the raw file so it uses less space (or delete altogether, it's fairly simple to re-download.)

    Another idea was to try to zip the file and then call directly from the file in the zipped folder? Is this possible?

    I am very open to other solutions, as well.
    Last edited by Todd Motiwalla; 23 Jan 2023, 08:37.

  • #2
    A third alternative: each person who needs access to the full dta file keeps a read-only copy of it on their local drive. Only the derived datasets - for example the dataset with the variables you "know we want" (I can almost guarantee that you will later find at least one more variable that you want) - are written to Dropbox.

    But I will warn you - Statalist has plenty of posts from people who experienced difficulties using datasets on network drives and on cloud storage. With the latter, it can be the case that if your program writes a dataset to the cloud and then immediately tries to reread it (say, into a merge) there can be timing problems where the writing to the cloud (which is really copying from your local copy) has yet to finish before reading wants to start.

    So in your work, I would advise writing datasets to local storage - or to tempfiles - and only write the datasets to be shared to the cloud as the last step in the program.

    Perhaps others here will share their experience on using cloud storage.

    Comment


    • #3
      William Lisowski yes, I agree we will certainly later find variables we additionally want, which is why I am hoping for alternative solutions. Your proposed alternative is a promising one; I think my colleague and I will try to divide work such that most of it can happen locally and then only converge in Dropbox in final stages.

      Comment

      Working...
      X