Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 12 million data for Stata/ BE

    My issue is why my code runs so slow with just 12 million observations for Stata/ BE. My laptop is Swift 5 SF514-56T 14-Inch. I definitely think codes should run in a few sec, but instead, removing duplicates takes hours. I am not tech-savvy, so please advise. I do not think a million data need splitting to run faster? I heard that I might need to change my laptop setting or sth so that codes can run faster...but again, I really do not know what to do. Thank you for all help provided.

  • #2
    Stata Basic Edition is, on average, slower than MP or SE. You don't give your code, so beyond that, I can't comment. Welcome to Statalist.

    Comment


    • #3
      My code is very simple...
      use "${tempdata}\weekly_pattern\2018\01\01\poi", clear
      duplicates drop
      save "${tempdata}\weekly_pattern\2018\01\01\brand", replace

      The dataset contains 12 million observations and 30 variables, and then it takes more than an hour to run this simple code.

      Comment


      • #4
        So wait. Do you expect all your variables to uniquely ID your observations? Unlikely! You have panel data. You should do
        Code:
        duplicates drop id time
        either way, you have BE, and you have big-boy observations. 12 million will be slower anyways, so that's kinda par for the course.

        Comment


        • #5
          A couple of considerarle outside of the version of Stata that may also be bottlenecks when reading data into memory: a slow, physical disk drive (not solid state) or reading data from a network location meaning that it must be downloaded first. My hunch is that a network transfer might be involved because 12M records is a lot, but an hour is a long time.

          Comment

          Working...
          X