Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ODBC tables not loading

    I'm having trouble loading datatables through an odbc connection. I haven't used Stata's odbc functionality much, so not sure if my issue is with how I'm using Stata or with my database server. I have successfully loaded a couple of smallish tables, but when I go to load large tables (~30million x 100) it gets infinitely stuck in loading mode, even if I limit the pull to a handful of rows. For example, the below code should pull the first five rows of the data table (there are about 100 columns). I've let it run for up to 12 hours (with an onsite LAN connection to the server) without any success. I'm guessing it's still running through all 30million rows to find _n<=5, but not sure why that would be. I'm using Stata 17 MP on a Windows 10 machine with a high-end processor, 64MBs of RAM, and ample storage. Loading similar sized data using other methods is usually pretty quick. Is there a better way to get this odbc data loaded? All suggestions appreciated.

    Code:
    odbc load in 1/5, table("databse.table") clear noquote

  • #2
    I would try something like this instead:

    Code:
     
     odbc load, exec("SELECT TOP 5 * FROM databse.table;") clear dsn("YOUR_DSN_HERE")
    I believe the issue is that your code grabs the whole table from the DB but only loads the first 5 observations into Stata. The code above only grabs the top 5 rows.

    Comment


    • #3
      Thanks Dimitriy your suggestion works (with a slight modification to conform to Oracle 12c syntax). I can pull the first few rows very quickly. But it's still getting stuck in load mode when I try to pull any meaningful number of rows. My back up plan is to just pull a million rows at a time, then eventually append them all, but at the rate I'm seeing, that's looking like a month-long endeavor. I recognize that this might just be an issue of server limitations and there may be nothing I can do about it; nonetheless, if you have any advice on how to speed up the process, I would be most grateful.

      For posterity's sake, here is Dimitriy's code (witch is in SQL Server / MS Access syntax) modified for Oracle 12c syntax:

      Code:
      odbc load, exec("SELECT * FROM database.table ORDER BY row_id  FETCH FIRST 5 ROWS ONLY;") clear dsn("your_dsn") noquote

      Comment


      • #4
        My other advice is to make sure you have the most recent version of the odbc driver and manager or try jdbc.

        Sometimes doing this when the DB is not very busy can help.

        Another approach is to try to parallelize the loading by month.

        Comment


        • #5
          Here is an example of the last suggestion.

          Comment


          • #6
            Good suggestion; running it parallel is working nicely. Thanks Dimitriy, I appreciate your help.

            Comment

            Working...
            X