Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Batch download URL using copy command

    Dear statalist,

    I like to batch download about 100.000.000 files (all .csv) through an URL using the copy command in a forval loop. However, the links are behind a username + password login and I therefore get the error message: "authorization required by server r(673);"

    Is there a way to work around this?

    Thank you!

    Kind regards,

    Stein

  • #2
    The only way I would think is if you had an interface to Curl to build out the HTML Header for the request and pass it along with the appropriate method (e.g., curl GET '...'). The other option might be to ssh into the server and copy things that way or use some other type of FTP method. I'd suspect that you would run into some memory issues unless those files are exceptionally small as well.

    Comment


    • #3
      Wget might work. You could possibly pass the parameters from Stata directly to Wget, or use Stata to create a batch file that contains the commands. You don't say what platform you're using, but Wget works the same under linux, OS-X, and Windows.

      Comment


      • #4
        If you really need get "100.000.000" (I suppose it means 100 million) files, your approach will not work well even if you have a way to submit user name and password on the fly. Half a second is considered good average load time for a url. In that rate, 100 million url will take 100000000*0.5/60/60/24 ~ 578 days to download all the files.

        Comment

        Working...
        X