Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem copying files from URL

    In order to make my do-files more easily replicable, I'm trying to include direct links to datasets downloaded from the Internet by using the copy command. I'm running into a problem when URLs don't actually include the file itself.

    Consider the following: I go here and click the "SystemMembership2011.zip" link at the end of the page. This downloads a working .zip file. I right-click on the hyperlink and copy the link URL into my do file and use the copy command:
    Code:
    copy "http://www.correlatesofwar.org/data-sets/state-system-membership/state-system-membership-zip/at_download/file" "COWsystem.zip"
    This adds a text file to the directory, but trying to open even that produces an error dialogue box.

    Harvard's Dataverse also presents this problem. I go here and right-click on the "Download" button next to "Dyadic13undirected.dta." I copy that link URL and use the copy command:
    Code:
    copy "https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/12379#" "UNdata.dta"
    This too downloads what looks like a Stata data file to the directory, but trying to open it results in the following error: "file ~/UNdata.dta not Stata format." I'begotten in touch with Dataverse's technical support, and there are plans to include permalinks directly to datasets in the future.

    I want to know if there's any way to direct Stata to the file when the link doesn't point directly to it, or—more generally and less Stata-specific—if anyone knows how to find a permalink to a file when a hyperlink doesn't point directly to it.

  • #2
    Jeremy, try this for Dyadic13undirected.dta:
    copy "https://dataverse.harvard.edu/api/access/datafile/2709627" "c:\temp\UNdata2222.dta"
    Best, Sergiy Radyakin

    Comment


    • #3
      Sergiy's suggestion worked, but I'd like to know how he derived that URL so I can replicate for other sites (like the Correlates of War example I also included). I did get an "r(610): .dta too modern" message for the Dataverse file, but I'm running v13.1.

      Comment


      • #4
        Jeremy, you are combining several different problems into one question.

        1) Stata 13.1 is not the most recent version of Stata. Stata 14 with updates from 21dec2015 is the most recent version. You are getting an error 610 because of this. The UNData file is saved according to specification 118 which is supported only in Stata 14+. Sadly some data providers assume that if they have the software to produce a data file then users have adequate software to read their data.

        2) downloading state-system-membership file receives a zip-file, not sure how do you determine that it is a text file, but I am receiving a 250,829 bytes long zip file. You need to unzip the content of that file prior to importing the data to Stata. You can use external tools, or the Stata's built-in commands.

        3) decyphering the links is in general not possible. You are getting a server response, and it is up to the server to decide what to return to you. It may return you different content based on the time of the day, your location, language, previous browsing history, or simply randomly. You will need to contact the site owner to check whether there is a plain URL for data resources you need or hire a specialist to examine particular resources you are interested in.

        Best regards, Sergiy Radyakin

        Comment


        • #5
          For request to the server to download files, I usually use Stata's "shell" command to call "wget" command or to call Python's "request". But you need to install those software beforehand.

          Comment


          • #6
            Re: Sergiy
            (1) Yes, I understand this point.
            (2) Now I'm receiving a .zip file. I'm not sure what was happening before, but that problem seems to have resolved itself.
            (3) All I'm asking is how you got the URL "https://dataverse.harvard.edu/api/access/datafile/2709627," which is different from what I originally posted in my code above, and which I don't see that anywhere on the source page. From what I've been told by Dataverse support, there are no persistent URLs for Dataverse files. So from where did your URL come? (Oh, and as soon as my department offers me the funding to hire assistants, or NSF gives me a multimillion dollar grant, I'll start hunting for all kinds of specialists .)

            Re: Jimmy
            Thanks, I'll look into those. For now, Stata's commands (along with a few shell commands to Mac's Terminal) are getting me by.

            Comment


            • #7
              Hi Jeremy, for what it's worth I think Sergiy used the file id and the API link to generate the direct link in his comment. It looks like the files have changed since you originally wrote your post, but hovering my mouse over the first dataset, Dyadicdata.tab, on the Dataverse site you link to I see the url: https://dataverse.harvard.edu/file.x...SED&version=.0

              From the API documentation (http://guides.dataverse.org/en/lates...ic-file-access), I see the basic call to a file is: /api/access/datafile/$id.

              Combining this with the file id value above, I get: https://dataverse.harvard.edu/api/ac...tafile/2802729. This appears to be a valid link for me. Hope that answers that part of your question.

              Comment

              Working...
              X