Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Directly importing from dropbox.com

    Hi all,

    I have Dropbox Plus installed both as an App and online. It turns out that I have very big data to handle and cannot store them in the App, otherwise a lot of memory would be lost. Instead I can sync them only online and have them stored in the dropbox servers.
    Now, the problem is that I need to use those dta files directly from dropbox.com which apparently is not as easy as I was thinking. My naive approach was simply to copy paste the link provided by dropbox.com:
    Code:
     use "https://www.dropbox.com/s/cxmbo2gsw8yuoic/pcs.dta",clear
    however it did not work. Can someone help me out on this please?

  • #2
    Hi Federico,

    What error does this line produce?

    Comment


    • #3
      Hi,

      the error is the following:
      Code:
      file https://www.dropbox.com/s/cxmbo2gsw8yuoic/pcs.dta not Stata format

      Comment


      • #4
        Okay, so my guess is that the link is serving you HTML for the webpage and not the file itself. Check out this piece of dropbox documentation: https://help.dropbox.com/files-folde...force-download

        Does this work?

        Code:
        use "https://www.dropbox.com/s/cxmbo2gsw8yuoic/pcs.dta?dl=1",clear

        Comment


        • #5
          actually it is taking a lot...so I forced a break. Even with a small .dta file
          Last edited by Federico Nutarelli; 23 Jun 2022, 09:50.

          Comment


          • #6
            Okay, are we talking about something on the order of 5 minutes, or 20 to 30 minutes of wait time before you force a break? Keep in mind that in this setup you will have to download all of the content each time you invoke the -use- command. Can you please try this instead?

            Code:
             use "https://www.dropbox.com/s/cxmbo2gsw8yuoic/pcs.dta?raw=1",clear

            Comment


            • #7
              Cross-posted and answered at https://stackoverflow.com/questions/...y-from-dropbox

              Please note our policy on cross-posting, which is that you should tell us about it.

              Comment


              • #8
                Let me be the one to tell you that this is a python problem. I know i know, you may not know python, but assuming you have 17, Python will literally be your best friend in this situation, specifically the Selenium web driver library.

                I haven't look at the post Nick mentioned, but if this were my problem I'd likely use Python to grab it from online.


                Suppose we wanna download CDC data on vaccinations for COVID-19.

                Code:
                python:
                import time, os
                from selenium import webdriver
                from webdriver_manager.chrome import ChromeDriverManager
                from selenium.webdriver.support.ui import WebDriverWait
                from selenium.webdriver.common.by import By
                from selenium.webdriver.support import expected_conditions as EC
                from selenium.webdriver.support.ui import Select
                
                options = webdriver.ChromeOptions()
                preferences= {"download.default_directory": os.getcwd(), "directory_upgrade": True}
                options.add_experimental_option("prefs", preferences)
                #options.headless = True
                options.add_experimental_option('excludeSwitches', ['enable-logging'])
                
                url = "https://tinyurl.com/ygxx9ede"
                
                # Path of my WebDriver
                driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
                
                wait = WebDriverWait(driver, 20)
                
                
                # to maximize the browser window
                driver.maximize_window()
                
                #get method to launch the URL
                driver.get(url)
                
                paths = ["#app > div > div:nth-child(2) > div > div > div.entry-header > div > div.entry-actions > div > div:nth-child(3) > button",
                "#export-flannel > section > ul > li:nth-child(1) > a"]
                
                for x in paths:
                    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, x))).click()
                
                
                end
                My Python code is not perfect, and there exist Pythonistas who can run many circles around me. But, this works from a Stata terminal. It grabs the same data from the same place pretty much every single time. It is efficient, and allows you to fully automate your data collection process. You'll likely need to learn to fill in boxes and forms to log into your dropbox, and all other relevant stuff. But even though I'm an athest, I swear to God you wanna learn Python, particularly if you're a young researcher like me who uses a variety of datasets from a wide variety of different places and don't wanna manually recollect data each time you need to do a paper.
                Last edited by Jared Greathouse; 23 Jun 2022, 11:33.

                Comment


                • #9
                  From the stack overflow thread:

                  To be clear, you can pass a URL in use but then you need a URL that return a Stata dataset and not instructions to a browser, the way Dropbox does. And the dataset is nevertheless downloaded to your computer when you do that, as in order for Stata to read a dataset if first needs to be on your computer. If you download it manually yourself first or let Stata do it to a temporary folder first, does not make a difference on your disk space requirements.
                  TheIceBear
                  This is exactly the problem that ?dl=1 or ?raw=1 are supposed to solve. These are two slightly different implementations of a way to get the file directly rather than the html, and in general this is how one should programmatically download files from dropbox. Of course, I have no idea whether or not stata's -use- command can handle a redirect (as with ?raw=1), and OP has clearly concluded that ?dl=1 doesn't work. One downside of a high level language like Ado is that you don't usually have low level control of things like this. TheIceBear also makes an excellent point in the other thread when he says the data needs to be downloaded anyway. Your 7 gig file will almost certainly not fit in memory (RAM), and will likely have to be written to the disk when you download it regardless.

                  EDIT: ?dl=1 might trigger a browser command of some kind, which is why I think raw might be better. It renders the file in the browser - which is a bit of a red haring actually. When a server provides a web page or other file for rendering, it is really just allowing a client to download the object directly. The trick, of course, is handling the redirect and getting the direct URI for the file.
                  Last edited by Daniel Schaefer; 23 Jun 2022, 11:59.

                  Comment


                  • #10
                    Jared Greathouse it looks like your python script will load the webpage, goes through every clickable <div> on the page, and then clicks it? I think this is probably overkill, and isn't really what OP wants anyway, since it will ultimately just download the file to the filesystem anyway, right?

                    Comment


                    • #11
                      My code loads the webpage and clicks two buttons. It doesn't go through all the <div> items, as this indeed would be overkill.


                      And once we've clicked those two buttons, the file begins to download in the users current working directory. Isn't that about what OP wanted? Daniel Schaefer Perhaps I've misunderstood?

                      Comment


                      • #12
                        Jared Greathouse, I could be wrong, but I believe OP is looking for a way to load a .dta file hosted on dropbox directly into his local RAM so that he doesn't need to store it on his hard drive. Not that it particularly matters: my guess is that OP has discovered that this isn't practical for a few reasons.

                        It's a neat script regardless. I also prefer python for crawling websites.

                        Comment


                        • #13
                          Nick Cox I am sorry about cross-posting. I did not know the rule.
                          Thank you all for the replies. Actually I do use python and selenium but the anaalyses that we want to do are better performed in STATA.
                          Daniel Schaefer is right actually. I found out that this is not possible so at the end of the day I am trying to use only the .dta strictly needed compressing them and store online the other ones. However if also this turns out to be infeasible I must go to python.

                          Thanks all for the kind replies and sorry again for cross-posting

                          Comment

                          Working...
                          X