Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Building URL queries in import delimited

    I am looking to access an API by building a URL string with queries. The URL string works and will download the data I am looking for directly from a browser. However, when trying to implement it using the import delimited function, I receive an error "server refused to send file could not open url". Below is my code.

    Code:
    import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=report_date >= '2022-06-27T00:00:00.000'"

  • #2
    I don't think import delim can do that

    Comment


    • #3
      Notice that your browser will convert the expression to a URL before downloading the file. Does the browser generated URL work?

      Code:
      import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=report_date%20%3E=%20%272022-06-27T00:00:00.000%27"
      I can't test myself for reasons too tedious to go into, but if the above doesn't work see post #2.

      Comment


      • #4
        If the above URL works and you want a general, programable way to access the api, then you will need to dynamically generate the appropriate URL strings yourself. This is definitely more work, but far from impossible. After a quick google search, see https://www.stata.com/manuals/m-5urlencode.pdf

        Comment


        • #5
          This was fun - I knew what the problem was, but it took a while for me to realize that the easiest solution lies outside Stata.

          The problem is getting the dollar sign to be delivered to data.ct.gov without Stata treating it as a request for global macro substitution.

          Since the URL is in fact interpreted by the recipient server as an HTML string, a straightforward approach is to replace the literal dollar sign — $ — with the HTML code for the dollar sign — $ — as in the following example. Luckily neither & nor # nor ; have special meanings to Stata when they appear in a string.
          Code:
          . import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=report_date >= '2022-06-27T00:00:00.000'"
          (encoding automatically selected: ISO-8859-1)
          (94 vars, 8 obs)
          
          . describe, fullnames
          
          Contains data
           Observations:             8                  
              Variables:            94                  
          ------------------------------------------------------------------------------------------------
          Variable      Storage   Display    Value
              name         type    format    label      Variable label
          ------------------------------------------------------------------------------------------------
          report_date     str23   %23s                  
          cumulative_cases
                          long    %12.0g                
          cumulative_cases_confirmed
                          long    %12.0g                
          cumulative_cases_probable
                          long    %12.0g                
          cumulative_tests_reportable
                          long    %12.0g                
          cumulative_deaths
                          int     %8.0g                 
          cases_7days     int     %8.0g                 
          confirmed_cases_7days
                          int     %8.0g                 
          probable_cases_7days
                          int     %8.0g                 
          tests_reportable_7days
                          long    %12.0g                
          newly_reported_deaths
                          byte    %8.0g                 
          positive_naat_7days
                          int     %8.0g                 
          tests_naat_7days
                          long    %12.0g                
          naat_positivity_7days
                          float   %9.0g                 
          cumulative_positive_naat
                          long    %12.0g                
          cumulative_tests_naat
                          long    %12.0g                
          positive_ag_7days
                          int     %8.0g                 
          census_today    int     %8.0g                 
          fullyvax_today  int     %8.0g                 
          partialvax_today
                          byte    %8.0g                 
          nonvax_today    byte    %8.0g                 
          adddose_today   byte    %8.0g                 
          census_7days_ago
                          int     %8.0g                 
          census_change   byte    %8.0g                 
          census_not_fully_vax
                          byte    %8.0g                 
          census_pct_not_fully_vax
                          float   %9.0g                 
          cases_age_group_0_9_7days
                          int     %8.0g                 
          cases_age_group_10_19_7days
                          int     %8.0g                 
          cases_age_group_20_29_7days
                          int     %8.0g                 
          cases_age_group_30_39_7days
                          int     %8.0g                 
          cases_age_group_40_49_7days
                          int     %8.0g                 
          cases_age_group_50_59_7days
                          int     %8.0g                 
          cases_age_group_60_69_7days
                          int     %8.0g                 
          cases_age_group_70_79_7days
                          int     %8.0g                 
          cases_age_group_80_7days
                          int     %8.0g                 
          cases_age_group_unknown_7days
                          byte    %8.0g                 
          cumulative_cases_agegroup
                          long    %12.0g                
          cumulative_cases_agegroup_1
                          long    %12.0g                
          cumulative_cases_agegroup_2
                          long    %12.0g                
          cumulative_cases_agegroup_3
                          long    %12.0g                
          cumulative_cases_agegroup_4
                          long    %12.0g                
          cumulative_cases_agegroup_5
                          long    %12.0g                
          cumulative_cases_agegroup_6
                          long    %12.0g                
          cumulative_cases_agegroup_7
                          long    %12.0g                
          cumulative_cases_agegroup_8
                          int     %8.0g                 
          cumulative_cases_agegroup_9
                          int     %8.0g                 
          cases_hisp_race_hispanic_7days
                          int     %8.0g                 
          cases_hisp_race_nh_american_indi
                          byte    %8.0g                 cases_hisp_race_nh_american_indian
          cases_hisp_race_nh_asian_or_paci
                          int     %8.0g                 cases_hisp_race_nh_asian_or_pacific
          cases_hisp_race_nh_black_7days
                          int     %8.0g                 
          cases_hisp_race_nh_multiracial
                          byte    %8.0g                 
          cases_hisp_race_nh_white_7days
                          int     %8.0g                 
          cases_hisp_race_unknown_7days
                          int     %8.0g                 
          cumulative_cases_hisp_race
                          long    %12.0g                
          cumulative_cases_hisp_race_1
                          int     %8.0g                 
          cumulative_cases_hisp_race_2
                          int     %8.0g                 
          cumulative_cases_hisp_race_3
                          long    %12.0g                
          cumulative_cases_hisp_race_4
                          int     %8.0g                 
          cumulative_cases_hisp_race_5
                          long    %12.0g                
          cumulative_cases_hisp_race_6
                          long    %12.0g                
          cases_gender_female_7days
                          int     %8.0g                 
          cases_gender_male_7days
                          int     %8.0g                 
          cases_gender_other_7days
                          byte    %8.0g                 
          cases_gender_unknown_7days
                          byte    %8.0g                 
          cumulative_cases_gender_female
                          long    %12.0g                
          cumulative_cases_gender_male
                          long    %12.0g                
          cumulative_cases_gender_other
                          int     %8.0g                 
          cumulative_cases_gender
                          int     %8.0g                 
          cumulative_cases_gender_na
                          int     %8.0g                 
          case_rate_weekly
                          float   %9.0g                 
          state_population
                          long    %12.0g                
          cumulative_deaths_agegroup
                          byte    %8.0g                 
          cumulative_deaths_agegroup_1
                          byte    %8.0g                 
          cumulative_deaths_agegroup_2
                          byte    %8.0g                 
          cumulative_deaths_agegroup_3
                          int     %8.0g                 
          cumulative_deaths_agegroup_4
                          int     %8.0g                 
          cumulative_deaths_agegroup_5
                          int     %8.0g                 
          cumulative_deaths_agegroup_6
                          int     %8.0g                 
          cumulative_deaths_agegroup_7
                          int     %8.0g                 
          cumulative_deaths_agegroup_8
                          int     %8.0g                 
          cumulative_deaths_agegroup_9
                          byte    %8.0g                 
          cumulative_deaths_hisp_race
                          int     %8.0g                 
          cumulative_deaths_hisp_race_1
                          byte    %8.0g                 
          cumulative_deaths_hisp_race_2
                          int     %8.0g                 
          cumulative_deaths_hisp_race_3
                          int     %8.0g                 
          cumulative_deaths_hisp_race_4
                          int     %8.0g                 
          cumulative_deaths_hisp_race_5
                          int     %8.0g                 
          cumulative_deaths_hisp_race_6
                          int     %8.0g                 
          cumulative_deaths_gender
                          int     %8.0g                 
          cumulative_deaths_gender_1
                          int     %8.0g                 
          cumulative_deaths_gender_2
                          byte    %8.0g                 
          cumulative_deaths_gender_3
                          byte    %8.0g                 
          fips            byte    %8.0g                 
          data_updated    str23   %23s                  
          ------------------------------------------------------------------------------------------------
          Sorted by: 
               Note: Dataset has changed since last saved.
          
          .

          Comment


          • #6
            This is a nice trick, and I frankly wish it were better documented, but when I do
            Code:
            import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv", clear
            
            br
            I believe I get the same results, no?

            Comment


            • #7
              Very nice, William Lisowski.

              What I find strange is that you cannot do this simply by using \$ within the string representing the URL. Stata does ordinarily interpret \$ as a literal dollar sign:
              Code:
              . display `"$S_ADO"'
              BASE;SITE;.;PERSONAL;PLUS;OLDPLACE
              
              . display `"\$S_ADO"'
              $S_ADO
              But for some reason, when you try it in this context, it does not work:
              Code:
              . import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=report_date >= '2022-06-27T00:00:00.000'"
              (encoding automatically selected: ISO-8859-1)
              (94 vars, 8 obs)
              
              . clear
              
              . import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?\$where=report_date >= '2022-06-27T00:00:00.000'"
              server refused to send file
              could not open url
              r(603);
              I'm surprised that this doesn't work. It seems inconsistent with the way that escape sequences are normally handled in Stata strings.
              Last edited by Clyde Schechter; 29 Jun 2022, 11:33.

              Comment


              • #8
                Jared Greathouse I suspect that file will not be filtered by dates. This is what the where clause in the URL does.

                Comment


                • #9
                  Thank you all, however the solution proposed by William Lisowski is still incorrect. Looking at the data, after passing the $ into the URL string all 8 observations are imported. The goal is to only import data after 6/26/2022 or just data or 6/27/2022 and 6/28/2022. Clyde Schechter I also tried to use the \$ however for some reason it does not work.

                  Comment


                  • #10
                    The browser will also fail to download the filtered results when &#36 is included in the string rather than $.

                    Comment


                    • #11
                      Is it possible that the escape character actually works as expected, but some other part of the query is malformed? What about this?

                      Code:
                       import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?\$where=report_date>=%272022-06-27T00:00:00.000%27"
                      Edit: Again, I'm sorry. Although I know it is best practice, I still can't test this myself.

                      Comment


                      • #12
                        Ah, well, I was so glad to get the URL to be recognized that I failed to notice that the query portion of the URL had been ignored, rather than tried and failed. I'm guessing the problem is at data.ct.gov - perhaps the portion of the URL following the ? is not in fact translated from encoded HTML, although I do not understand why the $ did not throw an error.

                        As to not being able to escape the dollar sign with a backslash, my guess is that import_delimited.ado is a back-and-forth between Stata and Mata and calls from Mata back to Stata and somewhere along the way the URL is being parsed by Stata enough times that the bare dollar sign that had been escaped with the backslash is parsed again and treated as request for global macro substitution. My attempts to do devious things using `macval()' also did not succeed, as also when I simplified the task to
                        Code:
                        tempfile foo
                        copy "https://data.ct.gov/resource/qmgw-5kp6.csv?\$where=report_date >= '2022-06-27T00:00:00.000'" "`foo'"
                        Post #11 came in while I was writing this; I tried it and it failed. We can simplify the query to
                        Code:
                        import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=cumulative_cases>=826779", clear
                        to return the same two observations, and it performs similarly, which suggests the problem is not in the characters following $where= in the query.



                        Comment

                        Working...
                        X