Building URL queries in import delimited

Adam Gallaher

Join Date: Feb 2021

Posts: 41
#1

Building URL queries in import delimited

29 Jun 2022, 08:19

I am looking to access an API by building a URL string with queries. The URL string works and will download the data I am looking for directly from a browser. However, when trying to implement it using the import delimited function, I receive an error "server refused to send file could not open url". Below is my code.

Code:

import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=report_date >= '2022-06-27T00:00:00.000'"
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

29 Jun 2022, 09:09

I don't think import delim can do that
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#3

29 Jun 2022, 10:59

Notice that your browser will convert the expression to a URL before downloading the file. Does the browser generated URL work?

Code:

import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=report_date%20%3E=%20%272022-06-27T00:00:00.000%27"

I can't test myself for reasons too tedious to go into, but if the above doesn't work see post #2.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#4

29 Jun 2022, 11:10

If the above URL works and you want a general, programable way to access the api, then you will need to dynamically generate the appropriate URL strings yourself. This is definitely more work, but far from impossible. After a quick google search, see https://www.stata.com/manuals/m-5urlencode.pdf
1 like
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

29 Jun 2022, 11:13

This was fun - I knew what the problem was, but it took a while for me to realize that the easiest solution lies outside Stata.

The problem is getting the dollar sign to be delivered to data.ct.gov without Stata treating it as a request for global macro substitution.

Since the URL is in fact interpreted by the recipient server as an HTML string, a straightforward approach is to replace the literal dollar sign — $ — with the HTML code for the dollar sign — $ — as in the following example. Luckily neither & nor # nor ; have special meanings to Stata when they appear in a string.

Code:

. import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?&#36;where=report_date >= '2022-06-27T00:00:00.000'"
(encoding automatically selected: ISO-8859-1)
(94 vars, 8 obs)

. describe, fullnames

Contains data
 Observations:             8                  
    Variables:            94                  
------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
------------------------------------------------------------------------------------------------
report_date     str23   %23s                  
cumulative_cases
                long    %12.0g                
cumulative_cases_confirmed
                long    %12.0g                
cumulative_cases_probable
                long    %12.0g                
cumulative_tests_reportable
                long    %12.0g                
cumulative_deaths
                int     %8.0g                 
cases_7days     int     %8.0g                 
confirmed_cases_7days
                int     %8.0g                 
probable_cases_7days
                int     %8.0g                 
tests_reportable_7days
                long    %12.0g                
newly_reported_deaths
                byte    %8.0g                 
positive_naat_7days
                int     %8.0g                 
tests_naat_7days
                long    %12.0g                
naat_positivity_7days
                float   %9.0g                 
cumulative_positive_naat
                long    %12.0g                
cumulative_tests_naat
                long    %12.0g                
positive_ag_7days
                int     %8.0g                 
census_today    int     %8.0g                 
fullyvax_today  int     %8.0g                 
partialvax_today
                byte    %8.0g                 
nonvax_today    byte    %8.0g                 
adddose_today   byte    %8.0g                 
census_7days_ago
                int     %8.0g                 
census_change   byte    %8.0g                 
census_not_fully_vax
                byte    %8.0g                 
census_pct_not_fully_vax
                float   %9.0g                 
cases_age_group_0_9_7days
                int     %8.0g                 
cases_age_group_10_19_7days
                int     %8.0g                 
cases_age_group_20_29_7days
                int     %8.0g                 
cases_age_group_30_39_7days
                int     %8.0g                 
cases_age_group_40_49_7days
                int     %8.0g                 
cases_age_group_50_59_7days
                int     %8.0g                 
cases_age_group_60_69_7days
                int     %8.0g                 
cases_age_group_70_79_7days
                int     %8.0g                 
cases_age_group_80_7days
                int     %8.0g                 
cases_age_group_unknown_7days
                byte    %8.0g                 
cumulative_cases_agegroup
                long    %12.0g                
cumulative_cases_agegroup_1
                long    %12.0g                
cumulative_cases_agegroup_2
                long    %12.0g                
cumulative_cases_agegroup_3
                long    %12.0g                
cumulative_cases_agegroup_4
                long    %12.0g                
cumulative_cases_agegroup_5
                long    %12.0g                
cumulative_cases_agegroup_6
                long    %12.0g                
cumulative_cases_agegroup_7
                long    %12.0g                
cumulative_cases_agegroup_8
                int     %8.0g                 
cumulative_cases_agegroup_9
                int     %8.0g                 
cases_hisp_race_hispanic_7days
                int     %8.0g                 
cases_hisp_race_nh_american_indi
                byte    %8.0g                 cases_hisp_race_nh_american_indian
cases_hisp_race_nh_asian_or_paci
                int     %8.0g                 cases_hisp_race_nh_asian_or_pacific
cases_hisp_race_nh_black_7days
                int     %8.0g                 
cases_hisp_race_nh_multiracial
                byte    %8.0g                 
cases_hisp_race_nh_white_7days
                int     %8.0g                 
cases_hisp_race_unknown_7days
                int     %8.0g                 
cumulative_cases_hisp_race
                long    %12.0g                
cumulative_cases_hisp_race_1
                int     %8.0g                 
cumulative_cases_hisp_race_2
                int     %8.0g                 
cumulative_cases_hisp_race_3
                long    %12.0g                
cumulative_cases_hisp_race_4
                int     %8.0g                 
cumulative_cases_hisp_race_5
                long    %12.0g                
cumulative_cases_hisp_race_6
                long    %12.0g                
cases_gender_female_7days
                int     %8.0g                 
cases_gender_male_7days
                int     %8.0g                 
cases_gender_other_7days
                byte    %8.0g                 
cases_gender_unknown_7days
                byte    %8.0g                 
cumulative_cases_gender_female
                long    %12.0g                
cumulative_cases_gender_male
                long    %12.0g                
cumulative_cases_gender_other
                int     %8.0g                 
cumulative_cases_gender
                int     %8.0g                 
cumulative_cases_gender_na
                int     %8.0g                 
case_rate_weekly
                float   %9.0g                 
state_population
                long    %12.0g                
cumulative_deaths_agegroup
                byte    %8.0g                 
cumulative_deaths_agegroup_1
                byte    %8.0g                 
cumulative_deaths_agegroup_2
                byte    %8.0g                 
cumulative_deaths_agegroup_3
                int     %8.0g                 
cumulative_deaths_agegroup_4
                int     %8.0g                 
cumulative_deaths_agegroup_5
                int     %8.0g                 
cumulative_deaths_agegroup_6
                int     %8.0g                 
cumulative_deaths_agegroup_7
                int     %8.0g                 
cumulative_deaths_agegroup_8
                int     %8.0g                 
cumulative_deaths_agegroup_9
                byte    %8.0g                 
cumulative_deaths_hisp_race
                int     %8.0g                 
cumulative_deaths_hisp_race_1
                byte    %8.0g                 
cumulative_deaths_hisp_race_2
                int     %8.0g                 
cumulative_deaths_hisp_race_3
                int     %8.0g                 
cumulative_deaths_hisp_race_4
                int     %8.0g                 
cumulative_deaths_hisp_race_5
                int     %8.0g                 
cumulative_deaths_hisp_race_6
                int     %8.0g                 
cumulative_deaths_gender
                int     %8.0g                 
cumulative_deaths_gender_1
                int     %8.0g                 
cumulative_deaths_gender_2
                byte    %8.0g                 
cumulative_deaths_gender_3
                byte    %8.0g                 
fips            byte    %8.0g                 
data_updated    str23   %23s                  
------------------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

.

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#6

29 Jun 2022, 11:19

This is a nice trick, and I frankly wish it were better documented, but when I do

Code:

import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv", clear br

I believe I get the same results, no?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#7

29 Jun 2022, 11:21

Very nice, William Lisowski.

What I find strange is that you cannot do this simply by using \$ within the string representing the URL. Stata does ordinarily interpret \$ as a literal dollar sign:

Code:

. display `"$S_ADO"' BASE;SITE;.;PERSONAL;PLUS;OLDPLACE . display `"\$S_ADO"' $S_ADO

But for some reason, when you try it in this context, it does not work:

Code:

. import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=report_date >= '2022-06-27T00:00:00.000'" (encoding automatically selected: ISO-8859-1) (94 vars, 8 obs) . clear . import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?\$where=report_date >= '2022-06-27T00:00:00.000'" server refused to send file could not open url r(603);

I'm surprised that this doesn't work. It seems inconsistent with the way that escape sequences are normally handled in Stata strings.

Last edited by Clyde Schechter; 29 Jun 2022, 11:33.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#8

29 Jun 2022, 11:24

Jared Greathouse I suspect that file will not be filtered by dates. This is what the where clause in the URL does.
Comment
Adam Gallaher

Join Date: Feb 2021

Posts: 41
#9

29 Jun 2022, 11:40

Thank you all, however the solution proposed by William Lisowski is still incorrect. Looking at the data, after passing the $ into the URL string all 8 observations are imported. The goal is to only import data after 6/26/2022 or just data or 6/27/2022 and 6/28/2022. Clyde Schechter I also tried to use the \$ however for some reason it does not work.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#10

29 Jun 2022, 11:50

The browser will also fail to download the filtered results when &#36 is included in the string rather than $.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#11

29 Jun 2022, 11:55

Is it possible that the escape character actually works as expected, but some other part of the query is malformed? What about this?

Code:

import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?\$where=report_date>=%272022-06-27T00:00:00.000%27"

Edit: Again, I'm sorry. Although I know it is best practice, I still can't test this myself.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#12

29 Jun 2022, 12:37

Ah, well, I was so glad to get the URL to be recognized that I failed to notice that the query portion of the URL had been ignored, rather than tried and failed. I'm guessing the problem is at data.ct.gov - perhaps the portion of the URL following the ? is not in fact translated from encoded HTML, although I do not understand why the $ did not throw an error.

As to not being able to escape the dollar sign with a backslash, my guess is that import_delimited.ado is a back-and-forth between Stata and Mata and calls from Mata back to Stata and somewhere along the way the URL is being parsed by Stata enough times that the bare dollar sign that had been escaped with the backslash is parsed again and treated as request for global macro substitution. My attempts to do devious things using `macval()' also did not succeed, as also when I simplified the task to

Code:

tempfile foo copy "https://data.ct.gov/resource/qmgw-5kp6.csv?\$where=report_date >= '2022-06-27T00:00:00.000'" "`foo'"

Post #11 came in while I was writing this; I tried it and it failed. We can simplify the query to

Code:

import delimited "https://data.ct.gov/resource/qmgw-5kp6.csv?$where=cumulative_cases>=826779", clear

to return the same two observations, and it performs similarly, which suggests the problem is not in the characters following $where= in the query.
Comment

Announcement