Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A question on insheetjson - Keep getting "Invalid column name/selector" error message when using -insheetjson-

    I am trying to get some data from the API of American Community Survey by using insheetjson. But when I run the following codes, I got the error message "Invalid column name/selector 'CBSA'. (Possible name candidates are: )"

    Code:
    gen str10 CBSA_temp = ""
    gen str200 Area = ""
    gen str10 temp = ""
    
    insheetjson CBSA_temp Area temp using "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20statistical%20area/micropolitan%20statistical%20area:*", aoa(1) columns("CBSA" "NAME" "metropolitan statistical area/micropolitan statistical area")
    I thought it might be that the column names of the first row from the web server were not "CBSA", "NAME", and "metropolitan statistical area/micropolitan statistical area". Thus, I used the following command to check the data

    Code:
    insheetjson CBSA_temp Area temp using "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20statistical%20area/micropolitan%20statistical%20area:*", showresponse
    But it show that the column names were correct, please see below

    Code:
    Response from server:
    [
     [
      "CBSA",
      "NAME",
      "metropolitan statistical area/micropolitan statistical area"
     ],
     [
      "10140",
      "Aberdeen, WA Micro Area",
      "10140"
     ],
     [
      "10180",
      "Abilene, TX Metro Area",
      "10180"
     ],
    ......
    Does anyone know why this happened?

  • #2
    Can someone help me?

    Comment


    • #3
      A guess with my eyes closed is that insheetjson would also need to download the data from the https that you are feeding to it, and Stata issues an error message for that:
      Code:
      . copy "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%
      > 20statistical%20area/micropolitan%20statistical%20area:*" "`tmp1'"
      Received close_notify during handshake
      r(5100);
      The help file says:
      Code:
      [P]     error . . . . . . . . . . . . . . . . . . . . . . . . Return code 5100
              a Java runtime exception occurred
              This indicates a Java runtime exception occurred inside a
              Java program.  Typically exceptions should be caught by the
              Java program and handled in a graceful way.  This error
              indicates the exception was not handled by the program and
              therefore it is handle by the system.
      At least this is the behavior in Stata 15.0 for Windows (on win10). The same URL opens in Chrome fine. With this your best bet is to trace the insheetjson, confirm that it stumbles on the same problem and then contact the support.

      Best, Sergiy.

      Comment


      • #4
        I really liked Sergiy's "guess" but I don't think that's the root of the problem. On Stata 15.1 with macOS Mojave 10.14.2, I have the following experience of successfully copying the data at the URL, but still having the insheetjson fail.
        Code:
        . clear
        
        .
        . tempfile gnxl
        
        . copy "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20stat
        > istical%20area/micropolitan%20statistical%20area:*" "`gnxl'"
        
        . type "`gnxl'", lines(5)
        [["CBSA","NAME","metropolitan statistical area/micropolitan statistical area"],
        ["10140","Aberdeen, WA Micro Area","10140"],
        ["10180","Abilene, TX Metro Area","10180"],
        ["10300","Adrian, MI Micro Area","10300"],
        ["10380","Aguadilla-Isabela, PR Metro Area","10380"],
        .
        . gen str10 CBSA_temp = ""
        
        . gen str200 Area = ""
        
        . gen str10 temp = ""
        
        .
        . insheetjson CBSA_temp Area temp using ///
        >      "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20stat
        > istical%20area/micropolitan%20statistical%20area:*" ///
        >          , aoa(1) columns("CBSA" "NAME" "metropolitan statistical area/micropolitan statistic
        > al area")
        Invalid column name/selector 'CBSA'. (Possible name candidates are: )
         Empty result returned; Nothing to do.
        
        .          
        . insheetjson CBSA_temp Area temp using ///
        >      "`gnxl'" ///
        >          , aoa(1) columns("CBSA" "NAME" "metropolitan statistical area/micropolitan statistic
        > al area")
        Invalid column name/selector 'CBSA'. (Possible name candidates are: )
         Empty result returned; Nothing to do.
        Added in edit: I added an attempt to use insheetjson to read the file copied from the URL, and it failed identically to the attempt to read the data from the URL.
        Last edited by William Lisowski; 29 Jan 2019, 13:07.

        Comment


        • #5
          Originally posted by William Lisowski View Post
          I really liked Sergiy's "guess" but I don't think that's the root of the problem. On Stata 15.1 with macOS Mojave 10.14.2, I have the following experience of successfully copying the data at the URL, but still having the insheetjson fail.
          Code:
          . clear
          
          .
          . tempfile gnxl
          
          . copy "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20stat
          > istical%20area/micropolitan%20statistical%20area:*" "`gnxl'"
          
          . type "`gnxl'", lines(5)
          [["CBSA","NAME","metropolitan statistical area/micropolitan statistical area"],
          ["10140","Aberdeen, WA Micro Area","10140"],
          ["10180","Abilene, TX Metro Area","10180"],
          ["10300","Adrian, MI Micro Area","10300"],
          ["10380","Aguadilla-Isabela, PR Metro Area","10380"],
          .
          . gen str10 CBSA_temp = ""
          
          . gen str200 Area = ""
          
          . gen str10 temp = ""
          
          .
          . insheetjson CBSA_temp Area temp using ///
          > "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20stat
          > istical%20area/micropolitan%20statistical%20area:*" ///
          > , aoa(1) columns("CBSA" "NAME" "metropolitan statistical area/micropolitan statistic
          > al area")
          Invalid column name/selector 'CBSA'. (Possible name candidates are: )
          Empty result returned; Nothing to do.
          
          .
          . insheetjson CBSA_temp Area temp using ///
          > "`gnxl'" ///
          > , aoa(1) columns("CBSA" "NAME" "metropolitan statistical area/micropolitan statistic
          > al area")
          Invalid column name/selector 'CBSA'. (Possible name candidates are: )
          Empty result returned; Nothing to do.
          Added in edit: I added an attempt to use insheetjson to read the file copied from the URL, and it failed identically to the attempt to read the data from the URL.
          Thanks William. That's exactly the problem that I had encountered. Do you have a solution to this? Or do you encountered this problem before using -insheetjson- ?

          Just an add-on, I suspect that it could be -insheetjson- doesn't read the whole data. Because when I use the following command, I found one "]" is missing at the end of the results.

          Code:
           insheetjson CBSA_temp Area temp using "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20statistical%20area/micropolitan%20statistical%20area:*", showresponse

          Comment


          • #6
            I have no solution to this. I agree that the showresponse option does not show the final right bracket "]" in the returned data. I can confirm that the character is indeed in the data returned by the server.

            You might write to the author of insheetjson and libjson sending him a link to this disucssion and ask for his advice.

            Comment


            • #7
              This is likely to be related to this issue.

              In any case, I'd recommend (if downloading via that API works for you) to download the file into a temporary file, open it for append and write a couple of empty lines to its tail. That should not affect the JSON content, but if the trailing bracket is missing because of the above mentioned problem with the last line, this should fix it (and if confirmed the insheetjson can also be patched to do the same automatically).

              Best, Sergiy

              Comment


              • #8
                I considered that possibility and ruled it out while writing post #6.

                Comment


                • #9
                  My error mentioned in post #3 above seems to be caused by an unrelated certificates issue, as clarified by James Hassell in the neighboring thread.


                  For the task at hand the following is a crude way:
                  Code:
                  do http://www.radyakin.org/statalist/2019/area_import.do
                  Which brings in:
                  Code:
                  Contains data
                    obs:           518                          
                   vars:             3                          
                   size:        33,670                          
                  -----------------------------------------------------------------------------------------
                                storage   display    value
                  variable name   type    format     label      variable label
                  -----------------------------------------------------------------------------------------
                  cbsa            long    %12.0g                CBSA
                  name            str57   %57s                  NAME
                  metropolitans~o long    %12.0g                metropolitan statistical area/micropolitan
                                                                  statistical area
                  -----------------------------------------------------------------------------------------
                  Substitute the infpfile to refer to that API point, as I can't tweak Java on this machine I have to settle for debugging with a local cached copy.

                  Best, Sergiy

                  Comment


                  • #10


                    Code:
                    version 15.0
                    
                    clear all
                    
                    local inpfile "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20statistical%20area/micropolitan%20statistical%20area:*"
                    
                    tempfile tmp tmp2
                    
                    quietly {
                        copy `"`inpfile'"' `"`tmp2'"'
                        filefilter `"`tmp2'"' `"`tmp'"', from("[") to("") replace
                        filefilter `"`tmp'"' `"`tmp2'"', from("]") to("") replace
                        filefilter `"`tmp2'"' `"`tmp'"', from(",`=char(10)'") to("`=char(10)'") replace
                    }
                    
                    insheet using `"`tmp'"', comma names
                     
                    describe

                    Also, for someone who wants to know why Sergiy uses `''`, just read the section "18.3.5 Double quotes" in this pdf: https://www.stata.com/manuals13/u18.pdf#u18.3Macros
                    Last edited by Victor Smith; 31 Jan 2019, 06:29.

                    Comment


                    • #11
                      Originally posted by William Lisowski View Post
                      I have no solution to this. I agree that the showresponse option does not show the final right bracket "]" in the returned data. I can confirm that the character is indeed in the data returned by the server.

                      You might write to the author of insheetjson and libjson sending him a link to this disucssion and ask for his advice.
                      Hi William, I actually had emailed the authors before I posted the questions here. Unfortunately, no one replies, yet.

                      Comment


                      • #12
                        Victor - Thanks for having contacted the authors. Because they have not weighted in, I'll mention a possible second problem with insheetjson that I did not point out in my previous posts, hoping to allow them to discover it for themselves, rather than appear to be "piling on".

                        I ran the code with trace turned on, and it appears to me that insheetjson will not function properly when a selector has embedded blanks, as the third selector in the ACS JSON does. I ruled out the third selector being the cause of the immediate problem by using a local copy of the data that had been edited down to just the first three lines (with the second "]" added at the end for the third) and just the first two selectors, deleting the third in the first line, and the corresponding data in the two lines of data. It still failed to parse the selectors on the first line, exactly as before, and at that point I threw in the towel.

                        My take at this point is that I would build plans around using insheetjson in a production system without serious testing on representative data first. And for array-of-array tabular data, presented nicely one array on each line of input, I'd go with Sergiy's approach instead.

                        Comment


                        • #13
                          Victor Smith & William Lisowski,

                          I would say that the bigger issue here is that the structure of the JSON is invalid for the data the server is returning. The array of arrays structure that they are using is really just a bad way to represent a CSV file with arrays; it’s really bad because the column headings are really attributes to which values would be attached/assigned to return a complete object (row). Can’t copy and paste the snippet above, but will try to make it easy to illustrate.

                          Code:
                          [
                          [
                          “CBSA”,
                          ”NAME”,
                          ”metro”
                          ],
                          [ 123, here, nowheresville],
                          ...
                          ]
                          JSON arrays, like vectors in Mata, can store an arbitrary number of values of the same type. Unlike Mata, however, the array is a type unto itself, so you could probably store an array of strings followed by an array of floats, etc... basically an array is a stupid/naive container. Values in one array are not related in any way to values in others in the strict sense. The only reason why we would say there is a relationship is that we are all thinking about the payload in terms of how it would appear if we opened the CSV in MS Excel.

                          What should happen is the data should be structured as an array of objects:

                          Code:
                          [
                          {
                          “CSBA” : 123,
                          ”NAME”: “here”,
                          ”metro” : “nowheresville”
                          },
                          ...
                          ]
                          in this case the two dimensional structure of the data is a bit clearer and the attribution of values to variables is explicitly declared. I haven’t tried to use jsonio to do anything like this, but imagine it could be feasible.

                          Comment


                          • #14
                            Inspired by wbuchanan I have experimented further and found that if we replace the column selectors in the columns() option with column numbers
                            Code:
                            insheetjson CBSA_temp Area temp using "https://api.census.gov/data/2017/acs/acs1/profile?get=CBSA,NAME&for=metropolitan%20statistical%20area/micropolitan%20statistical%20area:*", aoa(1) columns(1 2 3)
                            we get what we expected.
                            Code:
                            . gen str10 CBSA_temp = ""
                            
                            . gen str200 Area = ""
                            
                            . gen str10 temp = ""
                            
                            .
                            . insheetjson CBSA_temp Area temp using "https://api.census.gov/data/2017/acs/acs1/profile?get=
                            > CBSA,NAME&for=metropolitan%20statistical%20area/micropolitan%20statistical%20area:*", aoa(1)
                            > columns(1 2 3)
                             518 observations updated/written.
                            
                            . describe
                            
                            Contains data
                              obs:           518                          
                             vars:             3                          
                             size:       113,960                          
                            -----------------------------------------------------------------------------------------------
                                          storage   display    value
                            variable name   type    format     label      variable label
                            -----------------------------------------------------------------------------------------------
                            CBSA_temp       str10   %10s                  
                            Area            str200  %200s                
                            temp            str10   %10s                  
                            -----------------------------------------------------------------------------------------------
                            Sorted by:
                                 Note: Dataset has changed since last saved.
                            
                            . list in 1/5, noobs abbreviate(10)
                            
                              +------------------------------------------------------+
                              | CBSA_temp                               Area    temp |
                              |------------------------------------------------------|
                              |     10140            Aberdeen, WA Micro Area   10140 |
                              |     10180             Abilene, TX Metro Area   10180 |
                              |     10300              Adrian, MI Micro Area   10300 |
                              |     10380   Aguadilla-Isabela, PR Metro Area   10380 |
                              |     10420               Akron, OH Metro Area   10420 |
                              +------------------------------------------------------+
                            I will freely admit that I have no justification in the output of help insheetjson for using column numbers rather than the "header keys" (their term, not mine) from the JSON feed. There is no example shown using the aoa() option, and the discussions of the aoa() option and of selectors are largely opaque to me, perhaps because I have no actual experience with JSON, I just act as if I do here on Statalist. And sometimes I get lucky.

                            Added in edit: All attempts to search the internet for a discussion of formatting a JSON feed in the "array of arrays" ("hybrid table") format failed me. I can find nothing for it that explains what the items in the first row correspond to in standard JSON terminology. Having such documentation might better explain why what I did worked. Using column numbers does avoid the potential problem I describe in post #12, so perhaps that is what aoa() is designed for.
                            Last edited by William Lisowski; 31 Jan 2019, 09:43.

                            Comment


                            • #15
                              William Lisowski
                              the reason why the column numbers are likely providing the intended behavior is that it is referencing the array indices instead of attempting to assign the value to its associated attribute name (variable). The danger with this method is that there is no true guarantee that the values in the array are ordered consistently with the attribute names. The census bureau likely works to ensure that they are consistently ordered, but there is nothing preventing the order from changing in the JSON specification. This is why I mentioned previously that the census bureau should return an array of objects. It will ensure that the values are consistently attributed to their associated attributes.

                              Comment

                              Working...
                              X