Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calling in SQL through ODBC--String variables & choosing data

    Hello Statalist ,

    I am using Stata 17 on Windows. I am new to the ODBC process and I am trying to call in a large dataset from SQL. The dataset is mostly string variables, including the year variable. I have been successful with smaller datasets--they were able to load fairly quickly and I could manipulate the variables.
    The code that works is:

    Code:
    odbc load, table("MySmallTable")
    I want choose what data is loaded based on year to make it more manageable. The code I am trying (without luck) is:

    Code:
    odbc load if year=="2015", table("MyTable) dsn("DataSource")
    Is there a way to:

    a) destring all the variables prior to loading to make the data more manageable
    b) a way to destring just one variable (in this case year) so that I could try if year>2015
    c) a way to indicate which specific values of year I want. for example if year=="2015" & "2016" & "2017"

    Thank you so much in advance!
    I am not able to share screenshots or the data due to the confidential nature of the data (accessed through a secure lab).
    Last edited by Elena Draghici; 27 Jul 2022, 14:46. Reason: ODBC SQL

  • #2
    Originally posted by Elena Draghici View Post
    The code I am trying (without luck) is:

    Code:
    odbc load if year=="2015", table("MyTable) dsn("DataSource")
    What doesn't work about it?

    It might be better to do the subsetting in the database using SQL WHERE year = '2015' predicate in the odbc load, exec() style syntax of the Stata command.

    Is there a way to:

    a) destring all the variables prior to loading to make the data more manageable
    SQL has the standard CAST() function and various relational database management system vendors have their own extensions, e.g., CONVERT().

    b) a way to destring just one variable (in this case year) so that I could try if year>2015
    The datatype conversion functions take a single argument.

    c) a way to indicate which specific values of year I want. for example if year=="2015" & "2016" & "2017"
    The SQL statement would use the OR operator or IN condition in the predicate, e.g., WHERE year IN ('2015', '2016', '2017')

    Comment


    • #3
      I think there is a corollary, implicit in the answer provided in #2, that OP would be better served with a deeper understanding of SQL (rather than Stata). Essentially, you want to talk to a SQL database. So you need to use your understanding of the structure of the underlying database to write a SQL interface that will produce the data you want. You will need a clear understanding of SQL to do that.

      Comment


      • #4
        Thank you so much for the responses. Joseph, the reason the data is not loading is because it is too big (and consists mostly of string variables which take up a lot of space). I also do not have the ability to manipulate any of the data in SQL without the Stata interface. When I attempt to load just the 2015 data, Stata still has trouble loading the data. But, the program accepted this code:

        Code:
        odbc load, exec("select * from [MyTable] where year in ('2015', '2016', '2017')") dsn("DataSOurce")
        And the data loaded! However, I would still like to understand how to get the variables destrung prior to loading into Stata (I cannot save a dataset, so it adds a layer of complexity to my situation because I would just load it using the code above, destring, and save). if you could please advise on the specific code I could use to load the data in destrung format, I would greatly appreciate it. My guess is it might look something like:

        Code:
        CONVERT(char, int)
        My question is where would that go in the code below:

        Code:
        odbc load, exec("select * from [MyTable] where year in ('2015', '2016', '2017')") dsn("DataSOurce")
        Also, since some of the variables are indeed strings (names, for instance) would I need to specify a list of the variables I want to convert or would the program automatically skip over those?

        Thank you so much for all the help!

        Comment


        • #5
          https://www.w3schools.com/sql/ is a good resource for SQL.

          I think you have to build the select statement: You have to list those cast statements and the other cols to fetch. A example of building a part of a select using mysql dialect:
          Code:
          jdbc load , exec("SELECT * FROM TABLENAME LIMIT 1")   clear
          
          local tocast Mydb_Patient_id // space sep "list" of colnames to cast
          qui ds // get all vars/cols to fetch
          local cols `r(varlist)'
          local asis : list cols - tocast 
          local asis = subinstr("`asis'", " ", ", ", .)
          
          * build part of SELECT following the dialect (below mysql)  
          
          foreach colname of local tocast { 
              
              local cast CAST( `colname' AS REAL) AS `colname' ,   
          }
           
          jdbc load , exec("SELECT `cast' `asis' FROM TABLENAME LIMIT 10")  clear

          Comment


          • #6
            correction to #5:
            Code:
            local cast `cast' CAST( `colname' AS REAL) AS `colname' ,

            Comment


            • #7
              Thank you to everyone for their input. Further to this question, is there a way to break up the SQL syntax in the select statement so the code does not have to be one continuous line? For example,

              Code:
               
               odbc load, exec("select * from [MyTable] where year in ('2015', '2016', '2017')") and where in var ('1','2','3') dsn("DataSOurce")
              Here I used a short example but the code I am referring to used multiple min max functions and is hard to read. If I wanted to break up the statement to look more readable such as:

              odbc load, exec("select * from [MyTable] where year in ('2015', '2016', '2017')") and where in var ('1','2','3') dsn("DataSOurce")

              Comment


              • #8
                Thank you to everyone for their input. Further to this question, is there a way to break up the SQL syntax in the select statement so the code does not have to be one continuous line? For example,

                Code:
                 odbc load, exec("select * from [MyTable] where year in ('2015', '2016', '2017')") and where in var ('1','2','3') dsn("DataSOurce")
                Here I used a short example but the code I am referring to used multiple min and max functions and is hard to read. If I wanted to break up the statement to look more readable such as:

                Code:
                odbc load, exec("select * from [MyTable]
                
                                         where year in ('2015', '2016', '2017')") and
                                         where in var ('1','2','3')
                                          dsn("DataSOurce")
                Whenever I try the second code, Stata does not read it as belonging to one statement and I have to have the entire code in one line. Does anyone know how to do this?

                Thank you in advance!

                Comment


                • #9
                  Code:
                  odbc sqlfile("myselect.sql")
                  or using macro
                  Code:
                  #delim;
                  
                  local select  `"
                  
                  select 
                    * 
                  from 
                    [MyTable] 
                  where 
                    year in ('2015', '2016', '2017') ") and where in var ('1','2','3')
                   
                  "'
                  ;
                  #delim cr
                  
                   
                  
                  odbc load, exec(`"`select'"')

                  Comment


                  • #10
                    Originally posted by Bjarte Aagnes View Post
                    Code:
                    odbc sqlfile("myselect.sql")
                    or using macro
                    Code:
                    #delim;
                    
                    local select `"
                    
                    select
                    *
                    from
                    [MyTable]
                    where
                    year in ('2015', '2016', '2017') ") and where in var ('1','2','3')
                    
                    "'
                    ;
                    #delim cr
                    
                    
                    
                    odbc load, exec(`"`select'"')
                    Thank you!!

                    Comment


                    • #11
                      re #9 ignore the
                      Code:
                      odbc sqlfile("myselect.sql")
                      you may try:

                      Code:
                      local select = fileread("myselect.sql")
                       
                      odbc load, exec(`"`select'"')

                      Comment

                      Working...
                      X