Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Truncated data when using - odbc load -

    Hey Everyone,

    I'm working with air travel data stored in SQL Server. One of the tables, called "FLEET" identifies various attributes of different airline's fleet (e.g. aircraft model and series, engine model, etc.). the "OPERATOR" column identifies the carrier (e.g. "UNITED AIRLINES", "AMERICAN AIRLINES", etc.). [Side note: everything in the database is stored in all caps, hence the way I'm typing it]. In terms of character length, no OPERATOR exceeds 45 characters (including spaces), so I am not encountering long string limitations.

    When using the Code:
    Code:
     odbc load, table("FLEET") dsn("DSNNAME")
    or the Code:
    Code:
     odbc load, exec("SELECT * FROM FLEET") dsn("DSNNAME")
    each observation in the OPERATOR column is truncated to five characters with extra nonstandard characters after the fifth. For example, AMERICAN AIRLINES appears as AMERI\0\0\0\0Žï® in some instances (but not in all instances). The only airlines that are not truncated are those that have five or fewer characters. I doubt this matters, but many (although not all) of the nonstandard instances involve the Žï® characters in that order (although there are some that do not have any of those characters).

    If I enter the SQL command found in Code 2 directly into SQL Server Management Studio, I do not see any truncation at all. Also, there is another table that contains some of the same airlines (but not all of the same airlines), that table does not experience this error. It is just the OPERATOR column of the FLEET table that experiences this error. [I realize that this indicates that my database does not follow standard normalization principles. Unfortunately I did not create the database and do not have the authority to change it].

    I need to perform operations on the data in the FLEET table, which means I must have the correct OPERATOR data. I'm not sure how to accomplish that, so I would appreciate any advice on the matter.

    Final Notes (maybe they'll mean more to you than to me):
    ​Operating System: Windows 7 Enterprise
    Stata Version: IC 13.1 64-bit
    SQL Server Management Studio Version: 11.0.5343.0
    SQL Server 2012
    ODBC Data Source Administrator Driver: SQL Server Native Client 10.0 Version 2009.100.4042.00

    Thank you,

    Andrew

  • #2
    Check the string encoding in the SQL Server instance. It is likely/possible that the server is storing the string data with an encoding different from what Stata is expecting. You can check the default encoding via the server's collation property. Do you know what type the variable is stored in? You might want to try something like:

    Code:
    odbc lo, exec("SELECT CAST(NULLIF(LTRIM(RTRIM(OPERATOR)), '') AS NVARCHAR(50)) FROM FLEET") dsn(DSNNAME)
    SQL Server uses four different types to store/manage string data: CHAR, NCHAR, VARCHAR, and NVARCHAR.

    Comment


    • #3
      Thank you Sir!

      I did a comparison of the tables that stored airline names and there was indeed a difference in the encoding. Selecting as NVARCHAR(50) instead of NVARCHAR(MAX) appears to have solved the problem.

      Thank you again,

      Andrew

      Comment


      • #4
        Not a problem. If you're able to spin up an instance of Postgres, there is a SQL Server foreign data wrapper available for it. Then you can dump copies in nightly ETL to Postgres and not have to deal with any of that ridiculousness.

        Comment

        Working...
        X