Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Apparently undocumented rule for the delimiter in -import delimited-

    Short version: -import delimited- appears not to work well with delimiters above char(127), but that's not documented.

    While testing out my suggested answer for a previous posting here I had suggested using an unusual character (ASCII 171) as a delimiter for a file to be read with -import delimited- because a character like that would not likely appear in ordinary text. At least in my older version of Stata (15.1), -import delimited- will not parse variable values if the delimiter is a character whose ASCII code is above 127. While such a delimiter is of course a strange thing, there are times when using something like that might be useful, and I can't find any documentation of this as something not allowed. This behavior might be different in a newer version of Stata, but here's an example on which I'd like to get some feedback from someone running a newer version before I send this on to Stata tech support:

    Code:
    // Create a test file with an upper ASCII custom delimiter
    local delimiter = char(171)  
    sysuse auto, clear
    tempfile temp
    export delimited using "`temp'", delimiter("`delimiter'") nolabel datafmt quote replace
    // File looks ok although the delimiter isn't printable.
    type "`temp'"  
    //
    // Import this file and see if Stata parses it properly.
    clear
    import delimited using "`temp'", delimiter("`delimiter'")
    browse


  • #2
    Stata/MP4 version 18.0 for Windows (64-bit x86-64)

    Code:
    . // Create a test file with an upper ASCII custom delimiter
    . local delimiter = char(171)  
    
    . sysuse auto, clear
    (1978 automobile data)
    
    . tempfile temp
    
    . export delimited using "`temp'", delimiter("`delimiter'") nolabel datafmt quote replace
    (file C:\Users\JENKINSS\AppData\Local\Temp\ST_457c_000001.tmp not found)
    file C:\Users\JENKINSS\AppData\Local\Temp\ST_457c_000001.tmp saved
    
    . // File looks ok although the delimiter isn't printable.
    . type "`temp'"  
    make�price�mpg�rep78�headroom�trunk�weight�length�turn�displacement�gear_ratio�foreign
    "AMC Concord"�4099�22�3�2.5�11�2930�186�40�121�3.58�0
    "AMC Pacer"�4749�17�3�3.0�11�3350�173�40�258�2.53�0
    "AMC Spirit"�3799�22��3.0�12�2640�168�35�121�3.08�0
    "Buick Century"�4816�20�3�4.5�16�3250�196�40�196�2.93�0
    "Buick Electra"�7827�15�4�4.0�20�4080�222�43�350�2.41�0
    "Buick LeSabre"�5788�18�3�4.0�21�3670�218�43�231�2.73�0
    "Buick Opel"�4453�26��3.0�10�2230�170�34�304�2.87�0
    "Buick Regal"�5189�20�3�2.0�16�3280�200�42�196�2.93�0
    "Buick Riviera"�10372�16�3�3.5�17�3880�207�43�231�2.93�0
    "Buick Skylark"�4082�19�3�3.5�13�3400�200�42�231�3.08�0
    "Cad. Deville"�11385�14�3�4.0�20�4330�221�44�425�2.28�0
    "Cad. Eldorado"�14500�14�2�3.5�16�3900�204�43�350�2.19�0
    "Cad. Seville"�15906�21�3�3.0�13�4290�204�45�350�2.24�0
    "Chev. Chevette"�3299�29�3�2.5�9�2110�163�34�231�2.93�0
    "Chev. Impala"�5705�16�4�4.0�20�3690�212�43�250�2.56�0
    "Chev. Malibu"�4504�22�3�3.5�17�3180�193�31�200�2.73�0
    "Chev. Monte Carlo"�5104�22�2�2.0�16�3220�200�41�200�2.73�0
    "Chev. Monza"�3667�24�2�2.0�7�2750�179�40�151�2.73�0
    "Chev. Nova"�3955�19�3�3.5�13�3430�197�43�250�2.56�0
    "Dodge Colt"�3984�30�5�2.0�8�2120�163�35�98�3.54�0
    "Dodge Diplomat"�4010�18�2�4.0�17�3600�206�46�318�2.47�0
    "Dodge Magnum"�5886�16�2�4.0�17�3600�206�46�318�2.47�0
    "Dodge St. Regis"�6342�17�2�4.5�21�3740�220�46�225�2.94�0
    "Ford Fiesta"�4389�28�4�1.5�9�1800�147�33�98�3.15�0
    "Ford Mustang"�4187�21�3�2.0�10�2650�179�43�140�3.08�0
    "Linc. Continental"�11497�12�3�3.5�22�4840�233�51�400�2.47�0
    "Linc. Mark V"�13594�12�3�2.5�18�4720�230�48�400�2.47�0
    "Linc. Versailles"�13466�14�3�3.5�15�3830�201�41�302�2.47�0
    "Merc. Bobcat"�3829�22�4�3.0�9�2580�169�39�140�2.73�0
    "Merc. Cougar"�5379�14�4�3.5�16�4060�221�48�302�2.75�0
    "Merc. Marquis"�6165�15�3�3.5�23�3720�212�44�302�2.26�0
    "Merc. Monarch"�4516�18�3�3.0�15�3370�198�41�250�2.43�0
    "Merc. XR-7"�6303�14�4�3.0�16�4130�217�45�302�2.75�0
    "Merc. Zephyr"�3291�20�3�3.5�17�2830�195�43�140�3.08�0
    "Olds 98"�8814�21�4�4.0�20�4060�220�43�350�2.41�0
    "Olds Cutl Supr"�5172�19�3�2.0�16�3310�198�42�231�2.93�0
    "Olds Cutlass"�4733�19�3�4.5�16�3300�198�42�231�2.93�0
    "Olds Delta 88"�4890�18�4�4.0�20�3690�218�42�231�2.73�0
    "Olds Omega"�4181�19�3�4.5�14�3370�200�43�231�3.08�0
    "Olds Starfire"�4195�24�1�2.0�10�2730�180�40�151�2.73�0
    "Olds Toronado"�10371�16�3�3.5�17�4030�206�43�350�2.41�0
    "Plym. Arrow"�4647�28�3�2.0�11�3260�170�37�156�3.05�0
    "Plym. Champ"�4425�34�5�2.5�11�1800�157�37�86�2.97�0
    "Plym. Horizon"�4482�25�3�4.0�17�2200�165�36�105�3.37�0
    "Plym. Sapporo"�6486�26��1.5�8�2520�182�38�119�3.54�0
    "Plym. Volare"�4060�18�2�5.0�16�3330�201�44�225�3.23�0
    "Pont. Catalina"�5798�18�4�4.0�20�3700�214�42�231�2.73�0
    "Pont. Firebird"�4934�18�1�1.5�7�3470�198�42�231�3.08�0
    "Pont. Grand Prix"�5222�19�3�2.0�16�3210�201�45�231�2.93�0
    "Pont. Le Mans"�4723�19�3�3.5�17�3200�199�40�231�2.93�0
    "Pont. Phoenix"�4424�19��3.5�13�3420�203�43�231�3.08�0
    "Pont. Sunbird"�4172�24�2�2.0�7�2690�179�41�151�2.73�0
    "Audi 5000"�9690�17�5�3.0�15�2830�189�37�131�3.20�1
    "Audi Fox"�6295�23�3�2.5�11�2070�174�36�97�3.70�1
    "BMW 320i"�9735�25�4�2.5�12�2650�177�34�121�3.64�1
    "Datsun 200"�6229�23�4�1.5�6�2370�170�35�119�3.89�1
    "Datsun 210"�4589�35�5�2.0�8�2020�165�32�85�3.70�1
    "Datsun 510"�5079�24�4�2.5�8�2280�170�34�119�3.54�1
    "Datsun 810"�8129�21�4�2.5�8�2750�184�38�146�3.55�1
    "Fiat Strada"�4296�21�3�2.5�16�2130�161�36�105�3.37�1
    "Honda Accord"�5799�25�5�3.0�10�2240�172�36�107�3.05�1
    "Honda Civic"�4499�28�4�2.5�5�1760�149�34�91�3.30�1
    "Mazda GLC"�3995�30�4�3.5�11�1980�154�33�86�3.73�1
    "Peugeot 604"�12990�14��3.5�14�3420�192�38�163�3.58�1
    "Renault Le Car"�3895�26�3�3.0�10�1830�142�34�79�3.72�1
    "Subaru"�3798�35�5�2.5�11�2050�164�36�97�3.81�1
    "Toyota Celica"�5899�18�5�2.5�14�2410�174�36�134�3.06�1
    "Toyota Corolla"�3748�31�5�3.0�9�2200�165�35�97�3.21�1
    "Toyota Corona"�5719�18�5�2.0�11�2670�175�36�134�3.05�1
    "VW Dasher"�7140�23�4�2.5�12�2160�172�36�97�3.74�1
    "VW Diesel"�5397�41�5�3.0�15�2040�155�35�90�3.78�1
    "VW Rabbit"�4697�25�4�3.0�15�1930�155�35�89�3.78�1
    "VW Scirocco"�6850�25�4�2.0�16�1990�156�36�97�3.78�1
    "Volvo 260"�11995�17�5�2.5�14�3170�193�37�163�2.98�1
    
    . //
    . // Import this file and see if Stata parses it properly.
    . clear
    
    . import delimited using "`temp'", delimiter("`delimiter'")
    (encoding automatically selected: ISO-8859-1)
    (1 var, 75 obs)

    Comment


    • #3
      Originally posted by Mike Lacy View Post
      -import delimited- appears not to work well with delimiters above char(127)
      I think that it has to do more with the vicissitudes of encoding than with the -import delimited- command per se.

      Add the line below in red and it will work well.
      Code:
      // Create a test file with an upper ASCII custom delimiter
      local delimiter = char(171)  
      sysuse auto, clear
      tempfile temp
      export delimited using "`temp'", delimiter("`delimiter'") nolabel datafmt quote replace
      // File looks ok although the delimiter isn't printable.
      type "`temp'"  
      //
      // Import this file and see if Stata parses it properly.
      clear
      
      local delimiter = uchar(0171)
      import delimited using "`temp'", delimiter("`delimiter'")
      browse

      Comment


      • #4
        Thanks to Joseph Coveney here. "Vicissitudes" is right <grin>. I have to say I'm confused by seeing that -export- can properly dereference a local containing a character created with char() and properly insert it in the file (which can be verified with -hexdump-), but that -import- won't pick up that character from a local unless it was created with uchar(). I could understand if char() wouldn't work in either case.

        Comment


        • #5
          Notice that the declared encoding is different between the two cases when -import delimited- executes. I think that when you explicitly make the character Unicode, that prompts -import delimited- to switch to a double-byte encoding. I guess that then it's better able to detect delimiters that lie above ASCII's range.

          Comment

          Working...
          X