Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import delimited suddenly doesn't understand dashes

    I've been using a plain text file copied from fbref.com (select "Share and Export -> table as csv") to analyze scores of Major League Soccer all season. I simply use the code

    Code:
    import delimited "~/results.txt", clear
    where the data is stored as "results.txt". Today when I updated the text file, Stata 18 interpreted the dashes in the scores as â. When I used the command chartab, I see that the dashes are being interpreted as

    decimal = 226
    hexidecimal = \u00e2
    character = â
    unique name = LATIN SMALL LETTER A WITH CIRCUMFLEX

    Any ideas why this is suddenly happening and how to fix it? I've been doing this process without problems for months and have not recently updated Stata so I'm at a loss as to why things are different now.

  • #2
    I can't reproduce this problem in Stata 18 or 17. I copied the results to results.csv and let import delimited be as smart as it can.

    chartab is from SSC.

    Code:
    . db import delimited
    
    . import delimited "C:\Users\Laptop\Downloads\results.csv", clear
    (encoding automatically selected: UTF-8)
    (13 vars, 493 obs)
    
    . tab score
    
          Score |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            0–0 |         20       11.17       11.17
            0–1 |         10        5.59       16.76
            0–2 |          4        2.23       18.99
            0–3 |          2        1.12       20.11
            0–4 |          1        0.56       20.67
            1–0 |         20       11.17       31.84
            1–1 |         23       12.85       44.69
            1–2 |         14        7.82       52.51
            1–3 |          3        1.68       54.19
            1–4 |          1        0.56       54.75
            2–0 |         14        7.82       62.57
            2–1 |         21       11.73       74.30
            2–2 |          7        3.91       78.21
            2–3 |          3        1.68       79.89
            3–0 |         10        5.59       85.47
            3–1 |          7        3.91       89.39
            3–2 |          6        3.35       92.74
            3–3 |          1        0.56       93.30
            4–0 |          5        2.79       96.09
            4–1 |          2        1.12       97.21
            4–2 |          1        0.56       97.77
            5–0 |          1        0.56       98.32
            5–1 |          2        1.12       99.44
            6–1 |          1        0.56      100.00
    ------------+-----------------------------------
          Total |        179      100.00
    
    . chartab score
    
       decimal  hexadecimal   character |     frequency    unique name
    ------------------------------------+-----------------------------
            48       \u0030       0     |           107    DIGIT ZERO
            49       \u0031       1     |           127    DIGIT ONE
            50       \u0032       2     |            77    DIGIT TWO
            51       \u0033       3     |            33    DIGIT THREE
            52       \u0034       4     |            10    DIGIT FOUR
            53       \u0035       5     |             3    DIGIT FIVE
            54       \u0036       6     |             1    DIGIT SIX
         8,211       \u2013       –     |           179    EN DASH
    ------------------------------------+-----------------------------
    
                                        freq. count   distinct
    ASCII characters              =             358          7
    Multibyte UTF-8 characters    =             179          1
    Unicode replacement character =               0          0
    Total Unicode characters      =             537          8

    Comment


    • #3
      Thanks for checking Nick. Your multi-version test gave me an idea and I ran the version command. For some reason, my version was set to 10. I have no idea why. When I changed it back to 18, import was able to recognize the dashes again.

      Comment


      • #4
        Good. Thanks for closure.

        Comment

        Working...
        X