Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split Column

    Dear Statalist, i am facing situation in the below data. i want to generate two new variables in the daily date formate (named: "from" and "to") from the below string variable "Fn04003".

    is there any way to handle the non-identified characters �� and then split the column in two separate columns ?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str46 Fn04003
    "2008-03-21��2008-10-28"
    ""                      
    ""                      
    ""                      
    "2009-05-26"            
    ""                      
    "2008-03-21��2008-10-28"
    "2008-03-21��2008-10-28"
    "2008-03-21��2008-10-28"
    "2009-06-26��2009-06-29"
    end

  • #2
    how about,
    Code:
    gen from = date(substr(Fn04003,1,10),"YMD")
    gen to = date(substr(Fn04003,-10,10),"YMD") if length(Fn04003)>10

    Comment


    • #3
      Wow, magical. the code worked perfectly. thanks Øyvind Snilsberg. Below is the outcome


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str46 Fn04003 float(from to)
      "2008-03-21��2008-10-28" 17612 17833
      ""                           .     .
      ""                           .     .
      ""                           .     .
      "2009-05-26"             18043     .
      ""                           .     .
      "2008-03-21��2008-10-28" 17612 17833
      "2008-03-21��2008-10-28" 17612 17833
      "2008-03-21��2008-10-28" 17612 17833
      "2009-06-26��2009-06-29" 18074 18077
      end
      format %tdCCYY-NN-DD from
      format %tdCCYY-NN-DD to

      Comment


      • #4

        Oyvind Snilsberg's code worked fine. On how to identify odd characters, consider this dialogue with the example data.

        Code:
        . charlist Fn04003
        -01235689���
        
        . 
        . return li 
        
        macros:
                      r(chars) : "-01235689���"
                   r(sepchars) : "- 0 1 2 3 5 6 8 9 � � � "
                      r(ascii) : "45 48 49 50 51 53 54 56 57 189 191 239 "
        
        . 
        . chartab Fn04003
        
           decimal  hexadecimal   character |     frequency    unique name
        ------------------------------------+---------------------------------------
                45       \u002d       -     |            22    HYPHEN-MINUS
                48       \u0030       0     |            33    DIGIT ZERO
                49       \u0031       1     |             8    DIGIT ONE
                50       \u0032       2     |            22    DIGIT TWO
                51       \u0033       3     |             4    DIGIT THREE
                53       \u0035       5     |             1    DIGIT FIVE
                54       \u0036       6     |             4    DIGIT SIX
                56       \u0038       8     |            12    DIGIT EIGHT
                57       \u0039       9     |             4    DIGIT NINE
            65,533       \ufffd       �     |            10    REPLACEMENT CHARACTER
        ------------------------------------+---------------------------------------
        
                                            freq. count   distinct
        ASCII characters              =             110          9
        Multibyte UTF-8 characters    =               0          0
        Unicode replacement character =              10          1
        Total Unicode characters      =             120         10
        charlist from SSC is an old (2002 original) ado that still sometimes is useful. It doesn't really understand anything but up to ASCII 256. It is fine for identifying ASCII 160, which is a common nuisance character.

        chartab from SSC (Robert Picard) is the better and more versatile tool that understands Unicode.

        Comment

        Working...
        X