Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Delete only part of a string variable

    Hi Statalist

    I want to delete all the brackets and "" from the following:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 diagnosis
    `"["99"]"'   
    `"["99"]"'   
    `"["5"]"'    
    `"["5"]"'    
    `"["9"]"'    
    `"["5"]"'    
    `"["5"]"'    
    `"["9"]"'    
    `"["5"]"'    
    `"["5"]"'    
    `"["5"]"'    
    `"["5"]"'    
    `"["5"]"'    
    `"["5"]"'    
    `"["5"]"'    
    `"["6"]"'    
    `"["9"]"'    
    `"["6"]"'    
    `"["6"]"'    
    `"["6"]"'    
    `"["6"]"'    
    `"["6"]"'    
    `"["6"]"'    
    `"["6"]"'    
    `"["6"]"'    
    ""           
    `"["6"]"'    
    `"["1"]"'    
    `"["5"]"'    
    `"["5","6"]"'
    `"["6"]"'    
    `"["6"]"'    
    end
    I want it to be so I only have a numeric variable. I am unsure if it is possible when there are multiple numbers in one. Each number represents a diagnosis.

  • #2
    You cannot have multiple numbers in a single numeric variable. Perhaps this example code will start you in a useful direction.
    Code:
    generate diag = ustrregexra(diagnosis,`"[^\d,]"',"")
    split diag, parse(",") destring
    describe *
    list in 25/l, clean
    Code:
    . describe *
    
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    ------------------------------------------------------------------------------------------------
    diagnosis       str9    %9s                  
    diag            str3    %9s                  
    diag1           byte    %10.0g                
    diag2           byte    %10.0g                
    
    . list in 25/l, clean
    
           diagnosis   diag   diag1   diag2  
     25.       ["6"]      6       6       .  
     26.                          .       .  
     27.       ["6"]      6       6       .  
     28.       ["1"]      1       1       .  
     29.       ["5"]      5       5       .  
     30.   ["5","6"]    5,6       5       6  
     31.       ["6"]      6       6       .  
     32.       ["6"]      6       6       .  
    
    .
    Added in edit: I simplified the ustrregexra() second argument from what I originally posted. This function - admittedly incomprehensible to the novice - deletes every character that is neither a digit nor a comma. You may find the following code easier to follow, and more instructive in Stata basics.
    Code:
    generate diag = diagnosis
    replace diag = subinstr(diag,"[","",.)
    replace diag = subinstr(diag,"]","",.)
    replace diag = subinstr(diag,`"""',"",.)
    split diag, parse(",") destring
    describe *
    list in 25/l, clean
    The results are effectively identical on your example data. The trickiest part of the new code is using compound double quotes to surround the string containing a single double quote - the text shown in blue above. For details on quoting in Stata, see the output of
    Code:
    help quotes
    Last edited by William Lisowski; 30 Nov 2022, 07:49.

    Comment

    Working...
    X