Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing comma separated values within a string variable with letters

    Hello,
    I have a question regarding string variables.

    I have a string variable (calendar) that represents a monthly calendar over a three year period (36 months) for people in my survey. Each month is currently separated by commas. Each number or letter represent a monthly status.

    For example:

    ,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 ,5,5,0,0,0,0,0,0,0,0
    ,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9 ,9,9,9,9,9,9,9,9,9,9

    Eventually I want to have a 36-character string variable so that I can manipulate the data and find out what everyone’s status was at the same month. The variable I want for the two observations above would end up looking like:

    55555555555555555555555555500000000
    99999999999999999999999999999999999


    This is easy enough- I know how to remove commas and trim leading spaces etc.


    My problem is that some of the monthly statuses are coded as numbers above 10, and I need to replace any monthly value above 10 with a single character (probably a letter since numbers 0-9 are already taken) so that every observation’s string variable is the same length. I need to standardize the length of the string like this so that later I can extract the 12th month value across observations and compare them, reshape, etc.

    So, in the following two observations for the calendar variable, I need the code to replace the 14s and 30s with another value, such as “X” and “Y”, respectively:

    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,14,14,14,14,14,14,14
    P,P,P,P,P,P,30,30,30,30,30,30,14,14,14,14,14,14,14 ,14,14,14,14,14,14,14,14,14,14,14,14,B,P,P,P,P


    I need the code to make the above two values become:

    00000000000000000000000000000XXXXXXX
    PPPPPPYYYYYYXXXXXXXXXXXXXXXXXXXBPPPP



    Any help would be greatly appreciated.
    Dana

  • #2
    There are several ways to accomplish this. I would just split the string into several variables, apply the transformations and concatenate everything again.

    Code:
    split var1, parse(,) gen(monthly_)
    
    foreach var of varlist monthly_*{
       replace `var' = strtrim(`var')
    
       replace `var' = "A" if (`var' == 1)
       replace `var' = "B" if (`var' == 2)
       ...
    }
    
    egen my_new_var = concat(monthly_*)

    Comment


    • #3
      If you don't have any missing values in calendar (that is, two commas with nothing between them), and if a space is not a possible value for a monthly status, then you could replace all the commas with spaces, leave the 2-digit numbers as they are, and use the word() function to extract the 36 "words" in the resulting string.

      However, I would be more likely to use the split command to create 36 new variables status1 through status36 each containing the status from a single month.

      Comment


      • #4
        In #2 you would need commands like


        replace `var' = "A" if `var' == "10"
        because each such variable is string and its values can only be strings. Let's imagine a code 10 to A 11 to B and so on. Then you could go


        Code:
        clear 
        input str42 var1 
        "1 ,2 ,3 ,10 ,11 ,12 ,32"
        end 
        
        split var1, parse(,) gen(monthly_)
        
        foreach var of varlist monthly_* {
           replace `var' = word("`c(ALPHA)'", real(`var') - 9) if real(`var') > 9 
        }
        and avoid a series of individual replacements.

        Comment


        • #5
          Thank you so much! These are great suggestions!

          Comment

          Working...
          X