Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split string variable by the last word that meets character length limit

    I am trying to upload a csv on website however, it requires that the character length for the field should not be greater than 500.

    Since the field contains occupations and splitting by the exact character limit would mean trimming occupation names, is there a way to split the field in a manner that follows the character limit while keeping full occupation name? Text example is below:

    Code:
    clear
    input strL job
    Sales/Marketing Officer, Manager/Assistant Manager, Customer Service Officer/Enumerator, Data Entry Operator, Teacher, Accountant/Cashier, Administration/Operations Officer/Clerk, Computer Operator, Receptionist/Front Desk Officer/ Telephone Operator, Supervisor/Controller, Lab Assistant, Software Developer/Graphic Designer/IT Specialist, Doctors/Nurses, Designer, Lawyer, Journalist/Media Officer, Armed Forces - Police, Army, Fireman, Security Guard,etc, Telemarketing Officer/Call Centre Agent, Research and Writing Jobs: Content Writer/Research Assistant/Analyst, Lab Assistant,
    end

  • #2
    Code:
    clear
    input strL job
    "Sales/Marketing Officer, Manager/Assistant Manager, Customer Service Officer/Enumerator, Data Entry Operator, Teacher, Accountant/Cashier, Administration/Operations Officer/Clerk, Computer Operator, Receptionist/Front Desk Officer/ Telephone Operator, Supervisor/Controller, Lab Assistant, Software Developer/Graphic Designer/IT Specialist, Doctors/Nurses, Designer, Lawyer, Journalist/Media Officer, Armed Forces - Police, Army, Fireman, Security Guard,etc, Telemarketing Officer/Call Centre Agent, Research and Writing Jobs: Content Writer/Research Assistant/Analyst, Lab Assistant,"
    end
    
    *CHARACTER LIMIT=100
    gen holding= job
    local i 1
    while !missing(holding){
        quietly{
            gen wanted`i'= substr(holding, 1, 100)
            replace wanted`i'= ustrregexra(wanted`i',"(.*\s.*),([^,]+$)", "$1")
            replace holding= substr(subinstr(holding, wanted`i', "", 1), 2,.)
            local ++i
        }
    }
    Res.:

    Code:
     forval i=1/7{
      2.
    .     l wanted`i', notrim
      3.
    . }
    
                                                                                           wanted1  
      1.   Sales/Marketing Officer, Manager/Assistant Manager, Customer Service Officer/Enumerator  
    
                                                                                              wanted2  
      1.    Data Entry Operator, Teacher, Accountant/Cashier, Administration/Operations Officer/Clerk  
    
                                                                                                  wanted3  
      1.    Computer Operator, Receptionist/Front Desk Officer/ Telephone Operator, Supervisor/Controller  
    
                                                                                                        wanted4  
      1.    Lab Assistant, Software Developer/Graphic Designer/IT Specialist, Doctors/Nurses, Designer, Lawyer,  
    
                                                                                      wanted5  
      1.   Journalist/Media Officer, Armed Forces - Police, Army, Fireman, Security Guard,etc  
    
                                            wanted6  
      1.    Telemarketing Officer/Call Centre Agent  
    
                                                                                         wanted7  
      1.    Research and Writing Jobs: Content Writer/Research Assistant/Analyst, Lab Assistant,  
    
    .
    Last edited by Andrew Musau; 27 May 2022, 03:28.

    Comment


    • #3
      The syntax will work only if the content of the strings using the -input- command are enclosed in double quotes.

      Could it make a difference if you would use -usubstr- and -usubinstr- instead of -substr- and -subinstr- ? I noticed that some strings end with a comma, some not -- how should one change the commands such that ", " will be preserved so that you can add the strings together to get back the original string?

      My solution (obviously not so elegant) was:
      Code:
      clear
      input strL job
      "Sales/Markéting Officer, Manager/Assistant Manager, Customer Service Officer/Enumerator, Data Entry Operator, Teacher, Accountant/Cashier, Administration/Operations Officer/Clerk, Computer Operator, Receptionist/Front Desk Officer/ Telephone Operator, Supervisor/Controller, Lab Assistant, Software Developer/Graphic Designer/IT Specialist, Doctors/Nurses, Designer, Lawyer, Journalist/Media Officer, Armed Forces - Police, Army, Fireman, Security Guard,etc, Telemarketing Officer/Call Centre Agent, Research and Writing Jobs: Content Writer/Research Assistant/Analyst, Lab Assistant, Sales/Marketing Officer, Manager/Assistant Manager, Customer Service Officer/Enumerator, Data Entry Operator, Teacher, Accountant/Cashier, Administration/Operations Officer/Clerk, Computer Operator, Receptionist/Front Desk Officer/ Telephone Operator, Supervisor/Controller, Lab Assistant, Software Developer/Graphic Designer/IT Specialist, Doctors/Nurses, Designer, Lawyer, Journalist/Media Officer, Armed Forces - Police, Army, Fireman, Security Guard,etc, Telemarketing Officer/Call Centre Agent, Research and Writing Jobs: Content Writer/Research Assistant/Analyst, Lab Assistant,"
      "Sales/Marketing Officer, Manager/Assistant Manager, Customer Service Officer/Enumerator,"
      end
      
      tempvar tmpstr
      gen `tmpstr' = job
      local fstr : format `tmpstr'
      
      // CHARACTER LIMIT=500
      local bits = ceil(real(substr("`fstr'",2,strlen("`fstr'")-2))/500)
      forvalues i = 1/`bits' {
         local spacepos = 500 - ustrpos(reverse(usubstr(`tmpstr',1,498))," ,")
         qui gen job_`i' = usubstr(`tmpstr',1,`spacepos'-1)
         qui replace `tmpstr' = usubstr(`tmpstr',`spacepos',ustrlen(`tmpstr'))
      }
      d job_*

      Comment

      Working...
      X