Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing irregular length backslashes in string

    Hello,

    I am trying to remove all of the backslashes of a variable that are located after the last letter. Example below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 hhid str99 spouse_names
    "00030-001" "HABSA MOUSSA BA/////"              
    "00030-002" "SALA KA/////"                      
    "00030-003" ""                                  
    "00030-004" "AISSATA DJIBY SY/////"             
    "00030-005" "FATY  Sy/////"                     
    "00030-006" "DIEYNABA  DIAKHO/////"             
    "00030-007" "DIENABA YERO  KA/HABY DJIBY BA////"
    "00030-008" "FATY YERO BA/////"                 
    "00030-009" "HAWA SAMBA  BA/////"               
    "00030-010" ""                                  
    "00030-011" ""                                  
    "00030-012" "Salamata Dem/////"                 
    "00030-013" "Hawa Thiogo/////"                  
    "00030-014" ""                                  
    "00030-015" "Aissata Ba/////"                   
    "00030-016" "Thioya Diallo/////"                
    "00030-017" "Habsa  Demba Kebe/////"            
    "00030-018" "Maimouna Diallo/////"              
    "00030-019" "Salamata  Ba/////"                 
    "00030-020" "Gueda Diallo/////"                 
    "00030-021" ""                                  
    "00030-022" "Hawa  Sy/////"                     
    "00030-023" "Houleye  Soumare/////"             
    "00030-024" "Fama  Ba/////"                     
    "00030-025" ""                                  
    "00030-026" ""                                  
    "00030-027" "Kadia  DIALLO/////"                
    "00030-028" "Rougui Abou  Sy/////"              
    "00030-029" ""                                  
    "00030-030" "Dieynaba  DIALLO/////"             
    "00030-031" "Aïssata Ardo  DIALLO/////"        
    "00030-032" "Ramata  Awe/////"                  
    "00030-033" "Hawa Aw/////"                      
    "00030-034" "Bolo Sow/////"                     
    "00030-035" "Dieynaba Ka/Salamata Sow////"      
    "00030-036" "Hawa Tamboura/////"                
    "00030-037" "Bolo Ba/////"                      
    "00030-038" "Farmata Thioye/////"               
    "00030-039" "Houleye Diallo/Ramata Ba////"      
    "00030-040" "Haby Ba/////"                      
    "00030-041" "Dieynaba issaka Ba/////"           
    "00030-042" "DIARIATA  AW/////"                 
    "00030-043" "KARDIATOU PATHE DIALLO/////"       
    "00030-044" ""                                  
    "00030-045" "HABY DEMBA DIOUM/////"             
    "00030-047" "RAMATOULAYE DIOUM/////"            
    end
    But the number of backslashes are different across observations so not sure how to set up a strpo command that is consistent across these.
    The "spouse_names" variable was generated by concacenating 6 variables and indicating a "/" should separate names. If an observation did not have 6 names, which is the max number, the backslashes were still added to the end.

    Thanks.

  • #2
    Either

    Code:
    replace spouse_names= ustrregexra(spouse_names, "/", "")
    or

    Code:
    replace spouse_names= subinstr(spouse_names, "/", "",.)
    will do.

    Comment


    • #3
      But this removes the spaces used to seprate the names. I only want to remove the extra spaces at the end.

      Comment


      • #4
        FWIW, these are forward slashes.

        Does this help? I look for the first slash and cut just before.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str9 hhid str99 spouse_names
        "00030-001" "HABSA MOUSSA BA/////"              
        "00030-002" "SALA KA/////"                      
        "00030-003" ""                                  
        "00030-004" "AISSATA DJIBY SY/////"             
        "00030-005" "FATY  Sy/////"                     
        "00030-006" "DIEYNABA  DIAKHO/////"             
        "00030-007" "DIENABA YERO  KA/HABY DJIBY BA////"
        "00030-008" "FATY YERO BA/////"                 
        "00030-009" "HAWA SAMBA  BA/////"               
        "00030-010" ""                                  
        "00030-011" ""                                  
        "00030-012" "Salamata Dem/////"                 
        "00030-013" "Hawa Thiogo/////"                  
        "00030-014" ""                                  
        "00030-015" "Aissata Ba/////"                   
        "00030-016" "Thioya Diallo/////"                
        "00030-017" "Habsa  Demba Kebe/////"            
        "00030-018" "Maimouna Diallo/////"              
        "00030-019" "Salamata  Ba/////"                 
        "00030-020" "Gueda Diallo/////"                 
        "00030-021" ""                                  
        "00030-022" "Hawa  Sy/////"                     
        "00030-023" "Houleye  Soumare/////"             
        "00030-024" "Fama  Ba/////"                     
        "00030-025" ""                                  
        "00030-026" ""                                  
        "00030-027" "Kadia  DIALLO/////"                
        "00030-028" "Rougui Abou  Sy/////"              
        "00030-029" ""                                  
        "00030-030" "Dieynaba  DIALLO/////"             
        "00030-031" "Aïssata Ardo  DIALLO/////"        
        "00030-032" "Ramata  Awe/////"                  
        "00030-033" "Hawa Aw/////"                      
        "00030-034" "Bolo Sow/////"                     
        "00030-035" "Dieynaba Ka/Salamata Sow////"      
        "00030-036" "Hawa Tamboura/////"                
        "00030-037" "Bolo Ba/////"                      
        "00030-038" "Farmata Thioye/////"               
        "00030-039" "Houleye Diallo/Ramata Ba////"      
        "00030-040" "Haby Ba/////"                      
        "00030-041" "Dieynaba issaka Ba/////"           
        "00030-042" "DIARIATA  AW/////"                 
        "00030-043" "KARDIATOU PATHE DIALLO/////"       
        "00030-044" ""                                  
        "00030-045" "HABY DEMBA DIOUM/////"             
        "00030-047" "RAMATOULAYE DIOUM/////"            
        end
        
        gen safecopy = substr(spouse_names, 1, strpos(spouse_names, "/") - 1) 
        
        l spouse_names safecopy 
        
            +-------------------------------------------------------------+
             |                       spouse_names                 safecopy |
             |-------------------------------------------------------------|
          1. |               HABSA MOUSSA BA/////          HABSA MOUSSA BA |
          2. |                       SALA KA/////                  SALA KA |
          3. |                                                             |
          4. |              AISSATA DJIBY SY/////         AISSATA DJIBY SY |
          5. |                      FATY  Sy/////                 FATY  Sy |
             |-------------------------------------------------------------|
          6. |              DIEYNABA  DIAKHO/////         DIEYNABA  DIAKHO |
          7. | DIENABA YERO  KA/HABY DJIBY BA////         DIENABA YERO  KA |
          8. |                  FATY YERO BA/////             FATY YERO BA |
          9. |                HAWA SAMBA  BA/////           HAWA SAMBA  BA |
         10. |                                                             |
             |-------------------------------------------------------------|
         11. |                                                             |
         12. |                  Salamata Dem/////             Salamata Dem |
         13. |                   Hawa Thiogo/////              Hawa Thiogo |
         14. |                                                             |
         15. |                    Aissata Ba/////               Aissata Ba |
             |-------------------------------------------------------------|
         16. |                 Thioya Diallo/////            Thioya Diallo |
         17. |             Habsa  Demba Kebe/////        Habsa  Demba Kebe |
         18. |               Maimouna Diallo/////          Maimouna Diallo |
         19. |                  Salamata  Ba/////             Salamata  Ba |
         20. |                  Gueda Diallo/////             Gueda Diallo |
             |-------------------------------------------------------------|
         21. |                                                             |
         22. |                      Hawa  Sy/////                 Hawa  Sy |
         23. |              Houleye  Soumare/////         Houleye  Soumare |
         24. |                      Fama  Ba/////                 Fama  Ba |
         25. |                                                             |
             |-------------------------------------------------------------|
         26. |                                                             |
         27. |                 Kadia  DIALLO/////            Kadia  DIALLO |
         28. |               Rougui Abou  Sy/////          Rougui Abou  Sy |
         29. |                                                             |
         30. |              Dieynaba  DIALLO/////         Dieynaba  DIALLO |
             |-------------------------------------------------------------|
         31. |          Aïssata Ardo  DIALLO/////     Aïssata Ardo  DIALLO |
         32. |                   Ramata  Awe/////              Ramata  Awe |
         33. |                       Hawa Aw/////                  Hawa Aw |
         34. |                      Bolo Sow/////                 Bolo Sow |
         35. |       Dieynaba Ka/Salamata Sow////              Dieynaba Ka |
             |-------------------------------------------------------------|
         36. |                 Hawa Tamboura/////            Hawa Tamboura |
         37. |                       Bolo Ba/////                  Bolo Ba |
         38. |                Farmata Thioye/////           Farmata Thioye |
         39. |       Houleye Diallo/Ramata Ba////           Houleye Diallo |
         40. |                       Haby Ba/////                  Haby Ba |
             |-------------------------------------------------------------|
         41. |            Dieynaba issaka Ba/////       Dieynaba issaka Ba |
         42. |                  DIARIATA  AW/////             DIARIATA  AW |
         43. |        KARDIATOU PATHE DIALLO/////   KARDIATOU PATHE DIALLO |
         44. |                                                             |
         45. |              HABY DEMBA DIOUM/////         HABY DEMBA DIOUM |
             |-------------------------------------------------------------|
         46. |             RAMATOULAYE DIOUM/////        RAMATOULAYE DIOUM |
             +-------------------------------------------------------------+

        Comment


        • #5
          Yes, I didn't notice intermediate slashes

          "00030-007" "DIENABA YERO KA/HABY DJIBY BA////"
          Try

          Code:
          replace spouse_names= ustrregexra(spouse_names,  "(^.*[a-zA-Z].*[a-zA-Z])(.*)$", "$1")
          Res.:

          Code:
          . l, sep(0)
          
               +--------------------------------------------+
               |      hhid                     spouse_names |
               |--------------------------------------------|
            1. | 00030-001                  HABSA MOUSSA BA |
            2. | 00030-002                          SALA KA |
            3. | 00030-003                                  |
            4. | 00030-004                 AISSATA DJIBY SY |
            5. | 00030-005                         FATY  Sy |
            6. | 00030-006                 DIEYNABA  DIAKHO |
            7. | 00030-007   DIENABA YERO  KA/HABY DJIBY BA |
            8. | 00030-008                     FATY YERO BA |
            9. | 00030-009                   HAWA SAMBA  BA |
           10. | 00030-010                                  |
           11. | 00030-011                                  |
           12. | 00030-012                     Salamata Dem |
           13. | 00030-013                      Hawa Thiogo |
           14. | 00030-014                                  |
           15. | 00030-015                       Aissata Ba |
           16. | 00030-016                    Thioya Diallo |
           17. | 00030-017                Habsa  Demba Kebe |
           18. | 00030-018                  Maimouna Diallo |
           19. | 00030-019                     Salamata  Ba |
           20. | 00030-020                     Gueda Diallo |
           21. | 00030-021                                  |
           22. | 00030-022                         Hawa  Sy |
           23. | 00030-023                 Houleye  Soumare |
           24. | 00030-024                         Fama  Ba |
           25. | 00030-025                                  |
           26. | 00030-026                                  |
           27. | 00030-027                    Kadia  DIALLO |
           28. | 00030-028                  Rougui Abou  Sy |
           29. | 00030-029                                  |
           30. | 00030-030                 Dieynaba  DIALLO |
           31. | 00030-031             Aïssata Ardo  DIALLO |
           32. | 00030-032                      Ramata  Awe |
           33. | 00030-033                          Hawa Aw |
           34. | 00030-034                         Bolo Sow |
           35. | 00030-035         Dieynaba Ka/Salamata Sow |
           36. | 00030-036                    Hawa Tamboura |
           37. | 00030-037                          Bolo Ba |
           38. | 00030-038                   Farmata Thioye |
           39. | 00030-039         Houleye Diallo/Ramata Ba |
           40. | 00030-040                          Haby Ba |
           41. | 00030-041               Dieynaba issaka Ba |
           42. | 00030-042                     DIARIATA  AW |
           43. | 00030-043           KARDIATOU PATHE DIALLO |
           44. | 00030-044                                  |
           45. | 00030-045                 HABY DEMBA DIOUM |
           46. | 00030-047                RAMATOULAYE DIOUM |
               +--------------------------------------------+
          Last edited by Andrew Musau; 26 Aug 2022, 04:34.

          Comment


          • #6
            Oops!

            This is better than my previous:

            Code:
            replace spouse_names = trim(itrim(subinstr(spouse_names, "/", " ", .)))

            Comment


            • #7
              These work well. Thank you.

              Comment


              • #8
                For those who might find this topic at a later date, the regular expression coding in post #6 can be simplified to
                Code:
                replace spouse_names = ustrregexra(spouse_names,"/+$","")
                which will remove a sequence of one or more slashes (/+) that is immediately followed by the end of the string ($).
                Code:
                . list in 34/36, clean
                
                            hhid                   spouse_names  
                 34.   00030-034                  Bolo Sow/////  
                 35.   00030-035   Dieynaba Ka/Salamata Sow////  
                 36.   00030-036             Hawa Tamboura/////  
                
                . replace spouse_names = ustrregexra(spouse_names,"/+$","")
                (37 real changes made)
                
                . list in 34/36, clean
                
                            hhid               spouse_names  
                 34.   00030-034                   Bolo Sow  
                 35.   00030-035   Dieynaba Ka/Salamata Sow  
                 36.   00030-036              Hawa Tamboura

                Comment

                Working...
                X