Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sub in string within same variable

    Hi all,

    Code:
    input str11 onetSocCode
    "53-5011.00"
    "53.5011.00"
    "53501100"
    "53501100"
    "11-2021.00"
    "11202100"
    The data are as above.

    However, ideally, I would want:

    Code:
    "53-5011.00"
    "53-5011.00"
    "53-5011.00"
    "53-5011.00"
    "11-2021.00"
    "11-2021.00"
    So there has to be a "-" after the two first digits, and a "." before the two last digits everywhere, i.e. for all observations.

    In this variable, some observations are already correctly specified, e.g. 53-5011.00, and some are not, e.g. 53501100.

    How can I add "-" and "." where they should be, wherever applicable?

  • #2
    There is probably some clever way to do this with regular expressions. But here's what I would do:
    Code:
    replace onetSocCode = subinstr(onetSocCode, "-", "", .)
    replace onetSocCode = subinstr(onetSocCode, ".", "", .)
    replace onetSocCode = substr(onetSocCode, 1, 2) + "-" + substr(onetSocCode, 3, 4) ///
        + "." + substr(onetSocCode, 7, 2)
    Two steps backward, one step forward!

    Comment


    • #3
      I was about to post the same solution that Clyde Schechter posted in #2, but he beat me to it. In my original code, I had n=2 as the final argument in the subinstr() function, but I noticed Clyde has the system missing indicator . in that position. That led me back to the documentation where I found this statement:

      If n is missing, all occurrences are replaced
      As an old SPSSer, I first thought that meant that if I omit entirely the fourth argument, all occurrences would be replaced. But I found through trial and error that it means I must set the 4th argument to a missing value, such as ., but I can also plug in .a, .b, .c, etc.

      I also noted in my code that (I believe) it assumes all of the strings are the same length. If that is not so, you'll need to make your code a bit more complicated, probably using the strlen() function.

      In hopes that it might help someone, here is my version of the code with missing values as the final n argument in subinstr(). Thanks for educating me once again, Clyde.

      Code:
      clear
      input str11 onetSocCode
      "53-5011.00"
      "53.5011.00"
      "53501100"
      "53501100"
      "11-2021.00"
      "11202100"
      end
      clonevar Copy = onetSocCode
      * First, remove - and . from all rows.
      * From the documentation for subinstr(s1,s2,s3,n):
      * "If n is missing, all occurrences are replaced".
      * This means n must be a missing value, not that
      * it can be omitted from the command.
      replace Copy = subinstr(Copy,"-","",.a)
      replace Copy = subinstr(Copy,".","",.z)  
      list, clean
      * The following assumes all strings are the same length.
      * If this is not correct, you'll have to make the code
      * more flexible, using strleng(s), I should imagine.
      replace Copy = substr(Copy,1,2) + "-" + substr(Copy,3,4) + "." ///
      + substr(Copy,7,2)
      list, clean
      Output from the final list command:
      Code:
      . list, clean
      
             onetSocC~e         Copy  
        1.   53-5011.00   53-5011.00  
        2.   53.5011.00   53-5011.00  
        3.     53501100   53-5011.00  
        4.     53501100   53-5011.00  
        5.   11-2021.00   11-2021.00  
        6.     11202100   11-2021.00
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        This is more by way of fun in showing a quite different solution than by way of showing anything better.

        Code:
        clear
        input str11 onetSocCode
        "53-5011.00"
        "53.5011.00"
        "53501100"
        "53501100"
        "11-2021.00"
        "11202100"
        end
        
        destring onetSocCode, ignore("-.") gen(work)
        
        gen wanted = strofreal(floor(work/1e6)) + "-"
        replace wanted = wanted + strofreal(floor(mod(work, 1e6)/100)) + "."
        replace wanted = wanted + strofreal(mod(work, 100), "%02.0f")
        
        list, sep(0)
        
            +------------------------------------+
             | onetSocC~e       work       wanted |
             |------------------------------------|
          1. | 53-5011.00   53501100   53-5011.00 |
          2. | 53.5011.00   53501100   53-5011.00 |
          3. |   53501100   53501100   53-5011.00 |
          4. |   53501100   53501100   53-5011.00 |
          5. | 11-2021.00   11202100   11-2021.00 |
          6. |   11202100   11202100   11-2021.00 |
             +------------------------------------+
        Last edited by Nick Cox; 22 Oct 2023, 12:29.

        Comment


        • #5
          My version. A bit literal one. Nothing as creative as Nick Cox's version. Designed only to work in a use case with the existing 6 observations.

          Code:
          tempvar x
          gen `x' = ustrregexra(onetSocCode,"-|\.","")
          gen wanted = substr(`x',1,2)+ "-" + substr(`x',3,4) + "." + substr(`x',-2,.)
          Last edited by Girish Venkataraman; 22 Oct 2023, 15:16. Reason: Irksome grammatical error corrected.

          Comment


          • #6
            Code:
            clear
            input str11 onetSocCode
            "53-5011.00"
            "53.5011.00"
            "53501100"
            "53501100"
            "11-2021.00"
            "11202100"
            end
            
            gen wanted= ustrregexra(ustrregexra(onetSocCode, "[^\d]", ""), "(\d{2})(\d{4})(\d{2})", "$1-$2\.$3")
            Res.:

            Code:
            . l, sep(0)
            
                 +-------------------------+
                 | onetSocC~e       wanted |
                 |-------------------------|
              1. | 53-5011.00   53-5011.00 |
              2. | 53.5011.00   53-5011.00 |
              3. |   53501100   53-5011.00 |
              4. |   53501100   53-5011.00 |
              5. | 11-2021.00   11-2021.00 |
              6. |   11202100   11-2021.00 |
                 +-------------------------

            Comment


            • #7
              #5 is excellent.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                #5 is excellent.
                Made my day!!

                Comment


                • #9
                  Thanks a lot everyone! Really insightful, and it taught me a lot about various commands!

                  Comment


                  • #10
                    #6 is excellent too!

                    The moral is: Get friendly with functions!

                    Comment

                    Working...
                    X