Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing numbers and text from string data

    Hello,

    I am seeking advice regarding removing text and numbers from a string variable, to retain only month or year data. A sample of test data follows. I have read posts suggesting use of regular expressions and other that just use the substr and other commands.

    The basic structure of the data is a name and report creation date; a space and date , time and file size data. This task is complicated by the year being reported in two and four digits. I want to remove the names, retain the creation month and year (ideally as separate variables), and remove the space and all the date/time data (this latter data includes when a file was last opened not when it was necessarily created).

    Any advice is appreciated.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str102 TestData
    "JONES, William 09.07.14 Oth.pdf                                   10/07/2014 1:07:04 PM  259 KB    "  
    "SMITH, Suzanne 15.08.2016. HCR20.pdf                             15/08/2016 4:00:20 PM  1,190 KB  "   
    "BLACK, Merton 29.08.16 FPO.pdf                                       29/08/2016 11:07:20 AM 1,421 KB  "
    "ISAAC, Joseph 03.03.16 FPO.pdf                                      3/03/2016 11:39:12 AM  337 KB    "
    "THORN, Christopher  07.09.2016  FPO.pdf                           7/09/2016 11:11:08 AM  331 KB    "  
    "BORN, Christopher  28.10.2016  FPO.pdf                           28/10/2016 10:02:46 AM 307 KB    "   
    "BORN, Christopher 23.08.2016 HCR20.pdf                           23/08/2016 6:40:02 PM  1,742 KB  "   
    "NEWTON , Teeek  06.08.15 snfp.pdf                                    6/08/2015 1:29:36 PM   330 KB    "
    "WAFER, Flynn  17.02.16 SNFP.pdf                                     17/02/2016 2:34:00 PM  367 KB    "
    "BROWN, Blake  30.7.14 SNFP.pdf                                      31/07/2014 8:27:02 AM  389 KB    "
    end



  • #2
    Thanks for the data example.

    substr() is a function, not a command. In Stata the two classes are distinct and documented separately.

    I can't see what code you tried given the advice you are already aware of, but this may help.
    Code:
    clear 
    
    input str102 TestData
    "JONES, William 09.07.14 Oth.pdf                                   10/07/2014 1:07:04 PM  259 KB    "  
    "SMITH, Suzanne 15.08.2016. HCR20.pdf                             15/08/2016 4:00:20 PM  1,190 KB  "   
    "BLACK, Merton 29.08.16 FPO.pdf                                       29/08/2016 11:07:20 AM 1,421 KB  "
    "ISAAC, Joseph 03.03.16 FPO.pdf                                      3/03/2016 11:39:12 AM  337 KB    "
    "THORN, Christopher  07.09.2016  FPO.pdf                           7/09/2016 11:11:08 AM  331 KB    "  
    "BORN, Christopher  28.10.2016  FPO.pdf                           28/10/2016 10:02:46 AM 307 KB    "   
    "BORN, Christopher 23.08.2016 HCR20.pdf                           23/08/2016 6:40:02 PM  1,742 KB  "   
    "NEWTON , Teeek  06.08.15 snfp.pdf                                    6/08/2015 1:29:36 PM   330 KB    "
    "WAFER, Flynn  17.02.16 SNFP.pdf                                     17/02/2016 2:34:00 PM  367 KB    "
    "BROWN, Blake  30.7.14 SNFP.pdf                                      31/07/2014 8:27:02 AM  389 KB    "
    end
    
    gen TestData2 = substr(TestData, 1, strpos(TestData, "pdf") + 2) 
    gen strDate = word(TestData2, -2)
    gen DDate = daily(strDate, "DMY", 2050) 
    gen Year = yofd(DDate) 
    gen Month = month(DDate) 
    
    l Year Month
    
         +--------------+
         | Year   Month |
         |--------------|
      1. | 2014       7 |
      2. | 2016       8 |
      3. | 2016       8 |
      4. | 2016       3 |
      5. | 2016       9 |
         |--------------|
      6. | 2016      10 |
      7. | 2016       8 |
      8. | 2015       8 |
      9. | 2016       2 |
     10. | 2014       7 |
         +--------------+

    Comment


    • #3
      Nick,

      Many thanks for the elegant code that did what I wanted perfectly. I can now also experiment with the parameters in the first two rows to get a better idea of what they do. My code efforts were to try answers provided to other enquiries on the stata forum. Thanks again, Bob

      Comment

      Working...
      X