Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting Surname when preceded by abbreviations of first and/or middle names

    Hi Statalisters,

    Apologies for this newbie question. I have tried searching for a solution here but have not been able to gain a solution thus far.

    I have an excerpt of my dataset below. They are authors of peer reviewed articles.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str509 author
    "O. B. Tofler and T. L. Woodings"                                                                                                                    
    "S. Hodgins, S. Lovenhag, M. Rehn and K. W. Nilsson"                                                                                                
    "G. E. Vaillant"                                                                                                                                    
    "W. A. Pridemore, S. Tomkins, K. Eckhardt, N. Kiryanov and L. Saburova"                                                                              
    "W. Zheng, J. K. McLaughlin, G. Gridley, E. Bjelke, L. M. Schuman, D. T. Silverman, S. Wacholder, H. T. Co-Chien, W. J. Blot and J. F. Fraumeni, Jr."
    "U. Bauer and A. Hasenohrl"                                                                                                                          
    "D. J. Pittman and R. L. Tate"                                                                                                                      
    "S. Pell and C. A. D'Alonzo"                                                                                                                        
    "J. Norvig and B. Nielsen"                                                                                                                          
    "H. M. Pettinati, A. A. Sugerman, N. DiDonato and H. S. Maurer"                                                                                      
    "A. M. Gallagher, J. M. Savage, L. J. Murray, G. Davey Smith, I. S. Young, P. J. Robson, C. E. Neville, G. Cran, J. J. Strain and C. A. Boreham"    
    "G. E. Vaillant"                                                                                                                                    
    "B. A. Shaw and N. Agahi"                                                                                                                            
    "J. Storbjork and S. Ullman"                                                                                                                        
    "R. J. Goldberg, C. M. Burchfiel, D. M. Reed, G. Wergowske and D. Chiu"                                                                              
    "E. M. Smith and C. R. Cloninger"                                                                                                                    
    "A. Nunomura, T. Shingae, A. Ikeda, K. Ohta and T. Miyagishi"                                                                                        
    "M. A. Schuckit, J. H. Atkinson, P. L. Miller and J. Berman"                                                                                        
    "W. A. Pridemore and M. B. Chamlin"                                                                                                                  
    "N. Rathod, E. Gregory, D. Blows and G. Thomas"                                                                                                      
    end
    I am interested in generating a variable where the string begins with the Surname of the first author (in a list of authors), and so remove any preceding initials (abbreviations of any first and/or middle names).

    For instance, the first observation would now be:

    "Tofler and T. L. Woodings"

    Thank you in advance for your consideration of this query.

    Kareem


    Last edited by Abdul-Kareem Abdul-Rahman; 01 Jul 2016, 06:55.

  • #2
    A first approximation to an appropriate regular expression would appear to be a capital letter followed by one or more lower case letters. This won't work for all possibilities, e.g. O'Brien, von Neumann, Maynard Smith, but it's a start. Here I use moss (SSC).

    Code:
    clear
    input str509 author
    "O. B. Tofler and T. L. Woodings"                                                                                                                    
    "S. Hodgins, S. Lovenhag, M. Rehn and K. W. Nilsson"                                                                                                 
    "G. E. Vaillant"                                                                                                                                     
    "W. A. Pridemore, S. Tomkins, K. Eckhardt, N. Kiryanov and L. Saburova"                                                                              
    "W. Zheng, J. K. McLaughlin, G. Gridley, E. Bjelke, L. M. Schuman, D. T. Silverman, S. Wacholder, H. T. Co-Chien, W. J. Blot and J. F. Fraumeni, Jr."
    "U. Bauer and A. Hasenohrl"                                                                                                                          
    "D. J. Pittman and R. L. Tate"                                                                                                                       
    "S. Pell and C. A. D'Alonzo"                                                                                                                         
    "J. Norvig and B. Nielsen"                                                                                                                           
    "H. M. Pettinati, A. A. Sugerman, N. DiDonato and H. S. Maurer"                                                                                      
    "A. M. Gallagher, J. M. Savage, L. J. Murray, G. Davey Smith, I. S. Young, P. J. Robson, C. E. Neville, G. Cran, J. J. Strain and C. A. Boreham"     
    "G. E. Vaillant"                                                                                                                                     
    "B. A. Shaw and N. Agahi"                                                                                                                            
    "J. Storbjork and S. Ullman"                                                                                                                         
    "R. J. Goldberg, C. M. Burchfiel, D. M. Reed, G. Wergowske and D. Chiu"                                                                              
    "E. M. Smith and C. R. Cloninger"                                                                                                                    
    "A. Nunomura, T. Shingae, A. Ikeda, K. Ohta and T. Miyagishi"                                                                                        
    "M. A. Schuckit, J. H. Atkinson, P. L. Miller and J. Berman"                                                                                         
    "W. A. Pridemore and M. B. Chamlin"                                                                                                                  
    "N. Rathod, E. Gregory, D. Blows and G. Thomas"                                                                                                      
    end
    
    moss author, match("([A-Z][a-z]+)") regex max(1)
    list
    Alternatively,

    Code:
    .  search extrname, stb historical
    
    Search of official help files, FAQs, Examples, SJs, and STBs
    
    STB-13  dm13  . . . . . . . . . . . . . . . . . . . . . Person name extraction
            (help extrname, replstr if installed) . . . . . . . . . . . . W. Gould
            5/93    pp.6--11; STB Reprints Vol 3, pp.25--31

    Comment

    Working...
    X