Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • strpos and whitespace

    I have a list of names and I want to know how many "words" there are by counting the empty space after trimming the front/back/double spaces.
    I am getting numbers that make no sense to me.The table below is an example of my data. I expect the top name to just have 1 space and the other to have two.
    I manually typed these in to make sure there was no extra space anywhere in the name (before or after)

    Code:
    gen count=.
    forval i=0/10{
    replace count=`i' if strpos(name, " ")==`i'
    }
    name count
    john smith 5
    john jacob smith 5

    I searched for info on strpos but the only whitespace related content I got was php stuff which I didn't quite understand.

    I tested this on both Stata IC 12.1 and 13.1.

  • #2
    you have misunderstood "strpos": it tells you what numbered character is the first of the type you are looking for; here you get "5" because the first four characters are "john" and the fifth is " "

    it is not clear exactly what you are looking for, but see -h wordcount- which may help (but will tell you how many words you have, not how many spaces)

    Comment


    • #3
      Very true. What command would I use if I want to count the instances of whitespace?

      edit: I figured out something that works fine using the wordcount command you suggested.

      Code:
      forval i=0/10{
      replace count=`i' if wordcount(var1)==`i'
      }
      Last edited by Jack Stiles; 23 Jun 2014, 19:33.

      Comment


      • #4
        The command
        gen foo = length(trim(itrim(var1))) - length(subinstr(trim(itrim(var1)))," ","",.)
        will place the number of internal spaces in foo, where multiple consecutive spaces are counted as one. If you truly want to count the instances of internal whitespace (i.e., including tab, newline, etc.), then you'll need to work a bit harder.
        Last edited by Phil Schumm; 23 Jun 2014, 20:15.

        Comment


        • #5
          Originally posted by Jack Stiles View Post
          I have a list of names and I want to know how many "words" there are ...
          Code:
          clear all
          
          input str100 name
          "john"
          "john smith"
          "john jacob smith"
          "mike goodwin"
          "cassiopea"
          "  jeff   ivanoff  "
          end
          
          generate w=wordcount(strtrim(name))
          
          list, clean noobs

          Code:
                            name   w  
                            john   1  
                      john smith   2  
                john jacob smith   3  
                    mike goodwin   2  
                       cassiopea   1  
                jeff   ivanoff     2

          Comment


          • #6
            As others have pointed out, wordcount() (which is a function, not a command) is the most obvious tool here.

            Your comments make me wonder how you are looking for documentation. The on-line help is always the first place to look, and then the manuals. It sounds as if you are just Googling. Looking in turn at

            Code:
            help functions
            help string functions
            would (for example) have made it clear that the function strpos() returns a position, not a count.

            Comment

            Working...
            X