Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with strings containing multiple values. Need to replace the values and count the number of records (not counting any record twice)

    I am a Salesforce admin and a Stata novice. A solution to my problem would help working with Salesforce multi-select fields and any similar strings containing multiple values.

    I put together an example to explain my problem better:

    record number - value - result I need to get and keep

    1 - a,b,c - book
    2 - c,b,d - book
    3 - b,c - magazine
    4 - c - newspaper

    When I used the code below, I did not get the correct count, since all values were counted every time they appeared in each string:

    replace value = "book" if strpos(value, "a") | if strpos(value, "d")
    replace value = "magazine" if strpos(value, "b")
    replace value = "newspaper" if strpos(value, "c")

    If I drop each value after it gets counted, I get the correct counts for all observations, but almost the whole dataset gets deleted!

    The preserve and restore commands do not work, any changes made to the data after preserve get undone by restore.

    Seems like some kind of a "for" loop would work for this task if I could loop through the data without deleting them, but I don't know how to write it.

    Your help would be greatly appreciated!

    Dina

  • #2
    So what you need is to make your -replace- statements hierarchical, so that they do not overwrite the previous results:

    Code:
    gen value = "book" if strpos(value, "a") | if strpos(value, "d")
    replace value = "magazine" if strpos(value, "b") & missing(value)
    replace value = "newspaper" if strpos(value, "c") & missing(value)
    This way, once you pick a "value" in one command , it will not be overwritten by the later commands which are lower in the hierarchy.

    You have already noticed that -preserve- and -restore- do not serve this purpose. They are specifically for the purpose of doing things that require changes to the data along the way, but that need to then be "undone." Their effect is, in a loose sense, the exact opposite of what you need here.
    Last edited by Clyde Schechter; 19 Apr 2016, 12:51. Reason: Add bold-face to salient parts of code3.

    Comment


    • #3
      I don't understand this at all. It would help to give literal examples of your data, not schematic examples. Alternatively if the words "book" "magazine" "newspaper" really do occur in your data, then you just need to look for them.

      Similarly, you should give exact code, not abstract code.

      Code:
      ... if strpos(value, "a") | if strpos(value, "d")
      would be an illegal statement as the code can only include one if, so I guess that you are rewriting what you had.

      More crucially, perhaps, looking for any single letter using strpos() would return true if the letter is found anywhere in the text. That seems likely to be fatal to your aim.

      Complaining that
      restore undoes preserve is puzzling, as that's exactly what it is supposed to do, but as you don't give any examples of using it I can't make a more positive suggestion.

      Comment


      • #4
        Nick correctly identifies a syntax error in the first -replace- statement which I overlooked and replicated in my corresponding -gen- statement.

        I'm guessing that Dina's use of "a", "b", etc. may be literal, or she is just using these as examples and really refer to more complex expressions in her data.

        Comment


        • #5
          Thank you Clyde and Nick! The syntax error was a mistake in my post; I did not have it in the code. Clyde was right, I used a,b,c as examples, since I did not want to copy my data - would be too cumbersome,

          Clyde, you solved my problem, thank you so much! Here is what I did wrong (now I am using a part of my real code) if you care to know:

          My first try:
          gen source = sourceofmonthlyincome
          replace source = "Employment" if strpos(sourceofmonthlyincome, "Full-time") | strpos(sourceofmonthlyincome, "Part-time")
          replace source = "Retirement" if strpos(sourceofmonthlyincome, "Retirement")
          etc.....

          Since the calculations were incorrect, I did this (second try):
          gen source = sourceofmonthlyincome
          replace source = "Employment" if strpos(sourceofmonthlyincome, "Full-time") | strpos(sourceofmonthlyincome, "Part-time")
          drop if strpos(sourceofmonthlyincome, "Full-time") | strpos(sourceofmonthlyincome, "Part-time")
          replace source = "Retirement" if strpos(sourceofmonthlyincome, "Retirement")
          drop if strpos(sourceofmonthlyincome, "Retirement")
          etc...

          When I dropped the generated values, I got the correct counts, but lost my data :-)

          When I looked at your code above, I realized that instead of replacing values in the original field "sourceofmonthlyincome", I should have replaced them in the "source"... This worked:

          gen source = sourceofmonthlyincome
          replace source = "Employment" if strpos(source, "Full-time") | strpos(source, "Part-time")
          replace source = "Retirement" if strpos(source, "Retirement")
          etc...

          Thanks again,
          Dina

          Comment

          Working...
          X