Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting a string using parse

    I have the following variable "location" which contains the city and state name. I need to split it into a new variable which contains only the city name since I already have the state names.
    Click image for larger version

Name:	Screen Shot 2017-08-14 at 6.25.15 PM.png
Views:	1
Size:	42.5 KB
ID:	1406363



    I have used the following command to split the data:

    . split location, parse(,) generate(temp)

    Unfortunately, some of the data points such as the 1st one have additional data (18th and vine) which are not required but result in the split data mismatch (I require Kansas City to be in the first column)
    Click image for larger version

Name:	Screen Shot 2017-08-14 at 6.30.22 PM.png
Views:	1
Size:	26.3 KB
ID:	1406364



    How can I split the data so that I can extract only the city name?

    Thanks

  • #2
    Since you show your data as a screen shot (which in the FAQ you are specifically told not to do) I can't import your example into Stata to test this code. So it may contain errors.

    Code:
    gen rloc = reverse(location)
    split rloc, parse(",") gen(temp)
    replace temp2 = reverse(temp2)
    As I say, not tested because you made it impossible to do so, but I believe this will give you the city in temp2.

    There is probably a way to do this with regular expressions as well, though I don't have a good grasp of those.

    Please before posting again make sure you read FAQ #12 and in the future follow those instructions for posting example data (the -dataex- command) and code or Stata output (code delimiters). These are the optimal ways to show these things and they facilitate easy transfer of information between questioners and responders.

    Comment


    • #3
      I apologize for not posting the data properly. Thank you for your help though.

      Comment


      • #4
        Hi family,
        Please help me resolve this. I am trying to split my data which has multiple responses by using the following commnds:
        split HowcanonegetinfectedwithHI
        split HowcanonegetinfectedwithHI, parse(",") gen (HIVtransmission)
        split HowcanonegetinfectedwithHI, parse(",") gen (HIVmodeoftransmission) notrim
        split HowcanonegetinfectedwithHI, parse(",") gen (HIVmodeoftransmi) limit(8)
        split HowcanonegetinfectedwithHI, parse(",") gen (HIVmodeoftransmi2) destring
        tab HIVmodeoftransmi2

        The results I obtained had one of the responses missing, and this gave me 361 instead of my total sample size of 403:

        HIVmodeoftransmi2 Freq. Percent Cum.

        From infected mother to child 20 5.54 5.54
        Having unprotected sex with an infec.. 228 63.16 68.70
        Kissing an infected person 8 2.22 70.91
        Sharing sharps with an infected person 67 18.56 89.47
        Through curse 18 4.99 94.46
        witchcraft 20 5.54 100.00

        Total 361 100.00

        Comment


        • #5
          I think we need to see the original data that are not shown here by say the results of

          Code:
          tab HowcanonegetinfectedwithHI if missing(HIVmodeoftransmi2), missing

          Comment


          • #6
            Thank you Nick. When I type the command, the missing respondents showed up with their various categories. Is there a way I can now add them up to their respective parents categories in stata before running the analysis?

            Comment


            • #7
              Not trying to be flippant, but the answer to #6 is likely to be Yes, but as in #5 you need to show us those results so that we can advise.

              Comment

              Working...
              X