Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditioning strpos?

    I am wondering whether it wold be possible to extract information on a number of different words within a string variable in more or less the same command. I have a string containing several city names/places and i want to identify a selection of these in my dataset.

    My variable looks like this (this is the reason for why i need to extract)
    actiongeo_fullname
    Damascus, Dimashq, Syria
    Irbid, Halab, Syria
    Irbid, Halab, Syria
    Irbid, Halab, Syria
    Irbid, Halab, Syria
    Golan Heights, Syria (general), Syria
    Majdal Shams, Al Qunaytirah, Syria
    Majdal Shams, Al Qunaytirah, Syria

    I thought the following command would be helpful, but it only returns the largecity variable=1 for all observations;
    gen largecity = strpos(actiongeo_fullname, "aleppo") | strpos(actiongeo_fullname, "halab") | strpos(actiongeo_fullname, "damascus") | strpos(actiongeo_fullname, "dimashq") | strpos(actiongeo_fullname, "homs")


    Is it the case that I have to make separate variables for each word/city, or is there a way to do this without separating the different places/city names from the string?

    Thanks!

  • #2
    You can use split to make separate variables without affecting your original string variable. Something like that below. Keep in mind that Stata is case-sensitive.

    .ÿversionÿ14.1

    .ÿ
    .ÿclearÿ*

    .ÿsetÿmoreÿoff

    .ÿ
    .ÿinputÿstr244ÿactiongeo_fullname

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿactiongeo_fullname
    ÿÿ1.ÿ"Damascus,ÿDimashq,ÿSyria"
    ÿÿ2.ÿ"Irbid,ÿHalab,ÿSyria"
    ÿÿ3.ÿ"Irbid,ÿHalab,ÿSyria"
    ÿÿ4.ÿ"Irbid,ÿHalab,ÿSyria"
    ÿÿ5.ÿ"Irbid,ÿHalab,ÿSyria"
    ÿÿ6.ÿ"GolanÿHeights,ÿSyriaÿ(general),ÿSyria"
    ÿÿ7.ÿ"MajdalÿShams,ÿAlÿQunaytirah,ÿSyria"
    ÿÿ8.ÿ"MajdalÿShams,ÿAlÿQunaytirah,ÿSyria"
    ÿÿ9.ÿend

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿsplitÿactiongeo_fullname,ÿgenerate(geo)ÿparse(",ÿ")
    variablesÿcreatedÿasÿstring:ÿ
    geo1ÿÿgeo2ÿÿgeo3

    .ÿgenerateÿbyteÿlarge_cityÿ=ÿinlist(lower(geo2),ÿ"aleppo",ÿ"halab",ÿ"damascus",ÿ"dimashq",ÿ"homs")

    .ÿ
    .ÿlist,ÿnoobsÿseparator(0)ÿabbreviate(20)

    ÿÿ+----------------------------------------------------------------------------------------------+
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿactiongeo_fullnameÿÿÿÿÿÿÿÿÿÿÿÿgeo1ÿÿÿÿÿÿÿÿÿÿÿÿÿÿgeo2ÿÿÿÿgeo3ÿÿÿlarge_cityÿ|
    ÿÿ|----------------------------------------------------------------------------------------------|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿDamascus,ÿDimashq,ÿSyriaÿÿÿÿÿÿÿÿDamascusÿÿÿÿÿÿÿÿÿÿÿDimashqÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIrbid,ÿHalab,ÿSyriaÿÿÿÿÿÿÿÿÿÿÿIrbidÿÿÿÿÿÿÿÿÿÿÿÿÿHalabÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIrbid,ÿHalab,ÿSyriaÿÿÿÿÿÿÿÿÿÿÿIrbidÿÿÿÿÿÿÿÿÿÿÿÿÿHalabÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIrbid,ÿHalab,ÿSyriaÿÿÿÿÿÿÿÿÿÿÿIrbidÿÿÿÿÿÿÿÿÿÿÿÿÿHalabÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIrbid,ÿHalab,ÿSyriaÿÿÿÿÿÿÿÿÿÿÿIrbidÿÿÿÿÿÿÿÿÿÿÿÿÿHalabÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ|ÿGolanÿHeights,ÿSyriaÿ(general),ÿSyriaÿÿÿGolanÿHeightsÿÿÿSyriaÿ(general)ÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ0ÿ|
    ÿÿ|ÿÿÿÿMajdalÿShams,ÿAlÿQunaytirah,ÿSyriaÿÿÿÿMajdalÿShamsÿÿÿÿÿAlÿQunaytirahÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ0ÿ|
    ÿÿ|ÿÿÿÿMajdalÿShams,ÿAlÿQunaytirah,ÿSyriaÿÿÿÿMajdalÿShamsÿÿÿÿÿAlÿQunaytirahÿÿÿSyriaÿÿÿÿÿÿÿÿÿÿÿÿ0ÿ|
    ÿÿ+----------------------------------------------------------------------------------------------+

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      I forgot to mention that if you need to scan the first set of words (geo1 above) as well as the second, then you can add the following to what is shown above.
      Code:
      quietly replace large_city = 1 if inlist(lower(geo1), "aleppo", "halab", "damascus", "dimashq", "homs")

      Comment


      • #4
        Thanks a lot Joseph!

        Comment

        Working...
        X