Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep observations with matching keyword in string variable text

    I would like to restrict my analysis to variables with a particular keyword in text.

    The dates are dates when a particular food combination was taken.

    I want to restrict my analysis to observations with any food combination which has "FF" so long as this was taken in 2014and beyond.


    keep if ...............(food item 1 or food item 2 or food 3 have "FF") & (date for food item with "FF" is >= 31/12/2013)

    An example below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str36(ID FOOD_1  FOOD_2 FOOD_3) long(date_1 date_2 date_3)
    1 "FF_TO_SS"  "BUSY"        "FF_GA_SS"      17342 18216 18596
    2 "TO_EA_EA"  ""                 ""                 19044     .     .
    3 "TO_EA_SS"  "BUSY"        "TO_EA_SS"      19970 20495 20507
    4 "FF_GA"      "TO_EA_SS"      "BUSY"        17609 17784 18226
    5 "FF_GA_SS"  "BUSY"        "FF_GA_LP"     16138 16272 16457
    6 "TO_EA_SS"  "BUSY"        "TO_EA_SS"      20677 20794 21018
    7  "FF_GA_SS"  "TO_EA_SS"      ""                 15805 19940     .
    8 "TO_EA_SS"  "BUSY"        ""                 18788 21300     .
    9 "TO_EA_SS"  "BUSY"        "FF_GA_SS"      17981 18167 18444
    10 "FF_GA_SS"  "TO_EA_SS"      "BUSY"        17804 18226 18721
    11 "FF_GA_SS"  "TO_EA_SS"      ""                 16120 20465     .
    12 "TO_EA_SS"  "BUSY"        "TO_EA_SS"      17721 17916 18113
    13 "FF_GA_NA"  "GA_TO_NA"      ""                 16437 21074     .
    14 "TO_EA_SS"  ""                 ""                 19753     .     .
    15 "TO_EA_SS"  "GA_LP_ABC"     ""                 19100 19711     .
    16 "FF_GA_SS"  "LP_TO_EA"     "LP_TO_AT_EA" 17792 18568 19436
    17 "TO_EA_SS"  ""                 ""                 19955     .     .
    18 "FF_GA_SS"  "GA_TO_SS"      "TO_EA_SS"      17735 18975 19872
    19 "TT_GA_SS"  "FF_GA_SS"      "TO_EA_SS"      17006 17123 18080

  • #2
    As with so many things in Stata this problem appears complicated only because you are working with your data in wide layout. If you switch to long, it is easy:

    Code:
    reshape long FOOD_ date_, i(ID) j(_j)
    by ID, sort: egen condition = max(strpos(FOOD_, "FF") > 0 & year(date_) >= 2014 ///
        & !missing(date_))
    From there you can either -keep if condition-, or you can do your analysis qualified by -if condition-.

    If there is some compelling reason you need to go back to wide layout, you can do that with just -reshape wide-. But think twice before doing that. There are only a small number of things in Stata that are easier to do with wide data. Unless you know you're going to do one of them, stick with long.

    Comment

    Working...
    X