No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate dummies from list of strings

    Dear STATAlisters,

    I have a string variable that tells me what insurances are accepted by individual doctors.
    For example, for the first doctor in the dataset we have that:

    (The "|" are the result of a scraping algorithm developped with another software)

    Each doctor accepts different brand of insurances and a different quantity of insurances.
    I would like to create a dummy variable for every insurance.
    These goals can be reached passing through two different tasks (at least):

    1) presume that the first step should be that of creating the list of all of the possible insurances that are accepted by the doctors in my data set.
    Is there a way other than creating a list by hand?

    2) when I have created this list (or vector) of all of the possible insurances accepted by doctors in the data set, I can create a dummy for each insurance company.
    For example:

    gen Aetna = .
    replace Aetna = 0 if Insurance!="NA"
    replace Aetna = 1 if  regexm(Insurance, "(Aetna)")
    Here I have a curiosity: can I replace Aetna=0 if "insurance does not contain Aetna" with the function regexm?
    Last edited by FLuca; 09 Oct 2019, 09:30.

  • #2
    Cross-posted at Please note our policy on cross-posting, which is explicit in the FAQ Advice, and is that you should tell us about it.

    Regular expressions are great, but very often even more basic string functions are overlooked.

    gen Aetna = strpos(Insurance, "Aetna") > 0
    is a perfectly good way to get an indicator variable.


    • #3
      Thank you Nick Cox for the strpos suggestion on issue 2).
      [I mentioned the cross-posting on stackoverflow, but forgot to do that here, apologies for that]