Dear Statalist,
I am struggling with a specific task of simplifying my data.
Basically, I am working with diseases and for my further research, I would like to group them by specific keywords.
As there are a total of over 150.000 possible diseases in my dataset it is impossible to group them by hand.
So what I would like to do is search within the string value of my disease variable "disease" and replace the value in my "disease_group" variable with the keyword.
Here is an example of my data:
As a remark: the disease_group is missing in all observations at the moment.
I hope to have posted the data in an understandable fashion
As an example, I want to group all "diseases" with the keywords "typhus" (as in A011 - A014) into one "disease_group" with the value "Typhus".
The goal is to break down the diseases to around 100 disease_groups, so I will have to do it several times for different keywords.
I tried the foreach command in combination with lookfor, but didn't really got close to a solution.
I would really appreciate some help and hope to have explained my problem adequately.
With kind regards,
Torben
I am struggling with a specific task of simplifying my data.
Basically, I am working with diseases and for my further research, I would like to group them by specific keywords.
As there are a total of over 150.000 possible diseases in my dataset it is impossible to group them by hand.
So what I would like to do is search within the string value of my disease variable "disease" and replace the value in my "disease_group" variable with the keyword.
Here is an example of my data:
As a remark: the disease_group is missing in all observations at the moment.
I hope to have posted the data in an understandable fashion
Code:
clear input str5 icd10 str222 disease str1 disease_group "A00" "Cholera" "" "A000" "Cholera durch Vibrio cholerae O:1, Biovar cholerae" "" "A001" "Cholera durch Vibrio cholerae O:1, Biovar eltor" "" "A009" "Cholera, nicht näher bezeichnet" "" "A01" "Typhus abdominalis und Paratyphus" "" "A010" "Typhus abdominalis" "" "A011" "Paratyphus A" "" "A012" "Paratyphus B" "" "A013" "Paratyphus C" "" "A014" "Paratyphus, nicht näher bezeichnet" "" "A02" "Sonstige Salmonelleninfektionen" "" "A020" "Salmonellenenteritis" "" "A021" "Salmonellensepsis" "" "A022" "Lokalisierte Salmonelleninfektionen" "" "A028" "Sonstige näher bezeichnete Salmonelleninfektionen" "" "A029" "Salmonelleninfektion, nicht näher bezeichnet" "" "A03" "Shigellose [Bakterielle Ruhr]" "" "A030" "Shigellose durch Shigella dysenteriae" "" "A031" "Shigellose durch Shigella flexneri" "" "A032" "Shigellose durch Shigella boydii" "" "A033" "Shigellose durch Shigella sonnei"
The goal is to break down the diseases to around 100 disease_groups, so I will have to do it several times for different keywords.
I tried the foreach command in combination with lookfor, but didn't really got close to a solution.
I would really appreciate some help and hope to have explained my problem adequately.
With kind regards,
Torben
Comment