Dear all,
I have generated a new variable per nationality in a variable.
I am able to do so using the code
But the above variable can contain multiple nationalities per person, which are divided by a point comma ;.
(for example person A might have 'France; Italy' as a value for the variable DMNationality and person B has 'Italy;France;Canada')
I can parse these with the following code
Which results in the following output
So, the issue I have now is that variables Parse1 - Parse5 can contain the same values (nationalities), but if I would use tabulate & generate for each of the variables, then some variables would actually be about the same nationality.
For example, the variable 'nationality_Parse1_1' created using the variable' Parse1' might be the same as the variable 'nationality_Parse2_13' created using the variable 'Parse3', they might both be France (eg. both variables will have a similar label: 'Parse1 == France' & 'Parse2 == France")
My final goal for these variable creations is to create a Blau diversity index.
Does anyone have a solution for my issue?
For example, is there a way to, based on the last part of the label (eg. == France) to instead of creating a new variable, adapt the existing one?
Thank you in advance for your time and help!
Best regards,
Laura
I have generated a new variable per nationality in a variable.
I am able to do so using the code
Code:
tab DMNationality, generate(nationality)
(for example person A might have 'France; Italy' as a value for the variable DMNationality and person B has 'Italy;France;Canada')
I can parse these with the following code
Code:
split DMNationality, parse(;) generate(DMNAT)
Code:
. split DMNationality, parse(;) generate(Parse) variables created as string: Parse1 Parse2 Parse3 Parse4 Parse5
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str63 DMNationality str37 Parse1 str27(Parse2 Parse3) str20 Parse4 str16 Parse5 "Czech Republic;Germany" "Czech Republic" "Germany" "" "" "" "Czech Republic;Germany" "Czech Republic" "Germany" "" "" "" "Italy;Austria" "Italy" "Austria" "" "" "" "Italy;Austria" "Italy" "Austria" "" "" "" "Austria" "Austria" "" "" "" "" "Austria" "Austria" "" "" "" "" "Austria" "Austria" "" "" "" "" "Austria" "Austria" "" "" "" "" "Germany" "Germany" "" "" "" "" "Germany" "Germany" "" "" "" "" "Austria;Germany" "Austria" "Germany" "" "" "" "Austria" "Austria" "" "" "" "" "Austria" "Austria" "" "" "" "" "Austria" "Austria" "" "" "" "" "Austria" "Austria" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland;Austria;Australia" "Switzerland" "Austria" "Australia" "" "" "Switzerland;Austria;Australia" "Switzerland" "Austria" "Australia" "" "" "Austria" "Austria" "" "" "" "" "Austria" "Austria" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Switzerland" "Switzerland" "" "" "" "" "Austria;Germany" "Austria" "Germany" "" "" "" "Germany" "Germany" "" "" "" "" end
So, the issue I have now is that variables Parse1 - Parse5 can contain the same values (nationalities), but if I would use tabulate & generate for each of the variables, then some variables would actually be about the same nationality.
For example, the variable 'nationality_Parse1_1' created using the variable' Parse1' might be the same as the variable 'nationality_Parse2_13' created using the variable 'Parse3', they might both be France (eg. both variables will have a similar label: 'Parse1 == France' & 'Parse2 == France")
My final goal for these variable creations is to create a Blau diversity index.
Does anyone have a solution for my issue?
For example, is there a way to, based on the last part of the label (eg. == France) to instead of creating a new variable, adapt the existing one?
Thank you in advance for your time and help!
Best regards,
Laura

Comment