Hi,
I have a dataset for firms with different characteristics. One variables was a string variable with many tags divided by a commas.
I created a tag* variables using the following command:
. split tags, p(",")
variables created as string:
tags1 tags2 tags3
Then I used the multencode to create an identifier for each tag:
. multencode tags1-tags3, gen(ntags1-ntags3)
See below the dataex.
Now the tags are not uniforms in number that is some have 49 tags some have 3 tags but all the tags* variables are the same:
1) I would like to have a frequency table of all the different tags* variables I created (not only tag1, tag2, tag 3 seperately but rather how many times robots appear in all the matrix of tags)
2) I would like to create a dummy variable for each tag so that instead of now having a column tag1 which can include different values having a column variable for each company named: tec which equals 1 if tags1="tec" or tags2="tec" or tags3="tec" and so forth for all the tags and all the values.
I created the below example but in reality I have more than 5000 firms and 50 possible tags so doing it manually is impossible
Thanks a lot
I have a dataset for firms with different characteristics. One variables was a string variable with many tags divided by a commas.
I created a tag* variables using the following command:
. split tags, p(",")
variables created as string:
tags1 tags2 tags3
Then I used the multencode to create an identifier for each tag:
. multencode tags1-tags3, gen(ntags1-ntags3)
See below the dataex.
Now the tags are not uniforms in number that is some have 49 tags some have 3 tags but all the tags* variables are the same:
1) I would like to have a frequency table of all the different tags* variables I created (not only tag1, tag2, tag 3 seperately but rather how many times robots appear in all the matrix of tags)
2) I would like to create a dummy variable for each tag so that instead of now having a column tag1 which can include different values having a column variable for each company named: tec which equals 1 if tags1="tec" or tags2="tec" or tags3="tec" and so forth for all the tags and all the values.
I created the below example but in reality I have more than 5000 firms and 50 possible tags so doing it manually is impossible
Thanks a lot
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str2 company str50 tags str6 tags1 str8 tags2 str7 tags3 byte(ntags1 ntags2 ntags3) "a" "tec, medical, COV19" "tec" " medical" " COV19" 7 3 2 "b" "robots, COV19, yeast" "robots" " COV19" " yeast" 6 2 5 "c" "tec, AI, mobile" "tec" " AI" " mobile" 7 1 4 "d" "robots, AI" "robots" " AI" "" 6 1 . end label values ntags1 tags1 label values ntags2 tags1 label values ntags3 tags1 label def tags1 6 "robots", modify label def tags1 7 "tec", modify label def tags1 1 " AI", modify label def tags1 2 " COV19", modify label def tags1 3 " medical", modify label def tags1 4 " mobile", modify label def tags1 5 " yeast", modify
Comment