The problem is that the variable ProductSector in #10 is too complicated to transform into a minerals variable like the one you show in #10. In addition to the names of some minerals, these values contain extraneous information like product numbers, and long modifiers. It is going to be difficult or impossible to extract just the name of the primary mineral from these as there is no structured pattern that I can see for finding it. And there are some that name multiple minerals: "HS - 8112 - Beryllium, chromium, germanium, vanadium, gallium, hafnium, indium, niobium (columbium), rhenium and thallium, and articles of these metals, including waste and scrap." What on earth are we to do with that?
The best I can think of is for you to create a new data set containing two variables. The first variable is just the ProductSector variable you already have. Drop all the duplicates Then, by hand, create a second variable, called minerals, that contains the short description you want, like the ones you show in the second example in #10. Save that data set. Then -merge- it with the original data, and drop the ProductSector variable. Then you can use -collapse- to get the aggregated (summed) data you want.
The best I can think of is for you to create a new data set containing two variables. The first variable is just the ProductSector variable you already have. Drop all the duplicates Then, by hand, create a second variable, called minerals, that contains the short description you want, like the ones you show in the second example in #10. Save that data set. Then -merge- it with the original data, and drop the ProductSector variable. Then you can use -collapse- to get the aggregated (summed) data you want.
Comment