From the following data if the value of probability_weight for unique cpc4 exceeds or equals 0.5 then i need to create final variable naics07_final from the naics07_02 column's value for that observation.
NAICS represents different US industries. The issue is that for NAICS, codes 31-33 represent manufacturing, 44-45 represent retail, and 48-49 represent transportation. For each type of CPC4 observation, I intend to sum their probability weights in cases that fall into either 31-33, 44-45 or 48-49. In that case, it will be easier for me to understand if the probability_weight for that specific CPC4 exceeds my threshold of 0.5 or not in a unique NAICS category. Can anyone tell me how I can do this?
My initial idea of coding is the following but that doesn't sum up the probability across 31-33, 44-45 and 48-49
NAICS represents different US industries. The issue is that for NAICS, codes 31-33 represent manufacturing, 44-45 represent retail, and 48-49 represent transportation. For each type of CPC4 observation, I intend to sum their probability weights in cases that fall into either 31-33, 44-45 or 48-49. In that case, it will be easier for me to understand if the probability_weight for that specific CPC4 exceeds my threshold of 0.5 or not in a unique NAICS category. Can anyone tell me how I can do this?
Code:
* Define labels for industries label define industry_labels /// 11 "Agriculture, Forestry, Fishing and Hunting" /// 31 "Manufacturing" /// 32 "Manufacturing" /// 33 "Manufacturing" /// 42 "Wholesale Trade" /// 44 "Retail Trade" /// 45 "Retail Trade" /// 48 "Transportation and Warehousing" /// 49 "Transportation and Warehousing" /// end
Code:
gen naics07_final = . replace naics07_final = naics07_2 if probability_weight >= 0.5 & cpc4 != "."
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str4 cpc4 byte naics07_2 float probability_weight "A01B" 11 1 "A01C" 11 1 "A01D" 11 .1844643 "A01D" 23 .8155357 "A01F" 11 .4023609 "A01F" 23 .5976391 "A01G" 11 .8568231 "A01G" 22 .0223281 "A01G" 23 .1208488 "A01H" 11 .097597 "A01H" 31 .9024029 "A01J" 31 1 "A01K" 11 .9318168 "A01K" 23 .0312263 "A01K" 31 .0369569 "A01L" 32 .9562588 "A01L" 33 .0437413 "A01M" 11 .787376 "A01M" 23 .1239584 "A01M" 32 .0886656 "A01N" 11 .1843447 "A01N" 31 .0819803 "A01N" 32 .733675 "A21B" 31 1 "A21C" 31 1 "A21D" 31 1 "A22B" 11 .0859366 "A22B" 31 .8831139 "A22B" 33 .0309495 "A22C" 11 .2808393 "A22C" 31 .7191607 "A23B" 11 .2927963 "A23B" 31 .7072037 "A23C" 11 .0557356 "A23C" 31 .9442644 "A23D" 11 .1084658 "A23D" 31 .8915342 "A23F" 31 1 "A23G" 31 1 "A23J" 11 .1541328 "A23J" 31 .8458672 "A23K" 11 .3007674 "A23K" 31 .6992326 "A24C" 33 .0269198 "A24D" 31 .9561672 "A24D" 32 .0438328 "A24F" 31 1 "A41B" 31 .9496588 "A41B" 32 .0503412 "A41C" 31 .8472361 "A41C" 32 .1527639 "A41D" 31 1 "A41F" 31 .9394662 "A41F" 32 .0316944 "A41F" 33 .0288394 "A41G" 32 .9763525 "A41G" 33 .0236475 "A41H" 31 .7940938 "A41H" 32 .047757 "A41H" 33 .1581493 end
Comment