Hi Everyone, I have a question about syntax. I have a huge text database. I'm searching through hundreds of thousands of entries looking for key words. I want to create indicator variables indicating observations that contain the words and then compile those indicators (using the egen group command) into a variable indicating the key words in each variable. I create sample data and code below. My code creates the indicators that I desire but not the labels. In the code below, I'd like the variable act_industry_A to have the label A and the variable act_Industry_B to have the label B. After grouping, I'd like the variable act_Industry to have labels indicating A, B, A B, or blank. In the code that I created, the label var command does not create a label. I cannot figure out why. Please help if you can. Thanks. Gary
clear
input year str10 title_proper
year title_proper
1900 "z z z A"
1901 "z z z B"
1902 "z z A C"
1903 "z z A B"
1904 "z z z z"
end
global Industries "A B"
foreach I in $Industries {
gen act_Industry_`I' = 1 if regexm(title_proper,"`I'")==1
replace act_Industry_`I' = 0 if act_Industry_`I'==.
label var act_Industry_`I' "`I'"
}
egen act_Industry = group(act_Industry_*), label
br
label list
Note the output from browse is
year title_proper act_Industry_A act_Industry_B act_Industry
1900 z z z A 1 0 1 0
1901 z z z B 0 1 0 1
1902 z z A C 1 0 1 0
1903 z z A B 1 1 1 1
1904 z z z z 0 0 0 0
The indicators are correct, but act_Industry_A and act_Industry_B lack labels. The labels for the last variable are strings of number, but I wanted to create strings of letters (e.g. A, B, A B, or blank).
Thanks
Gary
clear
input year str10 title_proper
year title_proper
1900 "z z z A"
1901 "z z z B"
1902 "z z A C"
1903 "z z A B"
1904 "z z z z"
end
global Industries "A B"
foreach I in $Industries {
gen act_Industry_`I' = 1 if regexm(title_proper,"`I'")==1
replace act_Industry_`I' = 0 if act_Industry_`I'==.
label var act_Industry_`I' "`I'"
}
egen act_Industry = group(act_Industry_*), label
br
label list
Note the output from browse is
year title_proper act_Industry_A act_Industry_B act_Industry
1900 z z z A 1 0 1 0
1901 z z z B 0 1 0 1
1902 z z A C 1 0 1 0
1903 z z A B 1 1 1 1
1904 z z z z 0 0 0 0
The indicators are correct, but act_Industry_A and act_Industry_B lack labels. The labels for the last variable are strings of number, but I wanted to create strings of letters (e.g. A, B, A B, or blank).
Thanks
Gary
Comment