I still have a question.
As in the table I posted in my previous reply, some variables have names, like dpb1_229, of which prefixes are longer than those, such as a_9_f or a_62_g.
In the code Andrew created, the length of prefixes is "4".
In my real data, I have variables with same prefixes, such as dpb1_170_t/dpb1_170_y or dpb1_178_s/dpb1_178_x/dpb1_178_s, so I included these prefixes in the initial local code "pairs" as follows.
When I ran the program with the length of prefixes 4, I found that Stata could not distinguish some prefixes, such as a_114_ and a_116_, and thus I changed the length of prefixes from 4 to 5.
However, Stata still could not distinguish longer prefixes, such as dpb1_170_ and dpb1_178_, and the result regression included many variables with prefixes dqb1_, drb1_, and dpb1_.
On the other hand, if I changed the length of prefixes from 4 to 8, Stata could not properly handle variables with shorter prefixes, such as a_9_ or a_62_; variables with these prefixes were separately included in regressions.
So, my question is if there are any ways to tell Stata to handle all prefixes listed above properly.
Any comments and suggestions will be highly appreciated.
As in the table I posted in my previous reply, some variables have names, like dpb1_229, of which prefixes are longer than those, such as a_9_f or a_62_g.
In the code Andrew created, the length of prefixes is "4".
Code:
if regexm("`pairs'", substr("`var'", 1, 4)){
Code:
local pairs "a_9_ a_62_ a_76_ a_95_ a97_ a_99_ a_114_ a_116_ a_152_ a_156_ c_275_ c_156_ c_152_ c116_ /// c_99_ c_95_ c_9_ b_325_ b_305_ b_282_ b_163_ b_156_ b_116_ b_114_ b_99_ b_97_ b_95_ 80_ b_77_ b_70_ b_69_ /// b_67_ b_66_ b_45_ b_24_ b_z8_ b_z10_ b_z16_ b_z21_ b_z23_ drb1_233_ drb1_231_ drb1_189_ drb1_181_ drb1_180_ /// drb1_166_ drb1_149_ drb1_142_ drb1_140_ drb1_133_ drb1_120_ drb1_112_ drb1_104_ drb1_98_ drb1_96_ drb1_74_ /// drb1_71_ drb1_70_ drb1_67_ drb1_60_ drb1_57_ drb1_38_ drb1_37_ drb1_31_ drb1_30_ drb1_28_ drb1_26_ drb1_16_ /// drb1_13_ drb1_11_ drb1_10_ drb1_9_ drb1_4_ drb1_z1_ drb1_z16_ drb1_z17_ drb1_z24_ drb1_z25_ dqb1_224_ dqb1_221_ /// dqb1_220_ dqb1_203_ dqb1_197_ dqb1_185_ dqb1_182_ dqb1_167_ dqb1_140_ dqb1_130_ dqb1_126_ dqb1_125_ /// dqb1_116_ dqb1_87_ dqb1_86_ dqb1_74_ dqb1_71_ dqb1_70_ dqb1_57_ dqb1_55_ dqb1_37_ dqb1_30_ dqb1_26_ /// dqb1_9_ dqb1_3_ dqb1_z4_ dqb1_z5_ dqb1_z6_ dqb1_z9_ dqb1_z10_ dqb1_z17_ dqb1_z18_ dqb1_z21_ dqb1_z27_ /// dpb1_9_ dpb1_35_ dpb1_55_ dpb1_65_ dpb1_76_ dpb1_96_ dpb1_170_ dpb1_178_ dpb1_205_"
However, Stata still could not distinguish longer prefixes, such as dpb1_170_ and dpb1_178_, and the result regression included many variables with prefixes dqb1_, drb1_, and dpb1_.
On the other hand, if I changed the length of prefixes from 4 to 8, Stata could not properly handle variables with shorter prefixes, such as a_9_ or a_62_; variables with these prefixes were separately included in regressions.
So, my question is if there are any ways to tell Stata to handle all prefixes listed above properly.
Any comments and suggestions will be highly appreciated.
Comment