I've got a bundle of about 1200 Excel files that should have uniform column headers (and thus variable names upon import), but don't. That's too many for me to manually review and fix, so I'd like to try to semi-automate this process.
My pseudocode is something like this:
1) Read in first Excel file, having verified that the variables are as I want them to be.
2) Save as my destination .dta file
3) Create a local macro ("approved_vars") containing the varlist (*)
4) Read next Excel file into memory
5) Tokenize the varlist (of Excel file #2) and loop through the tokens, checking each against the approved varlist
6) Where a variable from Excel file 2...n is missing from the approved varlist, pause and report back both the erroneous variable and the source file
7) I can then investigate the erroneous variable and source file and make modifications either to my subsequent cleaning code or to the file itself
8) Resume
The problem is, I'm trying to use -strpos- to verify the presence or absence of the token within the approved_vars local macro, and it's not behaving as expected. My code is below. (I have elided the reporting of the filename because right now I can't figure out why the strpos is failing.)
It just occurred to me that the length of the local macro approved_vars might be too long - it's 279 characters - but I just ran it again with a truncated version of the macro that only had 200 chars. and it still failed in the same way. Screenshot of the output is below:

My pseudocode is something like this:
1) Read in first Excel file, having verified that the variables are as I want them to be.
2) Save as my destination .dta file
3) Create a local macro ("approved_vars") containing the varlist (*)
4) Read next Excel file into memory
5) Tokenize the varlist (of Excel file #2) and loop through the tokens, checking each against the approved varlist
6) Where a variable from Excel file 2...n is missing from the approved varlist, pause and report back both the erroneous variable and the source file
7) I can then investigate the erroneous variable and source file and make modifications either to my subsequent cleaning code or to the file itself
8) Resume
The problem is, I'm trying to use -strpos- to verify the presence or absence of the token within the approved_vars local macro, and it's not behaving as expected. My code is below. (I have elided the reporting of the filename because right now I can't figure out why the strpos is failing.)
Code:
local approved_vars "num_att_reg name_full num_phone_topay name_topay name_num_lookup name_num_match_conf sex zone_name county_name school_name attendee_title presence_day1am presence_day1pm presence_day2am presence_day2pm reimbursement_per_sess reimbursement_travel reimbursement_tot paylater source" local current_vars "num_att_reg name_full num_phone motorcar houseboat" // These are nonsense variables to test this loop local token_ct = 5 tokenize "`current_vars'" foreach x of num 1/`token_ct'{ di as error "``x''" if strpos("``x''","`approved_vars'")==0 { di as input "Searching for [``x''] in [`approved_vars'] was unsuccessful." pause "The variable being tested is [``x'']: How will you rename them?" } else if strpos("``x''","`approved_vars'")!=0 di as result "[``x''] was found in [`approved_vars']." }
Comment