That approach would not work because once you keep observations with form1, you will not find any with form3 as they have already been dropped. Your error was that the pattern is missing a right single quote when you refer to j:
If you are going to prune files upfront, you might as well break-up the filename into the parts you wanted from the start. That way you can make sure that all filenames you want to process match your expectations. You could do something like:
Note that there is a limit of 10 (I think) match strings when using inlist() with strings. If you have more, you can make a separate dataset with the list to use and use merge to reduce the observations to those that match the list.
Here's an expanded version of the program that handles the extra part variables:
Code:
keep if strmatch(filename, "*`j'*.txt")
Code:
clear all
filelist, dir("text_files")
* reduce to files with a ".txt" file extension
keep if strmatch(filename, "*.txt")
* split the file name into parts
gen s = subinstr(filename,".txt", "", 1)
split s, parse("_")
rename (`r(varlist)') (id date form)
assert !mi(id, date, form)
* reduce to form1 and form3
keep if inlist(form, "form1", "form3")
Here's an expanded version of the program that handles the extra part variables:
Code:
* code to import one text file program import_txt // move values of interest from variables to locals local dsource = dirname local fsource = filename local id1 = id local date1 = date local form1 = form import delimited using `"`dsource'/`fsource'"', clear stringcols(_all) varnames(nonames) // get the desired info keep if strpos(v1,"name:") gen name = subinstr(v1,"name:","",1) // copy over the file's information gen sourcefile = `"`fsource'"' gen sourcedir = `"`dsource'"' gen id = "`id1'" gen date = "`date1'" gen form = "`form1'" end runby import_txt, by(dirname filename) verbose

Comment