I'm trying to determine what causes Stata to crash in the middle of a repeated regular expression search. I do mean
"crash," as in it suddenly shuts down and I get an op. system message indicating that this happened, and I have
to restart Stata. This occurs in the context of what was supposed to be some time trials with -fileread()-,
strLs, and regular expressions, but that may not be the essence of the problem.
I'll give a high level algorithm version of what I'm doing, some indications of what I've observed, and some
actual code that, on my hardware/software, produces the crash. I'm running 64 bit Stata MP2 13.1 on a Windows
machine. I know there's a lot of details below, but perhaps at least someone might have ideas for what further
observations might be relevant to diagnosis.
Comments:
1. The crash always occurs for me within the while loop that does the "search" step, but not always at the same place, and not at the same value of the loop counter stuck in the while loop. This might be wrong, though, since all I have to work with what gets retained in the log file before the crash. The problem area is only about the last 10 lines; see "crash" below. Everything else, more or less, is of interest only to reproduce the problem in a self-contained chunk of code.
2. The forvalues loop serves only to keep going until a crash occurs. In the real situation, the "make example ascii data" step was inside the forval loop, and I incremented the number of records at each trial, but that complexity is not necessary to create the crash on my machine.
3. I thought perhaps there was some sort of hardware or op. system related timing issue, so I have tried putting a
delay (sleep 100) in the while loop, but this does not prevent the crash.
4. I don't see anything strange in the ascii data file.
Regards, Mike
"crash," as in it suddenly shuts down and I get an op. system message indicating that this happened, and I have
to restart Stata. This occurs in the context of what was supposed to be some time trials with -fileread()-,
strLs, and regular expressions, but that may not be the essence of the problem.
I'll give a high level algorithm version of what I'm doing, some indications of what I've observed, and some
actual code that, on my hardware/software, produces the crash. I'm running 64 bit Stata MP2 13.1 on a Windows
machine. I know there's a lot of details below, but perhaps at least someone might have ideas for what further
observations might be relevant to diagnosis.
Code:
Algorithm: Make simulated ascii text containing U.S. phone number strings, and save it to a file. forval trials = 1/3 { Get this file into a strL with fileread() while not all such phone number strings have been found { Search for and record any string that matches the pattern of a phone number. } }
1. The crash always occurs for me within the while loop that does the "search" step, but not always at the same place, and not at the same value of the loop counter stuck in the while loop. This might be wrong, though, since all I have to work with what gets retained in the log file before the crash. The problem area is only about the last 10 lines; see "crash" below. Everything else, more or less, is of interest only to reproduce the problem in a self-contained chunk of code.
2. The forvalues loop serves only to keep going until a crash occurs. In the real situation, the "make example ascii data" step was inside the forval loop, and I incremented the number of records at each trial, but that complexity is not necessary to create the crash on my machine.
3. I thought perhaps there was some sort of hardware or op. system related timing issue, so I have tried putting a
delay (sleep 100) in the while loop, but this does not prevent the crash.
4. I don't see anything strange in the ascii data file.
Regards, Mike
Code:
// Self-contained example // Make example data with phone numbers log using c:/temp/test.txt, text replace set seed 774656 // gives a crash with reps = 1e3 local nrec = 1e3 // gives about a 60K data file clear qui set obs `nrec' local D "strofreal(floor(runiform()*10))" // syntax for a single random digit local text = " The quick brown fox jumped over the lazy dog." + char(13) + char(10) // text, cr/lf gen strL s = "(" + `D' + `D' + `D' + ")" + `D' + `D' + `D' + /// "-" + `D' + `D' + `D' + `D' + "`text'" compress tempfile temp gen b = filewrite("`temp'", s,2) // Each observation gets appended to an ascii file keep in 1 // Don't need the whole file anymore, just one record and a strL // `temp' is as an ascii file with `nrec' records. Each line as follows, but with random digits: // (999)-999-9999 The quick brown fox jumped over the lazy dog. cr/lf // // Now, start the processing phase. // Bring the ascii file into a strL, look for phone numbers, and keep doing // this until Stata crashes. 3 trials almost always does it on my machine . gen strL phone = "" // will hold phone numbers gen str mstring = "" // will hold a string that matches a phone # pattern gen byte match = . set tr on forval trials = 1/3 { qui replace s = fileread("`temp'") // read the file local D3 = "[0-9]" * 3 // 3 digits for a reg exp. local D4 = "[0-9]" * 4 // 4 digits replace match = regexm(s, "\(`D3'\)`D3'\-`D4'") // Any instance of a phone number string replace mstring = regexs(0) cap assert match == 0 // _rc > 0 if any match local i 1 // Crash occurs within this loop. while (_rc > 0) { // loop until all phone numbers are found qui replace phone = phone + ", " + mstring // record the phone number just found di "i = `i', _rc = " _rc qui replace s = subinstr(s, mstring, "", 1) // remove # so as not to find it again. qui replace match = regexm(s, "\(`D3'\)`D3'\-`D4'") // look for the next # cap assert match == 0 qui if (_rc > 0) replace mstring = regexs(0) // retain it local ++i } // } log close //
Comment