Dear Statalisters,
I am using Stata/MP2 14 for Windows. I have a folder with more than 40,000 textfiles.
Since those files come in different encodings, I want to use Stata's new -unicode translate - features to translate those files to UTF-8.
In this context, I stumbled across a problem.
Here is an example:
The code generates a folder with 22,222 files and then tells Stata to - unicode analyze - each file.
However, it seems that Stata only analyzes 10,000 of those files, here is the result:
So 12,222 files are not analyzed, the same happens with - unicode translate -. 10,000 seems to be a general limit.
Can anybody reproduce this result, or does anyone have a solution? Please let me know, I would be very grateful.
Many thanks
Ali
I am using Stata/MP2 14 for Windows. I have a folder with more than 40,000 textfiles.
Since those files come in different encodings, I want to use Stata's new -unicode translate - features to translate those files to UTF-8.
In this context, I stumbled across a problem.
Here is an example:
Code:
local initial_dir `c(pwd)'
**** create a folder
capture mkdir folder
cd folder
**** create and store 20,000 datasets
qui {
forvalues f=1/22222 {
noisily di "`f' of 22222"
clear
set obs 1
gen str string=`"test string"'
save file`f', replace
}
}
**** unicode analysis
clear
set more on
unicode analyze *
clear all
cd `initial_dir'
However, it seems that Stata only analyzes 10,000 of those files, here is the result:
Code:
File summary (before starting):
10000 file(s) specified
10000 file(s) to be examined ...
Can anybody reproduce this result, or does anyone have a solution? Please let me know, I would be very grateful.
Many thanks
Ali

Comment