Dear all,
I am facing the following problem. The university provided me with a windows machine to specifically work with some data. Previously I have been working with the same data and writing STATA codes from my Mac IOS machine. Unfortunately, the same codes and files that worked on my machine now are not working on the new one.
Shortly, my code imports csv files, run some cleaning/merging and then saves the files in a different location as dta, everything in a loop. This has always worked on my IOS machine. Moreover, the files are the original csv obtained from the provider and have been only transferred from the IOS to an external hard disk without any manipulation. If I import them individually the problem does not arise.
Now instead STATA appears to encode and save the file twice, the second time using the wrong encoding and, thus, without possibility to read the file further. Let me show you what I mean.
Here a reduced form of the code that I have written to trace the problem:
When I run the code I get the following output - I have obscured (###) some names for privacy, and I have put in bold the line which are unexpected and problematic:
D:\#####\csv\History\Universe\2016
(encoding automatically selected: ISO-8859-1)
(25 vars, 780,167 obs)
D:\#####\Stata\dtas\Universe\txs
(file clientoutputmain_2016_01_04_2016_01_10_.dta not found)
file clientoutputmain_2016_01_04_2016_01_10_.dta saved
D:\######\csv\History\Universe\2016
(encoding automatically selected: windows-1252)
Note: 3,903 binary zeros were ignored in the source file. The first instance
occurred on line 3. Binary zeros are not valid in text data. Inspect
your data carefully.
(2 vars, 3 obs)
D:\#####\Stata\dtas\Universe\txs
(file ._clientoutputmain_2016_01_04_2016_01_10_.dta not found)
file ._clientoutputmain_2016_01_04_2016_01_10_.dta saved
D:\######\csv\History\Universe\2016
(encoding automatically selected: ISO-8859-1)
(25 vars, 882,454 obs)
D:\#####\Stata\dtas\Universe\txs
(file clientoutputmain_2016_01_11_2016_01_17_.dta not found)
file clientoutputmain_2016_01_11_2016_01_17_.dta saved
D:\#####\csv\History\Universe\2016
(encoding automatically selected: windows-1252)
Note: 3,903 binary zeros were ignored in the source file. The first instance
occurred on line 3. Binary zeros are not valid in text data. Inspect
your data carefully.
(2 vars, 3 obs)
D:\#####\Stata\dtas\Universe\txs
(file ._clientoutputmain_2016_01_11_2016_01_17_.dta not found)
file ._clientoutputmain_2016_01_11_2016_01_17_.dta saved
As you can see the program encodes the file a second time as windows-1252 and then save the file using the same name but with a prefix of "._". This problem persists even if I specify the encoding when importing the csv.
I have absolutely no idea about what is going on and I could not find any resource online.
Any help is highly appreciated,
Regards,
Brian
I am facing the following problem. The university provided me with a windows machine to specifically work with some data. Previously I have been working with the same data and writing STATA codes from my Mac IOS machine. Unfortunately, the same codes and files that worked on my machine now are not working on the new one.
Shortly, my code imports csv files, run some cleaning/merging and then saves the files in a different location as dta, everything in a loop. This has always worked on my IOS machine. Moreover, the files are the original csv obtained from the provider and have been only transferred from the IOS to an external hard disk without any manipulation. If I import them individually the problem does not arise.
Now instead STATA appears to encode and save the file twice, the second time using the wrong encoding and, thus, without possibility to read the file further. Let me show you what I mean.
Here a reduced form of the code that I have written to trace the problem:
Code:
clear foreach j of num 16/21 { local yr = 2000 +`j' local files : dir "$uni/`yr'" file "*.csv" foreach file of local files { cd "$uni/`yr'" import delimited "`file'", clear cd "$dbs/Universe/txs" local new : subinstr local file ".csv" "_.dta", all save "`new'", replace clear } }
When I run the code I get the following output - I have obscured (###) some names for privacy, and I have put in bold the line which are unexpected and problematic:
D:\#####\csv\History\Universe\2016
(encoding automatically selected: ISO-8859-1)
(25 vars, 780,167 obs)
D:\#####\Stata\dtas\Universe\txs
(file clientoutputmain_2016_01_04_2016_01_10_.dta not found)
file clientoutputmain_2016_01_04_2016_01_10_.dta saved
D:\######\csv\History\Universe\2016
(encoding automatically selected: windows-1252)
Note: 3,903 binary zeros were ignored in the source file. The first instance
occurred on line 3. Binary zeros are not valid in text data. Inspect
your data carefully.
(2 vars, 3 obs)
D:\#####\Stata\dtas\Universe\txs
(file ._clientoutputmain_2016_01_04_2016_01_10_.dta not found)
file ._clientoutputmain_2016_01_04_2016_01_10_.dta saved
D:\######\csv\History\Universe\2016
(encoding automatically selected: ISO-8859-1)
(25 vars, 882,454 obs)
D:\#####\Stata\dtas\Universe\txs
(file clientoutputmain_2016_01_11_2016_01_17_.dta not found)
file clientoutputmain_2016_01_11_2016_01_17_.dta saved
D:\#####\csv\History\Universe\2016
(encoding automatically selected: windows-1252)
Note: 3,903 binary zeros were ignored in the source file. The first instance
occurred on line 3. Binary zeros are not valid in text data. Inspect
your data carefully.
(2 vars, 3 obs)
D:\#####\Stata\dtas\Universe\txs
(file ._clientoutputmain_2016_01_11_2016_01_17_.dta not found)
file ._clientoutputmain_2016_01_11_2016_01_17_.dta saved
As you can see the program encodes the file a second time as windows-1252 and then save the file using the same name but with a prefix of "._". This problem persists even if I specify the encoding when importing the csv.
I have absolutely no idea about what is going on and I could not find any resource online.
Any help is highly appreciated,
Regards,
Brian
Comment