I am trying to find an efficient way of processing weather data that is nominally presented as a csv file but in two separate sections. A file of the data for a single location is formatted as follows:
"lat", "lon", "elev", "tzone"
55.0, 10.0, 150, CET
"tstamp", "temp", "wind", "total_solar", "direct_solar"
"2023-07-01T00:00:00Z", 19.0, 2.5, 0, 0
"2023-07-01T01:00:00Z", 18.5, 2.4, 0, 0
"2023-07-01T02:00:00Z", 18.0, 2.6, 0, 0.
"2023-07-01T03:00:00Z", 17.5, 2.4, 80, 100
....
The point is that there are, in effect, two csv files joined together with the first part containing common information for all the time series observations in the second part.
It is not difficult to read this data using two separate import delimited blocks, writing out each part to a temporary file and then merging these files. However, this is pretty inefficient when processing a large number of locations because you finish up with a minimum of 3 temporary files per location. It is possible to skip the write and join operations by using the first import delimited block to create local variables which are used to generate lat, lon, elev and tzone variables in the second import delimited block. But this is inefficient because it involves repeated overwriting of local variables. A third option is to use Python to carry out the same transformation.
Can anyone think of a more efficient way In Stata of reading this data to generate a single Stata file containing the merged dataset with variables <lat lon elev tzone tstamp temp wind total_solar direct_solar> ?
"lat", "lon", "elev", "tzone"
55.0, 10.0, 150, CET
"tstamp", "temp", "wind", "total_solar", "direct_solar"
"2023-07-01T00:00:00Z", 19.0, 2.5, 0, 0
"2023-07-01T01:00:00Z", 18.5, 2.4, 0, 0
"2023-07-01T02:00:00Z", 18.0, 2.6, 0, 0.
"2023-07-01T03:00:00Z", 17.5, 2.4, 80, 100
....
The point is that there are, in effect, two csv files joined together with the first part containing common information for all the time series observations in the second part.
It is not difficult to read this data using two separate import delimited blocks, writing out each part to a temporary file and then merging these files. However, this is pretty inefficient when processing a large number of locations because you finish up with a minimum of 3 temporary files per location. It is possible to skip the write and join operations by using the first import delimited block to create local variables which are used to generate lat, lon, elev and tzone variables in the second import delimited block. But this is inefficient because it involves repeated overwriting of local variables. A third option is to use Python to carry out the same transformation.
Can anyone think of a more efficient way In Stata of reading this data to generate a single Stata file containing the merged dataset with variables <lat lon elev tzone tstamp temp wind total_solar direct_solar> ?
Comment