I am currently working with historical minute-by-minute intraday stock price data of S&P 500 firms.
For example, I have one data file for Apple where each row includes 5 variables (datetime, open, high, low, close, volume) in the time period of Jan 3, 2005, until Jul 01, 2022.
So the rows look like this
datetime; open; high; low; close; volume
2005-01-30 09:30:00; 38.00; 38.10; 37.99; 38.02; 64600
My problem concerns the variable datetime that is originally stored as a string variable (type:str19, format %19s). In order to run a time-series analysis, I would like to convert this variable into a numeric variable.
The command I use is the following:
generate datetime_num = clock(datetime, "YMDhms")
format %-tc datetime_num
The problem now is that the values of the newly generate variable datetime_num do NOT match the original string variable. For instance, 2005-01-30 09:30:00 is converted to 03jan2005 09:30:25. Somehow 25 seconds are added.
Also problematic is that the second timestamp 2005-01-30 09:31:00 is also converted to 03jan2005 09:30:25 - as such, I do not have unique observations any longer.
As a further note, it seems to be unsystematic how many seconds are added or subtracted to the original and correct time stamps.
Any ideas about what (a) caused this problem, and (b) how to fix it?
Many thanks in advance!
For example, I have one data file for Apple where each row includes 5 variables (datetime, open, high, low, close, volume) in the time period of Jan 3, 2005, until Jul 01, 2022.
So the rows look like this
datetime; open; high; low; close; volume
2005-01-30 09:30:00; 38.00; 38.10; 37.99; 38.02; 64600
My problem concerns the variable datetime that is originally stored as a string variable (type:str19, format %19s). In order to run a time-series analysis, I would like to convert this variable into a numeric variable.
The command I use is the following:
generate datetime_num = clock(datetime, "YMDhms")
format %-tc datetime_num
The problem now is that the values of the newly generate variable datetime_num do NOT match the original string variable. For instance, 2005-01-30 09:30:00 is converted to 03jan2005 09:30:25. Somehow 25 seconds are added.
Also problematic is that the second timestamp 2005-01-30 09:31:00 is also converted to 03jan2005 09:30:25 - as such, I do not have unique observations any longer.
As a further note, it seems to be unsystematic how many seconds are added or subtracted to the original and correct time stamps.
Any ideas about what (a) caused this problem, and (b) how to fix it?
Many thanks in advance!

Comment