I have dataset that was provided to me as an excel file. It includes a date variable. if I open the file in Excel and visually inspect it, the date variable appears to be a variable of the format mm/dd/yy. However, if I click on one of the cells in the date column it becomes clear that the variable was actually entered into Excel as date and time (i.e. mm/dd/yy hh:mm:ss am/pm).
When I use import excel to import the file I get a date variable that is stored as a double and formatted %td. If I list the data, everything looks fine. A visual comparison of the data and the excel spreadsheet show that everything matches and it appears that I have a daily date. When I tabulate the date variable, though, it becomes clear that something is wrong, as shown below. (Note that the date variable addmission was misspelled on arrival)
I'm only interested in the date portion of the date so it's easy enough to solve this problem with
gen admitdate=int(addmission)
This seems like odd behavior to me, though. By default I would expect Stata to ignore the Excel formatting and import the variable as date and time (integer formatted as %tc). If it didn't do that, I would expect Stata to respect the excel formatting and just import the date portion (integer formatted as %td). The fact that the variables is imported as some sort of weird mix of daily date and fractional time doesn't seem to be desirable to me.
I'm reluctant to declare this a bug, but I thought it worth noting publicly since it struck me as unexpected behavior.
When I use import excel to import the file I get a date variable that is stored as a double and formatted %td. If I list the data, everything looks fine. A visual comparison of the data and the excel spreadsheet show that everything matches and it appears that I have a daily date. When I tabulate the date variable, though, it becomes clear that something is wrong, as shown below. (Note that the date variable addmission was misspelled on arrival)
Code:
import excel "patients V4 7-1 to 7-8.xls", sheet("Sheet1") cellrange(A4:Z115) firstrow clear case(lower) . desc addmission storage display value variable name type format label variable label ---------------------------------------------------------------- addmission double %td Addmission . sort addmission . list addmission in 1/5 +-----------+ | addmiss~n | |-----------| 1. | 01jul2014 | 2. | 01jul2014 | 3. | 01jul2014 | 4. | 01jul2014 | 5. | 01jul2014 | +-----------+ . . tab addmission in 1/5 Addmission | Freq. Percent Cum. ------------+----------------------------------- 01jul2014 | 1 20.00 20.00 01jul2014 | 1 20.00 40.00 01jul2014 | 1 20.00 60.00 01jul2014 | 1 20.00 80.00 01jul2014 | 1 20.00 100.00 ------------+----------------------------------- Total | 5 100.00 . . /*some investigating reveals that the underlying variable that Stata has imported is not an integer as I would expect */ . . format addmission %25.0g . . list addmission in 1/5 +--------------------+ | addmission | |--------------------| 1. | 19905.013888888891 | 2. | 19905.050694444442 | 3. | 19905.055555555555 | 4. | 19905.386111111111 | 5. | 19905.435416666667 | +--------------------+ .
I'm only interested in the date portion of the date so it's easy enough to solve this problem with
gen admitdate=int(addmission)
This seems like odd behavior to me, though. By default I would expect Stata to ignore the Excel formatting and import the variable as date and time (integer formatted as %tc). If it didn't do that, I would expect Stata to respect the excel formatting and just import the date portion (integer formatted as %td). The fact that the variables is imported as some sort of weird mix of daily date and fractional time doesn't seem to be desirable to me.
I'm reluctant to declare this a bug, but I thought it worth noting publicly since it struck me as unexpected behavior.
Comment