I have a db with dates in this format: 44378.097351.
If I put that in excel and choose the date format is says it is 01/07/2021 2:20:11, which is correct.
The problem is that now I need to do a plot with python with these dates and I extract the data directly from the db but I don't know how to convert it to an understandable date format.
Any suggestions?
Thank you
From Date systems in Excel, we can find how such a number was converted to a datetime.
The 1900 date system
In the 1900 date system, dates are calculated by using January 1, 1900, as a starting point. When you enter a date, it is converted into a serial number that represents the number of days elapsed since January 1, 1900. For example, if you enter July 5, 2011, Excel converts the date to the serial number 40729. This is the default date system in Excel for Windows, Excel 2016 for Mac, and Excel for Mac 2011. If you choose to convert the pasted data, Excel adjusts the underlying values, and the pasted dates match the dates that you copied.
from datetime import timedelta, datetime
dt = datetime(1900, 1, 1) + timedelta(44378.097351 - 2)
This format corresponds to the number of days since 01/01/1900.
you should check out this or this post to find an adequate solution ;)
Related
I have a dataset and I realized the values in the date column are in two different formats.
I tried resolving this issue using date parser but it confuses the day of the month with the month.
For example :
'27/02/21 13:40' is converted correctly to 2021-02-27 13:40:00' BUT '01/03/21 15:09' is converted to '2021-01-03 15:09:00' (Instead of March 1st it's transformed into January 3rd).
I really don't understand why in the first case the conversion is correct, while in the second it's not. Both dates are in the same column and have the same format.
This is a preview of the dataset with the two different dates:
This is the date that was converted correctly
This is the date that was not converted correctly
These are the steps I followed:
I converted the date columns to a list of strings
I created this function:
date parser function
I previewed my converted list and noticed that not all dates had been converted in the same way:
This date was converted correctly
Here, month and day of the month are switched
This question already has an answer here:
convert numerical representation of date (excel format) to python date and time, then split them into two seperate dataframe columns in pandas
(1 answer)
Closed 4 years ago.
I have seen that excel identifies dates with specific serial numbers. For example :
09/07/2018 = 43290
10/07/2018 = 43291
I know that we use the DATEVALUE , VALUE and the TEXT functions to convert between these types.
But what is the logic behind this conversion? why 43290 for 09/07/2018 ?
Also , if I have a list of these dates in the number format in a dataframe (Python), how can I convert this number to the date format?
Similarly with time, I see decimal values in place of a regular time format. What is the logic behind these time conversions?
The following question that has been given in the comments is informative, but does not answer my question of the logic behind the conversion between Date and Text format :
convert numerical representation of date (excel format) to python date and time, then split them into two seperate dataframe columns in pandas
It is simply the number of days (or fraction of days, if talking about date and time) since January 1st 1900:
The DATEVALUE function converts a date that is stored as text to a
serial number that Excel recognizes as a date. For example, the
formula =DATEVALUE("1/1/2008") returns 39448, the serial number of the
date 1/1/2008. Remember, though, that your computer's system date
setting may cause the results of a DATEVALUE function to vary from
this example
...
Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.
from DATEVALUE docs
if I have a list of these dates in the number format in a dataframe
(Python), how can I convert this number to the date format?
Since we know this number represents the number of days since 1/1/1900 it can be easily converted to a date:
from datetime import datetime, timedelta
day_number = 43290
print(datetime(1900, 1, 1) + timedelta(days=day_number - 2))
# 2018-07-09 00:00:00 ^ subtracting 2 because 1/1/1900 is
# "day 1", not "day 0"
However pd.read_excel should be able to handle this automatically.
How do you instantiate a datetime.timedelta to a number of seconds in python? For example, 43880.6543
Edit (as per clarification by OP): I have an Excel sheet, where a cell shows 43880.6543 (when the format is Generic) instead of a date. How do I convert this value to a datetime in python?
Just use
value = timedelta(seconds=43880.6543)
You can check the value is ok with
value.total_seconds()
Edit: If, in an Excel spreadsheet, you have a date expressed as a number, you can use python to convert that number into a datetime by doing this:
value = datetime(1900, 1, 1) + timedelta(days=43880.6543)
# value will be February 2nd, 2020 in the afternoon
I was trying to read a multiple sheet Excel workbook into SPSS when I stumbled upon the following problem: when I read a date variable from Excel into Python with xlrd, it seems to add 2 days to the date. Or perhaps my conversion from the Excel format to a more human friendly representation is not correct. Could anybody tell me what's wrong in the code below?
import xlwt,datetime
wb=xlwt.Workbook()
ws=wb.add_sheet("date_1")
fmt = xlwt.easyxf(num_format_str='M/D/YY')
ws.write(0,0,datetime.datetime.now(),fmt)
wb.save(r"d:\temp\datetest.xls")
#Now open Excel file manually -> date is correct
import xlrd
wb=xlrd.open_workbook(r"d:\temp\datetest.xls")
ws=wb.sheets()[0]
Data = ws.row_values(0)[0]
print datetime.datetime(1900,1,1,0,0,0)+datetime.timedelta(days=Data)
#Now date is 2 days off
I'm pretty sure that xlrd is able to tell when the cell is formatted in Excel as a date, and make the conversion to Python date object on its own. It's not foolproof, though.
Your issue is probably by starting with datetime.datetime(1900,1,1,0,0,0) and adding the timedelta to it--you might want to try:
datetime.date(1899,12,31) + datetime.timedelta(days=Data)
Which should avoid the (a) one day you're adding by starting at 1/1/1900 and (b) one day you're adding (I'm guessing) from having it be a datetime object rather than date, which may be pushing it over into the next day. This is just a guess, though.
Alternatively, if you already know that it's consistently two days, why don't you just do this?
print datetime.datetime(1900,1,1,0,0,0) + datetime.timedelta(days=Data - 2)
Nope. There's two things going on here.
1 - in Excel, "1" rather than "0" corresponds to January 1, 1900
2 - Excel includes Feb 29, 1900 (which never occurred), accounting for the second day of difference. This is done on purpose for backward compatibility reasons.
Taking these two points into account seems to solve all issues.
Earlier answers are only partially correct.
Extra info:
There are TWO Excel date systems: (1900 (Windows) and 1904 (Mac)).
1900 system: earliest non-ambiguous datetime is 1900-03-01T00:00:00, represented as 61.0.
1904 system: earliest non-ambiguous datetime is 1904-01-02T00:00:00, represented as 1.0.
Which date system is in effect is available in xlrd from Book.datemode.
xlrd supplies a function called xldate_as_tuple that takes care of all of the above. This code:
print datum
print datetime.datetime(1900, 1, 1) + datetime.timedelta(days=datum)
print datetime.datetime(1900, 3, 1) + datetime.timedelta(days=datum - 61)
tup = xlrd.xldate_as_tuple(datum, wb.datemode)
print tup
print datetime.datetime(*tup)
produces:
41274.4703588
2013-01-02 11:17:19
2012-12-31 11:17:19
(2012, 12, 31, 11, 17, 19)
2012-12-31 11:17:19
when wb.datemode is 0 (1900).
This information is all contained in the documentation that is distributed with xlrd.
Python and Matlab quite often have integer date representations as follows:
733828.0
733829.0
733832.0
733833.0
733834.0
733835.0
733836.0
733839.0
733840.0
733841.0
these numbers correspond to some dates this year. Do you guys know which function can convert them back to YYYYMMDD format?
thanks a million!
The datetime.datetime class can help you here. The following works, if those values are treated as integer days (you don't specify what they are).
>>> from datetime import datetime
>>> dt = datetime.fromordinal(733828)
>>> dt
datetime.datetime(2010, 2, 25, 0, 0)
>>> dt.strftime('%Y%m%d')
'20100225'
You show the values as floats, and the above doesn't take floats. If you can give more detail about what the data is (and where it comes from) it will be possible to give a more complete answer.
Since Python example was already demonstrated, here is the matlab one:
>> datestr(733828, 'yyyymmdd')
ans =
20090224
Also, note that while looking similar these are actually different things in Matlab and Python:
Matlab
A serial date number represents the whole and fractional number of days
from a specific date and time, where datenum('Jan-1-0000 00:00:00') returns
the number 1. (The year 0000 is merely a reference point and is not intended
to be interpreted as a real year in time.)
Python, datetime.date.fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1.
So they would differ by 366 days, which is apparently the length of the year 0.
Dates like 733828.0 are Rata Die dates, counted from January 1, 1 A.D. (and decimal fraction of days). They may be UTC or by your timezone.
Julian Dates, used mostly by astronomers, count the days (and decimal fraction of days) since January 1, 4713 BC Greenwich noon. Julian date is frequently confused with Ordinal date, which is the date count from January 1 of the current year (Feb 2 = ordinal day 33).
So datetime is calling these things ordinal dates, but I think this only makes sense locally, in the world of python.
Is 733828.0 a timestamp? If so, you can do the following:
import datetime as dt
dt.date.fromtimestamp(733828.0).strftime('%Y%m%d')
PS
I think Peter Hansen is right :)
I am not a native English speaker. Just trying to help. I don't quite know the difference between a timestamp and an ordinal :(