I have to parse an xml file which gives me datetimes in Excel style; for example: 42580.3333333333.
Does Pandas provide a way to convert that number into a regular datetime object?
OK I think the easiest thing is to construct a TimedeltaIndex from the floats and add this to the scalar datetime for 1900,1,1:
In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df
Out[85]:
date
0 42580.333333
1 10023.000000
In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df
Out[86]:
date real_date
0 42580.333333 2016-07-31 07:59:59.971200
1 10023.000000 1927-06-12 00:00:00.000000
OK it seems that excel is a bit weird with it's dates thanks #ayhan:
In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df
Out[89]:
date real_date
0 42580.333333 2016-07-29 07:59:59.971200
1 10023.000000 1927-06-10 00:00:00.000000
See related: How to convert a python datetime.datetime to excel serial date number
you can directly parse with pd.to_datetime, with keywords unit='D' and origin='1899-12-30':
import pandas as pd
df = pd.DataFrame({'xldate': [42580.3333333333]})
df['date'] = pd.to_datetime(df['xldate'], unit='D', origin='1899-12-30')
df['date']
Out[2]:
0 2016-07-29 07:59:59.999971200
Name: date, dtype: datetime64[ns]
further reading:
What is story behind December 30, 1899 as base date?
an answer from Martijn Pieters how to handle excel ordinal value < 60 correctly
You can use the 3rd party xlrd library before passing to pd.to_datetime:
import xlrd
def read_date(date):
return xlrd.xldate.xldate_as_datetime(date, 0)
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df['new'] = pd.to_datetime(df['date'].apply(read_date), errors='coerce')
print(df)
date new
0 42580.333333 2016-07-29 08:00:00
1 10023.000000 1927-06-10 00:00:00
Related
I have to parse an xml file which gives me datetimes in Excel style; for example: 42580.3333333333.
Does Pandas provide a way to convert that number into a regular datetime object?
OK I think the easiest thing is to construct a TimedeltaIndex from the floats and add this to the scalar datetime for 1900,1,1:
In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df
Out[85]:
date
0 42580.333333
1 10023.000000
In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df
Out[86]:
date real_date
0 42580.333333 2016-07-31 07:59:59.971200
1 10023.000000 1927-06-12 00:00:00.000000
OK it seems that excel is a bit weird with it's dates thanks #ayhan:
In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df
Out[89]:
date real_date
0 42580.333333 2016-07-29 07:59:59.971200
1 10023.000000 1927-06-10 00:00:00.000000
See related: How to convert a python datetime.datetime to excel serial date number
you can directly parse with pd.to_datetime, with keywords unit='D' and origin='1899-12-30':
import pandas as pd
df = pd.DataFrame({'xldate': [42580.3333333333]})
df['date'] = pd.to_datetime(df['xldate'], unit='D', origin='1899-12-30')
df['date']
Out[2]:
0 2016-07-29 07:59:59.999971200
Name: date, dtype: datetime64[ns]
further reading:
What is story behind December 30, 1899 as base date?
an answer from Martijn Pieters how to handle excel ordinal value < 60 correctly
You can use the 3rd party xlrd library before passing to pd.to_datetime:
import xlrd
def read_date(date):
return xlrd.xldate.xldate_as_datetime(date, 0)
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df['new'] = pd.to_datetime(df['date'].apply(read_date), errors='coerce')
print(df)
date new
0 42580.333333 2016-07-29 08:00:00
1 10023.000000 1927-06-10 00:00:00
I have a csv file with column 'date' which has dates in many different formats like ddmmyy, mmddyy,yymmdd. I want to convert all the dates to y-m-d format
df=pd.read_csv(file)
df=df['date] .dt.strftime(%y-%m-%d)
This code gives error: "Can only use .dt accessor with datetimelike values"
You can utilise pd.to_datetime -
>>> import pandas as pd
>>>
>>> df = pd.DataFrame(['1/2/2020','12/31/2020','20-Jun-20'],columns=['Date'])
>>> df
Date
0 1/2/2020
1 12/31/2020
2 20-Jun-20
>>>
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df
Date
0 2020-01-02
1 2020-12-31
2 2020-06-20
>>>
>>> df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%y-%m-%d')
>>>
>>> df
Date
0 20-01-02
1 20-12-31
2 20-06-20
>>>
Step 0:-
Your dataframe:-
df=pd.read_csv('your file name.csv')
Step 1:-
firstly convert your 'date' column into datetime by using to_datetime() method:-
df['date']=pd.to_datetime(df['date'])
Step 2:-
And If you want to convert them in string like format Then use:-
df['date']=df['date'].astype(str)
Now if you print df or write df(if you are using jupyter notebook)
Output:-
0 2020-01-01
1 2020-12-31
2 2020-06-20
I have a csv file, and want to use H2O to do DeepLearning. But it has some Chinese and datetime that when I finish my Deeplearning need to save output to csv, it can't return to original data.
I use small data to show my problem here.
In[1]: df = pd.DataFrame({'datetime':['2016-12-17 00:00:00'],'time':['00:00:30'],'month':['月'], 'weekend':['周六']})
print(df.dtypes)
df
out[1]: datetime object
time object
month object
weekend object
dtype: object
datetime time month weekend
0 2016-12-17 00:00:00 00:00:30 月 周六
In[2]: h2o_frame = h2o.H2OFrame(df);h2o_frame ;h2o_frame.types ;h2o_frame
C:\Users\thi\Anaconda3\lib\site-packages\h2o\utils\shared_utils.py:170: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
data = _handle_python_lists(python_obj.as_matrix().tolist(), -1)[1]
out[2]: Parse progress: |█████████████████████████████████████████████████████████| 100%
datetime time month weekend
2016-12-17 00:00:00 1970-01-01 00:00:30 <0xA4EB> <0xA9>P<0xA4BB>
the time I want it just only 00:00:30, any way to fix it?
month and weekends I don't find any way to let it show Chinese, but I still finish my deeplearning
But when I want to let h2oframe back to DataFrame and save to csv file, it save <0xA4EB> for me but not 月, and datetime change to int
In[3]: dff = h2o_frame.as_data_frame();dff
out[3]: datetime time month weekend
0 1481932800000 30000 <0xA4EB> <0xA9>P<0xA4BB>
How to correctly return character from h2oframe to DataFrame
How to correctly return datetime from h2oframe to DataFrame
One simplest way to solve this is, when you convet pandas frame to H2OFrame use argument column_types ,as below:
In [69]: col_types
Out[69]: ['categorical', 'categorical', 'categorical', 'categorical']
In [70]: h2o_frame = h2o.H2OFrame(df,column_types=col_types);h2o_frame ;h2o_frame.types ;h2o_frame
Parse progress: |█████████████████████████████████████████████████████████████████████████████| 100%
Out[70]:
datetime month time weekend
------------------- ------- -------- ---------
2016-12-17 00:00:00 月 00:00:30 周六
[1 row x 4 columns]
In [71]: dff = h2o_frame.as_data_frame();dff
Out[71]:
datetime month time weekend
0 2016-12-17 00:00:00 月 00:00:30 周六
allfiles = h2o.import_file(path='data/', pattern=".csv")
df = allfiles.as_data_frame()
df['datetime'] = pd.to_datetime(df["datetime"], unit='ms')
Given the following data frame:
Name Loc DateTime RecordId RST
0 SwingBridge (47.57, -122.35) 08/01/2016 12:00:00 AM 1854751 76.06
1 Roosevelt (47.69, -122.31) 08/01/2016 12:00:00 AM 1941744 2.26
which is read in from a csv file using
filename='my.csv'
df=pd.read_csv(filename,
dtype={'Name': str, 'Loc': object,
'RecordId':int, 'RST': float})
If I read the dates using parse_date=['DateTime'] or recast them as
df['DateTime'] = pd.to_datetime(df['DateTime']), why do the bloody hours disappear??
Name Loc DateTime RecordId RST
0 SwingBridge (47.57, -122.35) 08/01/2016 1854751 76.06
1 Roosevelt (47.69, -122.31) 08/01/2016 1941744 2.26
This is just a display issue, by default when the time is equal to 00:00 or in your case 12:00 AM then it is not displayed, if you look at the individual element's value, the time component is still there.
Example:
In[4]:
import pandas as pd
import io
t="""DateTime
08/01/2016 12:00:00 AM"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[4]:
DateTime
0 2016-08-01
In[5]:
df['DateTime'].iloc[0]
Out[5]: Timestamp('2016-08-01 00:00:00')
I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()