I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.
These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object
If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))
Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.
The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')
Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.
Related
I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.
These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object
If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))
Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.
The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')
Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.
I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.
These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object
If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))
Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.
The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')
Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.
I'm using Python 3.6 and Pandas 0.20.3.
I have a column that I've converted to date type from datetime. All I need is the date. I have it as a derived column for ease of use. But I'm looking to do some further operations via a day of the week calculation. I can get the day of week from a datetime type but not from the date. It seems to me that this should be possible but I've tried multiple variations and not found success.
Here is an example:
import numpy as np
import pandas as pd
df = pd.DataFrame({'date':['2017-5-16','2017-5-17']})
df['trade_date']=pd.to_datetime(df['date'])
I can get the day of the week from the datetime column 'trade_date'.
df['dow']=df['trade_date'].dt.dayofweek
df
date trade_date dow
0 2017-5-16 2017-05-16 1
1 2017-5-17 2017-05-17 2
But if I have a date, rather than a datetime, no dice:
For instance:
df['trade_date_2']=pd.to_datetime(df['date']).dt.date
And then:
df['dow_2']=df['trade_date_2'].dt.dayofweek
I get (at the end):
AttributeError: Can only use .dt accessor with datetimelike values
I've tried various combinations of dayofweek(), weekday, weekday() which, I realize, highlight my ignorance of exactly how Pandas works. So ... any suggestions besides adding another column which is the datetime version of column trade_date?
I'll also welcome explanations of why this is not working.
There is problem it is difference between pandas datetime (timestamps) where are implemented .dt methods and python date where not.
#return python date
df['trade_date_2']= pd.to_datetime(df['date']).dt.date
print (df['trade_date_2'].apply(type))
0 <class 'datetime.date'>
1 <class 'datetime.date'>
Name: trade_date_2, dtype: object
#cannot work with python date
df['dow_2']=df['trade_date_2'].dt.dayofweek
Need convert to pandas datetime:
df['dow_2']= pd.to_datetime(df['trade_date_2']).dt.dayofweek
print (df)
date trade_date_2 dow_2
0 2017-5-16 2017-05-16 1
1 2017-5-17 2017-05-17 2
So the best is use:
df['date'] = pd.to_datetime(df['date'])
print (df['date'].apply(type))
0 <class 'pandas._libs.tslib.Timestamp'>
1 <class 'pandas._libs.tslib.Timestamp'>
Name: date, dtype: object
df['trade_date_2']= df['date'].dt.date
df['dow_2']=df['date'].dt.dayofweek
print (df)
date trade_date_2 dow_2
0 2017-05-16 2017-05-16 1
1 2017-05-17 2017-05-17 2
EDIT:
Thank you Bharath shetty for solution working with python date - failed with NaT:
df = pd.DataFrame({'date':['2017-5-16',np.nan]})
df['trade_date_2']= pd.to_datetime(df['date']).dt.date
df['dow_2'] = df['trade_date_2'].apply(lambda x: x.weekday())
AttributeError: 'float' object has no attribute 'weekday'
Comparing solutions:
df = pd.DataFrame({'date':['2017-5-16','2017-5-17']})
df = pd.concat([df]*10000).reset_index(drop=True)
def a(df):
df['trade_date_2']= pd.to_datetime(df['date']).dt.date
df['dow_2'] = df['trade_date_2'].apply(lambda x: x.weekday())
return df
def b(df):
df['date1'] = pd.to_datetime(df['date'])
df['trade_date_21']= df['date1'].dt.date
df['dow_21']=df['date1'].dt.dayofweek
return (df)
def c(df):
#dont write to column, but to helper series
dates = pd.to_datetime(df['date'])
df['trade_date_22']= dates.dt.date
df['dow_22']= dates.dt.dayofweek
return (df)
In [186]: %timeit (a(df))
10 loops, best of 3: 101 ms per loop
In [187]: %timeit (b(df))
10 loops, best of 3: 90.8 ms per loop
In [188]: %timeit (c(df))
10 loops, best of 3: 91.9 ms per loop
I have a Pandas data frame, one of the column contains date strings in the format YYYY-MM-DD
For e.g. '2013-10-28'
At the moment the dtype of the column is object.
How do I convert the column values to Pandas date format?
Essentially equivalent to #waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst):
In [11]: df
Out[11]:
a time
0 1 2013-01-01
1 2 2013-01-02
2 3 2013-01-03
In [12]: pd.to_datetime(df['time'])
Out[12]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
Name: time, dtype: datetime64[ns]
In [13]: df['time'] = pd.to_datetime(df['time'])
In [14]: df
Out[14]:
a time
0 1 2013-01-01 00:00:00
1 2 2013-01-02 00:00:00
2 3 2013-01-03 00:00:00
Handling ValueErrors
If you run into a situation where doing
df['time'] = pd.to_datetime(df['time'])
Throws a
ValueError: Unknown string format
That means you have invalid (non-coercible) values. If you are okay with having them converted to pd.NaT, you can add an errors='coerce' argument to to_datetime:
df['time'] = pd.to_datetime(df['time'], errors='coerce')
Use astype
In [31]: df
Out[31]:
a time
0 1 2013-01-01
1 2 2013-01-02
2 3 2013-01-03
In [32]: df['time'] = df['time'].astype('datetime64[ns]')
In [33]: df
Out[33]:
a time
0 1 2013-01-01 00:00:00
1 2 2013-01-02 00:00:00
2 3 2013-01-03 00:00:00
I imagine a lot of data comes into Pandas from CSV files, in which case you can simply convert the date during the initial CSV read:
dfcsv = pd.read_csv('xyz.csv', parse_dates=[0]) where the 0 refers to the column the date is in.
You could also add , index_col=0 in there if you want the date to be your index.
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Now you can do df['column'].dt.date
Note that for datetime objects, if you don't see the hour when they're all 00:00:00, that's not pandas. That's iPython notebook trying to make things look pretty.
If you want to get the DATE and not DATETIME format:
df["id_date"] = pd.to_datetime(df["id_date"]).dt.date
Another way to do this and this works well if you have multiple columns to convert to datetime.
cols = ['date1','date2']
df[cols] = df[cols].apply(pd.to_datetime)
It may be the case that dates need to be converted to a different frequency. In this case, I would suggest setting an index by dates.
#set an index by dates
df.set_index(['time'], drop=True, inplace=True)
After this, you can more easily convert to the type of date format you will need most. Below, I sequentially convert to a number of date formats, ultimately ending up with a set of daily dates at the beginning of the month.
#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)
#Convert to monthly dates
df.index = df.index.to_period(freq='M')
#Convert to strings
df.index = df.index.strftime('%Y-%m')
#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)
For brevity, I don't show that I run the following code after each line above:
print(df.index)
print(df.index.dtype)
print(type(df.index))
This gives me the following output:
Index(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='object', name='time')
object
<class 'pandas.core.indexes.base.Index'>
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', name='time', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
PeriodIndex(['2013-01', '2013-01', '2013-01'], dtype='period[M]', name='time', freq='M')
period[M]
<class 'pandas.core.indexes.period.PeriodIndex'>
Index(['2013-01', '2013-01', '2013-01'], dtype='object')
object
<class 'pandas.core.indexes.base.Index'>
DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01'], dtype='datetime64[ns]', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
For the sake of completeness, another option, which might not be the most straightforward one, a bit similar to the one proposed by #SSS, but using rather the datetime library is:
import datetime
df["Date"] = df["Date"].apply(lambda x: datetime.datetime.strptime(x, '%Y-%d-%m').date())
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 startDay 110526 non-null object
1 endDay 110526 non-null object
import pandas as pd
df['startDay'] = pd.to_datetime(df.startDay)
df['endDay'] = pd.to_datetime(df.endDay)
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 startDay 110526 non-null datetime64[ns]
1 endDay 110526 non-null datetime64[ns]
Try to convert one of the rows into timestamp using the pd.to_datetime function and then use .map to map the formular to the entire column
I have a dataframe in pandas called 'munged_data' with two columns 'entry_date' and 'dob' which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between 'entry_date' and 'dob' and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :
internal_quote_id
2 15685977 days, 23:54:30.457856
3 11651985 days, 23:49:15.359744
4 9491988 days, 23:39:55.621376
7 11907004 days, 0:10:30.196224
9 15282164 days, 23:30:30.196224
15 15282227 days, 23:50:40.261632
However i do not seem to be able to extract the days as an integer so that i can continue with my calculation.
Any help appreciated.
Using the Pandas type Timedelta available since v0.15.0 you also can do:
In[1]: import pandas as pd
In[2]: df = pd.DataFrame([ pd.Timestamp('20150111'),
pd.Timestamp('20150301') ], columns=['date'])
In[3]: df['today'] = pd.Timestamp('20150315')
In[4]: df
Out[4]:
date today
0 2015-01-11 2015-03-15
1 2015-03-01 2015-03-15
In[5]: (df['today'] - df['date']).dt.days
Out[5]:
0 63
1 14
dtype: int64
You need 0.11 for this (0.11rc1 is out, final prob next week)
In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])
In [10]: df
Out[10]:
0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [11]: df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ],columns=['age'])
In [12]: df
Out[12]:
age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [13]: df['today'] = Timestamp('20130419')
In [14]: df['diff'] = df['today']-df['age']
In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)
In [17]: df
Out[17]:
age today diff years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00 12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00 8.887671
You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)
Not sure if you still need it, but in Pandas 0.14 i usually use .astype('timedelta64[X]') method
http://pandas.pydata.org/pandas-docs/stable/timeseries.html (frequency conversion)
df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
df.ix[0]-df.ix[1]
Returns:
0 -1251 days
dtype: timedelta64[ns]
(df.ix[0]-df.ix[1]).astype('timedelta64[Y]')
Returns:
0 -4
dtype: float64
Hope that will help
Let's specify that you have a pandas series named time_difference which has type
numpy.timedelta64[ns]
One way of extracting just the day (or whatever desired attribute) is the following:
just_day = time_difference.apply(lambda x: pd.tslib.Timedelta(x).days)
This function is used because the numpy.timedelta64 object does not have a 'days' attribute.
To convert any type of data into days just use pd.Timedelta().days:
pd.Timedelta(1985, unit='Y').days
84494