How to convert Pandas time series into a dict with string key - python

I need to convert a Pandas time series object into dicts which have the datetime as the key. I tried dict(my_ts_obj), but the keys are Timestamp, not string.
Thanks a million for your help!

You could use s.index.format() to convert the Timestamps into strings:
In [87]: rng = pd.date_range('12/1/2012', periods=4, freq='D')
In [88]: s = pd.Series(pd.np.random.randn(len(rng)), index=rng)
In [89]: s
Out[89]:
2012-12-01 -1.673655
2012-12-02 1.447061
2012-12-03 -0.672347
2012-12-04 0.202692
Freq: D, dtype: float64
In [90]: dict(zip(s.index.format(), s))
Out[90]:
{'2012-12-01': -1.6736553219187384,
'2012-12-02': 1.4470613776383001,
'2012-12-03': -0.67234662513200982,
'2012-12-04': 0.20269246374288372}

Related

Subtracting a fixed date from a column in Pandas

Consider
In [99]: d = pd.to_datetime({'year':[2016], 'month':[06], 'day':[01]})
In [100]: d1 = pd.to_datetime({'year':[2016], 'month':[01], 'day':[01]})
In [101]:d - d1
Out[101]:
0 152 days
dtype: timedelta64[ns]
But when I try to do this for a whole column, it gives me trouble. Consider:
df['Age'] = map(lambda x:x - pd.to_datetime({'year':[2016], 'month':[06], 'day':[01]}), df['Manager_DoB'])
df['Manager_Dob'] is a column of datetime objects.
It flags the following error:
TypeError: can only operate on a datetime with a rhs of a timedelta/DateOffset for addition and subtraction, but the operator [__rsub__] was passed
You don't need to use map*, you can subtract a Timestamp from a datetime column/Series:
In [11]: d = pd.to_datetime({'year':[2016], 'month':[6], 'day':[1]})
In [12]: d
Out[12]:
0 2016-06-01
dtype: datetime64[ns]
In [13]: d[0] # This is the Timestamp you are actually interested in subtracting
Out[13]: Timestamp('2016-06-01 00:00:00')
In [14]: dates = pd.date_range(start="2016-01-01", periods=4)
In [15]: dates - d[0]
Out[15]: TimedeltaIndex(['-152 days', '-151 days', '-150 days', '-149 days'], dtype='timedelta64[ns]', freq=None)
You can get the Timestamp more directly using the constructor:
In [21]: pd.Timestamp("2016-06-01")
Out[21]: Timestamp('2016-06-01 00:00:00')
*You should never use python's map with pandas, prefer .apply.

Convert unix time to readable date in pandas dataframe

I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.
These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object
If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))
Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.
The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')
Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.

Pandas date_range from DatetimeIndex to Date format

Pandas date_range returns a pandas.DatetimeIndex which has the indexes formatted as a timestamps (date plus time). For example:
In [114] rng=pandas.date_range('1/1/2013','1/31/2013',freq='D')
In [115] rng
Out [116]
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-31 00:00:00]
Length: 31, Freq: D, Timezone: None
Given I am not using timestamps in my application, I would like to convert this index to a date such that:
In [117] rng[0]
Out [118]
<Timestamp: 2013-01-02 00:00:00>
Will be in the form 2013-01-02.
I am using pandas version 0.9.1
to_pydatetime returns a NumPy array of Python datetime.datetime objects:
In [8]: dates = rng.to_pydatetime()
In [9]: print(dates[0])
2013-01-01 00:00:00
In [10]: print(dates[0].strftime('%Y-%m-%d'))
2013-01-01
For me the current answer is not satisfactory because internally it is still stored as a timestamp with hours, minutes, seconds.
Pandas version : 0.22.0
My solution has been to convert it to datetime.date:
In[30]: import pandas as pd
In[31]: rng = pd.date_range('1/1/2013','1/31/2013', freq='D')
In[32]: date_rng = rng.date # Here it becomes date
In[33]: date_rng[0]
Out[33]: datetime.date(2013, 1, 1)
In[34]: print(date_rng[0])
2013-01-01

How do I convert strings in a Pandas data frame to a 'date' data type?

I have a Pandas data frame, one of the column contains date strings in the format YYYY-MM-DD
For e.g. '2013-10-28'
At the moment the dtype of the column is object.
How do I convert the column values to Pandas date format?
Essentially equivalent to #waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst):
In [11]: df
Out[11]:
a time
0 1 2013-01-01
1 2 2013-01-02
2 3 2013-01-03
In [12]: pd.to_datetime(df['time'])
Out[12]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
Name: time, dtype: datetime64[ns]
In [13]: df['time'] = pd.to_datetime(df['time'])
In [14]: df
Out[14]:
a time
0 1 2013-01-01 00:00:00
1 2 2013-01-02 00:00:00
2 3 2013-01-03 00:00:00
Handling ValueErrors
If you run into a situation where doing
df['time'] = pd.to_datetime(df['time'])
Throws a
ValueError: Unknown string format
That means you have invalid (non-coercible) values. If you are okay with having them converted to pd.NaT, you can add an errors='coerce' argument to to_datetime:
df['time'] = pd.to_datetime(df['time'], errors='coerce')
Use astype
In [31]: df
Out[31]:
a time
0 1 2013-01-01
1 2 2013-01-02
2 3 2013-01-03
In [32]: df['time'] = df['time'].astype('datetime64[ns]')
In [33]: df
Out[33]:
a time
0 1 2013-01-01 00:00:00
1 2 2013-01-02 00:00:00
2 3 2013-01-03 00:00:00
I imagine a lot of data comes into Pandas from CSV files, in which case you can simply convert the date during the initial CSV read:
dfcsv = pd.read_csv('xyz.csv', parse_dates=[0]) where the 0 refers to the column the date is in.
You could also add , index_col=0 in there if you want the date to be your index.
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Now you can do df['column'].dt.date
Note that for datetime objects, if you don't see the hour when they're all 00:00:00, that's not pandas. That's iPython notebook trying to make things look pretty.
If you want to get the DATE and not DATETIME format:
df["id_date"] = pd.to_datetime(df["id_date"]).dt.date
Another way to do this and this works well if you have multiple columns to convert to datetime.
cols = ['date1','date2']
df[cols] = df[cols].apply(pd.to_datetime)
It may be the case that dates need to be converted to a different frequency. In this case, I would suggest setting an index by dates.
#set an index by dates
df.set_index(['time'], drop=True, inplace=True)
After this, you can more easily convert to the type of date format you will need most. Below, I sequentially convert to a number of date formats, ultimately ending up with a set of daily dates at the beginning of the month.
#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)
#Convert to monthly dates
df.index = df.index.to_period(freq='M')
#Convert to strings
df.index = df.index.strftime('%Y-%m')
#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)
For brevity, I don't show that I run the following code after each line above:
print(df.index)
print(df.index.dtype)
print(type(df.index))
This gives me the following output:
Index(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='object', name='time')
object
<class 'pandas.core.indexes.base.Index'>
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', name='time', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
PeriodIndex(['2013-01', '2013-01', '2013-01'], dtype='period[M]', name='time', freq='M')
period[M]
<class 'pandas.core.indexes.period.PeriodIndex'>
Index(['2013-01', '2013-01', '2013-01'], dtype='object')
object
<class 'pandas.core.indexes.base.Index'>
DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01'], dtype='datetime64[ns]', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
For the sake of completeness, another option, which might not be the most straightforward one, a bit similar to the one proposed by #SSS, but using rather the datetime library is:
import datetime
df["Date"] = df["Date"].apply(lambda x: datetime.datetime.strptime(x, '%Y-%d-%m').date())
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 startDay 110526 non-null object
1 endDay 110526 non-null object
import pandas as pd
df['startDay'] = pd.to_datetime(df.startDay)
df['endDay'] = pd.to_datetime(df.endDay)
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 startDay 110526 non-null datetime64[ns]
1 endDay 110526 non-null datetime64[ns]
Try to convert one of the rows into timestamp using the pd.to_datetime function and then use .map to map the formular to the entire column

pandas.to_datetime inconsistent time string format

I am attempting to convert the index of a pandas.DataFrame from string format to a datetime index, using pandas.to_datetime().
Import pandas:
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.10.1'
Create an example DataFrame:
In [3]: d = {'data' : pd.Series([1.,2.], index=['26/12/2012', '10/01/2013'])}
In [4]: df=pd.DataFrame(d)
Look at indices. Note that the date format is day/month/year:
In [5]: df.index
Out[5]: Index([26/12/2012, 10/01/2013], dtype=object)
Convert index to datetime:
In [6]: pd.to_datetime(df.index)
Out[6]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-10-01 00:00:00]
Length: 2, Freq: None, Timezone: None
Already at this stage, you can see that the date format for each entry has been formatted differently. The first is fine, the second has swapped month and day.
This is what I want to write, but avoiding the inconsistent formatting of date strings:
In [7]: df.set_index(pd.to_datetime(df.index))
Out[7]:
data
2012-12-26 1
2013-10-01 2
I guess the first entry is correct because the function 'knows' there aren't 26 months, and so does not choose the default month/day/year format.
Is there another/better way to do this? Can I pass the format into the to_datetime() function?
Thank you.
EDIT:
I have found a way to do this, without pandas.to_datetime:
import datetime.datetime as dt
date_string_list = df.index.tolist()
datetime_list = [ dt.strptime(date_string_list[x], '%d/%m/%Y') for x in range(len(date_string_list)) ]
df.index=datetime_list
but it's a bit messy. Any improvements welcome.
There are (hidden?) dayfirst argument to to_datetime:
In [23]: pd.to_datetime(df.index, dayfirst=True)
Out[23]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None
In pandas 0.11 (onwards) you'll be able to use the format argument:
In [24]: pd.to_datetime(df.index, format='%d/%m/%Y')
Out[24]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None

Categories