Python Pandas business day range bdate_range doesn't take 1min freq? - python

I am trying to use bdate_range with '1min' freq to get minute by minute data on all business days.
df = pd.bdate_range('20130101 9:30','20130106 16:00',freq='1min')
with output ends with
......
2013-01-05 23:59:00
2013-01-06 00:00:00
In [158]:
Notice that 2013-01-05 and 2013-01-06 are weekends and it didn't take time limit between 9:30 and 16:00
I think the freq = '1min' totally overwrites freq = 'B' from function name bdate_range
I also tried using date_range. It worked for the time range from 9:30 to 16:00, but it can't exclude weekends.
Thanks!

You could do it like this
In [28]: rng = pd.date_range('2012-01-01', '2013-01-01', freq="1min")
In [29]: rng
Out[29]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-01 00:00:00, ..., 2013-01-01 00:00:00]
Length: 527041, Freq: T, Timezone: None
Limit the times that I want
In [30]: x = rng[rng.indexer_between_time('9:30','16:00')]
In [31]: x
Out[31]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-01 09:30:00, ..., 2012-12-31 16:00:00]
Length: 143106, Freq: None, Timezone: None
Only days that are mon-fri
In [32]: x = x[x.dayofweek<5]
In [33]: x
Out[33]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-02 09:30:00, ..., 2012-12-31 16:00:00]
Length: 102051, Freq: None, Timezone: None

Related

Pandas DataFrame index - month and day only

I'd like to have a DataFrame with a DatetimeIndex, but I only want the months and days; not years. I'd like it to look like the following:
(index) (values)
01-01 56.2
01-02 59.6
...
01-31 62.3
02-01 61.6
...
12-31 44.0
I've tried creating a date_range but this seems to require the year input, so I can't seem to figure out how to achieve the above.
you can do it this way:
In [78]: df = pd.DataFrame({'val':np.random.rand(10)}, index=pd.date_range('2000-01-01', freq='10D', periods=10))
In [79]: df
Out[79]:
val
2000-01-01 0.422023
2000-01-11 0.215800
2000-01-21 0.186017
2000-01-31 0.804285
2000-02-10 0.014004
2000-02-20 0.296644
2000-03-01 0.048683
2000-03-11 0.239037
2000-03-21 0.129382
2000-03-31 0.963110
In [80]: df.index.dtype_str
Out[80]: 'datetime64[ns]'
In [81]: df.index.dtype
Out[81]: dtype('<M8[ns]')
In [82]: df.index = df.index.strftime('%m-%d')
In [83]: df
Out[83]:
val
01-01 0.422023
01-11 0.215800
01-21 0.186017
01-31 0.804285
02-10 0.014004
02-20 0.296644
03-01 0.048683
03-11 0.239037
03-21 0.129382
03-31 0.963110
In [84]: df.index.dtype_str
Out[84]: 'object'
In [85]: df.index.dtype
Out[85]: dtype('O')
NOTE: the index dtype is a string (object) now
PS of course you can do it in one step if you nedd:
In [86]: pd.date_range('2000-01-01', freq='10D', periods=5).strftime('%m-%d')
Out[86]:
array(['01-01', '01-11', '01-21', '01-31', '02-10'],
dtype='<U5')

python re-sample at a uniform semiannual period (equivaent of 'BQ' in pandas resample)

is there an 'BQ' equivalent semiannual resample in python? i didnt find it here
http://pandas.pydata.org/pandas-docs/dev/timeseries.html#up-and-downsampling
i've a set of records, some of them follow jun-dec, some jan-jul, some feb-auh etc. how do i resample all of them to jun-dec (concurrent for jun-dec, and following jun/dec for other records?
Thank you.
How about '2BQ'?
In [57]: ts = pd.Series(range(1000), index=pd.date_range('2000-4-15', periods=1000))
In [58]: ts.resample('2BQ', how='sum')
Out[58]:
2000-06-30 2926
2000-12-29 30485
2001-06-29 63609
2001-12-31 98605
2002-06-28 127985
2002-12-31 166935
2003-06-30 8955
Freq: 2BQ-DEC, dtype: int64
The 2 Quarter offset will be based on the first timestamp in the series, so if your data happens to start in Jan-Mar or Jun-Sep, the anchor will be wrong. One way to fix it would be to fill a dummy date at the beginning of the series so the anchor is right.
ts = pd.Series(range(1000), index=pd.date_range('2000-3-15', periods=1000))
from datetime import datetime
if ts.index[0].month in [1,2,3]:
ts.loc[datetime(ts.index[0].year - 1, 12, 1)] = np.nan
elif ts.index[0].month in [7,8,9]:
ts.loc[datetime(ts.index[0].year, 6, 1)] = np.nan
Should give the right answer (and can drop the first entry).
In [85]: ts.resample('2BQ', how='sum')
Out[85]:
1999-12-31 NaN
2000-06-30 5778
2000-12-29 36127
2001-06-29 69251
2001-12-31 104340
2002-06-28 133534
2002-12-31 150470
Freq: 2BQ-DEC, dtype: float64

Converting time zone pandas dataframe

I have data:
Symbol bid ask
Timestamp
2014-01-01 21:55:34.378000 EUR/USD 1.37622 1.37693
2014-01-01 21:55:40.410000 EUR/USD 1.37624 1.37698
2014-01-01 21:55:47.210000 EUR/USD 1.37619 1.37696
2014-01-01 21:55:57.963000 EUR/USD 1.37616 1.37696
2014-01-01 21:56:03.117000 EUR/USD 1.37616 1.37694
The timestamp is in GMT. Is there a way to convert that to Eastern?
Note when I do:
data.index
I get output:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 21:55:34.378000, ..., 2014-01-01 21:56:03.117000]
Length: 5, Freq: None, Timezone: None
Localize the index (using tz_localize) to UTC (to make the Timestamps timezone-aware) and then convert to Eastern (using tz_convert):
import pytz
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
For example:
import pandas as pd
import pytz
index = pd.date_range('20140101 21:55', freq='15S', periods=5)
df = pd.DataFrame(1, index=index, columns=['X'])
print(df)
# X
# 2014-01-01 21:55:00 1
# 2014-01-01 21:55:15 1
# 2014-01-01 21:55:30 1
# 2014-01-01 21:55:45 1
# 2014-01-01 21:56:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 21:55:00, ..., 2014-01-01 21:56:00]
# Length: 5, Freq: 15S, Timezone: None
eastern = pytz.timezone('US/Eastern')
df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern)
print(df)
# X
# 2014-01-01 16:55:00-05:00 1
# 2014-01-01 16:55:15-05:00 1
# 2014-01-01 16:55:30-05:00 1
# 2014-01-01 16:55:45-05:00 1
# 2014-01-01 16:56:00-05:00 1
# [5 rows x 1 columns]
print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2014-01-01 16:55:00-05:00, ..., 2014-01-01 16:56:00-05:00]
# Length: 5, Freq: 15S, Timezone: US/Eastern
The simplest way is to use to_datetime with utc=True:
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'],
utc=True)
For more flexibility, you can convert timezones with tz_convert(). If your data column/index is not timezone-aware, you will get a warning, and should first make the data timezone-aware with tz_localize.
df = pd.DataFrame({'Symbol': ['EUR/USD'] * 5,
'bid': [1.37622, 1.37624, 1.37619, 1.37616, 1.37616],
'ask': [1.37693, 1.37698, 1.37696, 1.37696, 1.37694]})
df.index = pd.to_datetime(['2014-01-01 21:55:34.378000',
'2014-01-01 21:55:40.410000',
'2014-01-01 21:55:47.210000',
'2014-01-01 21:55:57.963000',
'2014-01-01 21:56:03.117000'])
df.index = df.index.tz_localize('GMT')
df.index = df.index.tz_convert('America/New_York')
This also works similarly for datetime columns, but you need dt after accessing the column:
df['column'] = df['column'].dt.tz_convert('America/New_York')
To convert EST time into Asia tz
df.index = data.index.tz_localize('EST')
df.index = data.index.tz_convert('Asia/Kolkata')
Pandas has now inbuilt tz conversion ability.

Get MM-DD-YYYY from pandas Timestamp

dates seem to be a tricky thing in python, and I am having a lot of trouble simply stripping the date out of the pandas TimeStamp. I would like to get from 2013-09-29 02:34:44 to simply 09-29-2013
I have a dataframe with a column Created_date:
Name: Created_Date, Length: 1162549, dtype: datetime64[ns]`
I have tried applying the .date() method on this Series, eg: df.Created_Date.date(), but I get the error AttributeError: 'Series' object has no attribute 'date'
Can someone help me out?
map over the elements:
In [239]: from operator import methodcaller
In [240]: s = Series(date_range(Timestamp('now'), periods=2))
In [241]: s
Out[241]:
0 2013-10-01 00:24:16
1 2013-10-02 00:24:16
dtype: datetime64[ns]
In [238]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[238]:
0 01-10-2013
1 02-10-2013
dtype: object
In [242]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[242]:
0 01-10-2013
1 02-10-2013
dtype: object
You can get the raw datetime.date objects by calling the date() method of the Timestamp elements that make up the Series:
In [249]: s.map(methodcaller('date'))
Out[249]:
0 2013-10-01
1 2013-10-02
dtype: object
In [250]: s.map(methodcaller('date')).values
Out[250]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
Yet another way you can do this is by calling the unbound Timestamp.date method:
In [273]: s.map(Timestamp.date)
Out[273]:
0 2013-10-01
1 2013-10-02
dtype: object
This method is the fastest, and IMHO the most readable. Timestamp is accessible in the top-level pandas module, like so: pandas.Timestamp. I've imported it directly for expository purposes.
The date attribute of DatetimeIndex objects does something similar, but returns a numpy object array instead:
In [243]: index = DatetimeIndex(s)
In [244]: index
Out[244]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-10-01 00:24:16, 2013-10-02 00:24:16]
Length: 2, Freq: None, Timezone: None
In [246]: index.date
Out[246]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
For larger datetime64[ns] Series objects, calling Timestamp.date is faster than operator.methodcaller which is slightly faster than a lambda:
In [263]: f = methodcaller('date')
In [264]: flam = lambda x: x.date()
In [265]: fmeth = Timestamp.date
In [266]: s2 = Series(date_range('20010101', periods=1000000, freq='T'))
In [267]: s2
Out[267]:
0 2001-01-01 00:00:00
1 2001-01-01 00:01:00
2 2001-01-01 00:02:00
3 2001-01-01 00:03:00
4 2001-01-01 00:04:00
5 2001-01-01 00:05:00
6 2001-01-01 00:06:00
7 2001-01-01 00:07:00
8 2001-01-01 00:08:00
9 2001-01-01 00:09:00
10 2001-01-01 00:10:00
11 2001-01-01 00:11:00
12 2001-01-01 00:12:00
13 2001-01-01 00:13:00
14 2001-01-01 00:14:00
...
999985 2002-11-26 10:25:00
999986 2002-11-26 10:26:00
999987 2002-11-26 10:27:00
999988 2002-11-26 10:28:00
999989 2002-11-26 10:29:00
999990 2002-11-26 10:30:00
999991 2002-11-26 10:31:00
999992 2002-11-26 10:32:00
999993 2002-11-26 10:33:00
999994 2002-11-26 10:34:00
999995 2002-11-26 10:35:00
999996 2002-11-26 10:36:00
999997 2002-11-26 10:37:00
999998 2002-11-26 10:38:00
999999 2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]
In [269]: timeit s2.map(f)
1 loops, best of 3: 1.04 s per loop
In [270]: timeit s2.map(flam)
1 loops, best of 3: 1.1 s per loop
In [271]: timeit s2.map(fmeth)
1 loops, best of 3: 968 ms per loop
Keep in mind that one of the goals of pandas is to provide a layer on top of numpy so that (most of the time) you don't have to deal with the low level details of the ndarray. So getting the raw datetime.date objects in an array is of limited use since they don't correspond to any numpy.dtype that is supported by pandas (pandas only supports datetime64[ns] [that's nanoseconds] dtypes). That said, sometimes you need to do this.
Maybe this only came in recently, but there are built-in methods for this. Try:
In [27]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=2))
In [28]: s
Out[28]:
0 2016-02-11 19:11:43.386016
1 2016-02-12 19:11:43.386016
dtype: datetime64[ns]
In [29]: s.dt.to_pydatetime()
Out[29]:
array([datetime.datetime(2016, 2, 11, 19, 11, 43, 386016),
datetime.datetime(2016, 2, 12, 19, 11, 43, 386016)], dtype=object)
You can try using .dt.date on datetime64[ns] of the dataframe.
For e.g. df['Created_date'] = df['Created_date'].dt.date
Input dataframe named as test_df:
print(test_df)
Result:
Created_date
0 2015-03-04 15:39:16
1 2015-03-22 17:36:49
2 2015-03-25 22:08:45
3 2015-03-16 13:45:20
4 2015-03-19 18:53:50
Checking dtypes:
print(test_df.dtypes)
Result:
Created_date datetime64[ns]
dtype: object
Extracting date and updating Created_date column:
test_df['Created_date'] = test_df['Created_date'].dt.date
print(test_df)
Result:
Created_date
0 2015-03-04
1 2015-03-22
2 2015-03-25
3 2015-03-16
4 2015-03-19
well I would do this way.
pdTime =pd.date_range(timeStamp, periods=len(years), freq="D")
pdTime[i].strftime('%m-%d-%Y')

Pandas date_range from DatetimeIndex to Date format

Pandas date_range returns a pandas.DatetimeIndex which has the indexes formatted as a timestamps (date plus time). For example:
In [114] rng=pandas.date_range('1/1/2013','1/31/2013',freq='D')
In [115] rng
Out [116]
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-31 00:00:00]
Length: 31, Freq: D, Timezone: None
Given I am not using timestamps in my application, I would like to convert this index to a date such that:
In [117] rng[0]
Out [118]
<Timestamp: 2013-01-02 00:00:00>
Will be in the form 2013-01-02.
I am using pandas version 0.9.1
to_pydatetime returns a NumPy array of Python datetime.datetime objects:
In [8]: dates = rng.to_pydatetime()
In [9]: print(dates[0])
2013-01-01 00:00:00
In [10]: print(dates[0].strftime('%Y-%m-%d'))
2013-01-01
For me the current answer is not satisfactory because internally it is still stored as a timestamp with hours, minutes, seconds.
Pandas version : 0.22.0
My solution has been to convert it to datetime.date:
In[30]: import pandas as pd
In[31]: rng = pd.date_range('1/1/2013','1/31/2013', freq='D')
In[32]: date_rng = rng.date # Here it becomes date
In[33]: date_rng[0]
Out[33]: datetime.date(2013, 1, 1)
In[34]: print(date_rng[0])
2013-01-01

Categories