Difference of datetimes in hours, excluding the weekend - python

I currently have a dataframe, where an uniqueID has multiple dates in another column. I want extract the hours between each date, but ignore the weekend if the next date is after the weekend. For example, if today is friday at 12 pm,
and the following date is tuesday at 12 pm then the difference in hours between these two dates would be 48 hours.
Here is my dataset with the expected output:
df = pd.DataFrame({"UniqueID": ["A","A","A","B","B","B","C","C"],"Date":
["2018-12-07 10:30:00","2018-12-10 14:30:00","2018-12-11 17:30:00",
"2018-12-14 09:00:00","2018-12-18 09:00:00",
"2018-12-21 11:00:00","2019-01-01 15:00:00","2019-01-07 15:00:00"],
"ExpectedOutput": ["28.0","27.0","Nan","48.0","74.0","NaN","96.0","NaN"]})
df["Date"] = df["Date"].astype(np.datetime64)
This is what I have so far, but it includes the weekends:
df["date_diff"] = df.groupby(["UniqueID"])["Date"].apply(lambda x: x.diff()
/ np.timedelta64(1 ,'h')).shift(-1)
Thanks!

Idea is floor datetimes for remove times and get number of business days between start day + one day and shifted day to hours3 column by numpy.busday_count and then create hour1 and hour2 columns for start and end hours if not weekends hours. Last sum all hours columns together:
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values(['UniqueID','Date'])
df["shifted"] = df.groupby(["UniqueID"])["Date"].shift(-1)
df["hours1"] = df["Date"].dt.floor('d')
df["hours2"] = df["shifted"].dt.floor('d')
mask = df['shifted'].notnull()
f = lambda x: np.busday_count(x['hours1'] + pd.Timedelta(1, unit='d'), x['hours2'])
df.loc[mask, 'hours3'] = df[mask].apply(f, axis=1) * 24
mask1 = df['hours1'].dt.dayofweek < 5
hours1 = df['hours1'] + pd.Timedelta(1, unit='d') - df['Date']
df['hours1'] = np.where(mask1, hours1, np.nan) / np.timedelta64(1 ,'h')
mask1 = df['hours2'].dt.dayofweek < 5
df['hours2'] = np.where(mask1, df['shifted']-df['hours2'], np.nan) / np.timedelta64(1 ,'h')
df['date_diff'] = df['hours1'].fillna(0) + df['hours2'] + df['hours3']
print (df)
UniqueID Date ExpectedOutput shifted hours1 \
0 A 2018-12-07 10:30:00 28.0 2018-12-10 14:30:00 13.5
1 A 2018-12-10 14:30:00 27.0 2018-12-11 17:30:00 9.5
2 A 2018-12-11 17:30:00 Nan NaT 6.5
3 B 2018-12-14 09:00:00 48.0 2018-12-18 09:00:00 15.0
4 B 2018-12-18 09:00:00 74.0 2018-12-21 11:00:00 15.0
5 B 2018-12-21 11:00:00 NaN NaT 13.0
6 C 2019-01-01 15:00:00 96.0 2019-01-07 15:00:00 9.0
7 C 2019-01-07 15:00:00 NaN NaT 9.0
hours2 hours3 date_diff
0 14.5 0.0 28.0
1 17.5 0.0 27.0
2 NaN NaN NaN
3 9.0 24.0 48.0
4 11.0 48.0 74.0
5 NaN NaN NaN
6 15.0 72.0 96.0
7 NaN NaN NaN
First solution was removed with 2 reasons - was not accurate and slow:
np.random.seed(2019)
dates = pd.date_range('2015-01-01','2018-01-01', freq='H')
df = pd.DataFrame({"UniqueID": np.random.choice(list('ABCDEFGHIJ'), size=100),
"Date": np.random.choice(dates, size=100)})
print (df)
def old(df):
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values(['UniqueID','Date'])
df["shifted"] = df.groupby(["UniqueID"])["Date"].shift(-1)
def f(x):
a = pd.date_range(x['Date'], x['shifted'], freq='T')
return ((a.dayofweek < 5).sum() / 60).round()
mask = df['shifted'].notnull()
df.loc[mask, 'date_diff'] = df[mask].apply(f, axis=1)
return df
def new(df):
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values(['UniqueID','Date'])
df["shifted"] = df.groupby(["UniqueID"])["Date"].shift(-1)
df["hours1"] = df["Date"].dt.floor('d')
df["hours2"] = df["shifted"].dt.floor('d')
mask = df['shifted'].notnull()
f = lambda x: np.busday_count(x['hours1'] + pd.Timedelta(1, unit='d'), x['hours2'])
df.loc[mask, 'hours3'] = df[mask].apply(f, axis=1) * 24
mask1 = df['hours1'].dt.dayofweek < 5
hours1 = df['hours1'] + pd.Timedelta(1, unit='d') - df['Date']
df['hours1'] = np.where(mask1, hours1, np.nan) / np.timedelta64(1 ,'h')
mask1 = df['hours2'].dt.dayofweek < 5
df['hours2'] = np.where(mask1, df['shifted'] - df['hours2'], np.nan) / np.timedelta64(1 ,'h')
df['date_diff'] = df['hours1'].fillna(0) + df['hours2'] + df['hours3']
return df
print (new(df))
print (old(df))
In [44]: %timeit (new(df))
22.7 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [45]: %timeit (old(df))
1.01 s ± 8.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Related

Fetching Standard Meteorological Week from pandas dataframe date column

I have a pandas dataframe which is having long term data,
point_id issue_date latitude longitude rainfall
0 1.0 2020-01-01 6.5 66.50 NaN
1 2.0 2020-01-02 6.5 66.75 NaN
... ... ... ... ... ... ... ...
6373888 17414.0 2020-12-30 38.5 99.75 NaN
6373889 17415.0 2020-12-31 38.5 100.00 NaN
6373890 rows × 5 columns
I want to extract the Standard Meteorological Week from its issue_date column, as
given in this figure.
I have tried in 2 ways.
1st
lulc_gdf['smw'] = lulc_gdf['issue_date'].astype('datetime64[ns]').dt.strftime('%V')
2nd
lulc_gdf['iso'] = lulc_gdf['issue_date'].astype('datetime64[ns]').dt.isocalendar().week
The output in both cases is same
point_id issue_date latitude longitude rainfall smw iso
0 1.0 2020-01-01 6.5 66.50 NaN 01 1
1 2.0 2020-01-02 6.5 66.75 NaN 01 1
... ... ... ... ... ... ... ...
6373888 17414.0 2020-12-30 38.5 99.75 NaN 53 53
6373889 17415.0 2020-12-31 38.5 100.00 NaN 53 53
6373890 rows × 7 columns
The issue is that the week starts here by taking reference of Sunday or Monday as the starting day of week, irrespective of year.
Like here in case of year 2020 the day on 1st January is Wednesday (not Monday),
so the 1st week is of 5 days only i.e (Wed, Thu, Fri, Sat & Sunday).
year week day issue_date
0 2020 1 3 2020-01-01
1 2020 1 4 2020-01-02
2 2020 1 5 2020-01-03
3 2020 1 6 2020-01-04
... ... ... ...
6373889 2020 53 4 2020-12-31
But in the case of Standard Meteorological Weeks,
I want output as:
for every year
1st week should always be from - 1st January to 07th January
2nd week from - 8th January to 14th January
3rd week from - 15th January to 21st January
------------------------------- and so on
irrespective of the starting day of year (Sunday, monday etc).
How to do so?
Use:
df = pd.DataFrame({'issue_date': pd.date_range('2000-01-01','2000-12-31')})
#inspire https://stackoverflow.com/a/61592907/2901002
normal_year = np.append(np.arange(363) // 7 + 1, np.repeat(52, 5))
leap_year = np.concatenate((normal_year[:59], [9], normal_year[59:366]))
days = df['issue_date'].dt.dayofyear
df['smw'] = np.where(df['issue_date'].dt.is_leap_year,
leap_year[days - 1],
normal_year[days - 1])
print (df[df['smw'] == 9])
issue_date smw
56 2000-02-26 9
57 2000-02-27 9
58 2000-02-28 9
59 2000-02-29 9
60 2000-03-01 9
61 2000-03-02 9
62 2000-03-03 9
63 2000-03-04 9
Performance:
#11323 rows
df = pd.DataFrame({'issue_date': pd.date_range('2000-01-01','2030-12-31')})
In [6]: %%timeit
...: normal_year = np.append(np.arange(363) // 7 + 1, np.repeat(52, 5))
...: leap_year = np.concatenate((normal_year[:59], [9], normal_year[59:366]))
...: days = df['issue_date'].dt.dayofyear
...:
...: df['smw'] = np.where(df['issue_date'].dt.is_leap_year, leap_year[days - 1], normal_year[days - 1])
...:
3.51 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: %%timeit
...: df['smw1'] = get_smw(df['issue_date'])
...:
17.2 ms ± 312 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#51500 rows
df = pd.DataFrame({'issue_date': pd.date_range('1900-01-01','2040-12-31')})
In [9]: %%timeit
...: normal_year = np.append(np.arange(363) // 7 + 1, np.repeat(52, 5))
...: leap_year = np.concatenate((normal_year[:59], [9], normal_year[59:366]))
...: days = df['issue_date'].dt.dayofyear
...:
...: df['smw'] = np.where(df['issue_date'].dt.is_leap_year, leap_year[days - 1], normal_year[days - 1])
...:
...:
11.9 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: %%timeit
...: df['smw1'] = get_smw(df['issue_date'])
...:
...:
64.3 ms ± 483 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
You can write a custom function to calculate the Standard Meteorological Weeks.
Normal calculation is by taking the difference in number of days from 1st January of the same year, then divide by 7 and add 1.
Special adjustment for leap year on Week No. 9 to have 8 days and also special adjustment for the last week of the year to have 8 days:
import numpy as np
# convert to datetime format if not already in datetime
df['issue_date'] = pd.to_datetime(df['issue_date'])
def get_smw(date_s):
# get day-of-the-year minus 1 in range [0..364/365] for division by 7
days_diff = date_s.dt.dayofyear - 1
# adjust for leap year on Week No. 9 to have 8 days: (minus one day for 29 Feb onwards in the same year)
leap_adj = date_s.dt.is_leap_year & (date_s > pd.to_datetime(date_s.dt.year.astype(str) + '-02-28'))
days_diff = np.where(leap_adj, days_diff - 1, days_diff)
# adjust for the last week of the year to have 8 days:
# Make the value for 31 Dec to 363 instead of 364 to keep it in the same week of 24 Dec)
days_diff = np.clip(days_diff, 0, 363)
smw = days_diff // 7 + 1
return smw
df['smw'] = get_smw(df['issue_date'])
Result:
print(df)
point_id issue_date latitude longitude rainfall smw
0 1.0 2020-01-01 6.5 66.50 NaN 1
1 2.0 2020-01-02 6.5 66.75 NaN 1
2 3.0 2020-01-03 6.5 66.75 NaN 1
3 4.0 2020-01-04 6.5 66.75 NaN 1
4 5.0 2020-01-05 6.5 66.75 NaN 1
5 6.0 2020-01-06 6.5 66.75 NaN 1
6 7.0 2020-01-07 6.5 66.75 NaN 1
7 8.0 2020-01-08 6.5 66.75 NaN 2
8 9.0 2020-01-09 6.5 66.75 NaN 2
40 40.0 2020-02-26 6.5 66.75 NaN 9
41 41.0 2020-03-04 6.5 66.75 NaN 9
42 42.0 2020-03-05 6.5 66.75 NaN 10
43 43.0 2020-03-12 6.5 66.75 NaN 11
6373880 17414.0 2020-12-23 38.5 99.75 NaN 51
6373881 17414.0 2020-12-24 38.5 99.75 NaN 52
6373888 17414.0 2020-12-30 38.5 99.75 NaN 52
6373889 17415.0 2020-12-31 38.5 100.00 NaN 52
7000040 40.0 2021-02-26 6.5 66.75 NaN 9
7000041 41.0 2021-03-04 6.5 66.75 NaN 9
7000042 42.0 2021-03-05 6.5 66.75 NaN 10
7000042 43.0 2021-03-12 6.5 66.75 NaN 11
7373880 17414.0 2021-12-23 38.5 99.75 NaN 51
7373881 17414.0 2021-12-24 38.5 99.75 NaN 52
7373888 17414.0 2021-12-30 38.5 99.75 NaN 52
7373889 17415.0 2021-12-31 38.5 100.00 NaN 52

Is there a faster way to read following pandas dataframe?

I have a huge .csv file(2.3G) which I have to read into pandas dataframe.
start_date,wind_90.0_0.0,wind_90.0_5.0,wind_87.5_2.5
1948-01-01,15030.64,15040.64,16526.35
1948-01-02,15050.14,15049.28,16526.28
1948-01-03,15076.71,15075.0,16525.28
I want to process above data into below structure:
start_date lat lon wind
0 1948-01-01 90.0 0.0 15030.64
1 1948-01-01 90.0 5.0 15040.64
2 1948-01-01 87.5 2.5 16526.35
3 1948-01-02 90.0 0.0 15050.14
4 1948-01-02 90.0 5.0 15049.28
5 1948-01-02 87.5 2.5 16526.28
6 1948-01-03 90.0 0.0 15076.71
7 1948-01-03 90.0 5.0 15075.0
8 1948-01-03 87.5 2.5 16525.28
Code I have so far which does what I want but is too slow and takes up a lot of memory.
def load_data_as_pandas(fileName, featureName):
df = pd.read_csv(fileName)
df = pd.melt(df, id_vars = df.columns[0])
df['lat'] = df['variable'].str.split('_').str[-2]
df['lon'] = df['variable'].str.split('_').str[-1]
df = df.drop('variable', axis=1)
df.columns = ['start_date', featureName,'lat','lon']
df = df.groupby(['start_date','lat','lon']).first()
df = df.reset_index()
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d', errors='coerce')
return df
This should spead up your code:
We can use melt to unpivot your data from wide to long. Then we use str.split on the column name (values) and use expand=True to get a new column for each split. Finally we join these newly created columns back to our original dataframe:
melt = df.melt(id_vars='start_date').sort_values('start_date').reset_index(drop=True)
newcols = melt['variable'].str.split('_', expand=True).iloc[:, 1:].rename(columns={1:'lat', 2:'lon'})
final = melt.drop(columns='variable').join(newcols)
Output
start_date value lat lon
0 1948-01-01 15030.64 90.0 0.0
1 1948-01-01 15040.64 90.0 5.0
2 1948-01-01 16526.35 87.5 2.5
3 1948-01-02 15050.14 90.0 0.0
4 1948-01-02 15049.28 90.0 5.0
5 1948-01-02 16526.28 87.5 2.5
6 1948-01-03 15076.71 90.0 0.0
7 1948-01-03 15075.00 90.0 5.0
8 1948-01-03 16525.28 87.5 2.5
Timeit test on 800k rows:
3.55 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Localize each row of pandas timestamp index separately

I have a timeseries consisting of a list of dicts as follows:
for i in range(10):
d = {
'ts': i,
'ts_offset': 6 * 60 * 60,
'value': 1234.0
}
if i >= 5:
d['ts_offset'] = 12 * 60 * 60
data.append(d)
frame = pd.DataFrame(data)
frame.index = pd.to_datetime(frame.ts, unit='s')
ts ts_offset value
ts
1970-01-01 00:00:00 0 21600 1234.0
1970-01-01 00:00:01 1 21600 1234.0
1970-01-01 00:00:02 2 21600 1234.0
1970-01-01 00:00:03 3 21600 1234.0
1970-01-01 00:00:04 4 21600 1234.0
1970-01-01 00:00:05 5 43200 1234.0
1970-01-01 00:00:06 6 43200 1234.0
1970-01-01 00:00:07 7 43200 1234.0
1970-01-01 00:00:08 8 43200 1234.0
1970-01-01 00:00:09 9 43200 1234.0
The index is a timestamp plus a localization dependant offset (in seconds). As you can see, my use case is that the offset may change at any point during the timeseries. I would like to convert this construct to a series where the index is a localized pd.TimeSeriesIndex, but so far, i was only able to find localization functions that worked on the entire index.
Is anybody aware of an efficient method to convert each index with a (possibly) separate timezone? The series can consist of up to a few thousand rows and this function would be called a lot, so i would like to vectorize as much as possible.
Edit:
I took the liberty of timing FLabs grouping solution vs a simple python loop with the following script:
import pandas as pd
import numpy as np
import datetime
def to_series1(data, metric):
idx = []
values = []
for i in data:
tz = datetime.timezone(datetime.timedelta(seconds=i["ts_offset"]))
idx.append(pd.Timestamp(i["ts"] * 10**9, tzinfo=tz))
values.append(np.float(i["value"]))
series = pd.Series(values, index=idx, name=metric)
return series
def to_series2(data, metric):
frame = pd.DataFrame(data)
frame.index = pd.to_datetime(frame.ts, unit='s', utc=True)
grouped = frame.groupby('ts_offset')
out = {}
for name, group in grouped:
out[name] = group
tz = datetime.timezone(datetime.timedelta(seconds=name))
out[name].index = out[name].index.tz_convert(tz)
out = pd.concat(out, axis=0).sort_index(level='ts')
out.index = out.index.get_level_values('ts')
series = out.value
series.name = metric
series.index.name = None
return series
metric = 'bla'
data = []
for i in range(100000):
d = {
'ts': i,
'ts_offset': 6 * 60 * 60,
'value': 1234.0
}
if i >= 50000:
d['ts_offset'] = 12 * 60 * 60
data.append(d)
%timeit to_series1(data, metric)
%timeit to_series2(data, metric)
The results were as follows:
2.59 s ± 113 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.03 s ± 125 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So i'm still open for suggestions that are possibly faster.
You can use groupby ts_offset, so that you can apply a single offset to a dataframe (vectorised operation):
grouped = frame.groupby('ts_offset')
out = {}
for name, group in grouped:
print(name)
out[name] = group
out[name].index = out[name].index + pd.DateOffset(seconds=name)
out = pd.concat(out, axis=0, names=['offset', 'ts']).sort_index(level='ts')
Showing the applied offset just to verify the results, you have:
Out[17]:
ts ts_offset value
ts
21600 1970-01-01 06:00:00 0 21600 1234.0
1970-01-01 06:00:01 1 21600 1234.0
1970-01-01 06:00:02 2 21600 1234.0
1970-01-01 06:00:03 3 21600 1234.0
1970-01-01 06:00:04 4 21600 1234.0
43200 1970-01-01 12:00:05 5 43200 1234.0
1970-01-01 12:00:06 6 43200 1234.0
1970-01-01 12:00:07 7 43200 1234.0
1970-01-01 12:00:08 8 43200 1234.0
1970-01-01 12:00:09 9 43200 1234.0
Finally, you can remove the first index:
out.index = out.index.get_level_values('ts')

How to get date after subtracting days in pandas

I have a dataframe:
In [15]: df
Out[15]:
date day
0 2015-10-10 23
1 2015-12-19 9
2 2016-03-05 34
3 2016-09-17 23
4 2016-04-30 2
I want to subtract the number of days from the date and create a new column.
In [16]: df.dtypes
Out[16]:
date datetime64[ns]
day int64
Desired output something like:
In [15]: df
Out[15]:
date day date1
0 2015-10-10 23 2015-09-17
1 2015-12-19 9 2015-12-10
2 2016-03-05 34 2016-01-29
3 2016-09-17 23 2016-08-25
4 2016-04-30 2 2016-04-28
I tried but this does not work:
df['date1']=df['date']+pd.Timedelta(df['date'].dt.day-df['day'])
it throws error :
TypeError: unsupported type for timedelta days component: Series
You can use to_timedelta:
df['date1'] = df['date'] - pd.to_timedelta(df['day'], unit='d')
print (df)
date day date1
0 2015-10-10 23 2015-09-17
1 2015-12-19 9 2015-12-10
2 2016-03-05 34 2016-01-31
3 2016-09-17 23 2016-08-25
4 2016-04-30 2 2016-04-28
If need Timedelta use apply, but it is slower:
df['date1'] = df['date'] - df.day.apply(lambda x: pd.Timedelta(x, unit='D'))
print (df)
date day date1
0 2015-10-10 23 2015-09-17
1 2015-12-19 9 2015-12-10
2 2016-03-05 34 2016-01-31
3 2016-09-17 23 2016-08-25
4 2016-04-30 2 2016-04-28
Timings:
#[5000 rows x 2 columns]
df = pd.concat([df]*1000).reset_index(drop=True)
In [252]: %timeit df['date'] - df.day.apply(lambda x: pd.Timedelta(x, unit='D'))
10 loops, best of 3: 45.3 ms per loop
In [253]: %timeit df['date'] - pd.to_timedelta(df['day'], unit='d')
1000 loops, best of 3: 1.71 ms per loop
import dateutil.relativedelta
def calculate diff(v):
return v['date'] - dateutil.relativedelta.relativedelta(day=v['day'])
df['date1']=df.apply(calculate_diff, axis=1)
given that v['date'] is datetime object

Pandas resample bug?

Trying to down sample of 8 weekly time points to 2 points, each represents the average over 4 weeks, I use resample(). I started by defining the rule using (60*60*24*7*4) seconds, and saw I ended up in 3 time points, latest one is dummy. Started to check it, I noticed that if I define the rule as 4W or 28D it's fine, but going down to 672H or smaller units (minutes, seconds,..) the extra faked column appears. This testing code:
import numpy as np
import pandas as pd
d = np.arange(16).reshape(2, 8)
res = []
for month in range(1,13):
start_date = str(month) + '/1/2014'
df = pd.DataFrame(data=d, index=['A', 'B'], columns=pd.date_range(start_date, periods=8, freq='7D'))
print(df, '\n')
dfw = df.resample(rule='4W', how='mean', axis=1, closed='left', label='left')
print('4 Weeks:\n', dfw, '\n')
dfd = df.resample(rule='28D', how='mean', axis=1, closed='left', label='left')
print('28 Days:\n', dfd, '\n')
dfh = df.resample(rule='672H', how='mean', axis=1, closed='left', label='left')
print('672 Hours:\n', dfh, '\n')
dfm = df.resample(rule='40320T', how='mean', axis=1, closed='left', label='left')
print('40320 Minutes:\n', dfm, '\n')
dfs = df.resample(rule='2419200S', how='mean', axis=1, closed='left', label='left')
print('2419200 Seconds:\n', dfs, '\n')
res.append(([start_date], dfh.shape[1] == dfd.shape[1]))
print('\n\n--------------------------\n\n')
[print(res[i]) for i in range(12)]
pass
is printed as (I pasted here only the printout of the last iteration):
2014-11-01 2014-11-29 2014-12-27
A 1.5 5.5 NaN
B 9.5 13.5 NaN
2014-12-01 2014-12-08 2014-12-15 2014-12-22 2014-12-29 2015-01-05 \
A 0 1 2 3 4 5
B 8 9 10 11 12 13
2015-01-12 2015-01-19
A 6 7
B 14 15
4 Weeks:
2014-11-30 2014-12-28
A 1.5 5.5
B 9.5 13.5
28 Days:
2014-12-01 2014-12-29
A 1.5 5.5
B 9.5 13.5
672 Hours:
2014-12-01 2014-12-29 2015-01-26
A 1.5 5.5 NaN
B 9.5 13.5 NaN
40320 Minutes:
2014-12-01 2014-12-29 2015-01-26
A 1.5 5.5 NaN
B 9.5 13.5 NaN
2419200 Seconds:
2014-12-01 2014-12-29 2015-01-26
A 1.5 5.5 NaN
B 9.5 13.5 NaN
--------------------------
(['1/1/2014'], False)
(['2/1/2014'], True)
(['3/1/2014'], True)
(['4/1/2014'], True)
(['5/1/2014'], False)
(['6/1/2014'], False)
(['7/1/2014'], False)
(['8/1/2014'], False)
(['9/1/2014'], False)
(['10/1/2014'], False)
(['11/1/2014'], False)
(['12/1/2014'], False)
So there is an error for date_range starting on beginning of 9 months, and no error for 3 months (February-April). Either I miss something or it's a bug, is it?
Thanks #DSM and #Andy, indeed I had pandas 0.15.1, upgrading to latest 0.15.2 solved it

Categories