Pandas dataframe timedelta is giving exceptions - python

I am trying to get the next month first date based on billDate in a dataframe.
I did this:
import pandas as pd
import datetime
from datetime import timedelta
dt = pd.to_datetime('15/4/2019', errors='coerce')
print(dt)
print((dt.replace(day=1) + datetime.timedelta(days=32)).replace(day=1))
It is working perfectly, and the output is :
2019-04-15 00:00:00
2019-05-01 00:00:00
Now, I am applying same logic in my dataframe in the below code
df[comNewColName] = (pd.to_datetime(df['billDate'], errors='coerce').replace(day=1) + datetime.timedelta(days=32)).replace(day=1)
But I am getting error like this:
---> 69 df[comNewColName] = (pd.to_datetime(df['billDate'], errors='coerce').replace(day=1) + datetime.timedelta(days=32)).replace(day=1)
70 '''print(df[['billDate']])'''
71 '''df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))'''
TypeError: replace() got an unexpected keyword argument 'day'

You can use Series.to_period for month periods, add 1 for next month and then convert back to datetimes by Series.dt.to_timestamp:
print (df)
billDate
0 15/4/2019
1 30/4/2019
2 15/8/2019
df['billDate'] = (pd.to_datetime(df['billDate'], errors='coerce', dayfirst=True)
.dt.to_period('m')
.add(1)
.dt.to_timestamp())
print (df)
billDate
0 2019-05-01
1 2019-05-01
2 2019-09-01

Related

Convert mins:secs:milliseconds to proper time in python

I have an excel with such format:
I'm trying to get the 12.07.41am in python (I need to do all edits in python)
from datetime import timedelta, date, datetime
df['start'] = pd.to_datetime(df['start'])
df['start'] = pd.to_datetime(df['start'], format = '%y/%m/%d #H:%M:%S')
but is giving me error: hour must be in 0..23
Given a df of the form:
import pandas as pd
tin = [['07:04.2', '08:12.6'], ['12:14.2', "13:12.8"], ['07:24.0', '07:36.6'], ['09:14.2', "10:12.8"]]
df=pd.DataFrame(data=tin, columns=["start", 'end'])
which creates the df:
start end
0 07:04.2 08:12.6
1 12:14.2 13:12.8
2 07:24.0 07:36.6
3 09:14.2 10:12.8
You can convert the time data in start and end columns to datetime objects using:
df['start'] = [pd.to_datetime(t[0:t.index('.')] + f":{int(float(t[t.index('.'):])*60)}", format='%H:%M:%S') for t in df['start'].to_list()]
df['end'] = [pd.to_datetime(t[0:t.index('.')] + f":{int(float(t[t.index('.'):])*60)}", format='%H:%M:%S') for t in df['end'].to_list()]
Producing an updated df:
start end
0 1900-01-01 07:04:12 1900-01-01 08:12:36
1 1900-01-01 12:14:12 1900-01-01 13:12:48
2 1900-01-01 07:24:00 1900-01-01 07:36:36
3 1900-01-01 09:14:12 1900-01-01 10:12:48
1
​

Python: Create new column that counts the days between current date and a lag date

I want to create a function that counts the days as an integer between a date and the date shifted back a number of periods (e.g. df['new_col'] = (df['date'].shift(#periods)-df['date']). The date variable is datetime64[D].
As an example: df['report_date'].shift(39) = '2008-09-26' and df['report_date'] = '2008-08-18' and df['delta'] = 39.
import pandas as pd
from datetime import datetime
from datetime import timedelta
import datetime as dt
dates =pd.Series(np.tile(['2012-08-01','2012-08-15','2012-09-01','2012-08-15'],4)).astype('datetime64[D]')
dates2 =pd.Series(np.tile(['2012-08-01','2012-09-01','2012-10-01','2012-11-01'],4)).astype('datetime64[D]')
stocks = ['A','A','A','A','G','G','G','G','B','B','B','B','F','F','F','F']
stocks = pd.Series(stocks)
df = pd.DataFrame(dict(stocks = stocks, dates = dates,report_date = dates2)).reset_index()
df.head()
print('df info:',df.info())
The code below is my latest attempt to create this variable, but the code produces incorrect results.
df['delta'] = df.groupby(['stocks','dates'])['report_date'].transform(lambda x: (x.shift(1).rsub(x).dt.days))
I came up with the solution of using a for loop and zip function, to simply subtract each pair like so...
from datetime import datetime
import pandas as pd
dates = ['2012-08-01', '2012-08-15', '2012-09-01', '2012-08-15']
dates2 = ['2012-08-01', '2012-09-01', '2012-10-01', '2012-11-01']
diff = []
for i, x in zip(dates, dates2):
i = datetime.strptime(i, '%Y-%m-%d')
x = datetime.strptime(x, '%Y-%m-%d')
diff.append(i - x)
df = {'--col1--': dates, '--col2--': dates2, '--difference--': diff}
df = pd.DataFrame(df)
print(df)
Ouput:
--col1-- --col2-- --difference--
0 2012-08-01 2012-08-01 0 days
1 2012-08-15 2012-09-01 -17 days
2 2012-09-01 2012-10-01 -30 days
3 2012-08-15 2012-11-01 -78 days
Process finished with exit code 0
I hope that solves your problem.

Cannot find index of corresponding date in pandas DataFrame

I have the following DataFrame with a Date column,
0 2021-12-13
1 2021-12-10
2 2021-12-09
3 2021-12-08
4 2021-12-07
...
7990 1990-01-08
7991 1990-01-05
7992 1990-01-04
7993 1990-01-03
7994 1990-01-02
I am trying to find the index for a specific date in this DataFrame using the following code,
# import raw data into DataFrame
df = pd.DataFrame.from_records(data['dataset']['data'])
df.columns = data['dataset']['column_names']
df['Date'] = pd.to_datetime(df['Date'])
# sample date to search for
sample_date = dt.date(2021,12,13)
print(sample_date)
# return index of sample date
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
The output of the program is,
2021-12-13
[]
I can't understand why. I have cast the Date column in the DataFrame to a DateTime and I'm doing a like-for-like comparison.
I have reproduced your Dataframe with minimal samples. By changing the way that you can compare the date will work like this below.
import pandas as pd
import datetime as dt
df = pd.DataFrame({'Date':['2021-12-13','2021-12-10','2021-12-09','2021-12-08']})
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%Y-%m-%d')
sample_date = dt.datetime.strptime('2021-12-13', '%Y-%m-%d')
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
output:
[0]
The search data was in the index number 0 of the DataFrame
Please let me know if this one has any issues

Datetime difference between 2 columns with datetime/str - Python

I have a dataset - below
Create Complete
0 2005-01-02 01:15:00 2005-01-05 14:05:00
1 2005-01-06 00:00:00 open
I want to get the difference in minutes between the two using the below code. However as the 'complete' column also contains a string value, how can I get pandas to ign
df['diff_mins'] = df.Create - df.Complete
you can use pd.to_datetime for example:
import pandas as pd
df = pd.DataFrame([
['2005-01-02 01:15:00', '2005-01-05 14:05:00'],
['2005-01-06 00:00:00', 'open']],
columns=('Create', 'Complete')
)
and then:
df['diff_mins'] = (
pd.to_datetime(df.Create) - pd.to_datetime(df.Complete, errors='coerce')
)
to get the value in hours, just implement simple lambda function lambda x: x.total_seconds() / 60 / 60:
df['diff_mins_hours'] = (
pd.to_datetime(df.Create) - pd.to_datetime(df.Complete, errors='coerce')
).apply(lambda x: x.total_seconds() / 60 / 60)
give you:
print(df)
Create Complete diff_mins diff_mins_hours
0 2005-01-02 01:15:00 2005-01-05 14:05:00 -4 days +11:10:00 -84.833333
1 2005-01-06 00:00:00 open NaT NaN
I tried to do it using map. It should look something like this:
import datetime
def get_diff_mins(elem_a, elem_b):
if (elem_b=='open'):
elem_b = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
a = elem_a.replace(' ', '-').replace(':','-').split('-')
b = elem_b.replace(' ', '-').replace(':','-').split('-')
# Roughly converts yearly time to mins
# since month is always considered 30 days
f = [60*24*30*12, 60*24*30, 60*24, 60, 1, 0]
mins_a = sum([int(a)*f for a,f in zip(a,f)])
mins_b = sum([int(b)*f for b,f in zip(b,f)])
return mins_a-mins_b
df['diff_mins'] = map(get_diff_mins, df.Create, df.Complete)

Convert Dataframe column to time format in python

I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9
df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100

Categories