pandas difference between 2 dates - python

I am trying to find the day difference between today, and dates in my dataframe.
Below is my conversion of dates in my dataframe
df['Date']=pd.to_datetime(df['Date'])
Below is my code to get today
today1=dt.datetime.today().strftime('%Y-%m-%d')
today1=pd.to_datetime(today1)
Both are converted to pandas.to_datetime, but when I do subtraction, the below error came out.
ValueError: Cannot add integral value to Timestamp without offset.
Can someone help to advise? Thanks!

This is a simple example how you can do this:
import pandas
import datetime as dt
First, you have to get today.
today1=dt.datetime.today().strftime('%Y-%m-%d')
today1=pd.to_datetime(today1)
Then, you can construct the data frame:
df = pandas.DataFrame({'Date':'2016-11-24 11:03:10.050000', 'today1': today1 }, index = [0])
In this example I just have 2 columns, each with one value.
Next, you should check the data types:
print(df.dtypes)
Date datetime64[ns]
today1 datetime64[ns]
If both data types are datetime64[ns], you can then subtract df.Date from df.today1.
print(df.today1 - df.Date)
The output:
0 19 days 12:56:49.950000
dtype: timedelta64[ns]

Related

Create date from one year with string and int error - PYTHON

I have the following problem. I want to create a date from another. To do this, I extract the year from the database date and then create the chosen date (day = 30 and month = 9) being the year extracted from the database.
The code is the following
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
But error message is this
"cannot convert the series to <class 'int'>"
I think dt mean datetime, so the line 'dt.datetime(y,m,d)' create datetime object type.
bbdd20Q3['mydate'] should get int?
If so, try to think of another way to store the date (8 numbers maybe).
hope I helped :)
I assume that you did import datetime as dt then by doing:
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
You are delivering series as first argument to datetime.datetime, when it excepts int or something which can be converted to int. You should create one datetime.datetime for each element of series not single datetime.datetime, consider following example
import datetime
import pandas as pd
df = pd.DataFrame({"year":[2001,2002,2003]})
df["day"] = df["year"].apply(lambda x:datetime.datetime(x,9,30))
print(df)
Output:
year day
0 2001 2001-09-30
1 2002 2002-09-30
2 2003 2003-09-30
Here's a sample code with the required logic -
import pandas as pd
df = pd.DataFrame.from_dict({'date': ['2019-12-14', '2020-12-15']})
print(df.dtypes)
# convert the date in string format to datetime object,
# if the date column(Series) is already a datetime object then this is not required
df['date'] = pd.to_datetime(df['date'])
print(f'after conversion \n {df.dtypes}')
# logic to create a new data column
df['new_date'] = pd.to_datetime({'year':df['date'].dt.year,'month':9,'day':30})
#eollon I see that you are also new to Stack Overflow. It would be better if you can add a simple sample code, which others can tryout independently
(keeping the comment here since I don't have permission to comment :) )

pandas Groupby MonthStart with two business days offset

I have a DataFrame that is indexed by date and has daily data.
As described I wish to group and aggregate this data by calendar month start minus 2 business days. My idea is to use groupby and MonthBegin with a 2 days BDay offset to this.
When I try run the code
import pandas as pd
import pandas.tseries.offsets as of
days = of.MonthBegin() - of.BDay(2)
g = df.groupby(pd.Grouper(freq=days, level='Date')).sum()
I get an error
TypeError: Argument 'other' has incorrect type (expected
datetime.datetime, got BusinessDay)
Perhaps I need to use the rollback method on MonthBegin but when I try
days = of.MonthBegin()
days.rollback(of.BDay(2))
g_df = df.groupby(pd.Grouper(freq=days, level='Date')).sum()
TypeError: Cannot convert input [<2 * BusinessDays>] of type to Timestamp
Does anyone have any ideas how to correctly use the offsets to groupby MonthBegin - 2BDay ?
It is hard to tell, what you want to achieve without any data of yours, but here is how you could do it:
df = pd.DataFrame({"dates": ["2018-01-02", "2018-01-03", "2018-02-02", "2018-01-04"],
"vals": [10, 20, 10, 5]})
df.groupby((pd.to_datetime(df.dates) - of.MonthBegin() - of.BDay(2)).dt.month).vals.sum()
Output:
dates
1 10
12 35
Name: vals, dtype: int64

I have a date column in a dataframe. I want to change the format of the dates,in that column

I have a date column in a dataset where the dates are like 'Apr-12','Jan-12' format. I would like to change the format to 04-2012,01-2012. I am looking for a function which can do this.
I think I know one guy with the same name. Jokes apart here is the solution to your problem.
We do have an inbuilt function named as strptime(), so it takes up the string and then convert into the format you want.
You need to import datetime first since it is the part of the datetime package of python. Don't no need to install anything, just import it.
Then this works like this: datetime.strptime(your_string, format_you_want)
# You can also do this, from datetime import * (this imports all the functions of datetime)
from datetime import datetime
str = 'Apr-12'
date_object = datetime.strptime(str, '%m-%Y')
print(date_object)
I hope this will work for you. Happy coding :)
You can do following:
import pandas as pd
df = pd.DataFrame({
'date': ['Apr-12', 'Jan-12', 'May-12', 'March-13', 'June-14']
})
pd.to_datetime(df['date'], format='%b-%y')
This will output:
0 2012-04-01
1 2012-01-01
2 2012-05-01
Name: date, dtype: datetime64[ns]
Which means you can update your date column right away:
df['date'] = pd.to_datetime(df['date'], format='%b-%y')
You can chain a couple of pandas methods together to get this the desired output:
df = pd.DataFrame({'date_fmt':['Apr-12','Jan-12']})
df
Input dataframe:
date_fmt
0 Apr-12
1 Jan-12
Use pd.to_datetime chained with .dt date accessor and strftime
pd.to_datetime(df['date_fmt'], format='%b-%y').dt.strftime('%m-%Y')
Output:
0 04-2012
1 01-2012
Name: date_fmt, dtype: object

How to convert int64 to datetime in pandas

I have a pandas dataframe that has a column of type int64 but this columns represets date, e.g. 20180501. I'd like to convert this column to datetime and I'm having the following code but it returns an error message
df['new_date'] = pd.to_datetime(df['old_date'].astype('str'), format = '%y%m%d')
I'm getting the following error message
ValueError: unconverted data remains: 0501
How can I fix my code?
You need a capital Y. See Python's strftime directives for a complete reference.
df = pd.DataFrame({'old_date': [20180501, 20181230, 20181001]})
df['new_date'] = pd.to_datetime(df['old_date'].astype(str), format='%Y%m%d')
print(df)
old_date new_date
0 20180501 2018-05-01
1 20181230 2018-12-30
2 20181001 2018-10-01
It could be that the problem arises due to a format error at some places in the dataframe.
You could try setting the parameter errors="coerce" to avoid converting those entries and setting them to NaT.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

How to convert timedelta to time of day in pandas?

I have a SQL table that contains data of the mySQL time type as follows:
time_of_day
-----------
12:34:56
I then use pandas to read the table in:
df = pd.read_sql('select * from time_of_day', engine)
Looking at df.dtypes yields:
time_of_day timedelta64[ns]
My main issue is that, when writing my df to a csv file, the data comes out all messed up, instead of essentially looking like my SQL table:
time_of_day
0 days 12:34:56.000000000
I'd like to instead (obviously) store this record as a time, but I can't find anything in the pandas docs that talk about a time dtype.
Does pandas lack this functionality intentionally? Is there a way to solve my problem without requiring janky data casting?
Seems like this should be elementary, but I'm confounded.
Pandas does not support a time dtype series
Pandas (and NumPy) do not have a time dtype. Since you wish to avoid Pandas timedelta, you have 3 options: Pandas datetime, Python datetime.time, or Python str. Below they are presented in order of preference. Let's assume you start with the following dataframe:
df = pd.DataFrame({'time': pd.to_timedelta(['12:34:56', '05:12:45', '15:15:06'])})
print(df['time'].dtype) # timedelta64[ns]
Pandas datetime series
You can use Pandas datetime series and include an arbitrary date component, e.g. today's date. Underlying such a series are integers, which makes this solution the most efficient and adaptable.
The default date, if unspecified, is 1-Jan-1970:
df['time'] = pd.to_datetime(df['time'])
print(df)
# time
# 0 1970-01-01 12:34:56
# 1 1970-01-01 05:12:45
# 2 1970-01-01 15:15:06
You can also specify a date, such as today:
df['time'] = pd.Timestamp('today').normalize() + df['time']
print(df)
# time
# 0 2019-01-02 12:34:56
# 1 2019-01-02 05:12:45
# 2 2019-01-02 15:15:06
Pandas object series of Python datetime.time values
The Python datetime module from the standard library supports datetime.time objects. You can convert your series to an object dtype series containing pointers to a sequence of datetime.time objects. Operations will no longer be vectorised, but each underlying value will be represented internally by a number.
df['time'] = pd.to_datetime(df['time']).dt.time
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'datetime.time'>
Pandas object series of Python str values
Converting to strings is only recommended for presentation purposes that are not supported by other types, e.g. Pandas datetime or Python datetime.time. For example:
df['time'] = pd.to_datetime(df['time']).dt.strftime('%H:%M:%S')
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'str'>
it's a hack, but you can pull out the components to create a string and convert that string to a datetime.time(h,m,s) object
def convert(td):
time = [str(td.components.hours), str(td.components.minutes),
str(td.components.seconds)]
return datetime.strptime(':'.join(time), '%H:%M:%S').time()
df['time'] = df['time'].apply(lambda x: convert(x))
found a solution, but i feel like it's gotta be more elegant than this:
def convert(x):
return pd.to_datetime(x).strftime('%H:%M:%S')
df['time_of_day'] = df['time_of_day'].apply(convert)
df['time_of_day'] = pd.to_datetime(df['time_of_day']).apply(lambda x: x.time())
Adapted this code

Categories