Comparing today date with date in dataframe - python

Comparing today date with date in dataframe
Sample Data
id date
1 1/2/2018
2 1/5/2019
3 5/3/2018
4 23/11/2018
Desired output
id date
2 1/5/2019
4 23/11/2018
My current code
dfdateList = pd.DataFrame()
dfDate= self.df[["id", "date"]]
today = datetime.datetime.now()
today = today.strftime("%d/%m/%Y").lstrip("0").replace(" 0", "")
expList = []
for dates in dfDate["date"]:
if dates <= today:
expList.append(dates)
dfdateList = pd.DataFrame(expList)
Currently my code is printing every single line despite the conditions, can anyone guide me? thanks

Pandas has native support for a large class of operations on datetimes, so one solution here would be to use pd.to_datetime to convert your dates from strings to pandas' representation of datetimes, pd.Timestamp, then just create a mask based on the current date:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df[df['date'] > pd.Timestamp.now()]
For example:
In [34]: df['date'] = pd.to_datetime(df['date'], dayfirst=True)
In [36]: df
Out[36]:
id date
0 1 2018-02-01
1 2 2019-05-01
2 3 2018-03-05
3 4 2018-11-23
In [37]: df[df['date'] > pd.Timestamp.now()]
Out[37]:
id date
1 2 2019-05-01
3 4 2018-11-23

Related

String dates into unixtime in a pandas dataframe

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

Days before end of month in pandas

I would like to get the number of days before the end of the month, from a string column representing a date.
I have the following pandas dataframe :
df = pd.DataFrame({'date':['2019-11-22','2019-11-08','2019-11-30']})
df
date
0 2019-11-22
1 2019-11-08
2 2019-11-30
I would like the following output :
df
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0
The package pd.tseries.MonthEnd with rollforward seemed a good pick, but I can't figure out how to use it to transform a whole column.
Subtract all days of month created by Series.dt.daysinmonth with days extracted by Series.dt.day:
df['date'] = pd.to_datetime(df['date'])
df['days_end_month'] = df['date'].dt.daysinmonth - df['date'].dt.day
Or use offsets.MonthEnd, subtract and convert timedeltas to days by Series.dt.days:
df['days_end_month'] = (df['date'] + pd.offsets.MonthEnd(0) - df['date']).dt.days
print (df)
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0

Converting date formats in pandas dataframe

I have a dataframe and the Date column has two different types of date formats going on.
eg. 1983-11-10 00:00:00 and 10/11/1983
I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?
I believe you need parameter dayfirst=True in to_datetime:
df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
Date
0 1983-11-10 00:00:00
1 10/11/1983
df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
Date
0 1983-11-10
1 1983-11-10
because:
df['Date'] = pd.to_datetime(df.Date)
print (df)
Date
0 1983-11-10
1 1983-10-11
Or you can specify both formats and then use combine_first:
d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')
df['Date'] = d1.combine_first(d2)
print (df)
Date
0 1983-11-10
1 1983-11-10
General solution for multiple formats:
from functools import reduce
def convert_formats_to_datetimes(col, formats):
out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
return reduce(lambda l,r: pd.Series.combine_first(l,r), out)
formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
Date
0 1983-11-10
1 1983-11-10
I want them all to be the same type, how can I iterate through the
Date column of my dataframe and convert the dates to one format?
Your input data is ambiguous: is 10 / 11 10th November or 11th October? You need to specify logic to determine which is appropriate. A function is useful if you with to try multiple date formats sequentially:
def date_apply_formats(s, form_lst):
s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
for form in form_lst[1:]:
s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
return s
df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])
Priority is given to the first item in form_lst. The solution is extendible to an arbitrary number of provided formats.
Input date is
NSECODE Date Close
1 NSE500 20000103 1291.5500
2 NSE500 20000104 1335.4500
3 NSE500 20000105 1303.8000
history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")
ouput is now
NSECode Date Close
1 NSE500 2000-01-03 1291.5500
2 NSE500 2000-01-04 1335.4500
3 NSE500 2000-01-05 1303.8000

pd.to_datetime is getting half my dates with flipped day / months

My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30

Reformat Dataframe column to date only format

I have a dataframe (df) with a column 'Date of birth' column:
Date of birth
0 1957-04-30 00:00:00
1 1966-11-10 00:00:00
2 1966-11-10 00:00:00
3 1962-03-28 00:00:00
4 1958-10-28 00:00:00
5 1958-06-04 00:00:00
How can I reformat the column to a date only format? After I reformat I'm going to work out age from a specific date:
Date of birth
0 1957-04-30
1 1966-11-10
2 1966-11-10
3 1962-03-28
4 1958-10-28
5 1958-06-04
I have tried using
df["Date of birth"] = pd.to_datetime(df['Date of birth'], format='%d%b%Y')
df["Date of birth"] = df["Date of birth"].dt.strftime('%m/%d/%Y')
but with no joy.
After the column becomes a date, use date accessor to access it.
df["Date of birth"] = pd.to_datetime(df['Date of birth']).dt.date

Categories