How to compare date difference in python pandas - python

I'm appending a column to my pandas dataframe which is the time difference between two dates.
df['time_diff'] = datetime.dt(2018,1,1) - df['IN_TIME']
the type if the new column in <m8[ns]. I'm trying to filter the rows whose 'time_diff' is greater than 30 days but I can't compare <m8[ns] with a number. How can I do this comparison?

Here's one way. Note you don't need to use the datetime module for these calculations as Pandas has some intuitive functionality for these operations.
df['time_diff'] = pd.to_datetime('2018-01-01') - df['IN_TIME']
df = df[df['time_diff'].dt.days > 30]
This solution assumes df['IN_TIME'] is a datetime series; if it is not, you can convert via df['IN_TIME'] = pd.to_datetime(df['IN_TIME']).

Related

Pandas dataframe drop multiple rows based on datetime difference

I store datetimes in a pandas dataframe which look like dd/mm/yyyy hh:mm:ss
I want to drop all rows where values in column x (datetime) are within 24 hours of one another.
On a 1 by 1 basis, I was previously doing this, which doesn't seem to work within the drop function:
df.drop(df[(df['d2'] - df['d1']).seconds / 3600 < 24].index)
>> AttributeError: 'Series' object has no attribute 'seconds'
This should work
df.loc[ (df.d2 - df.d1) >= datetime.timedelta(days=1) ]
the answer is very easy
import pandas as pd
df = pd.read_csv("test.csv")
df["d1"] = pd.to_datetime(df["d1"])
df["d2"] = pd.to_datetime(df["d2"])
now if you tried to subtract columns from each other
df["first"] - df["second"]
output will be in days and hence and as what #kaan suggested
df.loc[(df["d2"] - df["d1"]) >= pd.Timedelta(days=1)]

How to join pandas Series of numbers to make it one number

I'm using Pandas library.
I have three columns in dataset named 'hours', 'minutes' and 'seconds'
I want to join the three columns to make it in time format.
For e.g the first column should read as 9:33:09
How can I do that?
Convert to timedelta and add -
pd.to_timedelta(df["hours"], unit='h') + pd.to_timedelta(df["minutes"], unit='m') + pd.to_timedelta(df["sec"], unit='S')
Viewing you example, I think that the sec column is actually microseconds, if that's the case use -
pd.to_timedelta(df["hours"], unit='h') + pd.to_timedelta(df["minutes"], unit='m') + pd.to_timedelta(df["sec"], unit='us')
You can use string operations and pandas for this.
import pandas as pd
# Read csv
data=pd.read_csv("data.csv")
# Create a DataFrame object
df=pd.DataFrame(data,columns=["hour","mins","sec"])
# Iterate through records and print the values.
for ind in df.index:
hour=str(df['hour'][ind])
min=str(df['mins'][ind])
sec=str(df['sec'][ind])
sec=sec[:len(sec)-4]
if(len(sec)==1):
sec="0"+sec
print(hour+":"+min+":"+sec)
Output:
HH:MM:SS
It appends 0 if seconds are of 1 digit.

Length between two dates in a time series in pandas data frame

I have a time series composed of weekdays with anomalous/unpredictable holidays. On any given day, I want to know the length/number of rows to a date specified under column 'date1'. See below.
len(df.loc['2019-10-18':'2019-11-15']) returns the correct answer
I am trying to create a column 'shift' that will calculate the above.
Both DatetimeIndex and the 'date1' are dtype 'datetime64[ns]'
df['shift']=len(df.loc[df.index : df['date1']]) clearly doesn't work but might there be a solution that does?
IIUC use:
df['len'] = (df.index - df['date1']).dt.days

Pandas - Python, deleting rows based on Date column

I'm trying to delete rows of a dataframe based on one date column; [Delivery Date]
I need to delete rows which are older than 6 months old but not equal to the year '1970'.
I've created 2 variables:
from datetime import date, timedelta
sixmonthago = date.today() - timedelta(188)
import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')
but I don't know how to delete rows based on these two variables, using the [Delivery Date] column.
Could anyone provide the correct solution?
You can just filter them out:
df[(df['Delivery Date'].dt.year == 1970) | (df['Delivery Date'] >= sixmonthago)]
This returns all rows where the year is 1970 or the date is less than 6 months.
You can use boolean indexing and pass multiple conditions to filter the df, for multiple conditions you need to use the array operators so | instead of or, and parentheses around the conditions due to operator precedence.
Check the docs for an explanation of boolean indexing
Be sure the calculation itself is accurate for "6 months" prior. You may not want to be hardcoding in 188 days. Not all months are made equally.
from datetime import date
from dateutil.relativedelta import relativedelta
#http://stackoverflow.com/questions/546321/how-do-i-calculate-the-date-six-months-from-the-current-date-using-the-datetime
six_months = date.today() - relativedelta( months = +6 )
Then you can apply the following logic.
import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')
df = df[(df['Delivery Date'].dt.year == nineteen_seventy.tm_year) | (df['Delivery Date'] >= six_months)]
If you truly want to drop sections of the dataframe, you can do the following:
df = df[(df['Delivery Date'].dt.year != nineteen_seventy.tm_year) | (df['Delivery Date'] < six_months)].drop(df.columns)

Pandas: select all dates with specific month and day

I have a dataframe full of dates and I would like to select all dates where the month==12 and the day==25 and add replace the zero in the xmas column with a 1.
Anyway to do this? the second line of my code errors out.
df = DataFrame({'date':[datetime(2013,1,1).date() + timedelta(days=i) for i in range(0,365*2)], 'xmas':np.zeros(365*2)})
df[df['date'].month==12 and df['date'].day==25] = 1
Pandas Series with datetime now behaves differently. See .dt accessor.
This is how it should be done now:
df.loc[(df['date'].dt.day==25) & (cust_df['date'].dt.month==12), 'xmas'] = 1
Basically what you tried won't work as you need to use the & to compare arrays, additionally you need to use parentheses due to operator precedence. On top of this you should use loc to perform the indexing:
df.loc[(df['date'].month==12) & (df['date'].day==25), 'xmas'] = 1
An update was needed in reply to this question. As of today, there's a slight difference in how you extract months from datetime objects in a pd.Series.
So from the very start, incase you have a raw date column, first convert it to datetime objects by using a simple function:
import datetime as dt
def read_as_datetime(str_date):
# replace %Y-%m-%d with your own date format
return dt.datetime.strptime(str_date,'%Y-%m-%d')
then apply this function to your dates column and save results in a new column namely datetime:
df['datetime'] = df.dates.apply(read_as_datetime)
finally in order to extract dates by day and month, use the same piece of code that #Shayan RC explained, with this slight change; notice the dt.datetime after calling the datetime column:
df.loc[(df['datetime'].dt.datetime.month==12) &(df['datetime'].dt.datetime.day==25),'xmas'] =1

Categories