How to join pandas Series of numbers to make it one number - python

I'm using Pandas library.
I have three columns in dataset named 'hours', 'minutes' and 'seconds'
I want to join the three columns to make it in time format.
For e.g the first column should read as 9:33:09
How can I do that?

Convert to timedelta and add -
pd.to_timedelta(df["hours"], unit='h') + pd.to_timedelta(df["minutes"], unit='m') + pd.to_timedelta(df["sec"], unit='S')
Viewing you example, I think that the sec column is actually microseconds, if that's the case use -
pd.to_timedelta(df["hours"], unit='h') + pd.to_timedelta(df["minutes"], unit='m') + pd.to_timedelta(df["sec"], unit='us')

You can use string operations and pandas for this.
import pandas as pd
# Read csv
data=pd.read_csv("data.csv")
# Create a DataFrame object
df=pd.DataFrame(data,columns=["hour","mins","sec"])
# Iterate through records and print the values.
for ind in df.index:
hour=str(df['hour'][ind])
min=str(df['mins'][ind])
sec=str(df['sec'][ind])
sec=sec[:len(sec)-4]
if(len(sec)==1):
sec="0"+sec
print(hour+":"+min+":"+sec)
Output:
HH:MM:SS
It appends 0 if seconds are of 1 digit.

Related

Pandas dataframe drop multiple rows based on datetime difference

I store datetimes in a pandas dataframe which look like dd/mm/yyyy hh:mm:ss
I want to drop all rows where values in column x (datetime) are within 24 hours of one another.
On a 1 by 1 basis, I was previously doing this, which doesn't seem to work within the drop function:
df.drop(df[(df['d2'] - df['d1']).seconds / 3600 < 24].index)
>> AttributeError: 'Series' object has no attribute 'seconds'
This should work
df.loc[ (df.d2 - df.d1) >= datetime.timedelta(days=1) ]
the answer is very easy
import pandas as pd
df = pd.read_csv("test.csv")
df["d1"] = pd.to_datetime(df["d1"])
df["d2"] = pd.to_datetime(df["d2"])
now if you tried to subtract columns from each other
df["first"] - df["second"]
output will be in days and hence and as what #kaan suggested
df.loc[(df["d2"] - df["d1"]) >= pd.Timedelta(days=1)]

How to create a "duration" column from two "dates" columns?

I have two columns ("basecamp_date" and "highpoint_date") in my "expeditions" dataframe, they have a start date (basecamp_date) and an end date ("highpoint_date") and I would like to create a new column that expresses the duration between these two dates but I have no idea how to do it.
import pandas as pd
expeditions = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv")
In read_csv convert columns to datetimes and then subtrat columns with Series.dt.days for days:
file = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv"
expeditions = pd.read_csv(file, parse_dates=['basecamp_date','highpoint_date'])
expeditions['diff'] = expeditions['highpoint_date'].sub(expeditions['basecamp_date']).dt.days
You can convert those columns to datetime and then subtract them to get the duration:
tstart = pd.to_datetime(expeditions['basecamp_date'])
tend = pd.to_datetime(expeditions['highpoint_date'])
expeditions['duration'])= pd.Timedelta(tend - tstart)

How to sort dates imported from a CSV file?

I'm trying to write a program that can print a list of sorted dates but it keeps sorting by the 'day' instead of the full date, day,month,year
Im very new to python so theres probably a lot i'm doing wrong but any help would be greatly appreciated.
So I have it so that you can view the list over two pages.
the dates will sort
12/03/2004
13/08/2001
15/10/2014
but I need the full date sorted
df = pd.read_csv('Employee.csv')
df = df.sort_values('Date of Employment.')
List1 = df.iloc[:50, 1:]
List2 = df.iloc[50:99, 1:]
The datetime data type has to be used for the dates to be sorted correctly
You need to use either one of these approaches to convert the dates to datetime objects:
Approach 1
pd.to_datetime + DataFrame.sort_values:
df['Date of Employment.'] = pd.to_datetime(df['Date of Employment.']')
Approach 2
You can parse the dates at the same time that the Pandas DataFrame is being loaded:
df = pd.read_csv('Employee.csv', parse_dates=['Date of Employement.'])
This is equivalent to the first approach with the exception that everything is done in one step.
Next you need to sort the datetime values in either ascending or descending order.
Ascending:
`df.sort_values('Date of Employment.')`
Descending
`df.sort_values('Date of Employment.',ascending=False)`
You need to convert Date of Employment. to a Date before sorting
df['Date of Employment.'] = pd.to_datetime(df['Date of Employment.'],format= '%d/%m/%Y')
Otherwise it's just strings for Python

How to compare date difference in python pandas

I'm appending a column to my pandas dataframe which is the time difference between two dates.
df['time_diff'] = datetime.dt(2018,1,1) - df['IN_TIME']
the type if the new column in <m8[ns]. I'm trying to filter the rows whose 'time_diff' is greater than 30 days but I can't compare <m8[ns] with a number. How can I do this comparison?
Here's one way. Note you don't need to use the datetime module for these calculations as Pandas has some intuitive functionality for these operations.
df['time_diff'] = pd.to_datetime('2018-01-01') - df['IN_TIME']
df = df[df['time_diff'].dt.days > 30]
This solution assumes df['IN_TIME'] is a datetime series; if it is not, you can convert via df['IN_TIME'] = pd.to_datetime(df['IN_TIME']).

Pandas: select all dates with specific month and day

I have a dataframe full of dates and I would like to select all dates where the month==12 and the day==25 and add replace the zero in the xmas column with a 1.
Anyway to do this? the second line of my code errors out.
df = DataFrame({'date':[datetime(2013,1,1).date() + timedelta(days=i) for i in range(0,365*2)], 'xmas':np.zeros(365*2)})
df[df['date'].month==12 and df['date'].day==25] = 1
Pandas Series with datetime now behaves differently. See .dt accessor.
This is how it should be done now:
df.loc[(df['date'].dt.day==25) & (cust_df['date'].dt.month==12), 'xmas'] = 1
Basically what you tried won't work as you need to use the & to compare arrays, additionally you need to use parentheses due to operator precedence. On top of this you should use loc to perform the indexing:
df.loc[(df['date'].month==12) & (df['date'].day==25), 'xmas'] = 1
An update was needed in reply to this question. As of today, there's a slight difference in how you extract months from datetime objects in a pd.Series.
So from the very start, incase you have a raw date column, first convert it to datetime objects by using a simple function:
import datetime as dt
def read_as_datetime(str_date):
# replace %Y-%m-%d with your own date format
return dt.datetime.strptime(str_date,'%Y-%m-%d')
then apply this function to your dates column and save results in a new column namely datetime:
df['datetime'] = df.dates.apply(read_as_datetime)
finally in order to extract dates by day and month, use the same piece of code that #Shayan RC explained, with this slight change; notice the dt.datetime after calling the datetime column:
df.loc[(df['datetime'].dt.datetime.month==12) &(df['datetime'].dt.datetime.day==25),'xmas'] =1

Categories