I'm working in python with a pandas df and trying to convert a column that contains nanoseconds to a time with days, hours, minutes and seconds, but I'm not succeeding.
My original df looks like this:
ID TIME_NANOSECONDS
1 47905245000000000
2 45018244000000000
3 40182582000000000
The result should look like this:
ID TIME_NANOSECONDS TIME
1 47905245000000000 554 days 11:00:45.000000000
2 45018244000000000 521 days 01:04:04.000000000
3 40182582000000000 465 days 01:49:42.000000000
I've found some answers that advised to use timedelta, but following code, returns a date, which is not what I want.
temp_fc_col['TIME_TO_REPAIR_D'] = datetime.timedelta(temp_fc_col['TIME_TO_REPAIR'], unit='ns')
alternatively,
temp_fc_col['TIME_TO_REPAIR_D'] = (timedelta(microseconds=round(temp_fc_col['TIME_TO_REPAIR'], -3)
Returns an error: unsupported type for timedelta microseconds component: Series. Probably, because this staement can only process one value at a time.
Use to_timedelta working well with Series, also unit='ns' should be omit:
temp_fc_col['TIME_TO_REPAIR_D'] = pd.to_timedelta(temp_fc_col['TIME_NANOSECONDS'])
print (temp_fc_col)
ID TIME_NANOSECONDS TIME_TO_REPAIR_D
0 1 47905245000000000 554 days 11:00:45
1 2 45018244000000000 521 days 01:04:04
2 3 40182582000000000 465 days 01:49:42
Related
Lets say I have a column 'MEMBERSHIP_LENGTH' with the value is integer for example, the value is 100 means this id had been a member for 100 days. I want to know what date this id applied for a membership from today.
So what I am thinking is
df['APPLIED_DATE'] = pd.to_datetime('2023-01-21') - timedelta(days=df['MEMBERSHIP_LENGTH'])
but I got an error TypeError: unsupported type for timedelta days component: Series.
How should I do it?
How about
from datetime import datetime, timedelta
df['APPLIED_DATE']=df['MEMBERSHIP_LENGTH'].apply(lambda x: datetime.today() - timedelta(days=x))
df
Out[28]:
MEMBERSHIP_LENGTH APPLIED_DATE
0 100 2022-10-13 09:56:49.854174
1 101 2022-10-12 09:56:49.854174
2 200 2022-07-05 09:56:49.854174
3 201 2022-07-04 09:56:49.854174
I have a df with a column like this:
df
time
0.5
30.5
60.3
90.2
120.3
and so on, with each time represented as a float of seconds - for example, 60.3 is simply 60.3 seconds. How can I turn this into data type of time? I have tried:
df['time'] = pd.to_timedelta(df['time'])
but it is returning incorrect values. For example, time of 60.3 seconds is converted into:
0 days 00:00:00.000000060
which is not only incorrect as it should be 0 days 00:00:60.3, but I would also like to only extract the minutes and seconds, leaving with me a column of data that looks like:
00:60.3
Thank you.
I'd use .to_datetime and set the unit to seconds ('s'). Then you can apply a format to the datetime object.
Try this out:
df['time'] = pd.to_datetime(df["time"], unit='s').apply(lambda x: x.strftime("%M:%S.%f"))
Should get:
time
0 00:00.500000
1 00:30.500000
2 01:00.300000
3 01:30.200000
4 02:00.300000
I have a pandas dataframe that has columns:
['A'] of departure time (listed as integer ex: 700 or 403 which is 7:00 and 4:03);
['B'] of elapsed time (listed as integer ex: 70 or 656 which is 70 mins and 656 mins);
['C'] of arrival time (listed as integer: 1810 and 355 which is 18:10 and 03:55).
I need to find a way to develop a new column ['D'] with a boolean value that returns True if the arrival is on the following day and False if arrival is on the same day.
I thought of accessing the -2 index of column A to convert hour to minute and then add the remainder minutes to normalize the values but not sure how to do that, or if there's a simpler way to find this. The idea behind this would be to get total minutes elapsed from moment the day started and if exceeds total minutes in a day, then I'd have my answer but unsure if this would work.
Similar to the method you outlined, you can accomplish the task by converting the integers in column A to a 24-hour datetime (starting from 1900-01-01), adding the integer number of minutes from column B as a timedelta and then check to see if the result is still in day 1 of the month. As a sanity check, I made sure the last row should return True.
You can probably combine these steps without creating a new column, but I think the code is more readable this way.
import numpy as np
import pandas as pd
import datetime as dt
df = pd.DataFrame({
'A':[700,403,2359],
'B':[70,656,2],
'C':[810,1059,1]
})
# convert to string, add leading zeros, then convert column A to datetime
df['arrival'] = pd.to_datetime(df['A'].astype(str).str.zfill(4), format='%H%M') + pd.to_timedelta(df['B'],'m')
# check if you are on day 1 of the month still
df['D'] = np.where(df.arrival.dt.day > 1, True, False)
Output:
A B C arrival D
0 700 70 810 1900-01-01 08:10:00 False
1 403 656 1059 1900-01-01 14:59:00 False
2 2359 2 1 1900-01-02 00:01:00 True
Want to calculate the difference of days between pandas date series -
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
and today's date.
I tried but could not come up with logical solution.
Please help me with the code. Actually I am new to python and there are lot of syntactical errors happening while applying any function.
You could do something like
# generate time data
data = pd.to_datetime(pd.Series(["2018-09-1", "2019-01-25", "2018-10-10"]))
pd.to_datetime("now") > data
returns:
0 False
1 True
2 False
you could then use that to select the data
data[pd.to_datetime("now") > data]
Hope it helps.
Edit: I misread it but you can easily alter this example to calculate the difference:
data - pd.to_datetime("now")
returns:
0 -122 days +13:10:37.489823
1 24 days 13:10:37.489823
2 -83 days +13:10:37.489823
dtype: timedelta64[ns]
You can try as Follows:
>>> from datetime import datetime
>>> df
col1
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
Make Sure to convert the column names to_datetime:
>>> df['col1'] = pd.to_datetime(df['col1'], infer_datetime_format=True)
set the current datetime in order to Further get the diffrence:
>>> curr_time = pd.to_datetime("now")
Now get the Difference as follows:
>>> df['col1'] - curr_time
0 -2145 days +07:48:48.736939
1 -2163 days +07:48:48.736939
2 -2140 days +07:48:48.736939
3 -2139 days +07:48:48.736939
4 -2132 days +07:48:48.736939
5 -2119 days +07:48:48.736939
6 -2115 days +07:48:48.736939
7 -2112 days +07:48:48.736939
Name: col1, dtype: timedelta64[ns]
With numpy you can solve it like difference-two-dates-days-weeks-months-years-pandas-python-2
. bottom line
df['diff_days'] = df['First dates column'] - df['Second Date column']
# for days use 'D' for weeks use 'W', for month use 'M' and for years use 'Y'
df['diff_days']=df['diff_days']/np.timedelta64(1,'D')
print(df)
if you want days as int and not as float use
df['diff_days']=df['diff_days']//np.timedelta64(1,'D')
From the pandas docs under Converting To Timestamps you will find:
"Converting to Timestamps To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the to_datetime function"
I haven't used pandas before but this suggests your pandas date series (a list-like object) is iterable and each element of this series is an instance of a class which has a to_datetime function.
Assuming my assumptions are correct, the following function would take such a list and return a list of timedeltas' (a datetime object representing the difference between two date time objects).
from datetime import datetime
def convert(pandas_series):
# get the current date
now = datetime.now()
# Use a list comprehension and the pandas to_datetime method to calculate timedeltas.
return [now - pandas_element.to_datetime() for pandas_series]
# assuming 'some_pandas_series' is a list-like pandas series object
list_of_timedeltas = convert(some_pandas_series)
I have a data frame in pandas which includes number of days since an event occurred. I want to create a new column that calculates the date of the event by subtracting the number of days from the current date. Every time I attempt to apply pd.offsets.Day or pd.Timedelta I get an error stating that Series are an unsupported type. This also occurs when I use apply. When I use map I receive a runtime error saying "maximum recursion depth exceeded while calling a Python object".
For example, assume my data frame looked like this:
index days_since_event
0 5
1 7
2 3
3 6
4 0
I want to create a new column with the date of the event, so my expected outcome (using today's date of 12/29/2015)
index days_since_event event_date
0 5 2015-12-24
1 7 2015-12-22
2 3 2015-12-26
3 6 2015-12-23
4 0 2015-12-29
I have attempted multiple ways to do this, but have received errors for each.
One method I tried was:
now = pd.datetime.date(pd.datetime.now())
df['event_date'] = now - df.days_since_event.apply(pd.offsets.Day)
With this I received an error saying that Series are an unsupported type.
I tried the above with .map instead of .apply, and received the error that "maximum recursion depth exceeded while calling a Python object".
I also attempted to convert the days into timedelta, such as:
df.days_since_event = (dt.timedelta(days = df.days_since_event)).apply
This also received an error referencing the series being an unsupported type.
First, to convert the column with integers to a timedelta, you can use to_timedelta:
In [60]: pd.to_timedelta(df['days_since_event'], unit='D')
Out[60]:
0 5 days
1 7 days
2 3 days
3 6 days
4 0 days
Name: days_since_event, dtype: timedelta64[ns]
Then you can create a new column with the current date and substract those timedelta's:
In [62]: df['event_date'] = pd.Timestamp('2015-12-29')
In [63]: df['event_date'] = df['event_date'] - pd.to_timedelta(df['days_since_event'], unit='D')
In [64]: df['event_date']
Out[64]:
0 2015-12-24
1 2015-12-22
2 2015-12-26
3 2015-12-23
4 2015-12-29
dtype: datetime64[ns]
Just to follow up with joris' response, you can convert an int or a float into whatever time unit you want with pd.to_timedelta(x, unit=''), changing only the entry for unit=:
# Years, Months, Days:
pd.to_timedelta(3.5, unit='Y') # returns '1095 days 17:27:36'
pd.to_timedelta(3.5, unit='M') # returns '91 days 07:27:18'
pd.to_timedelta(3.5, unit='D') # returns '3 days 12:00:00'
# Hours, Minutes, Seconds:
pd.to_timedelta(3.5, unit='h') # returns '0 days 03:30:00'
pd.to_timedelta(3.5, unit='m') # returns '0 days 00:03:30'
pd.to_timedelta(3.5, unit='s') # returns '0 days 00:00:03.50'
Note that mathematical operations are legal once correctly formatted:
pd.to_timedelta(3.5, unit='h') - pd.to_timedelta(3.25, unit='h') # returns '0 days 00:15:00'