I have the following table :
DayTime
1 days 19:55:00
134 days 15:34:00
How to convert the Daytime to fully day? Which mean the hours will change to day(devide by 24)
You can convert Timedeltas to numerical units of time by dividing by units of Timedelta. For instance,
import pandas as pd
df = pd.DataFrame({'DayTime':['1 days 19:55:00', '134 days 15:34:00']})
df['DayTime'] = pd.to_timedelta(df['DayTime'])
days = df['DayTime'] / pd.Timedelta(hours=24)
print(days)
yields
0 1.829861
1 134.648611
Name: DayTime, dtype: float64
Note that above I'm assuming that 1 day = 24 hours. That's not always exactly true. Some days are 24 hours + 1 leap second long.
Without using pandas and in python 2.7 (python 3 timedeltas can be directly divided):
import re
from datetime import timedelta
def full_days(day_time):
d, h, m, s = map(int, re.split('\D+', day_time))
delta = timedelta(hours=h, minutes=m, seconds=s)
return d + delta.total_seconds() / timedelta(days=1).total_seconds()
print full_days('1 days 19:55:00')
print full_days('0 days 43:55:00')
print full_days('134 days 15:34:00')
Outputs:
1.82986111111
1.82986111111
134.648611111
Related
I have a dataframe df and its first column is timedelta64
df.info():
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 6 columns):
0 686 non-null timedelta64[ns]
1 686 non-null object
2 686 non-null object
3 686 non-null object
4 686 non-null object
5 686 non-null object
If I print(df[0][2]), for example, it will give me 0 days 05:01:11. However, I don't want the 0 days filed. I only want 05:01:11 to be printed. Could someone teaches me how to do this? Thanks so much!
It is possible by:
df['duration1'] = df['duration'].astype(str).str[-18:-10]
But solution is not general, if input is 3 days 05:01:11 it remove 3 days too.
So solution working only for timedeltas less as one day correctly.
More general solution is create custom format:
N = 10
np.random.seed(11230)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='13.5H')
df = pd.DataFrame({'duration': np.abs(np.random.choice(rng, size=N) -
np.random.choice(rng, size=N)) })
df['duration1'] = df['duration'].astype(str).str[-18:-10]
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
df['duration2'] = df['duration'].apply(f)
print (df)
duration duration1 duration2
0 2 days 06:00:00 06:00:00 54:00:00
1 2 days 19:30:00 19:30:00 67:30:00
2 1 days 03:00:00 03:00:00 27:00:00
3 0 days 00:00:00 00:00:00 0:00:00
4 4 days 12:00:00 12:00:00 108:00:00
5 1 days 03:00:00 03:00:00 27:00:00
6 0 days 13:30:00 13:30:00 13:30:00
7 1 days 16:30:00 16:30:00 40:30:00
8 0 days 00:00:00 00:00:00 0:00:00
9 1 days 16:30:00 16:30:00 40:30:00
Here's a short and robust version using apply():
df['timediff_string'] = df['timediff'].apply(
lambda x: f'{x.components.hours:02d}:{x.components.minutes:02d}:{x.components.seconds:02d}'
if not pd.isnull(x) else ''
)
This leverages the components attribute of pandas Timedelta objects and also handles empty values (NaT).
If the timediff column does not contain pandas Timedelta objects, you can convert it:
df['timediff'] = pd.to_timedelta(df['timediff'])
datetime.timedelta already formats the way you'd like. The crux of this issue is that Pandas internally converts to numpy.timedelta.
import pandas as pd
from datetime import timedelta
time_1 = timedelta(days=3, seconds=3400)
time_2 = timedelta(days=0, seconds=3400)
print(time_1)
print(time_2)
times = pd.Series([time_1, time_2])
# Times are converted to Numpy timedeltas.
print(times)
# Convert to string after converting to datetime.timedelta.
times = times.apply(
lambda numpy_td: str(timedelta(seconds=numpy_td.total_seconds())))
print(times)
So, convert to a datetime.timedelta and then str (to prevent conversion back to numpy.timedelta) before printing.
3 days, 0:56:40
0:56:400
0 3 days 00:56:40
1 0 days 00:56:40
dtype: timedelta64[ns]
0 3 days, 0:56:40
1 0:56:40
dtype: object
I came here looking for answers to the same question, so I felt I should add further clarification. : )
You can convert it into a Python timedelta, then to str and finally back to a Series:
pd.Series(df["duration"].dt.to_pytimedelta().astype(str), name="start_time")
Given OP is ok with an object column (a little verbose):
def splitter(td):
td = str(td).split(' ')[-1:][0]
return td
df['split'] = df['timediff'].apply(splitter)
Basically we're taking the timedelta column, transforming the contents to a string, then splitting the string (creates a list) and taking the last item of that list, which would be the hh:mm:ss component.
Note that specifying ' ' for what to split by is redundant here.
Alternative one liner:
df['split2'] = df['timediff'].astype('str').str.split().str[-1]
which is very similar, but not very pretty IMHO. Also, the output includes milliseconds, which is not the case in the first solution. I'm not sure what the reason for that is (please comment if you do). If your data is big it might be worthwhile to time these different approaches.
If wou want to remove all nonzero components (not only days), you can do it like this:
def pd_td_fmt(td):
import pandas as pd
abbr = {'days': 'd', 'hours': 'h', 'minutes': 'min', 'seconds': 's', 'milliseconds': 'ms', 'microseconds': 'us',
'nanoseconds': 'ns'}
fmt = lambda td:"".join(f"{v}{abbr[k]}" for k, v in td.components._asdict().items() if v != 0)
if isinstance(td, pd.Timedelta):
return fmt(td)
elif isinstance(td,pd.TimedeltaIndex):
return td.map(fmt)
else:
raise ValueError
If you can be sure that your timedelta is less than a day, this might work. To do this in as few lines as possible, I convert the timedelta to a datetime by adding the unix epoch 0 and then using the now-datetime dt function to format the date format.
df['duration1'] = (df['duration'] + pd.to_datetime(0)).dt.strftime('%M:%S')
My DataFrame looks like this:
id
date
value
1
2021-07-16
100
2
2021-09-15
20
1
2021-04-10
50
1
2021-08-27
30
2
2021-07-22
15
2
2021-07-22
25
1
2021-06-30
40
3
2021-10-11
150
2
2021-08-03
15
1
2021-07-02
90
I want to groupby the id, and return the difference of total value in a 90-days period.
Specifically, I want the values of last 90 days based on today, and based on 30 days ago.
For example, considering today is 2021-10-13, I would like to get:
the sum of all values per id between 2021-10-13 and 2021-07-15
the sum of all values per id between 2021-09-13 and 2021-06-15
And finally, subtract them to get the variation.
I've already managed to calculate it, by creating separated temporary dataframes containing only the dates in those periods of 90 days, grouping by id, and then merging these temp dataframes into a final one.
But I guess it should be an easier or simpler way to do it. Appreciate any help!
Btw, sorry if the explanation was a little messy.
If I understood correctly, you need something like this:
import pandas as pd
import datetime
## Calculation of the dates that we are gonna need.
today = datetime.datetime.now()
delta = datetime.timedelta(days = 120)
# Date of the 120 days ago
hundredTwentyDaysAgo = today - delta
delta = datetime.timedelta(days = 90)
# Date of the 90 days ago
ninetyDaysAgo = today - delta
delta = datetime.timedelta(days = 30)
# Date of the 30 days ago
thirtyDaysAgo = today - delta
## Initializing an example df.
df = pd.DataFrame({"id":[1,2,1,1,2,2,1,3,2,1],
"date": ["2021-07-16", "2021-09-15", "2021-04-10", "2021-08-27", "2021-07-22", "2021-07-22", "2021-06-30", "2021-10-11", "2021-08-03", "2021-07-02"],
"value": [100,20,50,30,15,25,40,150,15,90]})
## Casting date column
df['date'] = pd.to_datetime(df['date']).dt.date
grouped = df.groupby('id')
# Sum of last 90 days per id
ninetySum = grouped.apply(lambda x: x[x['date'] >= ninetyDaysAgo.date()]['value'].sum())
# Sum of last 90 days, starting from 30 days ago per id
hundredTwentySum = grouped.apply(lambda x: x[(x['date'] >= hundredTwentyDaysAgo.date()) & (x['date'] <= thirtyDaysAgo.date())]['value'].sum())
The output is
ninetySum - hundredTwentySum
id
1 -130
2 20
3 150
dtype: int64
You can double check to make sure these are the numbers you wanted by printing ninetySum and hundredTwentySum variables.
I have a column with many dates: sample of the said list below
Dates
1 2019-02-01
2 2018-03-10
3 2019-08-01
4 2020-02-07
I would like to have it so that if input a date, of any year I can get the week number.
However, the fiscal year starts on Aug 1 of any given year.
I tried just shifting the date to Jan 1 but it's different for every year due to leap years.
data['Dates'] = pd.to_datetime(data['Dates'])
data['Week'] = (data['Dates'] - timedelta(days=215)).week
print(data)
how can I get a result similar to this one below
Dates Week
1 2019-02-01 27
2 2018-03-10 32
3 2019-08-01 1
4 2020-02-07 28
-Note: the weeks are probably incorrect.
The other answer ignores the fiscal year part of the OP. I am leaving the fiscal year start date calc to the reader but this will calculate the week number (where Monday is the start of the week) from an arbitrary start date.
from dateutil import relativedelta
from datetime import date, datetime, timedelta
NEXT_MONDAY = relativedelta.relativedelta(weekday=relativedelta.MO)
LAST_MONDAY = relativedelta.relativedelta(weekday=relativedelta.MO(-1))
ONE_WEEK = timedelta(weeks=1)
def week_in_fiscal_year(d: date, fiscal_year_start: date) -> int:
fy_week_2_monday = fiscal_year_start + NEXT_MONDAY
if d < fy_week_2_monday:
return 1
else:
cur_week_monday = d + LAST_MONDAY
return int((cur_week_monday - fy_week_2_monday) / ONE_WEEK) + 2
adapted from this post
Convert it to a datetime, then call datetime.date(2010, 6, 16).strftime("%V")4
You can also use isocalendar which will return a tuple, as opposed to a string above datetime.date(2010, 6, 16).isocalendar()[1]
How to get week number in Python?
A column in my pandas data frame represents a time delta that I calculated with datetime then exported into a csv and read back into a pandas data frame. Now the column's dtype is object whereas I want it to be a timedelta so I can perform a groupby function on the dataframe. Below is what the strings look like. Thanks!
0 days 00:00:57.416000
0 days 00:00:12.036000
0 days 16:46:23.127000
49 days 00:09:30.813000
50 days 00:39:31.306000
55 days 12:39:32.269000
-1 days +22:03:05.256000
Update, my best attempt at writing a for-loop to iterate over a specific column in my pandas dataframe:
def delta(i):
days, timestamp = i.split(" days ")
timestamp = timestamp[:len(timestamp)-7]
t = datetime.datetime.strptime(timestamp,"%H:%M:%S") +
datetime.timedelta(days=int(days))
delta = datetime.timedelta(days=t.day, hours=t.hour,
minutes=t.minute, seconds=t.second)
delta.total_seconds()
data['diff'].map(delta)
Use pd.to_timedelta
pd.to_timedelta(df.iloc[:, 0])
0 0 days 00:00:57.416000
1 0 days 00:00:12.036000
2 0 days 16:46:23.127000
3 49 days 00:09:30.813000
4 50 days 00:39:31.306000
5 55 days 12:39:32.269000
6 -1 days +22:03:05.256000
Name: 0, dtype: timedelta64[ns]
import datetime
#Parse your string
days, timestamp = "55 days 12:39:32.269000".split(" days ")
timestamp = timestamp[:len(timestamp)-7]
#Generate datetime object
t = datetime.datetime.strptime(timestamp,"%H:%M:%S") + datetime.timedelta(days=int(days))
#Generate a timedelta
delta = datetime.timedelta(days=t.day, hours=t.hour, minutes=t.minute, seconds=t.second)
#Represent in Seconds
delta.total_seconds()
You could do something like this, looping through each value from the CSV in place of stringdate:
stringdate = "2 days 00:00:57.416000"
days_v_hms = string1.split('days')
hms = days_v_hms[1].split(':')
dt = datetime.timedelta(days=int(days_v_hms[0]), hours=int(hms[0]), minutes=int(hms[1]), seconds=float(hms[2]))
Cheers!
I am new to Pandas timeseries and dataframes and struggle getting this simple task done.
I have a dataset "data" (1-dimensional float32-Numpy array) for each day from 1/1/2004 - 12/31/2008. The dates are stored as a list of datetime objects "dates".
Basically, I would like to calculate a complete "standard year" - the average value of each day of all years (1-365).
I started from this similar (?) question (Getting the average of a certain hour on weekdays over several years in a pandas dataframe), but could not get to the desired result - a time series of 365 "average" days, e.g. the average of all four 1st of January's, 2nd of January's ...
A small example script:
import numpy as np
import pandas as pd
import datetime
startdate = datetime.datetime(2004, 1, 1)
enddate = datetime.datetime(2008, 1, 1)
days = (enddate + datetime.timedelta(days=1) - startdate).days
data = np.random.random(days)
dates = [startdate + datetime.timedelta(days=x) for x in range(0, days)]
ts = pd.Series(data, dates)
test = ts.groupby(lambda x: (x.year, x.day)).mean()
Group by the month and day, rather than the year and day:
test = ts.groupby([ts.index.month, ts.index.day]).mean()
yields
1 1 0.499264
2 0.449357
3 0.498883
...
12 17 0.408180
18 0.317682
19 0.467238
...
29 0.413721
30 0.399180
31 0.828423
Length: 366, dtype: float64