how to convert a python date in to an integer - python

Struggling with something that should be easy:
today = '26/8/2018'
start = '1/8/2018'
diff = today - start
diff gives us 26 days
how do I take the integer value of this datetime? i.e. 26?
basically, im trying to calc a daycount fraction, (diff / 365) * 10,000 say, but it wont work.
My actual values I have are:
0 304.548
1 371.397
2 350.466
3 -3574.36
4 255.452
and im trying to multiply them by:
duration
0 13 days
1 2 days
2 1 days
3 20 days
4 7 days
But I get:
0 TimedeltaIndex(['3959 days 02:57:32.054794', ...
1 TimedeltaIndex([ '4828 days 03:56:42.739725', ...
2 TimedeltaIndex([ '4556 days 01:18:54.246575', ...
3 TimedeltaIndex(['-46467 days +08:52:36.164383'...
4 TimedeltaIndex(['3320 days 21:02:27.945204', ...
desired output is
0 3959.124 as an integer (304.548*13), not as a daycount

Perhaps something like this might work:
In [1]: import datetime
In [4]: diff = datetime.datetime.today() - datetime.datetime(year=2018, month=8, day=1)
In [5]: diff.days
Out[5]: 25
Then you can do something like:
In [10]: diff.days / 365 * 10000
Out[10]: 684.931506849315

Related

Convert multiple time format object as datetime format

I have a dataframe with a list of time value as object and needed to convert them to datetime, the issue is, they are not on the same format so when I try:
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M:%S')
it gives me an error
ValueError: time data '3:22' does not match format '%H:%M:%S' (match)
or if use this code
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M')
I get this error
ValueError: unconverted data remains: :58
These are the values on my data
Total call time
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
**45:48**
1:41:40
5:08:37
**3:22**
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58
times = """\
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
45:48
1:41:40
5:08:37
3:22
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58""".split()
import pandas as pd
df = pd.DataFrame(times, columns=['elapsed'])
def pad(s):
if len(s) == 4:
return '00:0'+s
elif len(s) == 5:
return '00:'+s
return s
print(pd.to_timedelta(df['elapsed'].apply(pad)))
Output:
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 00:03:22
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58
Name: elapsed, dtype: timedelta64[ns]
Alternatively to grovina's answer ... instead of using apply you can directly use the dt accessor.
Here's a sample:
>>> data = [['2017-12-01'], ['2017-12-
30'],['2018-01-01']]
>>> df = pd.DataFrame(data=data,
columns=['date'])
>>> df
date
0 2017-12-01
1 2017-12-30
2 2018-01-01
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: object
Note how df.date is an object? Let's turn it into a date like you want
>>> df.date = pd.to_datetime(df.date)
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: datetime64[ns]
The format you want is for string formatting. I don't think you'll be able to convert the actual datetime64 to look like that format. For now, let's make a newly formatted string version of your date in a separate column
>>> df['new_formatted_date'] =
df.date.dt.strftime('%d/%m/%y %H:%M')
>>> df.new_formatted_date
0 01/12/17 00:00
1 30/12/17 00:00
2 01/01/18 00:00
Name: new_formatted_date, dtype: object
Finally, since the df.date column is now of date datetime64... you can use the dt accessor right on it. No need to use apply
>>> df['month'] = df.date.dt.month
>>> df['day'] = df.date.dt.day
>>> df['year'] = df.date.dt.year
>>> df['hour'] = df.date.dt.hour
>>> df['minute'] = df.date.dt.minute
>>> df
date new_formatted_date month day
year hour minute
0 2017-12-01 01/12/17 00:00 12
1 2017 0 0
1 2017-12-30 30/12/17 00:00 12
30 2017 0 0
2 2018-01-01 01/01/18 00:00
Another idea is test if double : and if not added :00 with converting to timedeltas by to_timedelta, also is test if number before first : is less like 23 - then is parsing like HH:MM, if is greater is parising like MM:SS:
m1 = df['Total call time'].str.count(':').ne(2)
m2 = df['Total call time'].str.extract('^(\d+):', expand=False).astype(float).gt(23)
s = np.select([m1 & m2, m1 & ~m2],
['00:' + df['Total call time'], df['Total call time']+ ':00'],
df['Total call time'] )
df['Total call time'] = pd.to_timedelta(s)
print (df)
Total call time
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 03:22:00
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58

Pandas read format %D:%H:%M:%S with python

Currently I am reading in a data frame with the timestamp from film 00(days):00(hours clocks over at 24 to day):00(min):00(sec)
pandas reads time formats HH:MM:SS and YYYY:MM:DD HH:MM:SS fine.
Though is there a way of having pandas read the duration of time such as the DD:HH:MM:SS.
Alternatively using timedelta how would I go about getting the DD into HH in the data frame so that pandas can make it "1 day HH:MM:SS" for example
Data sample
00:00:00:00
00:07:33:57
02:07:02:13
00:00:13:11
00:00:10:11
00:00:00:00
00:06:20:06
01:12:13:25
Expected output for last sample
36:13:25
Thanks
If you want timedelta objects, a simple way is to replace the first colon with days :
df['timedelta'] = pd.to_timedelta(df['col'].str.replace(':', 'days ', n=1))
output:
col timedelta
0 00:00:00:00 0 days 00:00:00
1 00:07:33:57 0 days 07:33:57
2 02:07:02:13 2 days 07:02:13
3 00:00:13:11 0 days 00:13:11
4 00:00:10:11 0 days 00:10:11
5 00:00:00:00 0 days 00:00:00
6 00:06:20:06 0 days 06:20:06
7 01:12:13:25 1 days 12:13:25
>>> df.dtypes
col object
timedelta timedelta64[ns]
dtype: object
From there it's also relatively easy to combine the days and hours as string:
c = df['timedelta'].dt.components
df['str_format'] = ((c['hours']+c['days']*24).astype(str)
+df['col'].str.split('(?=:)', n=2).str[-1]).str.zfill(8)
output:
col timedelta str_format
0 00:00:00:00 0 days 00:00:00 00:00:00
1 00:07:33:57 0 days 07:33:57 07:33:57
2 02:07:02:13 2 days 07:02:13 55:02:13
3 00:00:13:11 0 days 00:13:11 00:13:11
4 00:00:10:11 0 days 00:10:11 00:10:11
5 00:00:00:00 0 days 00:00:00 00:00:00
6 00:06:20:06 0 days 06:20:06 06:20:06
7 01:12:13:25 1 days 12:13:25 36:13:25
Convert days separately, add to times and last call custom function:
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
d = pd.to_timedelta(df['col'].str[:2].astype(int), unit='d')
td = pd.to_timedelta(df['col'].str[3:])
df['col'] = d.add(td).apply(f)
print (df)
col
0 0:00:00
1 7:33:57
2 55:02:13
3 0:13:11
4 0:10:11
5 0:00:00
6 6:20:06
7 36:13:25

How to turn multiple value within a period in a pandas df into their corresponding "weight" for that period?

So I have a pandas table like this:
DateTime
Score
Day 1 - Time 1
4
Day 1 - Time 2
3
Day 1 - Time 3
3
Day 2 - Time 1
10
Day 2 - Time 2
10
Day 3 - Time 1
184
Day 4 ...
...
I want to turn it into this
DateTime
Score
DailyScoreWeight
Day 1 - Time 1
4
0.4
Day 1 - Time 2
3
0.3
Day 1 - Time 3
3
0.3
Day 2 - Time 1
10
0.5
Day 2 - Time 2
10
0.5
Day 3 - Time 1
184
1.0
Day 4 ...
...
...
Use GroupBy.transform with sum for new Series, so possible divide by original column:
df['DateTime'] = pd.to_datetime(df['DateTime'])
s = df.groupby(pd.Grouper(freq='d', key='DateTime'))['Score'].transform('sum')
df['new'] = df['Score'].div(s)

Group by id and calculate variation on sells based on the date

My DataFrame looks like this:
id
date
value
1
2021-07-16
100
2
2021-09-15
20
1
2021-04-10
50
1
2021-08-27
30
2
2021-07-22
15
2
2021-07-22
25
1
2021-06-30
40
3
2021-10-11
150
2
2021-08-03
15
1
2021-07-02
90
I want to groupby the id, and return the difference of total value in a 90-days period.
Specifically, I want the values of last 90 days based on today, and based on 30 days ago.
For example, considering today is 2021-10-13, I would like to get:
the sum of all values per id between 2021-10-13 and 2021-07-15
the sum of all values per id between 2021-09-13 and 2021-06-15
And finally, subtract them to get the variation.
I've already managed to calculate it, by creating separated temporary dataframes containing only the dates in those periods of 90 days, grouping by id, and then merging these temp dataframes into a final one.
But I guess it should be an easier or simpler way to do it. Appreciate any help!
Btw, sorry if the explanation was a little messy.
If I understood correctly, you need something like this:
import pandas as pd
import datetime
## Calculation of the dates that we are gonna need.
today = datetime.datetime.now()
delta = datetime.timedelta(days = 120)
# Date of the 120 days ago
hundredTwentyDaysAgo = today - delta
delta = datetime.timedelta(days = 90)
# Date of the 90 days ago
ninetyDaysAgo = today - delta
delta = datetime.timedelta(days = 30)
# Date of the 30 days ago
thirtyDaysAgo = today - delta
## Initializing an example df.
df = pd.DataFrame({"id":[1,2,1,1,2,2,1,3,2,1],
"date": ["2021-07-16", "2021-09-15", "2021-04-10", "2021-08-27", "2021-07-22", "2021-07-22", "2021-06-30", "2021-10-11", "2021-08-03", "2021-07-02"],
"value": [100,20,50,30,15,25,40,150,15,90]})
## Casting date column
df['date'] = pd.to_datetime(df['date']).dt.date
grouped = df.groupby('id')
# Sum of last 90 days per id
ninetySum = grouped.apply(lambda x: x[x['date'] >= ninetyDaysAgo.date()]['value'].sum())
# Sum of last 90 days, starting from 30 days ago per id
hundredTwentySum = grouped.apply(lambda x: x[(x['date'] >= hundredTwentyDaysAgo.date()) & (x['date'] <= thirtyDaysAgo.date())]['value'].sum())
The output is
ninetySum - hundredTwentySum
id
1 -130
2 20
3 150
dtype: int64
You can double check to make sure these are the numbers you wanted by printing ninetySum and hundredTwentySum variables.

Signed time deltas to signed seconds in Pandas

Consider the following series:
> df['time_delta']
0 -1 days +00:08:11
1 0 days 01:57:46
2 0 days 00:58:34
3 0 days 17:30:23
4 -1 days +21:44:34
5 -2 days +22:01:56
6 0 days 03:18:57
7 -1 days +21:44:48
8 -1 days +00:07:56
Name: time_delta, dtype: timedelta64[ns]
Say I want to convert this timedelta to total signed seconds. That is:
Positive deltas should convert to positive seconds
Negative deltas should convert to negative seconds
For example:
0 days 00:01:05 => 65 seconds
-1 days +23:58:30 => -90 seconds
How can I get this conversion?
Failed attempt
When I try the usual:
temp_df['seconds'] = temp_df['time_delta'].dt.seconds
I end up with:
time_delta seconds
0 -1 days +00:08:11 491
1 0 days 01:57:46 7066
2 0 days 00:58:34 3514
3 0 days 17:30:23 63023
4 -1 days +21:44:34 78274
5 -2 days +22:01:56 79316
6 0 days 03:18:57 11937
7 -1 days +21:44:48 78288
8 -1 days +00:07:56 476
which correctly handled positive deltas, but not the negative ones. To see this, note that the negative deltas seem to ignore the sign of the day offset. That is, in the example above:
-1 days +21:44:48 should convert to -8112 seconds, not 78288 seconds (wrong sign and value).
If it's a Timedelta object, just divide it by Timedelta(seconds=1):
>>> pd.Timedelta(days=-1) / pd.Timedelta(seconds=1)
-86400.0
just call abs prior to dt.total_seconds to get the absolute values:
df['seconds'] = df['time_delta'].abs().dt.total_seconds()
Example:
In [63]:
df = pd.DataFrame({'date_time':pd.date_range(dt.datetime(2015,1,1,12,10,32), dt.datetime(2015,1,3,12,12,30,2))})
df['time_delta'] = df['date_time'] - dt.datetime(2015,1,2)
df
Out[63]:
date_time time_delta
0 2015-01-01 12:10:32 -1 days +12:10:32
1 2015-01-02 12:10:32 0 days 12:10:32
2 2015-01-03 12:10:32 1 days 12:10:32
In [64]:
df['time_delta'].abs().dt.total_seconds()
Out[64]:
0 42568
1 43832
2 130232
Name: time_delta, dtype: float64
To add the signs back you can compare against pd.Timedelta(0):
In [78]:
df['seconds'] = df['time_delta'].abs().dt.total_seconds()
df.loc[df['time_delta'] < pd.Timedelta(0), 'seconds'] = -df['seconds']
df
Out[78]:
date_time time_delta seconds
0 2015-01-01 12:10:32 -1 days +12:10:32 -42568
1 2015-01-02 12:10:32 0 days 12:10:32 43832
2 2015-01-03 12:10:32 1 days 12:10:32 130232
However, I think #Ami Tamory's answer is superior
EDIT
After sleeping on this I realised that this is just dt.total_seconds:
In [137]:
df['time_delta'].dt.total_seconds()
Out[137]:
0 -42568
1 43832
2 130232
Name: time_delta, dtype: float64

Categories