is there a one/two liner way to convert a time period string like "1h30min" to "90min" in Python/Pandas?
Common guys it's too boring, let's make it a little bit more challenging:
from collections import defaultdict
import re
def humantime2minutes(s):
d = {
'w': 7*24*60,
'week': 7*24*60,
'weeks': 7*24*60,
'd': 24*60,
'day': 24*60,
'days': 24*60,
'h': 60,
'hr': 60,
'hour': 60,
'hours': 60,
}
mult_items = defaultdict(lambda: 1).copy()
mult_items.update(d)
parts = re.search(r'^(\d+)([^\d]*)', s.lower().replace(' ', ''))
if parts:
return int(parts.group(1)) * mult_items[parts.group(2)] + humantime2minutes(re.sub(r'^(\d+)([^\d]*)', '', s.lower()))
else:
return 0
print(humantime2minutes('1 week 3 days 1 hour 30 min'))
print(humantime2minutes('2h1m'))
Output:
14490
121
Now seriously, there is a pandas.Timedelta():
print( pd.Timedelta('1h30min') )
Output:
0 days 01:30:00
Very simple. Just split by the "h" and convert:
hour, minute = "1h30min".split("h")
minutes = (60 * int(hour)) + int(minute.strip("min"))
You can use to_datetime with parameter format:
import pandas as pd
s ="1h30min"
s = pd.to_datetime(s, format='%Hh%Mmin')
print s
1900-01-01 01:30:00
print type(s)
<class 'pandas.tslib.Timestamp'>
print s.hour * 60 + s.minute
90
print str(s.hour * 60 + s.minute) + 'min'
90min
Another formats.
Related
I have a list of date ranges and want to find the total number of days between those ranges. However, the ranges may, or may not, have overlap. And I want to exclude overlapped time. There may also be gaps between the ranges which I also want to exclude.
I'm curious on the most optimal way to calculate this.
An example:
ranges = [
{'start': 1/1/2001, 'end': 1/1/2002},
{'start': 1/1/2000, 'end': 1/1/2002},
{'start': 1/1/2003, 'end': 1/1/2004},
]
Total range time in days -- 1/1/2000 through 1/1/2002 + 1/1/2003 through 1/1/2004
from datetime import datetime, timedelta
ranges = [
{'start': '1/1/2001', 'end': '1/1/2002'},
{'start': '1/1/2000', 'end': '1/1/2002'},
{'start': '1/1/2003', 'end': '1/1/2004'},
]
# Sort the list of date ranges by the start date
ranges = sorted(ranges, key=lambda x: datetime.strptime(x['start'], '%m/%d/%Y'))
# Initialize the start and end dates for the non-overlapping and non-gapped ranges
start_date = datetime.strptime(ranges[0]['start'], '%m/%d/%Y')
end_date = datetime.strptime(ranges[0]['end'], '%m/%d/%Y')
total_days = 0
# Iterate through the list of date ranges
for i in range(1, len(ranges)):
current_start_date = datetime.strptime(ranges[i]['start'], '%m/%d/%Y')
current_end_date = datetime.strptime(ranges[i]['end'], '%m/%d/%Y')
# Check for overlaps and gaps
if current_start_date <= end_date:
end_date = max(end_date, current_end_date)
else:
total_days += (end_date - start_date).days
start_date = current_start_date
end_date = current_end_date
# Add the last range to the total days
total_days += (end_date - start_date).days
print(total_days)
You can easily do it by using Pandas, here is an reference/example code
import pandas as pd
data = [
{'start': 1/1/2001, 'end': 1/1/2002},
{'start': 1/1/2000, 'end': 1/1/2002},
{'start': 1/1/2003, 'end': 1/1/2004},
]
def numDays(start,end)
dt = pd.to_datetime(start, format='%d/%m/%Y')
dt1 = pd.to_datetime(end, format='%d/%m/%Y')
return (dt1-dt).days
for i in data:
print(numDays(i["start"],i["end"]))
Convert the values to datetime.datetime objects; the difference of two such objects is a datetime.timedelta object, which contains the amount of time between the two.
>>> from datetime import datetime
>>> parse = lambda x: datetime.strptime(x, "%m/%d/%Y")
>>> t1 = [parse(d['end']) - parse(d['start']) for d in ranges]
>>> print(sum(td.days for td in t1))
1461
Here My date details
start_Date_time = 20200709 15:48:26.603
end_Date_time = 20200709 15:58:26.648
Need Elapsed time = end_Date_time - start_Date_time.
You can use the datetime module to achieve this my friend.
import datetime
date_time_str_1 = '20200709 15:48:26.603'
date_time_obj_1 = datetime.datetime.strptime(date_time_str_1, '%Y%m%d %H:%M:%S.%f')
date_time_str_2 = '20200709 15:58:26.648'
date_time_obj_2 = datetime.datetime.strptime(date_time_str_2, '%Y%m%d %H:%M:%S.%f')
Then:
date_time_obj_2 - date_time_obj_1
You will get:
datetime.timedelta(seconds=600, microseconds=45000)
Use the built in Datetime module from Python and the example below:
>>> from datetime import datetime
>>>
>>> # read
>>> d1 = datetime(2020,7,9,15,48,26,603)
>>> d2 = datetime(2020,7,9,15,58,26,648)
>>>
>>> # calculate difference and print total amount of seconds
>>> diff = d2 - d1
>>> print(diff.total_seconds())
600.000045
>>>
>>> # keep the difference as a timedelta object
>>> diff
datetime.timedelta(0, 600, 45)
>>>
>>> # you can check the calculation
>>> d1 + diff == d2
True
This question already has answers here:
Find the closest date to a given date
(9 answers)
Closed 3 years ago.
If I have an array of dates like the following:
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
and I have a date like the following 17-Aug-2018.
can anyone advise the best way to check for the closest date, always in the past?
I have tried the following, but to no avail.
closest_date
for i in range(len(array)):
if(date > array[i].date and date < array[i + 1].date):
closest_date = array[i]
Follows yet another approach:
from datetime import datetime
convert = lambda e: datetime.strptime(e, '%d-%b-%Y')
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
ref = convert("17-Aug-2018")
transform = ((convert(elem['date']), elem['date']) for elem in array)
_, closest_date = max((elem for elem in transform if (elem[0] - ref).days < 0), key = lambda e: e[0])
print(closest_date)
Output is
09-Aug-2018
Hope this helps.
My approach first creates a list of datetime objects from your list of dicts, and then simply sorts the dates while comparing with the input date.
input_dt = datetime.strptime('17-Aug-2018', '%d-%b-%Y')
sorted(
map(lambda date: datetime.strptime(date['date'], '%d-%b-%Y'), array),
key=lambda dt: (input_dt - dt).total_seconds() if dt < input_dt else float("inf"),
)[0].strftime('%d-%b-%Y')
This is one approach.
Ex:
import datetime
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
to_check = "17-Aug-2018"
to_check = datetime.datetime.strptime(to_check, "%d-%b-%Y")
closest_dates = []
val = 0
for date in array:
date_val = datetime.datetime.strptime(date["date"], "%d-%b-%Y")
if date_val <= to_check:
closest_dates.append({(to_check - date_val).days: date["date"]})
print(min(closest_dates, key=lambda x: x.items()[0]))
Output:
{8: '09-Aug-2018'}
If the dates in your dictionary are timestamps here is a way to do it :
from datetime import date
closest_date = min([x['date'] for x in array])
date = date(2018, 8, 17)
for element in array:
current_date = element['date']
if current_date < date and current_date>closest_date:
closest_date = current_date
# Output : datetime.date(2018, 8, 9)
If your dates are not in the timestamp format, here is a way to convert them easily :
from datetime import datetime
array = [ {'date' : datetime.strptime(s['date'],'%d-%b-%Y')} for s in array]
I would advise you to use always vectorised operations in NumPy. It is always much faster :D. I would do it this way:
import numpy as np
import datetime
dates = np.array(list(map(lambda d: datetime.datetime.strptime(d["date"], "%d-%b-%Y"), array)))
differences = dates - datetime.datetime.strptime("17-Aug-2018", "%d-%b-%Y")
differences = np.vectorize(lambda d: d.days)(differences)
differences[differences >= 0] = -9e9
most_recent_date = dates[np.argmax(differences)]
I'm currently automating a browsergame. In the game you can upgrade things and the time how long it takes to upgrade is given me in a string like so: 2d 4h 20m 19s
I want to compare different upgrade times, so I'd like to get the time into seconds so its easier to compare.
My Idea was to look what time is given, then get the index of that letter, look for the numbers in front of the letter, but i think thats a bit too much line of code, espacially if i have to do it more than one time.
My idea would have been sth like that:
if "d" in string:
a = string.index("d")
if a == 2:
b = string[a-2] * 10 + string[a-1]
seconds = b * 86400
You can split the string up with .split() giving you a list: ['2d', '4h', '20m', '19s']
Now we can do each part separately.
We can also use a conversion dictionary to give us what number to use depending on the suffix:
mod = {"d": 60*60*24, "h": 60*60, "m": 60, "s": 1}
Then we just sum the list, multiplying the number of each with the mod from above:
sum(int(value[:-1]) * mod[value[-1]] for value in ds.split())
This is equivalent to:
total = 0
for value in ds.split():
number = int(value[:-1]) # strip off the units and convert the number to an int
unit = value[-1] # take the last character
total += number * mod[unit]
where ds is the date string input.
Below I've tried to split the process up into the basic steps:
In [1]: timestring = '2d 4h 20m 19s'
Out[1]: '2d 4h 20m 19s'
In [2]: items = timestring.split()
Out[2]: ['2d', '4h', '20m', '19s']
In [3]: splititems = [(int(i[:-1]), i[-1]) for i in items]
Out[3]: [(2, 'd'), (4, 'h'), (20, 'm'), (19, 's')]
In [4]: factors = {'h': 3600, 'm': 60, 's': 1, 'd': 24*3600}
Out[4]: {'h': 3600, 'm': 60, 's': 1, 'd': 86400}
In [5]: sum(a*factors[b] for a, b in splititems)
Out[5]: 188419
Like every code, this has some basic assumptions:
Different units are separated by whitespace.
The unit is only one code-point (character).
Allowed units are days, hours, minutes and seconds.
Numbers are integers.
There's a helpful total_seconds() method on timedeltas.
Given
import datetime as dt
names = ["weeks", "days", "minutes", "hours", "seconds"]
s = "2d 4h 20m 19s"
Code
Make a dict of remapped name-value pairs and pass to timedelta:
remap = {n[0]: n for n in names}
name_time = {remap[x[-1]]: int(x[:-1]) for x in s.split()}
td = dt.timedelta(**name_time)
td.total_seconds()
# 188419.0
I would like to download real-time rainfall data from the Data.gov.sg API (in json format) with Python from weather station S17 from this link, but my code is throwing:
TypeError: string indices must be integers
Appreciate if anyone could help! Thanks!
Code
#!/usr/bin/env python
import requests
import json
import datetime as dt
from xlwt import Workbook
start_date = dt.datetime(2018, 8, 14, 00, 00, 00)
end_date = dt.datetime(2018, 8, 19, 00, 00, 00)
total_days = (end_date - start_date).days + 1
neadatasum = []
for day_number in range(total_days):
for day_time in range(0, 24, 86400):
current_date = (start_date + dt.timedelta(days = day_number)).date()
current_time = (start_date + dt.timedelta(hours = day_time)).time()
url = 'https://api.data.gov.sg/v1/environment/rainfall?date_time=' + str(current_date) + 'T' + \
str(current_time)
headers = {"api-key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
data = requests.get(url, headers=headers).json()
actualtime = str(current_date) + 'T' + str(current_time) + '+08:00'
print(current_date, current_time)
if not data['items'][0]:
station_id = 'S71'
value = 'Nan'
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
else:
datatime = data['items'][0]['timestamp']
if actualtime != datatime:
station_id = 'S71'
value = 'Nan'
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
else:
station_id = 'S71'
value = data['items'][0]['timestamp']['readings']['station_id', 'value']
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
print(neadatasum)
wb = Workbook()
sheet1 = wb.add_sheet('Rainfall 5 minutes(mm)')
sheet1.write(0, 0, 'Date')
sheet1.write(0, 1, 'Time')
sheet1.write(0, 2, 'Rainfall')
for i, j in enumerate(neadatasum):
for k, l in enumerate(j):
sheet1.write(i+1, k, l)
wb.save('Rainfall 5 minutes(mm)(14082018 to 19082018).xls')
Traceback (most recent call last):
File "C:/Users/erilpm/AppData/Local/Programs/Python/Python36-32/Rainfall.py", line 36, in
value = data['items'][0]['timestamp']['readings']['station_id', 'value']
TypeError: string indices must be integers
I am not 100% sure if I understand what it is you want this code to do, but I am giving this my best shot. Please comment if I misunderstood your question!
OK, so first of all, you can't index in
data['items'][0]['timestamp']['readings']['station_id', 'value']
... if your data doesn't look like
data[ items , ..., ... ]
^
[ 0, 1, ... ]
^
[..., timestamp, ... ]
^
[readings, ..., ...]
^
[ (station_id, value), (...,...), ..., ... ]
... and indeed, that isn't really what your data seems to look like. In fact if I do print(data['items'][0].keys()), all I get back are ['timestamp','readings'].
Note that when I print data['items'] I get:
[{'timestamp': '2018-08-14T00:00:00+08:00', 'readings': [{'station_id': 'S77', 'value': 0}, {'station_id': 'S109', 'value': 0}, {'station_id': 'S117', 'value': 0}, {'station_id': 'S55', 'value': 0}, {'station_id': 'S64', 'value': 0}, {'station_id': 'S90', 'value': 0}, etc etc etc
Ok, so I think you want a list of things like (current time, current date, some value recorded at that specific station). The problem is you're looking at a list of values, not all of them from the station you want ... not just one value. So, here is my proposal. Replace everything in the end of your loop with the following:
# ok, so you like station S71, right?
station_id = 'S71'
# use list comprehension to get only values from that station
values = [a['value'] for a in data['items'][0]['readings'] if a['station_id'] == station_id]
# iterate over all the matching values
for value in values:
# and add them to your nifty list!
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
I hope this makes sense! If this does not work, or does not answer your question, or if I misunderstood your question, please comment accordingly and I (or someone more talented) will try and fix this or otherwise answer whatever remains to be answered :)