I have a list of date ranges and want to find the total number of days between those ranges. However, the ranges may, or may not, have overlap. And I want to exclude overlapped time. There may also be gaps between the ranges which I also want to exclude.
I'm curious on the most optimal way to calculate this.
An example:
ranges = [
{'start': 1/1/2001, 'end': 1/1/2002},
{'start': 1/1/2000, 'end': 1/1/2002},
{'start': 1/1/2003, 'end': 1/1/2004},
]
Total range time in days -- 1/1/2000 through 1/1/2002 + 1/1/2003 through 1/1/2004
from datetime import datetime, timedelta
ranges = [
{'start': '1/1/2001', 'end': '1/1/2002'},
{'start': '1/1/2000', 'end': '1/1/2002'},
{'start': '1/1/2003', 'end': '1/1/2004'},
]
# Sort the list of date ranges by the start date
ranges = sorted(ranges, key=lambda x: datetime.strptime(x['start'], '%m/%d/%Y'))
# Initialize the start and end dates for the non-overlapping and non-gapped ranges
start_date = datetime.strptime(ranges[0]['start'], '%m/%d/%Y')
end_date = datetime.strptime(ranges[0]['end'], '%m/%d/%Y')
total_days = 0
# Iterate through the list of date ranges
for i in range(1, len(ranges)):
current_start_date = datetime.strptime(ranges[i]['start'], '%m/%d/%Y')
current_end_date = datetime.strptime(ranges[i]['end'], '%m/%d/%Y')
# Check for overlaps and gaps
if current_start_date <= end_date:
end_date = max(end_date, current_end_date)
else:
total_days += (end_date - start_date).days
start_date = current_start_date
end_date = current_end_date
# Add the last range to the total days
total_days += (end_date - start_date).days
print(total_days)
You can easily do it by using Pandas, here is an reference/example code
import pandas as pd
data = [
{'start': 1/1/2001, 'end': 1/1/2002},
{'start': 1/1/2000, 'end': 1/1/2002},
{'start': 1/1/2003, 'end': 1/1/2004},
]
def numDays(start,end)
dt = pd.to_datetime(start, format='%d/%m/%Y')
dt1 = pd.to_datetime(end, format='%d/%m/%Y')
return (dt1-dt).days
for i in data:
print(numDays(i["start"],i["end"]))
Convert the values to datetime.datetime objects; the difference of two such objects is a datetime.timedelta object, which contains the amount of time between the two.
>>> from datetime import datetime
>>> parse = lambda x: datetime.strptime(x, "%m/%d/%Y")
>>> t1 = [parse(d['end']) - parse(d['start']) for d in ranges]
>>> print(sum(td.days for td in t1))
1461
Related
I have sorted a list of string dates by order
sorteddates =['2017-04-26', '2017-05-05', '2017-05-10', '2017-05-11', '2017-05-16']
I have tried using this to sort my code by consecutive dates by I am having a difficult time understanding. I want to see if which 2 dates are consecutive. Only two dates.
dates = [datetime.strptime(d, "%Y-%m-%d") for d in sorteddates]
date_ints = set([d.toordinal() for d in dates])
Convert the list from str to datetime -- still in sorted order.
Iterate through the list; for each item, check to see whether the next item is one day later -- datetime has timedelta values as well.
Some code:
# Convert list to datetime; you've shown you can do that part.
enter code here
one_day = datetime.timedelta(days=1)
for today, tomorrow in zip(sorteddates, sorteddates[1:]):
if today + one_day == tomorrow:
print ("SUCCESS")
If I understand your question correctly, to get first pair of consecutive dates you can check if their delta is 1 day:
from datetime import datetime
sorteddates =['2017-04-26', '2017-05-05', '2017-05-10', '2017-05-11', '2017-05-16']
dates = [datetime.strptime(d, "%Y-%m-%d") for d in sorteddates]
d = next(((d1, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1).days == 1), None ) # <-- returns pair or None if no consecutive dates are found
print(d)
Prints:
(datetime.datetime(2017, 5, 10, 0, 0), datetime.datetime(2017, 5, 11, 0, 0))
Or formatted:
if d:
print([datetime.strftime(i, "%Y-%m-%d") for i in d])
Prints:
['2017-05-10', '2017-05-11']
This question already has answers here:
Find the closest date to a given date
(9 answers)
Closed 3 years ago.
If I have an array of dates like the following:
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
and I have a date like the following 17-Aug-2018.
can anyone advise the best way to check for the closest date, always in the past?
I have tried the following, but to no avail.
closest_date
for i in range(len(array)):
if(date > array[i].date and date < array[i + 1].date):
closest_date = array[i]
Follows yet another approach:
from datetime import datetime
convert = lambda e: datetime.strptime(e, '%d-%b-%Y')
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
ref = convert("17-Aug-2018")
transform = ((convert(elem['date']), elem['date']) for elem in array)
_, closest_date = max((elem for elem in transform if (elem[0] - ref).days < 0), key = lambda e: e[0])
print(closest_date)
Output is
09-Aug-2018
Hope this helps.
My approach first creates a list of datetime objects from your list of dicts, and then simply sorts the dates while comparing with the input date.
input_dt = datetime.strptime('17-Aug-2018', '%d-%b-%Y')
sorted(
map(lambda date: datetime.strptime(date['date'], '%d-%b-%Y'), array),
key=lambda dt: (input_dt - dt).total_seconds() if dt < input_dt else float("inf"),
)[0].strftime('%d-%b-%Y')
This is one approach.
Ex:
import datetime
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
to_check = "17-Aug-2018"
to_check = datetime.datetime.strptime(to_check, "%d-%b-%Y")
closest_dates = []
val = 0
for date in array:
date_val = datetime.datetime.strptime(date["date"], "%d-%b-%Y")
if date_val <= to_check:
closest_dates.append({(to_check - date_val).days: date["date"]})
print(min(closest_dates, key=lambda x: x.items()[0]))
Output:
{8: '09-Aug-2018'}
If the dates in your dictionary are timestamps here is a way to do it :
from datetime import date
closest_date = min([x['date'] for x in array])
date = date(2018, 8, 17)
for element in array:
current_date = element['date']
if current_date < date and current_date>closest_date:
closest_date = current_date
# Output : datetime.date(2018, 8, 9)
If your dates are not in the timestamp format, here is a way to convert them easily :
from datetime import datetime
array = [ {'date' : datetime.strptime(s['date'],'%d-%b-%Y')} for s in array]
I would advise you to use always vectorised operations in NumPy. It is always much faster :D. I would do it this way:
import numpy as np
import datetime
dates = np.array(list(map(lambda d: datetime.datetime.strptime(d["date"], "%d-%b-%Y"), array)))
differences = dates - datetime.datetime.strptime("17-Aug-2018", "%d-%b-%Y")
differences = np.vectorize(lambda d: d.days)(differences)
differences[differences >= 0] = -9e9
most_recent_date = dates[np.argmax(differences)]
I would like to download real-time rainfall data from the Data.gov.sg API (in json format) with Python from weather station S17 from this link, but my code is throwing:
TypeError: string indices must be integers
Appreciate if anyone could help! Thanks!
Code
#!/usr/bin/env python
import requests
import json
import datetime as dt
from xlwt import Workbook
start_date = dt.datetime(2018, 8, 14, 00, 00, 00)
end_date = dt.datetime(2018, 8, 19, 00, 00, 00)
total_days = (end_date - start_date).days + 1
neadatasum = []
for day_number in range(total_days):
for day_time in range(0, 24, 86400):
current_date = (start_date + dt.timedelta(days = day_number)).date()
current_time = (start_date + dt.timedelta(hours = day_time)).time()
url = 'https://api.data.gov.sg/v1/environment/rainfall?date_time=' + str(current_date) + 'T' + \
str(current_time)
headers = {"api-key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
data = requests.get(url, headers=headers).json()
actualtime = str(current_date) + 'T' + str(current_time) + '+08:00'
print(current_date, current_time)
if not data['items'][0]:
station_id = 'S71'
value = 'Nan'
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
else:
datatime = data['items'][0]['timestamp']
if actualtime != datatime:
station_id = 'S71'
value = 'Nan'
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
else:
station_id = 'S71'
value = data['items'][0]['timestamp']['readings']['station_id', 'value']
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
print(neadatasum)
wb = Workbook()
sheet1 = wb.add_sheet('Rainfall 5 minutes(mm)')
sheet1.write(0, 0, 'Date')
sheet1.write(0, 1, 'Time')
sheet1.write(0, 2, 'Rainfall')
for i, j in enumerate(neadatasum):
for k, l in enumerate(j):
sheet1.write(i+1, k, l)
wb.save('Rainfall 5 minutes(mm)(14082018 to 19082018).xls')
Traceback (most recent call last):
File "C:/Users/erilpm/AppData/Local/Programs/Python/Python36-32/Rainfall.py", line 36, in
value = data['items'][0]['timestamp']['readings']['station_id', 'value']
TypeError: string indices must be integers
I am not 100% sure if I understand what it is you want this code to do, but I am giving this my best shot. Please comment if I misunderstood your question!
OK, so first of all, you can't index in
data['items'][0]['timestamp']['readings']['station_id', 'value']
... if your data doesn't look like
data[ items , ..., ... ]
^
[ 0, 1, ... ]
^
[..., timestamp, ... ]
^
[readings, ..., ...]
^
[ (station_id, value), (...,...), ..., ... ]
... and indeed, that isn't really what your data seems to look like. In fact if I do print(data['items'][0].keys()), all I get back are ['timestamp','readings'].
Note that when I print data['items'] I get:
[{'timestamp': '2018-08-14T00:00:00+08:00', 'readings': [{'station_id': 'S77', 'value': 0}, {'station_id': 'S109', 'value': 0}, {'station_id': 'S117', 'value': 0}, {'station_id': 'S55', 'value': 0}, {'station_id': 'S64', 'value': 0}, {'station_id': 'S90', 'value': 0}, etc etc etc
Ok, so I think you want a list of things like (current time, current date, some value recorded at that specific station). The problem is you're looking at a list of values, not all of them from the station you want ... not just one value. So, here is my proposal. Replace everything in the end of your loop with the following:
# ok, so you like station S71, right?
station_id = 'S71'
# use list comprehension to get only values from that station
values = [a['value'] for a in data['items'][0]['readings'] if a['station_id'] == station_id]
# iterate over all the matching values
for value in values:
# and add them to your nifty list!
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
I hope this makes sense! If this does not work, or does not answer your question, or if I misunderstood your question, please comment accordingly and I (or someone more talented) will try and fix this or otherwise answer whatever remains to be answered :)
There is two list of python dictionary
buy_lists = [{"buy_date":"2017-01-02","buy_price":10.50},
{"buy_date":"2017-01-15","buy_price":11.25},
{"buy_date":"2017-02-01","buy_price":8.50},
{"buy_date":"2017-02-04","buy_price":12.50}]
sell_lists=[{"sell_date":"2017-01-02","sell_price":10.50},
{"sell_date":"2017-01-03","sell_price":10.75},
{"sell_date":"2017-01-04","sell_price":12.50},
{"sell_date":"2017-01-10","sell_price":11.00},
{"sell_date":"2017-01-25","sell_price":11.25},
{"sell_date":"2017-01-27","sell_price":11.75},
{"sell_date":"2017-01-30","sell_price":7.50},
{"sell_date":"2017-02-01","sell_price":8.50},
{"sell_date":"2017-02-11","sell_price":9.50},
{"sell_date":"2017-02-15","sell_price":14.50},
{"sell_date":"2017-02-04","sell_price":12.50},
{"sell_date":"2017-02-05","sell_price":12.75},
{"sell_date":"2017-02-06","sell_price":12.80}]
How to select item from sell_lists which is most nearest date after of item in buy_lists and update the buy_lists.
buy_lists=[{"buy_date":"2017-01-02","buy_price":10.50,"sell_date":"2017-01-03","sell_price":10.75},
{"buy_date":"2017-01-15","buy_price":11.25,"sell_date":"2017-01-27","sell_price":11.75},
{"buy_date":"2017-02-01","buy_price":8.50,"sell_date":"2017-02-11","sell_price":9.50},
{"buy_date":"2017-02-04","buy_price":12.50,"sell_date":"2017-02-05","sell_price":12.75}]
This the code i have now.
for x in buy_lists:
for y in sell_lists:
# check if the x["buy_date"] is the most closest date after this date
# Then y add to x
Sort the lists by date; iterate over the sell list; when you find one with a date greater than the current buy, update the buy; get the next buy, continue.
import datetime
def f(d):
try:
s = d['buy_date']
except KeyError as e:
s = d['sell_date']
return datetime.datetime.strptime(s, '%Y-%m-%d')
buy_lists.sort(key=f)
sell_lists.sort(key=f)
bl = iter(buy_lists)
try:
buy = next(bl)
for sell in sell_lists:
if f(sell) > f(buy):
print(buy, sell)
buy.update(sell)
buy = next(bl)
except StopIteration:
pass
It might not produce what you want if there are multiple buys on the same day.
You can minimize on the positive difference between the two dates when iterating over the lists.
from dateutil import parser
def day_diff(buy, sell):
x = parser.parse(buy)
y = parser.parse(sell)
return (y-x).days
sd = 'sell_date' # shortens code
for d1 in buy_lists:
buy = d1.get('buy_date')
best_match = min(sell_lists,
key=lambda x: day_diff(buy, x.get(sd)) if day_diff(buy, x.get(sd))>=0 else 10**8)
d1.update(best_match)
buy_lists
# returns:
[{'buy_date': '2017-01-02', 'buy_price': 10.5,
'sell_date': '2017-01-02', 'sell_price': 10.5},
{'buy_date': '2017-01-15', 'buy_price': 11.25,
'sell_date': '2017-01-25', 'sell_price': 11.25},
{'buy_date': '2017-02-01', 'buy_price': 8.5,
'sell_date': '2017-02-01', 'sell_price': 8.5},
{'buy_date': '2017-02-04', 'buy_price': 12.5,
'sell_date': '2017-02-04', 'sell_price': 12.5}]
is there a one/two liner way to convert a time period string like "1h30min" to "90min" in Python/Pandas?
Common guys it's too boring, let's make it a little bit more challenging:
from collections import defaultdict
import re
def humantime2minutes(s):
d = {
'w': 7*24*60,
'week': 7*24*60,
'weeks': 7*24*60,
'd': 24*60,
'day': 24*60,
'days': 24*60,
'h': 60,
'hr': 60,
'hour': 60,
'hours': 60,
}
mult_items = defaultdict(lambda: 1).copy()
mult_items.update(d)
parts = re.search(r'^(\d+)([^\d]*)', s.lower().replace(' ', ''))
if parts:
return int(parts.group(1)) * mult_items[parts.group(2)] + humantime2minutes(re.sub(r'^(\d+)([^\d]*)', '', s.lower()))
else:
return 0
print(humantime2minutes('1 week 3 days 1 hour 30 min'))
print(humantime2minutes('2h1m'))
Output:
14490
121
Now seriously, there is a pandas.Timedelta():
print( pd.Timedelta('1h30min') )
Output:
0 days 01:30:00
Very simple. Just split by the "h" and convert:
hour, minute = "1h30min".split("h")
minutes = (60 * int(hour)) + int(minute.strip("min"))
You can use to_datetime with parameter format:
import pandas as pd
s ="1h30min"
s = pd.to_datetime(s, format='%Hh%Mmin')
print s
1900-01-01 01:30:00
print type(s)
<class 'pandas.tslib.Timestamp'>
print s.hour * 60 + s.minute
90
print str(s.hour * 60 + s.minute) + 'min'
90min
Another formats.