closest date when looping over array. python [duplicate] - python

This question already has answers here:
Find the closest date to a given date
(9 answers)
Closed 3 years ago.
If I have an array of dates like the following:
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
and I have a date like the following 17-Aug-2018.
can anyone advise the best way to check for the closest date, always in the past?
I have tried the following, but to no avail.
closest_date
for i in range(len(array)):
if(date > array[i].date and date < array[i + 1].date):
closest_date = array[i]

Follows yet another approach:
from datetime import datetime
convert = lambda e: datetime.strptime(e, '%d-%b-%Y')
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
ref = convert("17-Aug-2018")
transform = ((convert(elem['date']), elem['date']) for elem in array)
_, closest_date = max((elem for elem in transform if (elem[0] - ref).days < 0), key = lambda e: e[0])
print(closest_date)
Output is
09-Aug-2018
Hope this helps.

My approach first creates a list of datetime objects from your list of dicts, and then simply sorts the dates while comparing with the input date.
input_dt = datetime.strptime('17-Aug-2018', '%d-%b-%Y')
sorted(
map(lambda date: datetime.strptime(date['date'], '%d-%b-%Y'), array),
key=lambda dt: (input_dt - dt).total_seconds() if dt < input_dt else float("inf"),
)[0].strftime('%d-%b-%Y')

This is one approach.
Ex:
import datetime
array = [{'date': '09-Jul-2018'},
{'date': '09-Aug-2018'},
{'date': '09-Sep-2018'}]
to_check = "17-Aug-2018"
to_check = datetime.datetime.strptime(to_check, "%d-%b-%Y")
closest_dates = []
val = 0
for date in array:
date_val = datetime.datetime.strptime(date["date"], "%d-%b-%Y")
if date_val <= to_check:
closest_dates.append({(to_check - date_val).days: date["date"]})
print(min(closest_dates, key=lambda x: x.items()[0]))
Output:
{8: '09-Aug-2018'}

If the dates in your dictionary are timestamps here is a way to do it :
from datetime import date
closest_date = min([x['date'] for x in array])
date = date(2018, 8, 17)
for element in array:
current_date = element['date']
if current_date < date and current_date>closest_date:
closest_date = current_date
# Output : datetime.date(2018, 8, 9)
If your dates are not in the timestamp format, here is a way to convert them easily :
from datetime import datetime
array = [ {'date' : datetime.strptime(s['date'],'%d-%b-%Y')} for s in array]

I would advise you to use always vectorised operations in NumPy. It is always much faster :D. I would do it this way:
import numpy as np
import datetime
dates = np.array(list(map(lambda d: datetime.datetime.strptime(d["date"], "%d-%b-%Y"), array)))
differences = dates - datetime.datetime.strptime("17-Aug-2018", "%d-%b-%Y")
differences = np.vectorize(lambda d: d.days)(differences)
differences[differences >= 0] = -9e9
most_recent_date = dates[np.argmax(differences)]

Related

Calculate the total days between a range of dates using Python

I have a list of date ranges and want to find the total number of days between those ranges. However, the ranges may, or may not, have overlap. And I want to exclude overlapped time. There may also be gaps between the ranges which I also want to exclude.
I'm curious on the most optimal way to calculate this.
An example:
ranges = [
{'start': 1/1/2001, 'end': 1/1/2002},
{'start': 1/1/2000, 'end': 1/1/2002},
{'start': 1/1/2003, 'end': 1/1/2004},
]
Total range time in days -- 1/1/2000 through 1/1/2002 + 1/1/2003 through 1/1/2004
from datetime import datetime, timedelta
ranges = [
{'start': '1/1/2001', 'end': '1/1/2002'},
{'start': '1/1/2000', 'end': '1/1/2002'},
{'start': '1/1/2003', 'end': '1/1/2004'},
]
# Sort the list of date ranges by the start date
ranges = sorted(ranges, key=lambda x: datetime.strptime(x['start'], '%m/%d/%Y'))
# Initialize the start and end dates for the non-overlapping and non-gapped ranges
start_date = datetime.strptime(ranges[0]['start'], '%m/%d/%Y')
end_date = datetime.strptime(ranges[0]['end'], '%m/%d/%Y')
total_days = 0
# Iterate through the list of date ranges
for i in range(1, len(ranges)):
current_start_date = datetime.strptime(ranges[i]['start'], '%m/%d/%Y')
current_end_date = datetime.strptime(ranges[i]['end'], '%m/%d/%Y')
# Check for overlaps and gaps
if current_start_date <= end_date:
end_date = max(end_date, current_end_date)
else:
total_days += (end_date - start_date).days
start_date = current_start_date
end_date = current_end_date
# Add the last range to the total days
total_days += (end_date - start_date).days
print(total_days)
You can easily do it by using Pandas, here is an reference/example code
import pandas as pd
data = [
{'start': 1/1/2001, 'end': 1/1/2002},
{'start': 1/1/2000, 'end': 1/1/2002},
{'start': 1/1/2003, 'end': 1/1/2004},
]
def numDays(start,end)
dt = pd.to_datetime(start, format='%d/%m/%Y')
dt1 = pd.to_datetime(end, format='%d/%m/%Y')
return (dt1-dt).days
for i in data:
print(numDays(i["start"],i["end"]))
Convert the values to datetime.datetime objects; the difference of two such objects is a datetime.timedelta object, which contains the amount of time between the two.
>>> from datetime import datetime
>>> parse = lambda x: datetime.strptime(x, "%m/%d/%Y")
>>> t1 = [parse(d['end']) - parse(d['start']) for d in ranges]
>>> print(sum(td.days for td in t1))
1461

Sorting Array using Sorting Algorithm in PYthon

As part of my project, I want to make a database which sorts the Age based on their birthdate.
import datetime
profile = (
('Joe', 'Clark', '1989-11-20'),
('Charlie', 'Babbitt', '1988-11-20'),
('Frank', 'Abagnale', '2002-11-20'),
('Bill', 'Clark', '2009-11-20'),
('Alan', 'Clark', '1925-11-20'),
)
age_list = []
for prof in profile:
date = prof[2]
datem = datetime.datetime.strptime(date, "%Y-%m-%d")
tod = datem.day
mos = datem.month
yr = datem.year
today_date = datetime.datetime.now()
dob = datetime.datetime(yr, mos, tod)
time_diff = today_date - dob
Age = time_diff.days // 365
age_list.append(Age)
def insertionsort(age_list):
for him in range(1, len(age_list)):
call = him - 1
while age_list[call] > age_list[call + 1] and call >= 0:
age_list[call], age_list[call + 1] = age_list[call + 1], age_list[call]
call -= 1
print("")
print("\t\t\t\t\t\t\t\t\t\t\t---Insertion Sort---")
print("Sorted Array of Age: ", age_list)
and the output would be:
---Insertion Sort---
Sorted Array of Age: [12, 19, 32, 33, 96]
But that's not what I want, I don't want just the Age but also the other elements to be included in the output
So instead of the output earlier, what I want is:
---Insertion Sort---
Sorted Array of Age: [Bill, Clark, 12]
[Frank, Abagnale, 19]
[Joe, Clark, 32]
[Charlie, Babbitt, 33]
[Alan, Clark, 96]
Thank you in advanced!
As you want to keep your own insertion sort implementation, I would suggest putting the date of birth as the first tuple member: that way you can just compare tuples in your sorting implementation. The date of birth is in fact a better value to sort by (but reversed) than the age, as the date has more precision (day) compared to the age (year).
Secondly, your algorithm to calculate the age is error prone, as not all years have 365 days. Use the code as provided in this question:
import datetime
def calculate_age(born):
today = datetime.date.today()
return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
def insertionsort(lst):
for i, value in enumerate(lst):
for j in range(i - 1, -1, -1):
if lst[j] > value: # this will give a sort in descending order
break
lst[j], lst[j + 1] = lst[j + 1], lst[j]
# Your example data as a list
profiles = [
('Joe', 'Clark', '1989-11-20'),
('Charlie', 'Babbitt', '1988-11-20'),
('Frank', 'Abagnale', '2002-11-20'),
('Bill', 'Clark', '2009-11-20'),
('Alan', 'Clark', '1925-11-20'),
]
# Put date of birth first, and append age
profiles = [(dob, first, last, calculate_age(datetime.datetime.strptime(dob, "%Y-%m-%d")))
for first, last, dob in profiles]
insertionsort(profiles)
print(profiles)
results = sorted(profile, key = lamda x: datetime.datetime.strptime(x[2], "%Y-%m-%d"))
You could do it like this. Note that the strptime function may not be necessary for you but it implicitly validates the format of the date in your input data. Also note that because the dates are in the form of YYYY-MM-DD they can be sorted lexically to give the desired result.
from datetime import datetime
from dateutil.relativedelta import relativedelta
profile = (
('Joe', 'Clark', '1989-11-20'),
('Charlie', 'Babbitt', '1988-11-20'),
('Frank', 'Abagnale', '2002-11-20'),
('Bill', 'Clark', '2009-11-20'),
('Alan', 'Clark', '1925-11-20')
)
for person in sorted(profile, key=lambda e: e[2], reverse=True):
age = relativedelta(datetime.today(), datetime.strptime(person[2], '%Y-%m-%d')).years
print(f'{person[0]}, {person[1]}, {age}')

How to tell if two dates in a list are consecutive in Python?

I have sorted a list of string dates by order
sorteddates =['2017-04-26', '2017-05-05', '2017-05-10', '2017-05-11', '2017-05-16']
I have tried using this to sort my code by consecutive dates by I am having a difficult time understanding. I want to see if which 2 dates are consecutive. Only two dates.
dates = [datetime.strptime(d, "%Y-%m-%d") for d in sorteddates]
date_ints = set([d.toordinal() for d in dates])
Convert the list from str to datetime -- still in sorted order.
Iterate through the list; for each item, check to see whether the next item is one day later -- datetime has timedelta values as well.
Some code:
# Convert list to datetime; you've shown you can do that part.
enter code here
one_day = datetime.timedelta(days=1)
for today, tomorrow in zip(sorteddates, sorteddates[1:]):
if today + one_day == tomorrow:
print ("SUCCESS")
If I understand your question correctly, to get first pair of consecutive dates you can check if their delta is 1 day:
from datetime import datetime
sorteddates =['2017-04-26', '2017-05-05', '2017-05-10', '2017-05-11', '2017-05-16']
dates = [datetime.strptime(d, "%Y-%m-%d") for d in sorteddates]
d = next(((d1, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1).days == 1), None ) # <-- returns pair or None if no consecutive dates are found
print(d)
Prints:
(datetime.datetime(2017, 5, 10, 0, 0), datetime.datetime(2017, 5, 11, 0, 0))
Or formatted:
if d:
print([datetime.strftime(i, "%Y-%m-%d") for i in d])
Prints:
['2017-05-10', '2017-05-11']

Downloading Real time rainfall data from Data.gov.sg API (json) with Python

I would like to download real-time rainfall data from the Data.gov.sg API (in json format) with Python from weather station S17 from this link, but my code is throwing:
TypeError: string indices must be integers
Appreciate if anyone could help! Thanks!
Code
#!/usr/bin/env python
import requests
import json
import datetime as dt
from xlwt import Workbook
start_date = dt.datetime(2018, 8, 14, 00, 00, 00)
end_date = dt.datetime(2018, 8, 19, 00, 00, 00)
total_days = (end_date - start_date).days + 1
neadatasum = []
for day_number in range(total_days):
for day_time in range(0, 24, 86400):
current_date = (start_date + dt.timedelta(days = day_number)).date()
current_time = (start_date + dt.timedelta(hours = day_time)).time()
url = 'https://api.data.gov.sg/v1/environment/rainfall?date_time=' + str(current_date) + 'T' + \
str(current_time)
headers = {"api-key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
data = requests.get(url, headers=headers).json()
actualtime = str(current_date) + 'T' + str(current_time) + '+08:00'
print(current_date, current_time)
if not data['items'][0]:
station_id = 'S71'
value = 'Nan'
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
else:
datatime = data['items'][0]['timestamp']
if actualtime != datatime:
station_id = 'S71'
value = 'Nan'
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
else:
station_id = 'S71'
value = data['items'][0]['timestamp']['readings']['station_id', 'value']
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
print(neadatasum)
wb = Workbook()
sheet1 = wb.add_sheet('Rainfall 5 minutes(mm)')
sheet1.write(0, 0, 'Date')
sheet1.write(0, 1, 'Time')
sheet1.write(0, 2, 'Rainfall')
for i, j in enumerate(neadatasum):
for k, l in enumerate(j):
sheet1.write(i+1, k, l)
wb.save('Rainfall 5 minutes(mm)(14082018 to 19082018).xls')
Traceback (most recent call last):
File "C:/Users/erilpm/AppData/Local/Programs/Python/Python36-32/Rainfall.py", line 36, in
value = data['items'][0]['timestamp']['readings']['station_id', 'value']
TypeError: string indices must be integers
I am not 100% sure if I understand what it is you want this code to do, but I am giving this my best shot. Please comment if I misunderstood your question!
OK, so first of all, you can't index in
data['items'][0]['timestamp']['readings']['station_id', 'value']
... if your data doesn't look like
data[ items , ..., ... ]
^
[ 0, 1, ... ]
^
[..., timestamp, ... ]
^
[readings, ..., ...]
^
[ (station_id, value), (...,...), ..., ... ]
... and indeed, that isn't really what your data seems to look like. In fact if I do print(data['items'][0].keys()), all I get back are ['timestamp','readings'].
Note that when I print data['items'] I get:
[{'timestamp': '2018-08-14T00:00:00+08:00', 'readings': [{'station_id': 'S77', 'value': 0}, {'station_id': 'S109', 'value': 0}, {'station_id': 'S117', 'value': 0}, {'station_id': 'S55', 'value': 0}, {'station_id': 'S64', 'value': 0}, {'station_id': 'S90', 'value': 0}, etc etc etc
Ok, so I think you want a list of things like (current time, current date, some value recorded at that specific station). The problem is you're looking at a list of values, not all of them from the station you want ... not just one value. So, here is my proposal. Replace everything in the end of your loop with the following:
# ok, so you like station S71, right?
station_id = 'S71'
# use list comprehension to get only values from that station
values = [a['value'] for a in data['items'][0]['readings'] if a['station_id'] == station_id]
# iterate over all the matching values
for value in values:
# and add them to your nifty list!
neadatasum1 = [current_date, current_time, value]
neadatasum.append(neadatasum1)
I hope this makes sense! If this does not work, or does not answer your question, or if I misunderstood your question, please comment accordingly and I (or someone more talented) will try and fix this or otherwise answer whatever remains to be answered :)

Aggregate Monthly Values

I have a python list containing multiple list:
A = [['1/1/1999', '3.0'],
['1/2/1999', '4.5'],
['1/3/1999', '6.8'],
......
......
['12/31/1999', '8.7']]
What I need is to combine all the values corresponding to each month, preferably in the form of a dictionary containing months as keys and their values as values.
Example:
>>> A['1/99']
>>> ['3.0', '4.5', '6.8'.....]
Or in the form of a list of list, so that:
Example:
>>> A[0]
>>> ['3.0', '4.5', '6.8'.....]
Thanks.
Pandas is perfect for this, if you don't mind another dependency:
For example:
import pandas
import numpy as np
# Generate some data
dates = pandas.date_range('1/1/1999', '12/31/1999')
values = (np.random.random(dates.size) - 0.5).cumsum()
df = pandas.DataFrame(values, index=dates)
for month, values in df.groupby(lambda x: x.month):
print month
print values
The really neat thing, though, is aggregation of the grouped DataFrame. For example, if we wanted to see the min, max, and mean of the values grouped by month:
print df.groupby(lambda x: x.month).agg([min, max, np.mean])
This yields:
min max mean
1 -0.812627 1.247057 0.328464
2 -0.305878 1.205256 0.472126
3 1.079633 3.862133 2.264204
4 3.237590 5.334907 4.025686
5 3.451399 4.832100 4.303439
6 3.256602 5.294330 4.258759
7 3.761436 5.536992 4.571218
8 3.945722 6.849587 5.513229
9 6.630313 8.420436 7.462198
10 4.414918 7.169939 5.759489
11 5.134333 6.723987 6.139118
12 4.352905 5.854000 5.039873
from collections import defaultdict
from datetime import date
month_aggregate = defaultdict (list)
for [d,v] in A:
month, day, year = map(int, d.split('/'))
date = date (year, month, 1)
month_aggregate [date].append (v)
I iterate over each date and value, I pull out the year and month and create a date with those values. I then append the value to a list associated with that year and month.
Alternatively, if you want to use a string as a key then you can
from collections import defaultdict
month_aggregate = defaultdict (list)
for [d,v] in A:
month, day, year = d.split('/')
month_aggregate [month + "/" + year[2:]].append (v)
here is my solution without includes
def getKeyValue(lst):
a = lst[0].split('/')
return '%s/%s' % (a[0], a[2][2:]), lst[1]
def createDict(lst):
d = {}
for e in lst:
k, v = getKeyValue(e)
if not k in d: d[k] = [v]
else: d[k].append(v)
return d
A = [['1/1/1999', '3.0'],
['1/2/1999', '4.5'],
['1/3/1999', '6.8'],
['12/31/1999', '8.7']]
print createDict(A)
>>>{'1/99': ['3.0', '4.5', '6.8'], '12/99': ['8.7']}

Categories