find repeating dates between two datetime arrays python

find repeating dates between two datetime arrays python - python

I have two datetime arrays, and I am trying to output an array with only those dates which are repeated between both arrays.. I feel like this is something I should be able to answer myself, but I have spent a lot of time searching and I do not understand how to solve this.
>>> datetime1[0:4]
array([datetime.datetime(2014, 6, 19, 4, 0),
datetime.datetime(2014, 6, 19, 5, 0),
datetime.datetime(2014, 6, 19, 6, 0),
datetime.datetime(2014, 6, 19, 7, 0)], dtype=object)
>>> datetime2[0:4]
array([datetime.datetime(2014, 6, 19, 3, 0),
datetime.datetime(2014, 6, 19, 4, 0),
datetime.datetime(2014, 6, 19, 5, 0),
datetime.datetime(2014, 6, 19, 6, 0)], dtype=object)
I've tried this below but I still do not understand why this does not work
>>> np.where(datetime1==datetime2)
(array([], dtype=int64),)

This:
datetime1==datetime2
Is an element-wise comparison. It compares [0] with [0], then [1] with [1], and gives you a boolean array.
Instead, try:
np.in1d(datetime1, datetime2)
This gives you a boolean array the same size as datetime1, set to True for those elements which exist in datetime2.
If your goal is only to get the values rather than the indexes, use this:
np.intersect1d(datetime1, datetime2)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.intersect1d.html

I would say just iterate over the values of datetime1 and datetime2 and check for containment. So for example:
for date in datetime1:
if date in datetime2:
print(date)

Related

convert yyyymmdd to serial number python

How do I convert a list of dates that are in the form yyyymmdd to a serial number? For example, if I have this list of dates:
t = [1898-10-12 06:00,1898-10-12 12:00,1932-09-30 08:00,1932-09-30 00:00]
How do I convert each date to a serial number? Im currently using the datetime toordinal() command, but each date is being rounded to the same serial number. How do I get the same dates with different times to be different numbers?
The times in the list are the datetime.datetime numbers. I tried then doing:
thurser = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
But am not getting serial numbers as floats.

datetime.toordinal() considers only the 'date' part of the datetime object, not the time. So does date.toordinal() - it only has a date part. The first 2 and last 2 elements in your list have datetimes on the same date but at different times, which .toordinal ignores. So, .toordinal will give you the same value for those same-dated datetimes.
In general, the solution would be to calculate the delta between your dates and a pre-determined/fixed one. I'm using datetime.datetime(1, 1, 1), the earliest possible datetime, so all the deltas are positive:
thurser = []
# assuming t is a list of datetime objects
for d in t:
delta = d - datetime.datetime(1, 1, 1)
thurser.append(delta.days + delta.seconds/(24 * 3600))
>>> print(thurser)
[693149.25, 693149.5, 705555.3333333334, 705555.0]
And if you prefer ints instead of floats, then use seconds instead of days:
thurser.append(int(delta.total_seconds())) # total_seconds has microseconds in the float
>>> print(thurser)
[59888095200, 59888116800, 60959980800, 60959952000]
And to get back the original values in the 2nd example:
>>> [datetime.timedelta(seconds=d) + datetime.datetime(1, 1, 1) for d in thurser]
[datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0),
datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
>>> _ == t # compare with original values
True

Let me know if my understanding is wrong, I tried following and gives distinct numbers for each value of the list.
I modified
t = ['1898-10-12 06:00','1898-10-12 12:00','1932-09-30 08:00','1932-09-30 00:00']
with
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
As mentioned in comment it is list of datetime.datetime.
I am considering total MilliSeconds from 1970-01-01 00:00:00 the given date to generate a number.
So dates which are before above date give values in negative. But distinct values.
t = [datetime.datetime(1898, 10, 12, 6, 0), datetime.datetime(1898, 10, 12, 12, 0), datetime.datetime(1932, 9, 30, 8, 0), datetime.datetime(1932, 9, 30, 0, 0)]
thurser = []
x = []
for i in range(len(t)):
thurser.append(t[i].toordinal())
x.append((t[i]-datetime.datetime.utcfromtimestamp(0)).total_seconds() * 1000.0)
print(thurser)
print(x)
output:
[693150, 693150, 705556, 705556]
[-2247501600000.0, -2247480000000.0, -1175616000000.0, -1175644800000.0]

Find third latest date in a list

I have a situation where I need to get the third latest date, i.e
INPUT :
['14-04-2001', '29-12-2061', '21-10-2019',
'07-01-1973', '19-07-2014','11-03-1992','21-10-2019']
Also , INPUT
6
14-04-2001
29-12-2061
21-10-2019
07-01-1973
19-07-2014
11-03-1992
OUTPUT : 19-07-2014
import datetime
datelist = ['14-04-2001', '29-12-2061', '21-10-2019', '07-01-1973', '19-07-2014','11-03-1992','21-10-2019' ]
for d in datelist:
x = datetime.datetime.strptime(d,'%d-%m-%Y')
print x
How can i achieve this?

You can sort the list and take the 3rd element from it.
my_list = [datetime.datetime.strptime(d,'%d-%m-%Y') for d in list]
# [datetime.datetime(2001, 4, 14, 0, 0), datetime.datetime(2061, 12, 29, 0, 0), datetime.datetime(2019, 10, 21, 0, 0), datetime.datetime(1973, 1, 7, 0, 0), datetime.datetime(2014, 7, 19, 0, 0), datetime.datetime(1992, 3, 11, 0, 0), datetime.datetime(2019, 10, 21, 0, 0)]
my_list.sort(reverse=True)
my_list[2]
# datetime.datetime(2019, 10, 21, 0, 0)
Also, as per Kerorin's suggestion, if you don't need to sort in-place and just need the 3rd element always, you can simply do
sorted(my_list, reverse=True)[2]
Update
To remove the duplicates, taking inspiration from this answer, you can do the following -
import datetime
datelist = ['14-04-2001', '29-12-2061', '21-10-2019', '07-01-1973', '19-07-2014', '11-03-1992', '21-10-2019']
seen = set()
my_list = [datetime.datetime.strptime(d,'%d-%m-%Y')
for d in datelist
if d not in seen and not seen.add(d)]
my_list.sort(reverse=True)

You can use heapq.nlargest to do this.
import heapq
from datetime import datetime
datelist = [
'14-04-2001',
'29-12-2061',
'21-10-2019',
'07-01-1973',
'19-07-2014',
'11-03-1992',
'21-10-2019'
]
heapq.nlargest(3, {datetime.strptime(d, "%d-%m-%Y") for d in datelist})[-1]
This return datetime.datetime(2014, 7, 19, 0, 0)

Django Queryset Datetime Filter

I have a column formatted as such in one of my models:
TEMP_START = models.DateTimeField(null=True)
And I am attempting to do an exact lookup using queryset syntax such as
x.filter(TEMP_START=my_datetime_object) # x can be thought of as y.objects.all()
This returns no objects, when it should do more than one. However,
x.filter(TEMP_START__date=my_datetime_object.date()).filter(TEMP_START__hour=my_datetime_object.hour)
Does return the proper objects (they're hourly). Are direct datetime filters not supported, and thus keywords must be used?
====== Edit with bad results:
Searching for: {'TEMP_START': datetime.datetime(2016, 3, 31, 2, 0)}
Values in column: [{'TEMP_START': datetime.datetime(2016, 3, 29, 8, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 29, 14, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 2, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 29, 20, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 8, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 20, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 31, 2, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 14, 0)}]
Values being returned: []
Code:
args_timeframe_start = {variables.temp_start: self.ranked_timeframes[rank][variables.temp_start]}
print(args_timeframe_start)
print(self.query_data.values(variables.temp_start).distinct())
query_data = self.query_data.filter(**args_timeframe_start)
print(query_data.values(variables.temp_start).distinct())

You need to find out what is my_datetime_object but most likely because DateTime fields contain python datetime.datetime objects, datetime.datetime objects is composed of year, month, date, hour, minute, second, microsecond. So if you merely compare date and hour, sure you could get results, but you couldn't guarantee that my_datetime_object matches one of the records in your database that has the same minute, second and microsecond.
Try this quickly and you could see what does datetime look like, also django doc about DateTimeField:
from datetime import datetime
date = datetime.now()
print date

how to perform histogram on numpy array whose dtype is object using histogramdd?

I want to perform histogram on a (N, 3) numpy array, whose three dimensions represent longitude, latitude and time-stamp correspondingly, like this:
array([[116.45565032958984, 39.889976501464844,
datetime.datetime(2012, 10, 1, 6, 32, 39)],
[116.45565032958984, 39.889984130859375,
datetime.datetime(2012, 10, 1, 6, 33, 31)],
[116.45565032958984, 39.889984130859375,
datetime.datetime(2012, 10, 1, 6, 33, 33)],
[116.45565032958984, 39.889984130859375,
datetime.datetime(2012, 10, 1, 6, 33, 37)],
[116.45561981201172, 39.89040756225586,
datetime.datetime(2012, 10, 1, 6, 34, 42)],
[116.45561981201172, 39.890411376953125,
datetime.datetime(2012, 10, 1, 6, 36, 40)],
[116.45549774169922, 39.8941650390625,
datetime.datetime(2012, 10, 1, 6, 37, 54)],
[116.45556640625, 39.92431640625,
datetime.datetime(2012, 10, 1, 6, 38, 57)],
[116.45578002929688, 39.93780517578125,
datetime.datetime(2012, 10, 1, 6, 42, 10)],
[116.44468688964844, 39.93989944458008,
datetime.datetime(2012, 10, 1, 6, 43, 21)]], dtype=object)
I tried to use np.histogramdd like this:
import numpy as np
np.histogramdd(my_data, bins = (lon_bin_num, lat_bin_num, time_bin_num),
range = [[lon_min, lon_max], [lat_min, lat_max],
[start_datetime, end_datetime]])
And got TypeError:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-271-58c94eecf21d> in <module>()
1 np.histogramdd(tmp2, bins = (lon_bin_num, lat_bin_num, time_bin_num),
----> 2 range = [[lon_min, lon_max], [lat_min, lat_max], [start_datetime, end_datetime]])
/*/*/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights)
318 smax = zeros(D)
319 for i in arange(D):
--> 320 smin[i], smax[i] = range[i]
321
322 # Make sure the bins have a finite width.
TypeError: float() argument must be a string or a number
I know it was the datetime object causing the error, but I want to know how to correct this error or how to perform histogram on numpy ndarray whose dtype = object?

A lot of NumPy functions do not work with arrays of dtype object. To use np.histogramdd, you'll need an array of shape (N, D), so structured arrays will not be helpful here either (since a structured array would remove the D dimension). You'll need an array of homogenous non-object dtype. Since the first two columns are floats, let's try to represent the third column as floats too:
You could convert the dates into NumPy's native datetime64[s] dtype:
In [102]: dates = np.array(my_data[:, 2],dtype='<M8[s]')
In [103]: dates
Out[103]:
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
'2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
'2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
'2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
'2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')
and then use astype to convert those datetime64[s]s into floats:
In [104]: float_dates = dates.astype('float')
In [105]: float_dates
Out[105]:
array([ 1.34907316e+09, 1.34907321e+09, 1.34907321e+09,
1.34907322e+09, 1.34907328e+09, 1.34907340e+09,
1.34907347e+09, 1.34907354e+09, 1.34907373e+09,
1.34907380e+09])
Now form a new array with dtype float:
arr = np.empty_like(my_data, dtype='float')
arr[:, 0:2] = my_data[:, 0:2]
arr[:, 2] = float_dates
hist, edges = np.histogramdd(arr, bins=(xedges, yedges, zedges))
While this will give you a histogram, you may also need to re-interpret the floats as dates. You can do that with astype. To obtain datetime64[s]:
In [99]: float_dates.astype('<M8[s]')
Out[99]:
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
'2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
'2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
'2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
'2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')
To obtain Python datetime.datetime objects:
In [116]: float_dates.astype('<M8[s]').tolist()
Out[116]:
[datetime.datetime(2012, 10, 1, 6, 32, 39),
datetime.datetime(2012, 10, 1, 6, 33, 31),
datetime.datetime(2012, 10, 1, 6, 33, 33),
datetime.datetime(2012, 10, 1, 6, 33, 37),
datetime.datetime(2012, 10, 1, 6, 34, 42),
datetime.datetime(2012, 10, 1, 6, 36, 40),
datetime.datetime(2012, 10, 1, 6, 37, 54),
datetime.datetime(2012, 10, 1, 6, 38, 57),
datetime.datetime(2012, 10, 1, 6, 42, 10),
datetime.datetime(2012, 10, 1, 6, 43, 21)]

Python intersection of multiple datetime lists

I'm trying to find the intersection list of 5 lists of datetime objects. I know the intersection of lists question has come up a lot on here, but my code is not performing as expected (like the ones from the other questions).
Here are the first 3 elements of the 5 lists with the exact length of the list at the end.
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38790
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38818
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38959
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38802
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 40415
I've made a list of these lists called times. I've tried 2 methods of intersecting.
Method 1:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = [val for val in intersection if val in times[i]]
This method results in a list with length 20189 and takes 104 seconds to run.
Method 2:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = list(set(intersection) & set(times[i]))
This method results in a list with length 20148 and takes 0.1 seconds to run.
I've run into 2 problems with this. The first problem is that the two methods yield different size intersections and I have no clue why. And the other problem is that the datetime object datetime.datetime(2014, 8, 14, 19, 25, 6) is clearly in all 5 lists (see above) but when I print (datetime.datetime(2014, 8, 14, 19, 25, 6) in intersection) it returns False.

Your first list times[0] has duplicate elements; this is the reason for inconsistency. If you would do intersection = list(set(times[0])) in your first snippet, the problem would go away.
As for your second code, the code will be faster if you never do changes between lists and sets:
intersection = set(times[0]) # make a set of the first list
for timeset in times[1:]:
intersection.intersection_update(timeset)
# if necessary make into a list again
intersection = list(intersection)
And actually since intersection supports multiple iterables as separate arguments. you can simply replace all your code with:
intersection = set(times[0]).intersection(*times[1:])
For the in intersection problem, is the instance an actual datetime.datetime or just pretending to be? At least the timestamps seem not to be timezone aware.

Lists can have duplicate items, which can cause inconsistencies with the length. To avoid these duplicates, you can turn each list of datetimes into a set:
map(set, times)
This will give you a list of sets (with duplicate times removed). To find the intersection, you can use set.intersection:
intersection = set.intersection(*map(set, times))
With your example, intersection will be this set:
set([datetime.datetime(2014, 8, 14, 19, 25, 9), datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7)])

There might be duplicated times and you can do it simply like this:
Python3:
import functools
result = functools.reduce(lambda x, y: set(x) & set(y), times)
Python2:
result = reduce(lambda x, y: set(x) & set(y), times)

intersection = set(*times[:1]).intersection(*times[1:])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

find repeating dates between two datetime arrays python - python

I would say just iterate over the values of datetime1 and datetime2 and check for containment. So for example: for date in datetime1: if date in datetime2: print(date)

Related

convert yyyymmdd to serial number python

Find third latest date in a list

Django Queryset Datetime Filter

how to perform histogram on numpy array whose dtype is object using histogramdd?

Python intersection of multiple datetime lists

Categories

Resources