Numpy: How to best align two sorted arrays?

Numpy: How to best align two sorted arrays? - python

In order to combine time series data, I am left with the following essential step:
>>> xs1
array([ 0, 10, 12, 16, 25, 29])
>>> xs2
array([ 0, 5, 10, 15, 20, 25, 30])
How to best get the following solutions:
>>> xs1_ = np.array([0,0,10,12,12,16,16,25,29,29])
>>> xs2_ = np.array([0,5,10,10,15,15,20,25,25,30])
This is to align the measurements taken at times x1 and x2.
Imagine that the measurement from series xs1 at time 0 is valid until the next measurement in this series has been made, which is time 10. We could interpolate both series to their greatest common divisor, but that is most likely 1 and creates a huge bloat. Therefore it would be better to have an interpolation only for the union of xs1 and xs2. In xs1_ and xs2_ are aligned by list index the x-values to compare. I.e. we compare time 5 in series xs2_ with time 0 in series xs1_ as the next measurement in series xs1_ is only later, at time 10. From a visual point of view, imagine a step plot for both measurements (the y-values are not shown here) where we always compare the lines laying above each other.
Although I am struggling how to name this task, I believe it is a problem of general interest and therefore think it is appropriate to ask here for its best solution.

Here is my proposition:
a=np.array([0,10,12,16,25,29])
b=np.array([0,5,10,15,20,25,30])
c=set(a).union(b)
#c = {0, 5, 10, 12, 15, 16, 20, 25, 29, 30}
xs1_= [max([i for i in a if i<=j]) for j in c]
# [0, 0, 10, 12, 12, 16, 16, 25, 29, 29]
xs2 = [max([i for i in b if i<=j]) for j in c]
# [0, 5, 10, 10, 15, 15, 20, 25, 25, 30]
1) a and b are your two first list.
2) c is a set which represents the union between your two arrays. By doing this, you get all the value present in both array.
3) Then, for each element of this set, I will select the maximum of the value present in a or b, which remain smaller than or equal to this element.

Here's a vectorised approach:
xs1 = np.array([ 0, 10, 12, 16, 25, 29])
xs2 = np.array([ 0, 5, 10, 15, 20, 25, 30])
# union of both sets
xs = np.array(sorted(set(xs1) | set(xs2)))
# array([ 0, 5, 10, 12, 15, 16, 20, 25, 29, 30])
xs1_ = np.maximum.accumulate(np.in1d(xs, xs1) * xs)
print(xs1_)
array([ 0, 0, 10, 12, 12, 16, 16, 25, 29, 29])
xs2_ = np.maximum.accumulate(np.in1d(xs, xs2) * xs)
print(xs_2)
array([ 0, 5, 10, 10, 15, 15, 20, 25, 25, 30])
Where, for both cases:
np.in1d(xs, xs1) * xs
# array([ 0, 0, 10, 12, 0, 16, 0, 25, 29, 0])
Is giving an array with the values in in xs contained in xs1 and 0 for those that aren't. We just need to forward fill using np.maximum.accumulate.

Related

How can I extract a set of 2D slices from a larger 2D numpy array?

If I have a large 2D numpy array and 2 arrays which correspond to the x and y indices I want to extract, It's easy enough:
h = np.arange(49).reshape(7,7)
# h = [[0, 1, 2, 3, 4, 5, 6],
# [7, 8, 9, 10, 11, 12, 13],
# [14, 15, 16, 17, 18, 19, 20],
# [21, 22, 23, 24, 25, 26, 27],
# [28, 29, 30, 31, 32, 33, 34],
# [35, 36, 37, 38, 39, 40, 41],
# [42, 43, 44, 45, 46, 47, 48]]
x_indices = np.array([1,3,4])
y_indices = np.array([2,3,5])
reduced_h = h[x_indices, y_indices]
#reduced_h = [ 9, 24, 33]
However, I would like to, for each x,y pair cut out a square (denoted by 'a' - the number of indices in each direction from the centre) surrounding this 'coordinate' and return an array of these little 2D arrays.
For example, for h, x,y_indices as above and a=1:
reduced_h = [[[1,2,3],[8,9,10],[15,16,17]], [[16,17,18],[23,24,25],[30,31,32]], [[25,26,27],[32,33,34],[39,40,41]]]
i.e one 3x3 array for each x-y index pair corresponding to the 3x3 square of elements centred on the x-y index. In general, this should return a numpy array which has shape (len(x_indices),2a+1, 2a+1)
By analogy to reduced_h[0] = h[x_indices[0]-1:x_indices[0]+1 , y_indices[0]-1:y_indices[0]+1] = h[1-1:1+1 , 2-1:2+1] = h[0:2, 1:3] my first try was the following:
h[x_indices-a : x_indices+a, y_indices-a : y_indices+a]
However, perhaps unsurprisingly, slicing between the arrays fails.
So the obvious next thing to try is to create this slice manually. np.arange seems to struggle with this but linspace works:
a=1
xrange = np.linspace(x_indices-a, x_indices+a, 2*a+1, dtype=int)
# xrange = [ [0, 2, 3], [1, 3, 4], [2, 4, 5] ]
yrange = np.linspace(y_indices-a, y_indices+a, 2*a+1, dtype=int)
Now can try h[xrange,yrange] but this unsurprisingly does this element-wise meaning I get only one (2a+1)x(2a+1) array (the same dimensions as xrange and yrange). It there a way to, for every index, take the right slices from these ranges (without loops)? Or is there a way to make the broadcast work initially without having to set up linspace explicitly? Thanks

You can index np.lib.stride_tricks.sliding_window_view using your x and y indices:
import numpy as np
h = np.arange(49).reshape(7,7)
x_indices = np.array([1,3,4])
y_indices = np.array([2,3,5])
a = 1
window = (2*a+1, 2*a+1)
out = np.lib.stride_tricks.sliding_window_view(h, window)[x_indices-a, y_indices-a]
out:
array([[[ 1, 2, 3],
[ 8, 9, 10],
[15, 16, 17]],
[[16, 17, 18],
[23, 24, 25],
[30, 31, 32]],
[[25, 26, 27],
[32, 33, 34],
[39, 40, 41]]])
Note that you may need to pad h first to handle windows around your coordinates that reach "outside" h.

python slicing multi-demnsional array

Could any one explain me how does the command (<---) below works in python numpy
r = np.arange(36)
r.resize(6,6)
r.reshape(36)[::7] # <---

You just have to run the commands one by one and analyse their output:
Create a list of the first [0, 35] numbers.
>>> r = np.arange(36)
>>> r
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35])
Reshape the list in-place to a 6 x 6 array:
>>> r.resize(6,6) # equivalent to r = r.reshape(6,6)
>>> r
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
Reshape the vector r to a 1Dimensional vector
>>> tmp = r.reshape(36)
tmp above is exactly the same as r in the first step
Filter every 7 element
>>> tmp[::7]
array([ 0, 7, 14, 21, 28, 35])
Slicing/Indexing is represented as i:j:k, where i = from, j = to and k = step. Thus, 5:10:2 would mean from element 5th to the 10th, give me elements every 2 steps. If i is not present, it is assumed to be from the beginning of the array. If j is not present, it is assumed to be until the end of the array. If k is not present it is assumed to have an step of 1 (all the elements in the range).
With all the above, you could rewrite your example in a single line as:
>>> np.arange(36)[::7]
Or if you already have r, which is N-Dimensional:
>>> r.ravel()[::7]
Here ravel will return a 1Dimensional view of r (preferred to reshape(36)).
If you want to know more about slicing, please refer to the numpy documentation.

At first, you are using NumPy ndarray.reshape, which reconstructs the given array to the specified shape. In your case, you are converting it to a 1-Dimension array with 36 elements.
Secondly, with the numbers between brackets, your are indexing certain values in the array. The slicing consists in 3 values per dimension, in the form of [number1:number2:number3]. If you leave the values blank (like in your case for numbers 1 and 2), you will leave them to default i.e. number1 will be 0, number2 will be -1 (the last array index) and number3 will be 3:
The first number indicates the array index where you will begin taking values.
The second number indicates the array index where you will stop taking values.
Finally, the last number indicates the number of positions that will be ignored after each index reading. In your case, you are reading every 7 indexes.

One point to add, both reshape() and resize() methods have the SAME functionality, the ONLY difference between them is how they affect the calling array object r:
r.resize() have no return. It directly change the shape of calling array object r.
r.reshape() returns a new reshaped array object. And leaves the original r unchanged.
>>> import numpy as np
>>> r = np.arange(36)
>>> r.shape
(36,)
>>> # 1. --- `reshape()` returns a new object and keep the `r` ---
>>> new = r.reshape(6,6)
>>> new.shape
(6, 6)
>>>
>>> # 2. --- resize changes `r` directly and returns `None` ---
>>> nothing = r.resize(6,6)
>>> type(nothing)
<class 'NoneType'>
>>> r.shape
(6, 6)

how to get the value of multiple maximas in an array in python

I have an array
a =[0, 0, 15, 17, 16, 17, 16, 12, 18, 18]
I am trying to find the element value that has max count. and if there is a tie, I would like all of the elements that have the same max count.
as you can see there are two 0, two 16, two 17, two 18 one 15 and one 12
so i want something that would return
[0, 16, 17, 18] (order not important but I do not want the 15 or the 12)
I was doing np.argmax(np.bincount(a)) but argmax only returns one element (per its documentation) so I only get the 1st one which is 0
I tried
np.argpartition(values, -4)[-4:] that works, but in practice I would not know that there are 4 elements that have the same count number! (maybe I am close here!!! the light bulb just went on !!!)

You can use np.unique to get the counts and an array of the unique elements then pull the elements whose count is equal to the max:
import numpy as np
a = np.array([0, 0, 15, 17, 16, 17, 16, 12, 18, 18])
un, cnt = np.unique(a, return_counts=True)
print(un[cnt == cnt.max()])
[ 0 16 17 18]
un are the unique elements, cnt is the frequency/count of each:
In [11]: a = np.array([0, 0, 15, 17, 16, 17, 16, 12, 18, 18])
In [12]: un, cnt = np.unique(a, return_counts=True)
In [13]: un, cnt
Out[13]: (array([ 0, 12, 15, 16, 17, 18]), array([2, 1, 1, 2, 2, 2]))
cnt == cnt.max() will give us the mask to pull the elements that are equal to the max:
In [14]: cnt == cnt.max()
Out[14]: array([ True, False, False, True, True, True], dtype=bool)

It is a bit fiddly but you can achieve this using Counter and itemgetter:
from collections import Counter
from operator import itemgetter
a =[0, 0, 15, 17, 16, 17, 16, 12, 18, 18]
counter_list = Counter(a).most_common()
max_occurrences = max(counter_list, key=itemgetter(1))[1]
answer = [item[0] for item in counter_list if item[1] == max_occurrences]
print(answer)
Output
[0, 16, 17, 18]

Here is a neat solution:
from collections import Counter
import numpy as np
a = np.array([0, 0, 15, 17, 16, 17, 16, 12, 18, 18])
freq_count = Counter(a)
high = max(freq_count.values())
res = [key for key in freq_count.keys() if freq_count[key]==high]
Output: [0 16 17 18]
Note: Output order not guaranteed

Python intersection of multiple datetime lists

I'm trying to find the intersection list of 5 lists of datetime objects. I know the intersection of lists question has come up a lot on here, but my code is not performing as expected (like the ones from the other questions).
Here are the first 3 elements of the 5 lists with the exact length of the list at the end.
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38790
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38818
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38959
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38802
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 40415
I've made a list of these lists called times. I've tried 2 methods of intersecting.
Method 1:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = [val for val in intersection if val in times[i]]
This method results in a list with length 20189 and takes 104 seconds to run.
Method 2:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = list(set(intersection) & set(times[i]))
This method results in a list with length 20148 and takes 0.1 seconds to run.
I've run into 2 problems with this. The first problem is that the two methods yield different size intersections and I have no clue why. And the other problem is that the datetime object datetime.datetime(2014, 8, 14, 19, 25, 6) is clearly in all 5 lists (see above) but when I print (datetime.datetime(2014, 8, 14, 19, 25, 6) in intersection) it returns False.

Your first list times[0] has duplicate elements; this is the reason for inconsistency. If you would do intersection = list(set(times[0])) in your first snippet, the problem would go away.
As for your second code, the code will be faster if you never do changes between lists and sets:
intersection = set(times[0]) # make a set of the first list
for timeset in times[1:]:
intersection.intersection_update(timeset)
# if necessary make into a list again
intersection = list(intersection)
And actually since intersection supports multiple iterables as separate arguments. you can simply replace all your code with:
intersection = set(times[0]).intersection(*times[1:])
For the in intersection problem, is the instance an actual datetime.datetime or just pretending to be? At least the timestamps seem not to be timezone aware.

Lists can have duplicate items, which can cause inconsistencies with the length. To avoid these duplicates, you can turn each list of datetimes into a set:
map(set, times)
This will give you a list of sets (with duplicate times removed). To find the intersection, you can use set.intersection:
intersection = set.intersection(*map(set, times))
With your example, intersection will be this set:
set([datetime.datetime(2014, 8, 14, 19, 25, 9), datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7)])

There might be duplicated times and you can do it simply like this:
Python3:
import functools
result = functools.reduce(lambda x, y: set(x) & set(y), times)
Python2:
result = reduce(lambda x, y: set(x) & set(y), times)

intersection = set(*times[:1]).intersection(*times[1:])

Numpy: find index of the elements within range

I have a numpy array of numbers, for example,
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
I would like to find all the indexes of the elements within a specific range. For instance, if the range is (6, 10), the answer should be (3, 4, 5). Is there a built-in function to do this?

You can use np.where to get indices and np.logical_and to set two conditions:
import numpy as np
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(np.logical_and(a>=6, a<=10))
# returns (array([3, 4, 5]),)

As in #deinonychusaur's reply, but even more compact:
In [7]: np.where((a >= 6) & (a <=10))
Out[7]: (array([3, 4, 5]),)

Summary of the answers
For understanding what is the best answer we can do some timing using the different solution.
Unfortunately, the question was not well-posed so there are answers to different questions, here I try to point the answer to the same question. Given the array:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
The answer should be the indexes of the elements between a certain range, we assume inclusive, in this case, 6 and 10.
answer = (3, 4, 5)
Corresponding to the values 6,9,10.
To test the best answer we can use this code.
import timeit
setup = """
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
# or test it with an array of the similar size
# a = np.random.rand(100)*23 # change the number to the an estimate of your array size.
# we define the left and right limit
ll = 6
rl = 10
def sorted_slice(a,l,r):
start = np.searchsorted(a, l, 'left')
end = np.searchsorted(a, r, 'right')
return np.arange(start,end)
"""
functions = ['sorted_slice(a,ll,rl)', # works only for sorted values
'np.where(np.logical_and(a>=ll, a<=rl))[0]',
'np.where((a >= ll) & (a <=rl))[0]',
'np.where((a>=ll)*(a<=rl))[0]',
'np.where(np.vectorize(lambda x: ll <= x <= rl)(a))[0]',
'np.argwhere((a>=ll) & (a<=rl)).T[0]', # we traspose for getting a single row
'np.where(ne.evaluate("(ll <= a) & (a <= rl)"))[0]',]
functions2 = [
'a[np.logical_and(a>=ll, a<=rl)]',
'a[(a>=ll) & (a<=rl)]',
'a[(a>=ll)*(a<=rl)]',
'a[np.vectorize(lambda x: ll <= x <= rl)(a)]',
'a[ne.evaluate("(ll <= a) & (a <= rl)")]',
]
rdict = {}
for i in functions:
rdict[i] = timeit.timeit(i,setup=setup,number=1000)
print("%s -> %s s" %(i,rdict[i]))
print("Sorted:")
for w in sorted(rdict, key=rdict.get):
print(w, rdict[w])
Results
The results are reported in the following plot for a small array (on the top the fastest solution) as noted by #EZLearner they may vary depending on the size of the array. sorted slice could be faster for larger arrays, but it requires your array to be sorted, for arrays with over 10 M of entries ne.evaluate could be an option. Is hence always better to perform this test with an array of the same size as yours:
If instead of the indexes you want to extract the values you can perform the tests using functions2 but the results are almost the same.

I thought I would add this because the a in the example you gave is sorted:
import numpy as np
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
start = np.searchsorted(a, 6, 'left')
end = np.searchsorted(a, 10, 'right')
rng = np.arange(start, end)
rng
# array([3, 4, 5])

a = np.array([1,2,3,4,5,6,7,8,9])
b = a[(a>2) & (a<8)]

Other way is with:
np.vectorize(lambda x: 6 <= x <= 10)(a)
which returns:
array([False, False, False, True, True, True, False, False, False])
It is sometimes useful for masking time series, vectors, etc.

This code snippet returns all the numbers in a numpy array between two values:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56] )
a[(a>6)*(a<10)]
It works as following:
(a>6) returns a numpy array with True (1) and False (0), so does (a<10). By multiplying these two together you get an array with either a True, if both statements are True (because 1x1 = 1) or False (because 0x0 = 0 and 1x0 = 0).
The part a[...] returns all values of array a where the array between brackets returns a True statement.
Of course you can make this more complicated by saying for instance
...*(1-a<10)
which is similar to an "and Not" statement.

a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.argwhere((a>=6) & (a<=10))

Wanted to add numexpr into the mix:
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(ne.evaluate("(6 <= a) & (a <= 10)"))[0]
# array([3, 4, 5], dtype=int64)
Would only make sense for larger arrays with millions... or if you hitting a memory limits.

This may not be the prettiest, but works for any dimension
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
ranges = (0,4), (0,4)
def conditionRange(X : np.ndarray, ranges : list) -> np.ndarray:
idx = set()
for column, r in enumerate(ranges):
tmp = np.where(np.logical_and(X[:, column] >= r[0], X[:, column] <= r[1]))[0]
if idx:
idx = idx & set(tmp)
else:
idx = set(tmp)
idx = np.array(list(idx))
return X[idx, :]
b = conditionRange(a, ranges)
print(b)

s=[52, 33, 70, 39, 57, 59, 7, 2, 46, 69, 11, 74, 58, 60, 63, 43, 75, 92, 65, 19, 1, 79, 22, 38, 26, 3, 66, 88, 9, 15, 28, 44, 67, 87, 21, 49, 85, 32, 89, 77, 47, 93, 35, 12, 73, 76, 50, 45, 5, 29, 97, 94, 95, 56, 48, 71, 54, 55, 51, 23, 84, 80, 62, 30, 13, 34]
dic={}
for i in range(0,len(s),10):
dic[i,i+10]=list(filter(lambda x:((x>=i)&(x<i+10)),s))
print(dic)
for keys,values in dic.items():
print(keys)
print(values)
Output:
(0, 10)
[7, 2, 1, 3, 9, 5]
(20, 30)
[22, 26, 28, 21, 29, 23]
(30, 40)
[33, 39, 38, 32, 35, 30, 34]
(10, 20)
[11, 19, 15, 12, 13]
(40, 50)
[46, 43, 44, 49, 47, 45, 48]
(60, 70)
[69, 60, 63, 65, 66, 67, 62]
(50, 60)
[52, 57, 59, 58, 50, 56, 54, 55, 51]

You can use np.clip() to achieve the same:
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
np.clip(a,6,10)
However, it holds the values less than and greater than 6 and 10 respectively.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: How to best align two sorted arrays? - python

Related

How can I extract a set of 2D slices from a larger 2D numpy array?

python slicing multi-demnsional array

how to get the value of multiple maximas in an array in python

Python intersection of multiple datetime lists

Numpy: find index of the elements within range

Categories

Resources