Intersect between 1D ary and every row in 2D ary ? Overlap Count?

Intersect between 1D ary and every row in 2D ary ? Overlap Count? - python

You can use numpy.intersect1d(a1,a2) and then the docs provide an option to intersect multiple arrays :
reduce(np.intersect1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]))
What I want to do is to find the intersection between a 1D array and every row in the corresponding 2D array.
Or better yet just the COUNT of the overlapping elements in every row.
I know I can do that with intersect1d() and a loop, but it will be too slow.
How can we count the overlapping elements in every row the numpy-way ?
Ex:
In [59]: a2 = np.random.choice(np.arange(0,100),(10,5), replace=False)
In [60]: a2
Out[60]:
array([[50, 5, 25, 40, 19], 1
[43, 37, 21, 55, 11], 0
[16, 49, 6, 86, 96], 0
[80, 66, 87, 51, 64], 0
[42, 7, 20, 24, 74], 1
[92, 63, 75, 54, 90], 2
[ 9, 91, 88, 85, 22], 0
[ 4, 65, 97, 93, 53], 0
[18, 0, 57, 71, 76], 0
[94, 1, 77, 89, 45]]) 0
In [61]: a1 = np.random.choice(np.arange(0,100),5, replace=False)
In [63]: a1
Out[63]: array([63, 54, 20, 60, 25])

To simply get the count of common elements per row, we can get a mask of matches with np.isin and then just the count per row -
np.isin(arr2D,arr1D).sum(axis=1)
If you want to count each unique element only once in case of duplicate occurences per row and if input elements are positive numbers, we need few more steps -
# https://stackoverflow.com/a/46256361/ #Divakar
def bincount2D_vectorized(a):
N = a.max()+1
a_offs = a + np.arange(a.shape[0])[:,None]*N
return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)
count = (bincount2D_vectorized(np.isin(arr2D,arr1D)*arr2D)[:,1:]!=0).sum(1)

Related

Array based indexing of an ndarray

I am not understanding numpy.take though it seems like it is the function I want. I have an ndarray and I want to use another ndarray to index into the first.
import numpy as np
# Create a matrix
A = np.arange(75).reshape((5,5,3))
# Create the index array
idx = np.array([[1, 0, 0, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 0, 0, 0],
[1, 1, 1, 1, 0]])
Given the above, I want to index A by the values in idx. I thought takedoes this, but it doesn't output what I expected.
# Index the 3rd dimension of the A matrix by the idx array.
Asub = np.take(A, idx)
print(f'Value in A at 1,1,1 is {A[1,1,1]}')
print(f'Desired index from idx {idx[1,1]}')
print(f'Value in Asub at [1,1,1] {Asub[1,1]} <- thought this would be 19')
I was expecting to see the value at the idx location one the value in A based on idx:
Value in A at 1,1,1 is 19
Desired index from idx 1
Value in Asub at [1,1,1] 1 <- thought this would be 19

One possibility is to create row and col indices that broadcast with the third dimension one, i.e a (5,1) and (5,) that pair with the (5,5) idx:
In [132]: A[np.arange(5)[:,None],np.arange(5), idx]
Out[132]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
This ends up picking values from A[:,:,0] and A[:,:,1]. This takes the values of idx as integers, in the range of valid (0,1,2) (for shape 3). They aren't boolean selectors.
Out[132][1,1] is 19, same as A[1,1,1]; Out[132][1,2] is the same as A[1,2,0].
take_along_axis gets the same values, but with an added dimension:
In [142]: np.take_along_axis(A, idx[:,:,None], 2).shape
Out[142]: (5, 5, 1)
In [143]: np.take_along_axis(A, idx[:,:,None], 2)[:,:,0]
Out[143]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
The iterative equivalent might be easier to understand:
In [145]: np.array([[A[i,j,idx[i,j]] for j in range(5)] for i in range(5)])
Out[145]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
If you have trouble expressing an action in "vectorized" array ways, go ahead an write an integrative version. It will avoid a lot of ambiguity and misunderstanding.
Another way to get the same values, treating the idx values as True/False booleans is:
In [146]: np.where(idx, A[:,:,1], A[:,:,0])
Out[146]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])

IIUC, you can get the resulted array by broadcasting the idx array, to make its shape same as A to be multiplied, and then indexing to get the column 1 as:
Asub = (A * idx[:, :, None])[:, :, 1] # --> Asub[1, 1] = 19
# [[ 1 0 0 10 13]
# [16 19 0 25 28]
# [31 0 37 0 43]
# [46 49 0 0 0]
# [61 64 67 70 0]]
I think it be the fastest way (or one of the bests), particularly for large arrays.

Python: Return max 8 value random by array 2-D

I would like to extract maximum x values in different positions and save their position.
[[ 5, 57, 66, 59, 26],
[23, 66, 97, 96, 33],
[31, 63, 69, 55, 20],
[ 2, 77, 37, 85, 40],
[87, 94, 43, 92, 44],
]
Thanks

Using max with a range over an array is one way to get the index of the maximum element:
>>> [max(range(len(row)), key=row.__getitem__) for row in m]
[2, 2, 2, 3, 1]
Another option would be to use index after getting the max element itself (this is slightly less efficient because now you're scanning each row twice, but the difference is a constant factor):
>>> [row.index(max(row)) for row in m]
[2, 2, 2, 3, 1]

It's not totally clear if you want n randomly chosen or n largest items. I include solutions for both interpretations because they are very similar
Assuming you want 8 randomly chosen items from a 2D array and their positions
import numpy as np
x = np.array(
[[ 5, 57, 66, 59, 26],
[23, 66, 97, 96, 33],
[31, 63, 69, 55, 20],
[ 2, 77, 37, 85, 40],
[87, 94, 43, 92, 44]])
Create a random boolean matrix to choose the items
how_many = 8
choices = [True] * how_many + [False] * (len(x.ravel()) - how_many)
choices = np.random.permutation(choices).reshape(x.shape)
x[choices]
Out:
array([66, 59, 23, 33, 63, 69, 20, 40])
To get their positions
positions_2D = np.vstack(np.unravel_index(np.flatnonzero(choices), x.shape)).T
positions_2D
Out:
array([[0, 2],
[0, 3],
[1, 0],
[1, 4],
[2, 1],
[2, 2],
[2, 4],
[3, 4]])
To get the chosen items by 2D coordinates
x[positions_2D[:,0], positions_2D[:,1]]
Out:
array([66, 59, 23, 33, 63, 69, 20, 40])
If you want the 8 largest items it is the same approach without the boolean array to choose the items
top_8_positions_in_2D = np.vstack(np.unravel_index(x.argsort(None), x.shape)).T[:-9:-1]
x[top_8_positions_in_2D[:,0], top_8_positions_in_2D[:,1]]
Out:
array([97, 96, 94, 92, 87, 85, 77, 69])
To get their 2D coordinates
top_8_positions_in_2D
Out:
array([[1, 2],
[1, 3],
[4, 1],
[4, 3],
[4, 0],
[3, 3],
[3, 1]])

Reshaping rank > 2 numpy arrays in Python

I am working with numpy arrays as rank > 2 tensors in Python and am trying to reshape such a tensor into a matrix, i.e. a rank-2 array. The standard ndarray.reshape() function doesn't really work for this because I need to group the indices of my tensor in a particular way. What I mean is this: say I start with a rank 3 tensor, T_ijk. I am trying to find a function that will output the rank 2 tensor T_(j)(ik), for instance, i.e. for this exampe the desired input/output would be
[Input:] T=np.array([[[1 2]
[3 4]]
[[5 6]
[7 8]]])
[Output:] array([[1, 2, 5, 6],
[3, 4, 7, 8]])
Also, a friend suggested to me that tensorflow might have functions like this, but I've never used it. Does anyone have any insight here?

Try this -
k = 1
m = 2
i = 5
j = 5
l = 2
#dummy T_ijklm
T = np.array(range(100)).reshape(k,m,i,j,l)
T_new = T.reshape(k*m,i*j*l)
print('Original T:',T.shape)
print('New T:',T_new.shape)
#(km)(ijl) = 2*50
Original T: (1, 2, 5, 5, 2)
New T: (2, 50)
New tensor is now a rank 2
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98, 99]])

In [216]: arr = np.arange(1,9).reshape(2,2,2)
In [217]: arr
Out[217]:
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
reshape keeps elements in the original [1,2,3,4,5...] order
In [218]: arr.reshape(2,4)
Out[218]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Figuring out the correct transpose order can be tricky. Sometimes I just try several things. Here I note that you want to preserve the order on the last dimension, so all we have to do is swap the first 2 axes:
In [219]: arr.transpose(1,0,2)
Out[219]:
array([[[1, 2],
[5, 6]],
[[3, 4],
[7, 8]]])
now the reshape does what we want:
In [220]: arr.transpose(1,0,2).reshape(2,4)
Out[220]:
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
This sequence, as best I know, the best "built-in" approach.
You comment:
if I wanted to transform T_ijklmno to T_(ilo)(jmnk) having to figure out which axes to switch and how to reshape will probably get out of hand... that's why I'm looking for an in-built solution
The T_.... notation reminds me that we could use einsum to do the transpose:
In [221]: np.einsum('ijk->jik',arr)
Out[221]:
array([[[1, 2],
[5, 6]],
[[3, 4],
[7, 8]]])
So T_ijklmno to T_(ilo)(jmnk) could become
np.einsum('ijklmno->ilojmnk',T).reshape(I*L*O, J*M*N*K)
T.transpose(0,3,6,1,4,5,2).reshape(...)
(I wrote these by just eyeballing your T expression)
There are so many ways you could transpose and reshape an array with 7 dimensions, that there's little point in coming up with anything more general than the existing methods - transpose, swapaxes, einsum. Simply identifying the dimensions as you do with 'ijk...' is the toughest part of the problem.

Numpy: How to index 2d array with 1d array?

I have a 2d array:
a = np.random.randint(100, size=(6, 4))
[[72 76 40 11]
[48 82 6 87]
[53 24 25 99]
[ 7 94 82 90]
[28 81 10 9]
[94 99 67 58]]
And a 1d array:
idx = np.random.randint(4, size=6)
[0, 3, 2, 1, 0, 2]
Is it possible to index the 2d array so that the result is:
a[idx]
[72, 87, 25, 94, 28, 67]

Since you have the column indices, all you need are the row indices. You can generate those with arange.
>>> a[np.arange(len(a)), idx]
array([72, 87, 25, 94, 28, 67])

Is there any way to get by this without arange? It seems counterintuitive to me that something like
a[idx.reshape(-1,1)]
or
a[:,idx]
would not produce this result.

You can also use np.diagonal if you want to avoid np.arrange.
a = np.array([[72, 76, 40, 11],
[48, 82, 6, 87],
[53, 24, 25, 99],
[ 7, 94, 82, 90],
[28, 81, 10, 9],
[94, 99, 67, 58]])
idx = np.array([0, 3, 2, 1, 0, 2])
index into each array in 2d array using idx
>>> a_idx = a[...,idx]
>>> a_idx
array([[72, 11, 40, 76, 72, 40],
[48, 87, 6, 82, 48, 6],
[53, 99, 25, 24, 53, 25],
[ 7, 90, 82, 94, 7, 82],
[28, 9, 10, 81, 28, 10],
[94, 58, 67, 99, 94, 67]])
diagonal is where the position of idx and each array in 2d array line up
>>> np.diagonal(a_idx)
array([72, 87, 25, 94, 28, 67])

An alternative answer.
(a*np.eye(6,4)[idx]).sum(axis=1)

Slicing a list into n nearly-equal-length partitions [duplicate]

This question already has answers here:
Splitting a list into N parts of approximately equal length
(36 answers)
Closed 5 years ago.
I'm looking for a fast, clean, pythonic way to divide a list into exactly n nearly-equal partitions.
partition([1,2,3,4,5],5)->[[1],[2],[3],[4],[5]]
partition([1,2,3,4,5],2)->[[1,2],[3,4,5]] (or [[1,2,3],[4,5]])
partition([1,2,3,4,5],3)->[[1,2],[3,4],[5]] (there are other ways to slice this one too)
There are several answers in here Iteration over list slices that run very close to what I want, except they are focused on the size of the list, and I care about the number of the lists (some of them also pad with None). These are trivially converted, obviously, but I'm looking for a best practice.
Similarly, people have pointed out great solutions here How do you split a list into evenly sized chunks? for a very similar problem, but I'm more interested in the number of partitions than the specific size, as long as it's within 1. Again, this is trivially convertible, but I'm looking for a best practice.

Just a different take, that only works if [[1,3,5],[2,4]] is an acceptable partition, in your example.
def partition ( lst, n ):
return [ lst[i::n] for i in xrange(n) ]
This satisfies the example mentioned in #Daniel Stutzbach's example:
partition(range(105),10)
# [[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
# [1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101],
# [2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 102],
# [3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103],
# [4, 14, 24, 34, 44, 54, 64, 74, 84, 94, 104],
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
# [6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
# [7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
# [8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
# [9, 19, 29, 39, 49, 59, 69, 79, 89, 99]]

Here's a version that's similar to Daniel's: it divides as evenly as possible, but puts all the larger partitions at the start:
def partition(lst, n):
q, r = divmod(len(lst), n)
indices = [q*i + min(i, r) for i in xrange(n+1)]
return [lst[indices[i]:indices[i+1]] for i in xrange(n)]
It also avoids the use of float arithmetic, since that always makes me uncomfortable. :)
Edit: an example, just to show the contrast with Daniel Stutzbach's solution
>>> print [len(x) for x in partition(range(105), 10)]
[11, 11, 11, 11, 11, 10, 10, 10, 10, 10]

def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]
>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]
Python 3 version:
def partition(lst, n):
division = len(lst) / n
return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]

Below is one way.
def partition(lst, n):
increment = len(lst) / float(n)
last = 0
i = 1
results = []
while last < len(lst):
idx = int(round(increment * i))
results.append(lst[last:idx])
last = idx
i += 1
return results
If len(lst) cannot be evenly divided by n, this version will distribute the extra items at roughly equal intervals. For example:
>>> print [len(x) for x in partition(range(105), 10)]
[11, 10, 11, 10, 11, 10, 11, 10, 11, 10]
The code could be simpler if you don't mind all of the 11s being at the beginning or the end.

This answer provides a function split(list_, n, max_ratio), for people
who want to split their list into n pieces with at most max_ratio
ratio in piece length. It allows for more variation than the
questioner's 'at most 1 difference in piece length'.
It works by sampling n piece lengths within the desired ratio range
[1 , max_ratio), placing them after each other to form a 'broken
stick' with the right distances between the 'break points' but the wrong
total length. Scaling the broken stick to the desired length gives us
the approximate positions of the break points we want. To get integer
break points requires subsequent rounding.
Unfortunately, the roundings can conspire to make pieces just too short,
and let you exceed the max_ratio. See the bottom of this answer for an
example.
import random
def splitting_points(length, n, max_ratio):
"""n+1 slice points [0, ..., length] for n random-sized slices.
max_ratio is the largest allowable ratio between the largest and the
smallest part.
"""
ratios = [random.uniform(1, max_ratio) for _ in range(n)]
normalized_ratios = [r / sum(ratios) for r in ratios]
cumulative_ratios = [
sum(normalized_ratios[0:i])
for i in range(n+1)
]
scaled_distances = [
int(round(r * length))
for r in cumulative_ratios
]
return scaled_distances
def split(list_, n, max_ratio):
"""Slice a list into n randomly-sized parts.
max_ratio is the largest allowable ratio between the largest and the
smallest part.
"""
points = splitting_points(len(list_), n, ratio)
return [
list_[ points[i] : points[i+1] ]
for i in range(n)
]
You can try it out like so:
for _ in range(10):
parts = split('abcdefghijklmnopqrstuvwxyz', 4, 2)
print([(len(part), part) for part in parts])
Example of a bad result:
parts = split('abcdefghijklmnopqrstuvwxyz', 10, 2)
# lengths range from 1 to 4, not 2 to 4
[(3, 'abc'), (3, 'def'), (1, 'g'),
(4, 'hijk'), (3, 'lmn'), (2, 'op'),
(2, 'qr'), (3, 'stu'), (2, 'vw'),
(3, 'xyz')]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Intersect between 1D ary and every row in 2D ary ? Overlap Count? - python

Related

Array based indexing of an ndarray

Python: Return max 8 value random by array 2-D

Reshaping rank > 2 numpy arrays in Python

Numpy: How to index 2d array with 1d array?

Slicing a list into n nearly-equal-length partitions [duplicate]

Categories

Resources