Extracting a 2D numpy array from a 2D numpy array - python

I'm working with a problem using numpy arrays and I hit a roadblock, basically I have two arrays, one with a 2D numpy array and the other is a 1D numpy which represents some index of the 2D array, what I need is to use pairs of this indexes to extract a 2D numpy array from the original 2D array, I did something, but I'm sure it can be better, so I'm asking for advice. Here is my code:
import numpy as np
import itertools
x = np.arange(25).reshape(5, 5) #Original Array
#x = [[ 0 1 2 3 4]
# [ 5 6 7 8 9]
# [10 11 12 13 14]
# [15 16 17 18 19]
# [20 21 22 23 24]]
y = np.array([0, 2, 4]) #Indexes
idx = list(itertools.product(y, repeat = 2)) #This create a combination of the indexes to act as my coordinates from the array
#idx = [(0, 0), (0, 2), (0, 4), (2, 0), (2, 2), (2, 4), (4, 0), (4, 2), (4, 4)]
newarray = np.array([x[i] for i in idx]).reshape(3, 3) #This uses the tuples from before to extract the values of the original array
#newarray = [[ 0 2 4]
# [10 12 14] #The extracted array
# [20 22 24]]
So it works, but I think there is a lot to improve, for example, in the final step I use a list comprehesion and then a numpy array, and then a reshape, also I'm not sure if it's okay to create all the combinations of the index array maybe there is a easier way, so any advice will be appreciated, thank you!

x[::2, ::2]
will select every other row and column
For a less regular pattern try
x[y[:,None], y]
which uses advanced indexing

Numpy has some sophisticated indexing options. Also remember that reshape is free; never be afraid to reshape.
import numpy as np
import itertools
x = np.arange(25).reshape(5, 5) #Original Array
y = [0, 2, 4]
idx = list(itertools.product(y, repeat = 2)) #This create a combination of the indexes to act as my coordinates from the array
idx0 = [k[0] for k in idx]
idx1 = [k[1] for k in idx]
print(idx)
newarray = x[idx0,idx1].reshape((3,3))
print(newarray)
Output:
[(0, 0), (0, 2), (0, 4), (2, 0), (2, 2), (2, 4), (4, 0), (4, 2), (4, 4)]
[[ 0 2 4]
[10 12 14]
[20 22 24]]

I think what you hate is the for keyword (like me). And in fact you don't need itertools.
So my answer would be:
import numpy as np
x = np.arange(25).reshape(5, 5)
y = np.array([0, 2, 4])
ny = y.size
i = y.reshape(ny, 1)
j = y.repeat(ny).reshape(ny, ny).T
print(x[i, j])
Output:
[[ 0 2 4]
[10 12 14]
[20 22 24]]

Related

How to find the number of pairs of non-coprimes in two given arrays in python?

How to go about finding the number of non-coprimes in a given array?
Suppose
a = [2, 5, 6, 7]
b = [4, 9, 10, 12]
Then the number of non-coprimes will be 3, since You can remove:
(2, 4)
(5, 10)
(6, 9)
n = int(input())
a = list(map(int, input().split()))
b = list(map(int, input().split()))
count = 0
len_a = len(a)
len_b = len(b)
for i in range(len_a):
for j in range(len_b):
x = a[i]
y = b[j]
if(math.gcd(x,y) != 1):
count += 1
print(count)
This is in reference to :https://www.hackerrank.com/challenges/computer-game/problem
I am receiving 8 as output.
Why do you expect the answer to be 3?
You're pairing 5 and 10, so you're obviously looking at pairs of elements from a and b disregarding their position.
Just print out the pairs and you'll see why you're getting 8...
import math
from itertools import product
a=[2, 5, 6, 7]
b=[4, 9, 10, 12]
print(sum([math.gcd(x, y) != 1 for x, y in product(a, b)])) # 8
print([(x, y) for x, y in product(a, b) if math.gcd(x, y) != 1]) # the pairs
Update: After reading the problem the OP is trying to handle, it's worth pointing out that the expect output (3) is the answer to a different question!
Not how many pairs of elements are not coprime, but rather how many non-coprime pairs can be removed without returning them into the arrays.
This question is actually an order of magnitude more difficult, and is not a matter of fixing one's code, but rather about giving the actual problem a lot of mathematical and algorithmic thought.
See some discussion here
Last edit, a sort-of solution, albeit an extremely inefficient one. The only point is to suggest some code that can help the OP understand the point of the original question, by seeing some form of solution, however low-quality or bad-runtime it is.
import math
from itertools import product, permutations
n = 4
def get_pairs_list_not_coprime_count(pairs_list):
x, y = zip(*pairs_list)
return min(i for i in range(n) if math.gcd(x[i], y[i]) == 1) # number of pairs before hitting a coprime pair
a = [2, 5, 6, 7]
b = [4, 9, 10, 12]
a_perms = permutations(a) # so that the pairing product with b includes all pairing options
b_perms = permutations(b) # so that the pairing product with a includes all pairing options
pairing_options = product(a_perms, b_perms) # pairs-off different orderings of a and b
actual_pairs = [zip(*p) for p in pairing_options] # turn a pair of a&b orderings into number-pairs (for each of the orderings possible as realized by the product)
print(max(get_pairs_list_not_coprime_count(pairs_list) for pairs_list in actual_pairs)) # The most pairings managed over all possible options: 3 for this example
I believe the answer should be 8 itself. Out of the 4*4 possible combinations of numbers that you are comparing, there are 8 coprimes and 8 non-coprimes.
Here is an implementation of the code with the gcd function without using math and broadcasting to avoid multiple loops.
import numpy
a = '2 5 6 7'
b = '4 9 10 12'
a = np.array(list(map(int,a.split())))
b = np.array(list(map(int,b.split())))
def gcd(p,q):
while q != 0:
p, q = q, p%q
return p
def is_coprime(x, y):
return gcd(x, y) == 1
is_coprime_v = np.vectorize(is_coprime)
compare = is_coprime_v(a[:, None], b[None, :])
noncoprime_pairs = [(a[i],b[j]) for i,j in np.argwhere(~compare)]
coprime_pairs = [(a[i],b[j]) for i,j in np.argwhere(compare)]
print('non-coprime',noncoprime_pairs)
print('coprime',coprime_pairs)
non-coprime [(2, 4), (2, 10), (2, 12), (5, 10), (6, 4), (6, 9), (6, 10), (6, 12)]
coprime [(2, 9), (5, 4), (5, 9), (5, 12), (7, 4), (7, 9), (7, 10), (7, 12)]
Same solution but using the math.gcd() -
import math
import numpy
a = '2 5 6 7'
b = '4 9 10 12'
a = np.array(list(map(int,a.split())))
b = np.array(list(map(int,b.split())))
def f(x,y):
return math.gcd(x, y) == 1
fv = np.vectorize(f)
compare = fv(a[:, None], b[None, :])
noncoprime_pairs = [(a[i],b[j]) for i,j in np.argwhere(~compare)]
print(noncoprime_pairs)
[(2, 4), (2, 10), (2, 12), (5, 10), (6, 4), (6, 9), (6, 10), (6, 12)]
If you are looking for the answer to be 3 in your example, I would assume you are counting the number of values in a that have at least one non-coprime in b.
If that is the case you could do it like this:
from math import gcd
def nonCoprimes(A,B):
return sum(any(gcd(a,b)>1 for b in B) for a in A)
print(nonCoprimes([2,5,6,7],[4,9,10,12])) # 3
So, for each value in a check if there are any values of b that don't have a gcd of 1 with the value in a

Numpy select matrix specified by a matrix of indices, from multidimensional array

I have a numpy array a of size 5x5x4x5x5. I have another matrix b of size 5x5. I want to get a[i,j,b[i,j]] for i from 0 to 4 and for j from 0 to 4. This will give me a 5x5x1x5x5 matrix. Is there any way to do this without just using 2 for loops?
Let's think of the matrix a as 100 (= 5 x 5 x 4) matrices of size (5, 5). So, if you could get a liner index for each triplet - (i, j, b[i, j]) - you are done. That's where np.ravel_multi_index comes in. Following is the code.
import numpy as np
import itertools
# create some matrices
a = np.random.randint(0, 10, (5, 5, 4, 5, 5))
b = np.random(0, 4, (5, 5))
# creating all possible triplets - (ind1, ind2, ind3)
inds = list(itertools.product(range(5), range(5)))
(ind1, ind2), ind3 = zip(*inds), b.flatten()
allInds = np.array([ind1, ind2, ind3])
linearInds = np.ravel_multi_index(allInds, (5,5,4))
# reshaping the input array
a_reshaped = np.reshape(a, (100, 5, 5))
# selecting the appropriate indices
res1 = a_reshaped[linearInds, :, :]
# reshaping back into desired shape
res1 = np.reshape(res1, (5, 5, 1, 5, 5))
# verifying with the brute force method
res2 = np.empty((5, 5, 1, 5, 5))
for i in range(5):
for j in range(5):
res2[i, j, 0] = a[i, j, b[i, j], :, :]
print np.all(res1 == res2) # should print True
There's np.take_along_axis exactly for this purpose -
np.take_along_axis(a,b[:,:,None,None,None],axis=2)

How to create an xarray from a sparse, denormalized table?

Say I have the following structured array:
import numpy as np
l, h, w = 6, 5, 5
dtype = [('a', int), ('b', '<U3'), ('data', (float, (h, w)))]
table = np.empty(l, dtype)
table['a'] = [1, 2, 3, 1, 2, 3]
table['b'] = ['foo', 'bar'] * 3
table['data'] = np.random.rand(l, h, w)
My data has shape (6, 5, 5). But really, its shape is (3, 2, 5, 5), but I just have columns a and b denormalized.
Is it possible to create an xarray DataArray directly from this shape (6, 5, 5) by providing columns a and b of length 6 and have xarray figure out the (3, 2, 5, 5) shape? What would coords and dims be?
In reality, table is sparse and has many dimensions, and I'm trying to see if there's any xarray creation machinery I can lean on instead of reshaping table myself.

Reduce array over ranges

Say I have an array of numbers
np.array(([1, 4, 2, 1, 2, 5]))
And I want to compute the sum over a list of slices
((0, 3), (2, 4), (2, 6))
Giving
[(1 + 4 + 2), (2 + 1), (2 + 1 + 2 + 5)]
Is there a nice way to do this in numpy?
Looking for something equivalent to
def reduce(a, ranges):
np.array(list(np.sum(a[low:high]) for (low, high) in ranges))
Seems like there is probably some fancy numpy way to do this though. Anyone know?
One way is to use np.add.reduceat. If a is the array of values [1, 4, 2, 1, 2, 5]:
>>> np.add.reduceat(a, [0,3, 2,4, 2])[::2]
array([ 7, 3, 10], dtype=int32)
Here the slice indexes are passed in a list and are summed to return [ 7, 1, 3, 2, 10] (i.e. the sums of a[0:3], a[3:], a[2:4], a[4:], a[2:]). We only want every other element from this array.
Longer alternative approach...
The fact that the slices are of different lengths makes this slightly trickier to vectorise in NumPy, but here is one way you approach the problem.
Given an array of values and an array of slices to make...
a = np.array(([1, 4, 2, 1, 2, 5]))
slices = np.array([(0, 3), (2, 4), (2, 6)])
...create a mask-like array z that, for each slice, will be used to "zero-out" the values from a we don't want to sum:
z = np.zeros((3, 6))
s1 = np.arange(6) >= s[:, 0][:,None]
s2 = np.arange(6) < s[:, 1][:,None]
z[s1 & s2] = 1
Then you can do:
>>> (z * a).sum(axis=1)
array([ 7., 3., 10.])
A quick %timeit shows this is slightly faster than the list comprehension, even though we had to construct z and z * a. If slices is made to be of length 3000, this method is around 40 times quicker.
However note that the array z will be of shape (len(slices), len(a)) which may not be as practical if a or slices are both very long - an iterative approach might be preferred to avoid large temporary arrays in memory.

numpy: ndenumerate for masked arrays?

Is there a way to enumerate over the non-masked locations of a masked numpy ndarray (e.g. in the way that ndenumerate does it for regular ndarrays, but omitting all the masked entries)?
EDIT: to be more precise: the enumeration should not only skip over the masked entries, but also show the indices of the non-masked ones in the original array. E.g. if the first five elements of a 1-d array are masked, and the next one has an unmasked value of 3, then the enumeration should start with something like ((5,), 3), ....
Thanks!
PS: note that, although it is possible to apply ndenumerate to a masked ndarray, the resulting enumeration does not discriminate between its masked and normal entries. In fact, ndenumerate not only does not filter out the masked entries from the enumeration, but it doesn't even replace the enumerated values with the masked constant. Therefore, one can't adapt ndenumerate for this task by just wrapping ndenumerate with a suitable filter.
You can access only valid entries using inverse of a mask as an index:
>>> import numpy as np
>>> import numpy.ma as ma
>>> x = np.array([11, 22, -1, 44])
>>> m_arr = ma.masked_array(x, mask=[0, 0, 1, 0])
>>> for index, i in np.ndenumerate(m_arr[~m_arr.mask]):
print index, i
(0,) 11
(1,) 22
(2,) 44
See this for details.
The enumeration over only valid entries with indices from the original array:
>>> for (index, val), m in zip(np.ndenumerate(m_arr), m_arr.mask):
if not m:
print index, val
(0,) 11
(1,) 22
(3,) 44
How about:
import numpy as np
import itertools
def maenumerate(marr):
mask = ~marr.mask.ravel()
for i, m in itertools.izip(np.ndenumerate(marr), mask):
if m: yield i
N = 12
a = np.arange(N).reshape(2, 2, 3)+10
b = np.ma.array(a, mask = (a%5 == 0))
for i, val in maenumerate(b):
print i, val
which yields
(0, 0, 1) 11
(0, 0, 2) 12
(0, 1, 0) 13
(0, 1, 1) 14
(1, 0, 0) 16
(1, 0, 1) 17
(1, 0, 2) 18
(1, 1, 0) 19
(1, 1, 2) 21

Categories