Delete 2D unique elements in a 2D NumPy array - python

I generate a set of unique coordinate combinations by using:
axis_1 = np.arange(image.shape[0])
axis_1 = np.reshape(axis_1,(axis_1.shape[0],1))
axis_2 = np.arange(image.shape[1])
axis_2 = np.reshape(axis_2,(axis_2.shape[0],1))
coordinates = np.array(np.meshgrid(axis_1, axis_2)).T.reshape(-1,2)
I then check for some condition and if it is satisfied i want to delete the coordinates from the array.
Something like this:
if image[coordinates[i,0], coordinates[i,1]] != 0:
remove coordinates i from coordinates
I tried the remove and delete commands but one doesn't work for arrays and the other simply just removes every instance where coordinates[i,0] and coordinates[i,1] appear, rather than the unique combination of both.

You can use np.where to generate the coordinate pairs that should be removed, and np.unique combined with masking to remove them:
y, x = np.where(image > 0.7)
yx = np.column_stack((y, x))
combo = np.vstack((coordinates, yx))
unique, counts = np.unique(combo, axis=0, return_counts=True)
clean_coords = unique[counts == 1]
The idea here is to stack the original coordinates and the coordinates-to-be-removed in the same array, then drop the ones that occur in both.

You can use the numpy.delete function, but this function returns a new modified array, and does not modify the array in-place (which would be quite problematic, specially in a for loop).
So your code would look like that:
nb_rows_deleted = 0
for i in range(0, coordinates.shape[0]):
corrected_i = i - nb_rows_deleted
if image[coordinates[corrected_i, 0], coordinates[corrected_i, 1]] != 0:
coordinates = np.delete(coordinates, corrected_i, 0)
nb_rows_deleted += 1
The corrected_i takes into consideration that some rows have been deleted during your loop.

Related

Python concatenate 2D array to new list if condition is met

Let's say I have an array:
print(arr1.shape)
(188621, 10)
And in the nth column (let's say 4 for this example), I want to check when a value is above a threshold, t. I want to create a new list (of x instances) of the entire row of arr1 when the ith iteration of the 4th column is above threshold t. In other words, it is extracting the ith row from arr1 when the condition in the 4th column is met. So far I have:
arr2 = []
for i in range(0,len(arr1)):
if arr1[i,4] > t:
arr2.append(arr1[i,:])
I have also tried something along the lines of:
for i in range(0,len(arr1)):
if arr1[i,4] > t:
if len(arr2) == 0:
arr2 = arr1[i,:]
else:
arr2 = np.concatenate((arr2,arr1[i,:]))
However, both instances seem to be growing in 1D terms of x*10 instead of a 2D list of (x, 10) when the conditions are met. What am I missing here?
Well, it wasn't that difficult.
arr2 = arr1[np.logical_not(arr1[:,4] < t)]

Speed up search of array element in second array

I have a pretty simple operation involving two not so large arrays:
For every element in the first (larger) array, located in position i
Find if it exists in the second (smaller) array
If it does, find its index in the second array: j
Store a float taken from a third array (same length as first array) in the position i, in the position j of a fourth array (same length as second array)
The for block below works, but gets very slow for not so large arrays (>10000).
Can this implementation be made faster?
import numpy as np
import random
##############################################
# Generate some random data.
#'Nb' is always smaller then 'Na
Na, Nb = 50000, 40000
# List of IDs (could be any string, I use integers here for simplicity)
ids_a = random.sample(range(1, Na * 10), Na)
ids_a = [str(_) for _ in ids_a]
random.shuffle(ids_a)
# Some floats associated to these IDs
vals_in_a = np.random.uniform(0., 1., Na)
# Smaller list of repeated IDs from 'ids_a'
ids_b = random.sample(ids_a, Nb)
# Array to be filled
vals_in_b = np.zeros(Nb)
##############################################
# This block needs to be *a lot* more efficient
#
# For each string in 'ids_a'
for i, id_a in enumerate(ids_a):
# if it exists in 'ids_b'
if id_a in ids_b:
# find where in 'ids_b' this element is located
j = ids_b.index(id_a)
# store in that position the value taken from 'ids_a'
vals_in_b[j] = vals_in_a[i]
In defense of my approach, here is the authoritative implementation:
import itertools as it
def pp():
la,lb = len(ids_a),len(ids_b)
ids = np.fromiter(it.chain(ids_a,ids_b),'<S6',la+lb)
unq,inv = np.unique(ids,return_inverse=True)
vals = np.empty(la,vals_in_a.dtype)
vals[inv[:la]] = vals_in_a
return vals[inv[la:]]
(juanpa()==pp()).all()
# True
timeit(juanpa,number=100)
# 3.1373191522434354
timeit(pp,number=100)
# 2.5256317732855678
That said, #juanpa.arrivillaga's suggestion can also be implemented better:
import operator as op
def ja():
return op.itemgetter(*ids_b)(dict(zip(ids_a,vals_in_a)))
(ja()==pp()).all()
# True
timeit(ja,number=100)
# 2.015202699229121
I tried the approaches by juanpa.arrivillaga and Paul Panzer. The first one is the fastest by far. It is also the simplest. The second one is faster than my original approach, but considerably slower than the first one. It also has the drawback that this line vals[inv_a] = vals_in_a stores floats into a U5 array, thus converting them into strings. It can be converted back to floats at the end, but I lose digits (unless I'm missing something obvious of course.
Here are the implementations:
def juanpa():
dict_ids_b = {_: i for i, _ in enumerate(ids_b)}
for i, id_a in enumerate(ids_a):
try:
vals_in_b[dict_ids_b[id_a]] = vals_in_a[i]
except KeyError:
pass
return vals_in_b
def Paul():
# 1) concatenate ids_a and ids_b
ids_ab = ids_a + ids_b
# 2) apply np.unique with keyword return_inverse=True
vals, idxs = np.unique(ids_ab, return_inverse=True)
# 3) split the inverse into inv_a and inv_b
inv_a, inv_b = idxs[:len(ids_a)], idxs[len(ids_a):]
# 4) map the values to match the order of uniques: vals[inv_a] = vals_in_a
vals[inv_a] = vals_in_a
# 5) use inv_b to pick the correct values: result = vals[inv_b]
vals_in_b = vals[inv_b].astype(float)
return vals_in_b

Python-1D indice link with 2D array location

Introduction
Sometimes, I want to get the value of an 2-d array at a random location.
For example, there is an array data in the shape of (20,20). There is a random number-pair (5,5). Then, I get the data[5,5] as my target value.
On the purpose of using genetic algorithm. I want to get the samples from an 2-d array as several individuals. So, I want to generate an linked table which connect an 1d value to 2d position.
My attempt
## data was the 2-d array in the shape of 20x20
data = np.random.randint(0,1000,400)
data = data.reshape(20,20)
## direction was my linked table
direction = {"Indice":[],"X":[],"Y":[]}
k = 0
for i in range(0,data.shape[0],1):
for j in range(0,data.shape[1],1):
k+=1
direction["Indice"].append(k)
direction["X"].append(j)
direction["Y"].append(i)
direction = pd.DataFrame(direction)
## generate an random int and connect with the 2-d value.
loc = np.random.randint(0,400)
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
target_value = data[YY,XX]
My question
Are there any neat way to achieve my attempt?
Any advice would be appreciate!
You could use np.ravel to make data 1-dimensional, then index it using the flat index loc:
target_value = data.ravel()[loc-1]
Or, if you want XX and YY, perhaps you are looking for np.unravel_index. It maps a flat index or an array of flat indices to a tuple of coordinates.
For example, instead of building the direction DataFrame, you could use
np.unravel_index(loc-1, data.shape)
instead of
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
Then you could define target_value as :
target_value = data[np.unravel_index(loc-1, data.shape)]
Alternatively, to simply get a random value from the 2D array data, you could use
target_value = np.random.choice(data.flat)
Or to get N random values, use
target_values = np.random.choice(data.flat, size=(N,))
Why the minus one in loc-1:
In your original code, the direction['Indice'] column uses k values which
start at 1, not 0. So when loc equals 1, the 0th-indexed row of direction is
selected. I used loc-1 to make
target_value = data[np.unravel_index(loc-1, data.shape)]
return the same result that
XX = np.array(direction[direction.Indice == loc ].X)
YY = np.array(direction[direction.Indice == loc ].Y)
target_value = data[YY,XX]
returns. Note however, that if loc equals 0, then np.unravel_index(-1, data.shape) raises a ValueError, while your original code would return an empty array for target_value.

Iterating efficiently through indices of arbitrary order array

Say I have an arbitrary array of variable order N. For example:
A is a 2x3x3 array is an order 3 array with 2,3, and 3 dimiensions along it's three indices.
I would like to efficiently loop through each element. If I knew a priori the order then I could do something like (in python),
#for order 3
import numpy as np
shape = np.shape(A)
i = 0
while i < shape[0]:
j = 0
while j < shape[1]:
k = 0
while k < shape[2]:
#code using i,j,k
k += 1
j += 1
i += 1
Now suppose I don't know the order of A, i.e. I don't know a priori the length of shape. How can I permute the quickest through all elements of the array?
There are many ways to do this, e.g. iterating over a.ravel() or a.flat. However, looping over every single element of an array in a Python loop will never be particularly efficient.
I don't think it matters which index you choose to permute over first, which index you choose to permute over second, etc. because your inner-most while statement will always be executed once per combination of i, j, and k.
If you need to keep the results of your operation (and assuming its a function of A and i,j,k) You'd want to use something like this:
import itertools
import numpy as np
results = ( (position, code(A,position))
for indices in itertools.product(*(range(i) for i in np.shape(A))))
Then you can iterate the results getting out the position and return value of code for each position. Or convert the generator expression to a list if you need to access the results multiple times.
If the array of of the format array = [[[1,2,3,4],[1,2]],[[1],[1,2,3]]]
You could use the following structure:
array = [[[1,2,3,4],[1,2]],[[1],[1,2,3]]]
indices = []
def iter_array(array,indices):
indices.append(0)
for a in array:
if isinstance(a[0],list):
iter_array(a,indices)
else:
indices.append(0)
for nonlist in a:
#do something using each element in indices
#print(indices)
indices.append(indices.pop()+1)
indices.pop()
indices.append(indices.pop()+1)
indices.pop()
iter_array(array,indices)
This should work for the usual nested list "arrays" I don't know if it would be possible to mimic this using numpy's array structure.

How to pick the largest number in a matrix of lists in python?

I have a list-of-list-of-lists, where the first two act as a "matrix", where I can access the third list as
list3 = m[x][y]
and the third list contains a mix of strings and numbers, but each list has the same size & structure. Let's call a specific entry in this list The Number of Interest. This number always has the same index in this list!
What's the fastest way to get the 'coordinates' (x,y) for the list that has the largest Number of Interest in Python?
Thank you!
(So really, I'm trying to pick the largest number in m[x][y][k] where k is fixed, for all x & y, and 'know' what its address is)
max((cell[k], x, y)
for (y, row) in enumerate(m)
for (x, cell) in enumerate(row))[1:]
Also, you can assign the result directly to a couple of variables:
(_, x, y) = max((cell[k], x, y)
for (y, row) in enumerate(m)
for (x, cell) in enumerate(row))
This is O(n2), btw.
import itertools
indexes = itertools.product( xrange(len(m)), xrange(len(m[0]))
print max(indexes, key = lambda x: m[x[0]][x[1]][k])
or using numpy
import numpy
data = numpy.array(m)
print numpy.argmax(m[:,:,k])
In you are interested in speeding up operations in python, you really need to look at numpy.
Assuming "The Number of Interest" is in a known spot in the list, and there will be a nonzero maximum,
maxCoords = [-1, -1]
maxNumOfInterest = -1
rowIndex = 0
for row in m:
colIndex = 0
for entry in row:
if entry[indexOfNum] > maxNumOfInterest:
maxNumOfInterest = entry[indexOfNum]
maxCoords = [rowIndex,colIndex]
colIndex += 1
rowIndex += 1
Is a naive method that will be O(n2) on the size of the matrix. Since you have to check every element, this is the fastest solution possible.
#Marcelo's method is more succulent, but perhaps less readable.

Categories