Delete some elements from numpy array - python

One interesting question:
I would like to delete some elements from a numpy array but just as below simplified example code, it works if didn't delete the last element, but it failure if we wish to delete the last element.
Below code works fine:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,4,1]:
values = np.delete(values,i)
print values
The output is:
[0 1 2 3 4 5]
[0 2 4]
If we only change 4 to 5, then it will fail:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,5,1]:
values = np.delete(values,i)
print values
The error message:
IndexError: index 5 is out of bounds for axis 0 with size 5
Why this error only happen if delete the last element? what's correct way to do such tasks?

Keep in mind that np.delete(arr, ind) deletes the element at index ind NOT the one with that value.
This means that as you delete things, the array is getting shorter. So you start with
values = [0,1,2,3,4,5]
np.delete(values, 3)
[0,1,2,4,5] #deleted element 3 so now only 5 elements in the list
#tries to delete the element at the fifth index but the array indices only go from 0-4
np.delete(values, 5)
One of the ways you can solve the problem is to sort the indices that you want to delete in descending order (if you really want to delete the array).
inds_to_delete = sorted([3,1,5], reverse=True) # [5,3,1]
# then delete in order of largest to smallest ind
Or:
inds_to_keep = np.array([0,2,4])
values = values[inds_to_keep]

A probably faster way (because you don't need to delete every single value but all at once) is using a boolean mask:
values = np.array([0,1,2,3,4,5])
tobedeleted = np.array([False, True, False, True, False, True])
# So index 3, 5 and 1 are True so they will be deleted.
values_deleted = values[~tobedeleted]
#that just gives you what you want.
It is recommended on the numpy reference on np.delete
To your question: You delete one element so the array get's shorter and index 5 is no longer in the array because the former index 5 has now index 4. Delete in descending order if you want to use np.delete.
If you really want to delete with np.delete use the shorthand:
np.delete(values, [3,5,1])
If you want to delete where the values are (not the index) you have to alter the procedure a bit. If you want to delete all values 5 in your array you can use:
values[values != 5]
or with multiple values to delete:
to_delete = (values == 5) | (values == 3) | (values == 1)
values[~to_delete]
all of these give you the desired result, not sure how your data really looks like so I can't say for sure which will be the most appropriate.

The problem is that you have deleted items from values so when you are trying to delete item in index 5 there is no longer value at that index, it's now at index 4.
If you sort the list of indices to delete, and iterate over them from large to small that should workaround this issue.
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [5,3,1]: # iterate in order
values = np.delete(values,i)
print values

If you want to remove the elements of indices 3,4,1 , just do np.delete(values,[3,4,1]).
If you want in the first case to delete the fourth (index=3) item, then the fifth of the rest and finally the second of the rest, due to the order of the operations, you delete the second, fourth and sixth of the initial array. It's therefore logic that the second case fails.
You can compute the shifts (in the exemple fifth become sixth) in this way :
def multidelete(values,todelete):
todelete=np.array(todelete)
shift=np.triu((todelete>=todelete[:,None]),1).sum(0)
return np.delete(values,todelete+shift)
Some tests:
In [91]: multidelete([0, 1, 2, 3, 4, 5],[3,4,1])
Out[91]: array([0, 2, 4])
In [92]: multidelete([0, 1, 2, 3, 4, 5],[1,1,1])
Out[92]: array([0, 4, 5])
N.B. np.delete doesn't complain an do nothing if the bad indice(s) are in a list : np.delete(values,[8]) is values .

Boolean index is deprected. You can use function np.where() instead like this:
values = np.array([0,1,2,3,4,5])
print(values)
for i in [3,5,1]:
values = np.delete(values,np.where(values==i))
# values = np.delete(values,values==i) # still works with warning
print(values)

I know this question is old, but for further reference (as I found a similar source problem):
Instead of making a for loop, a solution is to filter the array with isin numpy's function. Like so,
>>> import numpy as np
>>> # np.isin(element, test_elements, assume_unique=False, invert=False)
>>> arr = np.array([1, 4, 7, 10, 5, 10])
>>> ~np.isin(arr, [4, 10])
array([ True, False, True, False, True, False])
>>> arr = arr[ ~np.isin(arr, [4, 10]) ]
>>> arr
array([1, 7, 5])
So for this particular case we can write:
values = np.array([0,1,2,3,4,5])
torem = [3,4,1]
values = values[ ~np.isin(values, torem) ]
which outputs: array([0, 2, 5])

here's how you can do it without any loop or any indexing, using numpy.setdiff1d
>>> import numpy as np
>>> array_1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> array_1
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> remove_these = np.array([1,3,5,7,9])
>>> remove_these
array([1, 3, 5, 7, 9])
>>> np.setdiff1d(array_1, remove_these)
array([ 2, 4, 6, 8, 10])

Related

How can I build a complementary array in numpy

I have an array of numbers corresponding to indices of another array.
index_array = np.array([2, 3, 5])
What I want to do is to create another array with the numbers 0, 1, 4, 6, 7, 8, 9. What I have thought is:
index_list = []
for i in range(10):
if i not in index_array:
index_list.append(i)
This works but I don't know if there is a more efficient way to do it or even a built-in function for it.
Probably the simplest solution is just to remove unwanted indices from the set:
n = 10
index_array = [2, 3, 5]
complement = np.delete(np.arange(n), index_array)
You can use numpy.setdiff1d to efficiently collect the unique value from a "universal array" that aren't in your index array. Passing assume_unique=True provides a small speed up.
When assume_unique is True, the result will be sorted so long as the input is sorted.
import numpy as np
# "Universal set" to take complement with respect to.
universe = np.arange(10)
a = np.array([2,3,5])
complement = np.setdiff1d(universe, a, assume_unique=True)
print(complement)
Results in
[0 1 4 6 7 8 9]

while accessing column i got row in numpy

I am accessing column values but it gives me the row values while indexing 2-d array in numpy.
The general format is arr_2d[row][col] or arr_2d[row,col]. recommended
is comma notation for clarity
arr_2d = np.arange(0,9).reshape((3,3))
# sub array
arr_2d[0:2,1:]
arr_2d[:,0] # in row form but the data will be of the column.
access column data but it gives me row values.
arr_2d[:][0] # it gives the first row data.
What is the difference between comma notation and bracket notation?
arr_2d[row][col] is only works as you intended, i.e, like arr_2d[row,col], if you pass an integer as row index, not slices.
For e.g.:
>>> arr_2d = np.arange(0,9).reshape((3,3))
>>> arr_2d[1][2]
5
>>> arr_2d[1,2]
5
But:
>>> arr_2d[:][2]
array([6, 7, 8])
This is because np.ndarray[:] is essentially a copy of the original array:
>>> arr_2d[:]
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> arr_2d
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
and:
>>> arr_2d[2]
array([6, 7, 8])
# so no surprises here:
>>> arr_2d[:][2]
array([6, 7, 8])
The notation arr_2d[:,0] translates to select all items in dimension 0, and the the first item in dimension 1 - amounting to the entire first column (item 0 of all rows).
The notation arr_2d[:][0] is chaining two operations:
arr_2d[:]means select all items in dimension 0 - basically referring to the entire matrix.
[0] simply selects the first item in the matrix returned by the first operation - returning the first row.
In order to select the first row, you can use either arr_2d[0] or more 'verbosely' arr_2d[0, :] (which translates to "all columns of the first row").
You could access the same items using both notations, but in different ways. For example -
In order to select the 3rd item in the 2nd row you could use:
Comma notation - arr_2d[1, 2]
Bracket notation - arr_2d[1][2]

Replace multiple elements in numpy array with 1

In a given numpy array X:
X = array([1,2,3,4,5,6,7,8,9,10])
I would like to replace indices (2, 3) and (7, 8) with a single element -1 respectively, like:
X = array([1,2,-1,5,6,7,-1,10])
In other words, I replaced values at indices (2, 3) and (7,8) of the original array with a singular value.
Question is: Is there a numpy-ish way (i.e. without for loops and usage of python lists) around it? Thanks.
Note: This is NOT equivalent of replacing a single element in-place with another. Its about replacing multiple values with a "singular" value. Thanks.
A solution using numpy.delete, similar to #pault, but more efficient as it uses pure numpy indexing. However, because of this efficient indexing, it means that you cannot pass jagged arrays as indices
Setup
a = np.array([1,2,3,4,5,6,7,8,9,10])
idx = np.stack([[2, 3], [7, 8]])
a[idx] = -1
np.delete(a, idx[:, 1:])
array([ 1, 2, -1, 5, 6, 7, -1, 10])
I'm not sure if this can be done in one step, but here's a way using np.delete:
import numpy as np
from operator import itemgetter
X = np.array(range(1,11))
to_replace = [[2,3], [7,8]]
X[list(map(itemgetter(0), to_replace))] = -1
X = np.delete(X, list(map(lambda x: x[1:], to_replace)))
print(X)
#[ 1 2 -1 5 6 7 -1 10]
First we replace the first element of each pair with -1. Then we delete the remaining elements.
Try np.put:
np.put(X, [2,3,7,8], [-1,0]) # `0` can be changed to anything that's not in the array
print(X[X!=0]) # whatever You put as an number in `put`
So basically use put to do the values for the indexes, then drop the zero-values.
Or as #khan says, can do something that's out of range:
np.put(X, [2,3,7,8], [-1,np.max(X)+1])
print(X[X!=X.max()])
All Output:
[ 1 2 -1 5 6 7 -1 10]

Python: Inplace Merge sort implementation issue

I am implementing inplace merge sort algorithm in python3. Code takes an input array and calls it self recursively (with split array as input) if length of the input array is more than one. After that, it joins two sorted arrays. Here is the code
def merge_sort(array):
"""
Input : list of values
Note :
It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves.
Returns : sorted list of values
"""
def join_sorted_arrays(array1, array2):
"""
Input : 2 sorted arrays.
Returns : New sorted array
"""
new_array = [] # this array will contain values from both input arrays.
j = 0 # Index to keep track where we have reached in second array
n2 = len(array2)
for i, element in enumerate(array1):
# We will compare current element in array1 to current element in array2, if element in array2 is smaller, append it
# to new array and look at next element in array2. Keep doing this until either array2 is exhausted or an element of
# array2 greater than current element of array1 is found.
while j < n2 and element > array2[j]:
new_array.append(array2[j])
j += 1
new_array.append(element)
# If there are any remaining values in array2, that are bigger than last element in array1, then append those to
# new array.
for i in range(j,n2):
new_array.append(array2[i])
return new_array
n = len(array)
if n == 1:
return array
else:
# print('array1 = {0}, array2 = {1}'.format(array[:int(n/2)], array[int(n/2):]))
array[:int(n/2)] = merge_sort(array[:int(n/2)])
array[int(n/2):] = merge_sort(array[int(n/2):])
# print('array before joining : ',array)
array = join_sorted_arrays(array[:int(n/2)],array[int(n/2):])
# print('array after joining : ',array)
return array
Now if the code is tested,
a = [2,1,4,3,1,2,3,4,2,7,8,10,3,4]
merge_sort(a)
print(a)
out : [1, 1, 2, 2, 3, 3, 4, 2, 3, 4, 4, 7, 8, 10]
If you uncomment the print statements in the above function, you will notice that, a = given output, just before the last call of join_sorted_arrays. After this function has been called, array 'a' should be sorted. To my surprise, if I do the following, output is correct.
a = [2,1,4,3,1,2,3,4,2,7,8,10,3,4]
a = merge_sort(a)
print(a)
out : [1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 7, 8, 10]
I need some help to understand why this is happening.
I am beginner, so any other comments about coding practices etc. are also welcome.
When you reassign array as the output of join_sorted_arrays() with
array = join_sorted_arrays(array[:int(n/2)],array[int(n/2):])
you're not updating the value of a anymore.
Seeing as you pass in a as the argument array, it's understandable why all variables named array in a function might seem like they should update the original value of array (aka a). But instead, what's happening with array = join_sorted_arrays(...) is that you have a new variable array scoped within the merge_sort() function. Returning array from the function returns that new, sorted, set of values.
The reference to a was being modified up until that last statement, which is why it looks different with print(a) after merge_sort(a). But you'll only get the final, sorted output from the returned value of merge_sort().
It might be clearer if you look at:
b = merge_sort(a)
print(a) # [1, 1, 2, 2, 3, 3, 4, 2, 3, 4, 4, 7, 8, 10]
print(b) # [1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 7, 8, 10]
Note that Python isn't a pass-by-reference language, and the details of what it actually is can be a little weird to suss out at first. I'm always going back to read on how it works when I get tripped up. There are plenty of SO posts on the topic, which may be of some use to you here.
For example, this one and this one.

Renumbering a 1D mesh in Python

First of all, I couldn't find the answer in other questions.
I have a numpy array of integer, this is called ELEM, the array has three columns that indicate, element number, node 1 and node 2. This is one dimensional mesh. What I need to do is to renumber the nodes, I have the old and new node numbering tables, so the algorithm should replace every value in the ELEM array according to this tables.
The code should look like this
old_num = np.array([2, 1, 3, 6, 5, 9, 8, 4, 7])
new_num = np.arange(1,10)
ELEM = np.array([ [1, 1, 3], [2, 3, 6], [3, 1, 3], [4, 5, 6]])
From now, for every element in the second and third column of the ELEM array I should replace every integer from the corresponding integer specified according to the new_num table.
If you're doing a lot of these, it makes sense to encode the renumbering in a dictionary for fast lookup.
lookup_table = dict( zip( old_num, new_num ) ) # create your translation dict
vect_lookup = np.vectorize( lookup_table.get ) # create a function to do the translation
ELEM[:, 1:] = vect_lookup( ELEM[:, 1:] ) # Reassign the elements you want to change
np.vectorize is just there to make things nicer syntactically. All it does is allow us to map over the values of the array with our lookup_table.get function
I actually couldn't exactly get what your problem is but, I tried to help you as far as I could understood...
I think you need to replace, for example 2 with 1, or 7 with 10, right? In such a case, you can create a dictionary for numbers that are to be replaced. The 'dict' below is for that purpose. It could also be done by using tuples or lists but for such purposes it is better to use dictionaries. Afterwards, just replace each element by looking into the dictionary.
The code below is a very basic one is relatively easy to understand. For sure there are more pythonic ways to do that. But if you are new into Python, the code below would be the most appropriate one.
import numpy as np
# Data you provided
old_num = np.array([2, 1, 3, 6, 5, 9, 8, 4, 7])
new_num = np.arange(1,10)
ELEM = np.array([ [1, 1, 3], [2, 3, 6], [3, 1, 3], [4, 5, 6]])
# Create a dict for the elements to be replaced
dict = {}
for i_num in range(len(old_num)):
num = old_num[i_num]
dict[num] = new_num[i_num]
# Replace the elements
for element in ELEM:
element[1] = dict[element[1]]
element[2] = dict[element[2]]
print ELEM

Categories