How can I build a complementary array in numpy - python

I have an array of numbers corresponding to indices of another array.
index_array = np.array([2, 3, 5])
What I want to do is to create another array with the numbers 0, 1, 4, 6, 7, 8, 9. What I have thought is:
index_list = []
for i in range(10):
if i not in index_array:
index_list.append(i)
This works but I don't know if there is a more efficient way to do it or even a built-in function for it.

Probably the simplest solution is just to remove unwanted indices from the set:
n = 10
index_array = [2, 3, 5]
complement = np.delete(np.arange(n), index_array)

You can use numpy.setdiff1d to efficiently collect the unique value from a "universal array" that aren't in your index array. Passing assume_unique=True provides a small speed up.
When assume_unique is True, the result will be sorted so long as the input is sorted.
import numpy as np
# "Universal set" to take complement with respect to.
universe = np.arange(10)
a = np.array([2,3,5])
complement = np.setdiff1d(universe, a, assume_unique=True)
print(complement)
Results in
[0 1 4 6 7 8 9]

Related

Is there any way to join automatically lists with same values from the same set?

I have an array of lists and I will like to join all the lists with repeated values.
For example if I have the next array a:
import numpy as np
a=np.array([[1,2,3],[4,3,2],[10,8,9],[72,3,6]])
The expected result should be:
result=array([[ 1, 2, 3, 4, 6, 72],
[ 10, 8, 9]])
The thing you are talking about is a data structure known as "Disjoint Set".
While this is not in-built in python, a simple mock-up of this can be easily made for your case.
def merge_sets (arrays): # arrays could be your np array
sets = list()
for a in arrays:
sets.append(a)
a, b = 0, 1
while a < len(sets):
if b < len(sets) and sets[a].intersect(sets[b]):
c = sets[a].union(sets[b])
sets.remove(a)
sets.remove(b)
sets.append(c)
return sets
P.S. This is just a rough implementation to demonstrate how you can do it. It has not been checked for fallacies.

Replace multiple elements in numpy array with 1

In a given numpy array X:
X = array([1,2,3,4,5,6,7,8,9,10])
I would like to replace indices (2, 3) and (7, 8) with a single element -1 respectively, like:
X = array([1,2,-1,5,6,7,-1,10])
In other words, I replaced values at indices (2, 3) and (7,8) of the original array with a singular value.
Question is: Is there a numpy-ish way (i.e. without for loops and usage of python lists) around it? Thanks.
Note: This is NOT equivalent of replacing a single element in-place with another. Its about replacing multiple values with a "singular" value. Thanks.
A solution using numpy.delete, similar to #pault, but more efficient as it uses pure numpy indexing. However, because of this efficient indexing, it means that you cannot pass jagged arrays as indices
Setup
a = np.array([1,2,3,4,5,6,7,8,9,10])
idx = np.stack([[2, 3], [7, 8]])
a[idx] = -1
np.delete(a, idx[:, 1:])
array([ 1, 2, -1, 5, 6, 7, -1, 10])
I'm not sure if this can be done in one step, but here's a way using np.delete:
import numpy as np
from operator import itemgetter
X = np.array(range(1,11))
to_replace = [[2,3], [7,8]]
X[list(map(itemgetter(0), to_replace))] = -1
X = np.delete(X, list(map(lambda x: x[1:], to_replace)))
print(X)
#[ 1 2 -1 5 6 7 -1 10]
First we replace the first element of each pair with -1. Then we delete the remaining elements.
Try np.put:
np.put(X, [2,3,7,8], [-1,0]) # `0` can be changed to anything that's not in the array
print(X[X!=0]) # whatever You put as an number in `put`
So basically use put to do the values for the indexes, then drop the zero-values.
Or as #khan says, can do something that's out of range:
np.put(X, [2,3,7,8], [-1,np.max(X)+1])
print(X[X!=X.max()])
All Output:
[ 1 2 -1 5 6 7 -1 10]

Splitting a array in python

How do you split an array in python in terms of the number of elements in the array. Im doing knn classification and I need to take into account of the first k elements of the 2D array.
import numpy as np
x = np.array([1, 2, 4, 4, 6, 7])
print(x[range(0, 4)])
You can also split it up by taking the range of elements that you want to work with. You could store x[range(x, x)]) in a variable and work with those particular elements of the array as well. The output as you can see splits the array up:
[1 2 4 4]
In Numpy, there is a method numpy.split.
x = np.arange(9.0)
np.split(x, 3)

Delete some elements from numpy array

One interesting question:
I would like to delete some elements from a numpy array but just as below simplified example code, it works if didn't delete the last element, but it failure if we wish to delete the last element.
Below code works fine:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,4,1]:
values = np.delete(values,i)
print values
The output is:
[0 1 2 3 4 5]
[0 2 4]
If we only change 4 to 5, then it will fail:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,5,1]:
values = np.delete(values,i)
print values
The error message:
IndexError: index 5 is out of bounds for axis 0 with size 5
Why this error only happen if delete the last element? what's correct way to do such tasks?
Keep in mind that np.delete(arr, ind) deletes the element at index ind NOT the one with that value.
This means that as you delete things, the array is getting shorter. So you start with
values = [0,1,2,3,4,5]
np.delete(values, 3)
[0,1,2,4,5] #deleted element 3 so now only 5 elements in the list
#tries to delete the element at the fifth index but the array indices only go from 0-4
np.delete(values, 5)
One of the ways you can solve the problem is to sort the indices that you want to delete in descending order (if you really want to delete the array).
inds_to_delete = sorted([3,1,5], reverse=True) # [5,3,1]
# then delete in order of largest to smallest ind
Or:
inds_to_keep = np.array([0,2,4])
values = values[inds_to_keep]
A probably faster way (because you don't need to delete every single value but all at once) is using a boolean mask:
values = np.array([0,1,2,3,4,5])
tobedeleted = np.array([False, True, False, True, False, True])
# So index 3, 5 and 1 are True so they will be deleted.
values_deleted = values[~tobedeleted]
#that just gives you what you want.
It is recommended on the numpy reference on np.delete
To your question: You delete one element so the array get's shorter and index 5 is no longer in the array because the former index 5 has now index 4. Delete in descending order if you want to use np.delete.
If you really want to delete with np.delete use the shorthand:
np.delete(values, [3,5,1])
If you want to delete where the values are (not the index) you have to alter the procedure a bit. If you want to delete all values 5 in your array you can use:
values[values != 5]
or with multiple values to delete:
to_delete = (values == 5) | (values == 3) | (values == 1)
values[~to_delete]
all of these give you the desired result, not sure how your data really looks like so I can't say for sure which will be the most appropriate.
The problem is that you have deleted items from values so when you are trying to delete item in index 5 there is no longer value at that index, it's now at index 4.
If you sort the list of indices to delete, and iterate over them from large to small that should workaround this issue.
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [5,3,1]: # iterate in order
values = np.delete(values,i)
print values
If you want to remove the elements of indices 3,4,1 , just do np.delete(values,[3,4,1]).
If you want in the first case to delete the fourth (index=3) item, then the fifth of the rest and finally the second of the rest, due to the order of the operations, you delete the second, fourth and sixth of the initial array. It's therefore logic that the second case fails.
You can compute the shifts (in the exemple fifth become sixth) in this way :
def multidelete(values,todelete):
todelete=np.array(todelete)
shift=np.triu((todelete>=todelete[:,None]),1).sum(0)
return np.delete(values,todelete+shift)
Some tests:
In [91]: multidelete([0, 1, 2, 3, 4, 5],[3,4,1])
Out[91]: array([0, 2, 4])
In [92]: multidelete([0, 1, 2, 3, 4, 5],[1,1,1])
Out[92]: array([0, 4, 5])
N.B. np.delete doesn't complain an do nothing if the bad indice(s) are in a list : np.delete(values,[8]) is values .
Boolean index is deprected. You can use function np.where() instead like this:
values = np.array([0,1,2,3,4,5])
print(values)
for i in [3,5,1]:
values = np.delete(values,np.where(values==i))
# values = np.delete(values,values==i) # still works with warning
print(values)
I know this question is old, but for further reference (as I found a similar source problem):
Instead of making a for loop, a solution is to filter the array with isin numpy's function. Like so,
>>> import numpy as np
>>> # np.isin(element, test_elements, assume_unique=False, invert=False)
>>> arr = np.array([1, 4, 7, 10, 5, 10])
>>> ~np.isin(arr, [4, 10])
array([ True, False, True, False, True, False])
>>> arr = arr[ ~np.isin(arr, [4, 10]) ]
>>> arr
array([1, 7, 5])
So for this particular case we can write:
values = np.array([0,1,2,3,4,5])
torem = [3,4,1]
values = values[ ~np.isin(values, torem) ]
which outputs: array([0, 2, 5])
here's how you can do it without any loop or any indexing, using numpy.setdiff1d
>>> import numpy as np
>>> array_1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> array_1
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> remove_these = np.array([1,3,5,7,9])
>>> remove_these
array([1, 3, 5, 7, 9])
>>> np.setdiff1d(array_1, remove_these)
array([ 2, 4, 6, 8, 10])

Acquiring the Minimum array out of Multiple Arrays by order in Python

Say that I have 4 numpy arrays
[1,2,3]
[2,3,1]
[3,2,1]
[1,3,2]
In this case, I've determined [1,2,3] is the "minimum array" for my purposes, as it is one of two arrays with lowest value at index 0, and of those two arrays it has the the lowest index 1. If there were more arrays with similar values, I would need to compare the next index values, and so on.
How can I extract the array [1,2,3] in that same order from the pile?
How can I extend that to x arrays of size n?
Thanks
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
Here's an idea using numpy:
import numpy
a = numpy.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
col = 0
while a.shape[0] > 1:
b = numpy.argmin(a[:,col:], axis=1)
a = a[b == numpy.min(b)]
col += 1
print a
This checks column by column until only one row is left.
numpy's lexsort is close to what you want. It sorts on the last key first, but that's easy to get around:
>>> a = np.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
>>> order = np.lexsort(a[:, ::-1].T)
>>> order
array([0, 3, 1, 2])
>>> a[order]
array([[1, 2, 3],
[1, 3, 2],
[2, 3, 1],
[3, 2, 1]])

Categories