Replacing values greater than a limit in a numpy array - python

I have an array n x m, and maximum values for each column. What's the best way to replace values greater than the maximum, besides checking each element?
For example:
def check_limits(bad_array, maxs):
good_array = np.copy(bad_array)
for i_line in xrange(bad_array.shape[0]):
for i_column in xrange(bad_array.shape[1]):
if good_array[i_line][i_column] >= maxs[i_column]:
good_array[i_line][i_column] = maxs[i_column] - 1
return good_array
Anyway to do this faster and in a more concise way?

Use putmask:
import numpy as np
a = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
m = np.array([7,6,5,4])
# This is what you need:
np.putmask(a, a >= m, m - 1)
# a is now:
np.array([[0, 1, 2, 3],
[4, 5, 4, 3],
[6, 5, 4, 3]])

Another way is to use the clip function:
using eumiro's example:
bad_array = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
maxs = np.array([7,6,5,4])
good_array = bad_array.clip(max=maxs-1)
OR
bad_array.clip(max=maxs-1, out=good_array)
you can also specify the lower limit, by adding the argument min=

If we aren't assuming anything about the structure of bad_array, your code is optimal by the adversary argument. If we know that each column is sorted in ascending order, then as soon as we reach a value higher than the max then we know every following element in that column is also higher than the limit, but if we have no such assumption we simply have to check every single one.
If you decide to sort each column first, this would take (n columns * nlogn) time, which is already greater than the n*n time it takes to check each element.
You could also create the good_array by checking and copying in one element at a time, instead of copying all of the elements from bad_array and checking them later. This should roughly cut down the time by a factor of .5

If the number of columns isn't large, one optimization would be:
def check_limits(bad_array, maxs):
good_array = np.copy(bad_array)
for i_column in xrange(bad_array.shape[1]):
to_replace = (good_array[:,i_column] >= maxs[i_column])
good_array[to_replace, i_column] = maxs[i_column] - 1
return good_array

Related

Python: Calculate total number of comparisons to find a match

I have two arrays. One with 10 Indexes, One with 2 Indexes.
I want to check if the large array has the exact values of the small array.
There is a total of 9 comparisons that need to be made.
How do I calculate this value for arrays of different sizes?
I need this value to manipulate control flow.
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
On the 9th Comparison it will be true.
The brute-force check will take up to len(largeArr) - len(smallArr) + 1 comparisons, each of size up to len(smallArr). It will take that many if it's not found. If found, it might take half of that on average, but that depends on the statistics of their entries. So this is O(n), where n = len(largeArr).
However, if largeArr is sorted as your example shows, it would be much more efficient to do a binary search for smallArr[0]. That would make checking be O(log(n)).
Another approach which would be much faster if you want to check many different smallArr against a given largeArr: generate a hash of each consecutive slice of length n = len(smallArr) taken from largeArr, and put those hashes in a set or dict. Then you can very quickly check if a particular smallArr is present by computing its hash and checking for membership in the pre-computed set or dict.
Here's an example of this latter approach:
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
n = len(smallArr)
match = set()
for i in range(0, len(largeArr) - n + 1):
match.add(tuple(largeArr[i:i+n]))
print(tuple(smallArr) in match)
This uses tuples since they are immutable, unlike slices. Checking is now close to O(1), or at least as quick as a set can test membership (which will actually grow slowly with n depending on the implementation).
Here is another solution. The above solution is perfect, my solution just happens to run in constant space and linear time complexity.
That is;
Time: O(N)
Space: O(1)
from typing import List # for types annotation
# You can solve this in a linear fashion like this...
def mapable(universe: List[int], elements: List[int], from_indx: int) -> bool:
# tries to address worst case
last_mapping_indx: int = from_indx + (len(elements) - 1)
if last_mapping_indx >= len(universe) or not(elements[-1] == universe[last_mapping_indx]):
return False
# why use a loop? using a loop is more dynamic, in case elements can change in size
# tries to match a subset in a set
for num in elements:
if from_indx >= len(universe) or not (num == universe[from_indx]):
return False
from_indx += 1
return True
# T = typedVar('T')
# you can find a creative way to use zip(itr[T], itr[T]) here to achieve the same
def a_in_b(larger: List[int], smaller: List[int]) -> bool:
for indx, num in enumerate(larger):
if num == smaller[0]:
if mapable(larger, smaller, indx):
return True
# return indx + (len(smaller)) # this is the solution if you only care about how many comparison were made
return False
# this code will check that a list is found in a larger one-dimentional list in linear faction. If you look at the helper-method(mapable) the worst case scenario would be the following
#larger: [8, 8, 8, 8, 8, 8, 8, 8, 8, 9]
#smaller: [8, 9]
# where it tries to iterate through smaller n-1 times. This would drop our complexity from O(N) to O(n * m) where m = len(smaller). Hence why we have an if statement at the beginning of mapable.
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [8, 9, 10]
print(a_in_b(largeArr, smallArr)) # True
As you have the numpy tag, use a numpy approach:
from numpy.lib.stride_tricks import sliding_window_view as swv
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
out = (swv(largeArr, len(smallArr)) == smallArr).any()
# True
Intermediate:
swv(largeArr, len(smallArr))
array([[ 1, 2],
[ 2, 3],
[ 3, 4],
[ 4, 5],
[ 5, 6],
[ 6, 7],
[ 7, 8],
[ 8, 9],
[ 9, 10]])
repeated comparisons
If many comparisons need to be done:
from numpy.lib.stride_tricks import sliding_window_view as swv
existing = set(map(tuple, swv(largeArr, len(smallArr))))
tuple(smallArr) in existing
# True
tuple([12, 4]) in existing
# False

In Numpy, how to use an array of items as the guide to determine the index of items in a second array?

This is hard to describe with a good title. Here is what I want to do:
I have a numpy array with unique items in it:
unique_arr = np.asarray([1, 4, 12, 5])
...then I have a second array that is very long, and has many occurrences of the items in the first array:
long_arr = np.asarray([12, 4, 4, 1, 12, 5, 5, ... ])
I'd like to make a third array that is the same length of long_arr, but instead of the items long_arr has, it has the indexes of those items in unique_arr:
long_idxs = something_magic(unique_arr, long_arr)
print(long_idxs)
>>> [2, 1, 1, 0, 2, 3, 3, ...]
Is there an efficient numpy-way of accomplishing this?
You can use searchsorted, but then you need to sort unique_arr first:
unique, idx = np.unique(unique_arr, return_index=True)
a = np.searchsorted(unique, long_arr)
long_idxs = idx[a]
Output:
array([2, 1, 1, 0, 2, 3, 3])
Note that searchsorted doesn't check for exact match, e.g. if long_arr contains 3, it would returns 1 still. You may need to validate the result.

Renumbering a 1D mesh in Python

First of all, I couldn't find the answer in other questions.
I have a numpy array of integer, this is called ELEM, the array has three columns that indicate, element number, node 1 and node 2. This is one dimensional mesh. What I need to do is to renumber the nodes, I have the old and new node numbering tables, so the algorithm should replace every value in the ELEM array according to this tables.
The code should look like this
old_num = np.array([2, 1, 3, 6, 5, 9, 8, 4, 7])
new_num = np.arange(1,10)
ELEM = np.array([ [1, 1, 3], [2, 3, 6], [3, 1, 3], [4, 5, 6]])
From now, for every element in the second and third column of the ELEM array I should replace every integer from the corresponding integer specified according to the new_num table.
If you're doing a lot of these, it makes sense to encode the renumbering in a dictionary for fast lookup.
lookup_table = dict( zip( old_num, new_num ) ) # create your translation dict
vect_lookup = np.vectorize( lookup_table.get ) # create a function to do the translation
ELEM[:, 1:] = vect_lookup( ELEM[:, 1:] ) # Reassign the elements you want to change
np.vectorize is just there to make things nicer syntactically. All it does is allow us to map over the values of the array with our lookup_table.get function
I actually couldn't exactly get what your problem is but, I tried to help you as far as I could understood...
I think you need to replace, for example 2 with 1, or 7 with 10, right? In such a case, you can create a dictionary for numbers that are to be replaced. The 'dict' below is for that purpose. It could also be done by using tuples or lists but for such purposes it is better to use dictionaries. Afterwards, just replace each element by looking into the dictionary.
The code below is a very basic one is relatively easy to understand. For sure there are more pythonic ways to do that. But if you are new into Python, the code below would be the most appropriate one.
import numpy as np
# Data you provided
old_num = np.array([2, 1, 3, 6, 5, 9, 8, 4, 7])
new_num = np.arange(1,10)
ELEM = np.array([ [1, 1, 3], [2, 3, 6], [3, 1, 3], [4, 5, 6]])
# Create a dict for the elements to be replaced
dict = {}
for i_num in range(len(old_num)):
num = old_num[i_num]
dict[num] = new_num[i_num]
# Replace the elements
for element in ELEM:
element[1] = dict[element[1]]
element[2] = dict[element[2]]
print ELEM

How to return all the minimum indices in numpy

I am a little bit confused reading the documentation of argmin function in numpy.
It looks like it should do the job:
Reading this
Return the indices of the minimum values along an axis.
I might assume that
np.argmin([5, 3, 2, 1, 1, 1, 6, 1])
will return an array of all indices: which will be [3, 4, 5, 7]
But instead of this it returns only 3. Where is the catch, or what should I do to get my result?
That documentation makes more sense when you think about multidimensional arrays.
>>> x = numpy.array([[0, 1],
... [3, 2]])
>>> x.argmin(axis=0)
array([0, 0])
>>> x.argmin(axis=1)
array([0, 1])
With an axis specified, argmin takes one-dimensional subarrays along the given axis and returns the first index of each subarray's minimum value. It doesn't return all indices of a single minimum value.
To get all indices of the minimum value, you could do
numpy.where(x == x.min())
See the documentation for numpy.argmax (which is referred to by the docs for numpy.argmin):
In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.
The phrasing of the documentation ("indices" instead of "index") refers to the multidimensional case when axis is provided.
So, you can't do it with np.argmin. Instead, this will work:
np.where(arr == arr.min())
I would like to quickly add that as user grofte mentioned, np.where returns a tuple and it states that it is a shorthand for nonzero which has a corresponding method flatnonzero which returns an array directly.
So, the cleanest version seems to be
my_list = np.array([5, 3, 2, 1, 1, 1, 6, 1])
np.flatnonzero(my_list == my_list.min())
=> array([3, 4, 5, 7])
Assuming that you want the indices of a list, not a numpy array, try
import numpy as np
my_list = [5, 3, 2, 1, 1, 1, 6, 1]
np.where(np.array(my_list) == min(my_list))[0]
The index [0] is because numpy returns a tuple of your answer and nothing (answer as a numpy array). Don't ask me why.
Recommended way (by numpy documents) to get all indices of the minimum value is:
x = np.array([5, 3, 2, 1, 1, 1, 6, 1])
a, = np.nonzero(x == x.min()) # a=>array([3, 4, 5, 7])

Shifting the elements of an array in python

I am trying to shift the elements of an array cyclically so all elements are replaced with the previous element, and the last rotates to the first poaition, like so: shift(1, [5, 6, 7])=>[7, 5, 6].
The following code only returns [7,5]. Could someone please tell me what is causing this to happen? I went through the code step by step and simply could not find a solution. I also tried 3 different interpreters.
def shift(key, array):
counter = range(len(array)-1)
new = counter
for i in counter:
new[i] = array[i-key]
return new
print shift(1, [5, 6, 7])
range(5) returns [0, 1, 2, 3, 4]. It excludes 5.
Just remove the -1 from range(len(array)-1) and it should work.
You could also use list slicing:
def shift(key, array):
return array[-key:] + array[:-key]
Here is the python way:
def shift(key, array):
return array[-key:]+array[:-key]
You need to remove the -1 from your range:
counter = range(len(array))
If you want a faster method though,
You could instead try using a deque?
from collections import deque
def shift(key, array):
a = deque(array) # turn list into deque
a.rotate(key) # rotate deque by key
return list(a) # turn deque back into a list
print (shift(1, [5, 6, 7]))
The answers are good, but it doesn't work if the key is greater than the length of the array. If you think the key will be larger than the array length, use the following:
def shift(key, array):
return array[key % len(array):] + array[:key % len(array)]
A positive key will shift left and a negative key will shift right.
The numpy package contains the roll function to perform exactly this task:
import numpy as np
b=[5,6,7]
c=np.roll(b,1).tolist()
>>> c
[7, 5, 6]
A function using this and returning a list is:
def shift(array,key):
return np.roll(array,key).tolist()
#!/usr/bin/env python
def ashift(key,array):
newqueue = array[-key:]
newqueue.extend( array[:-key] )
return newqueue
print ashift( 1, [5,6,7] )
print ashift( 2, [5,6,7] )
Results in:
$ ./shift
[7, 5, 6]
[6, 7, 5]
The only potential penalty is if the array is sufficiently large, you may encounter memory issues, as this operation is doing a copy. Using a "key" with an absolute value greater than the length of the array will result in wrapping and results may not be as expected, but will not error out.
Good old fashioned POP & APPEND
arr = [5, 6, 7]
for _ in range(0, 2):
shift = arr.pop(0)
arr.append(shift)
print(arr)
=>[7, 5, 6]
You can use numpy roll
>>> x = np.arange(10)
>>> np.roll(x, 2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> np.roll(x, -2)
array([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])

Categories