"Unsparsing" numpy Arrays with Given Masks

"Unsparsing" numpy Arrays with Given Masks - python

Suppose there are two arrays, vals contains values, and masks contains booleans indicating whether to use the values in vals, or nans. The goal is to build an array ret of the same length as masks, containing the values from vals in corresponding locations to True entries in masks.
For example, suppose
vals = np.array([1, 2])
masks = [True, False, False, True]
Then ret, the return value, should be
array([1, None, None, 2], dtype=object)
This is obviously very easy to do using loops:
import numpy as np
def unsparse(vals, masks):
vals_i = 0
ret = []
for m in masks:
if m:
ret.append(vals[vals_i])
vals_i += 1
else:
ret.append(None)
return np.array(ret)
>> unsparse(np.array([1, 2]), [True, False, False, True])
array([1, None, None, 2], dtype=object)
Is there a way to do so without loops and more concisely?

You could do something like this -
out = np.empty(masks.shape,dtype=object)
out[masks] = vals[:masks.sum()]
Please note that :masks.sum() selects first N elements from vals, where N is the number of TRUE elements in masks.
If it's guaranteed that the number of TRUE elements is same as number of elements in vals, then you could just simply do -
out[masks] = vals
Sample run -
In [34]: vals = np.array([1, 2, 6, 8, 9])
...: masks = np.array([True, False, False, True, False, True])
...:
In [35]: out = np.empty(masks.shape,dtype=object)
...: out[masks] = vals[:masks.sum()]
...:
In [36]: out
Out[36]: array([1, None, None, 2, None, 6], dtype=object)

Related

arranging a array without loops list comp or recursion

Definition:Arranged array
an arranged array is an array of dim 2 , shape is square matrix (NXN) and for every cell in the matrix : A[I,J] > A[I,J+1] AND A[I,J] > A[I+1,J]
I have an assignment to write a func that:
gets a numpy array and returns
True if - the given array is an arranged array
False - otherwise
note: We CANNOT use loops, list comps OR recursion. the point of the task is to use numpy things.
assumptions: we can assume that the array isn't empty and has no NA's, also all of the cells are numerics
My code isn't very numpy oriented.. :
def is_square_ordered_matrix(A):
# Checking if the dimension is 2
if A.ndim != 2:
return False
# Checking if it is a squared matrix
if A.shape[0] != A.shape[1]:
return False
# Saving the original shape to reshape later
originalDim = A.shape
# Making it a dim of 1 to use it as a list
arrayAsList = list((A.reshape((1,originalDim[0]**2)))[0])
# Keeping original order before sorting
originalArray = arrayAsList[:]
# Using the values of the list as keys to see if there are doubles
valuesDictionary = dict.fromkeys(arrayAsList, 1)
# If len is different, means there are doubles and i should return False
if len(arrayAsList) != len(valuesDictionary):
return False
# If sorted list is equal to original list it means the original is already ordered and i should return True
arrayAsList.sort(reverse=True)
if originalArray == arrayAsList:
return True
else:
return False
True example:
is_square_ordered_matrix(np.arange(8,-1,-1).reshape((3,3)))
False example:
is_square_ordered_matrix(np.arange(9).reshape((3,3)))
is_square_ordered_matrix(np.arange(5,-1,-1).reshape((3,2)))

Simple comparison:
>>> def is_square_ordered_matrix(a):
... return a.shape[0] == a.shape[1] and np.all(a[:-1] > a[1:]) and np.all(a[:, :-1] > a[:, 1:])
...
>>> is_square_ordered_matrix(np.arange(8,-1,-1).reshape((3,3)))
True
>>> is_square_ordered_matrix(np.arange(9).reshape((3,3)))
False
>>> is_square_ordered_matrix(np.arange(5,-1,-1).reshape((3,2)))
False
First, compare a[:-1] with a[1:], which will compare the elements of each row with the elements of the next row, and then use np.all to judge:
>>> a = np.arange(8,-1,-1).reshape((3,3))
>>> a[:-1] # First and second lines
array([[8, 7, 6],
[5, 4, 3]])
>>> a[1:] # Second and third lines
array([[5, 4, 3],
[2, 1, 0]])
>>> a[:-1] > a[1:]
array([[ True, True, True],
[ True, True, True]])
>>> np.all(a[:-1] > a[1:])
True
Then compare a[:,:-1] with a[:, 1:], which will compare the columns:
>>> a[:, :-1] # First and second columns
array([[8, 7],
[5, 4],
[2, 1]])
>>> a[:, 1:] # Second and third columns
array([[7, 6],
[4, 3],
[1, 0]])
>>> a[:, :-1] > a[:, 1:]
array([[ True, True],
[ True, True],
[ True, True]])
The result of row comparison and column comparison is the result you want.

Numpy getting row indices of last two elements of each column in mask

I have a boolean mask shaped (M, N). Each column in the mask may have a different number of True elements, but is guaranteed to have at least two. I want to find the row index of the last two such elements as efficiently as possible.
If I only wanted one element, I could do something like (M - 1) - np.argmax(mask[::-1, :], axis=0). However, that won't help me get the second-to-last index.
I've come up with an iterative solution using np.where or np.nonzero:
M = 4
N = 3
mask = np.array([
[False, True, True],
[True, False, True],
[True, False, True],
[False, True, False]
])
result = np.zeros((2, N), dtype=np.intp)
for col in range(N):
result[:, col] = np.flatnonzero(mask[:, col])[-2:]
This creates the expected result:
array([[1, 0, 1],
[2, 3, 2]], dtype=int64)
I would like to avoid the final loop. Is there a reasonably vectorized form of the above? I am looking for specifically two rows, which are always guaranteed to exist. A general solution for arbitrary element counts is not required.

An argsort does it -
In [9]: np.argsort(mask,axis=0,kind='stable')[-2:]
Out[9]:
array([[1, 0, 1],
[2, 3, 2]])
Another with cumsum -
c = mask.cumsum(0)
out = np.where((mask & (c>=c[-1]-1)).T)[1].reshape(-1,2).T
Specifically for exactly two rows, one way with argmax -
c = mask.copy()
idx = len(c)-c[::-1].argmax(0)-1
c[idx,np.arange(len(idx))] = 0
idx2 = len(c)-c[::-1].argmax(0)-1
out = np.vstack((idx2,idx))

classify np.arrays as duplicates

My goal is to take a list of np.arrays and create an associated list or array that classifies each as having a duplicate or not. Here's what I thought would work:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
count_dict = dict(zip(uniques, counts))
[count_dict[i] for i in www]
The desired output for this case would be :
[1, 1, 0]
because the first and second element have another copy within the original list. It seems that the problem is that I cannot use a np.array as a key for a dictionary.
Suggestions?

First convert www to a 2D Numpy array then do the following:
In [18]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[18]: array([1, 1, 0])
here we use broadcasting for check the equality of all www rows with uniques array and then using all() on last axis to find out which of its rows are completely equal to uniques rows.
Here's the elaborated results:
In [20]: (www[:,None] == uniques).all(2)
Out[20]:
array([[ True, False],
[ True, False],
[False, True]])
# Respective indices in `counts` array
In [21]: np.where((www[:,None] == uniques).all(2))[1]
Out[21]: array([0, 0, 1])
In [22]: counts[np.where((www[:,None] == uniques).all(2))[1]] > 1
Out[22]: array([ True, True, False])
In [23]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[23]: array([1, 1, 0])

In Python, lists (and numpy arrays) cannot be hashed, so they can't be used as dictionary keys. But tuples can! So one option would be to convert your original list to a tuple, and to convert uniques to a tuple. The following works for me:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
www_tuples = [tuple(l) for l in www] # list of tuples
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
# convert uniques to tuples
uniques_tuples = [tuple(l) for l in uniques]
count_dict = dict(zip(uniques_tuples, counts))
[count_dict[i] for i in www_tuples]
Just a heads-up: this will double your memory consumption, so it may not be the best solution if www is large.
You can mitigate the extra memory consumption by ingesting your data as tuples instead of numpy arrays if possible.

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]

If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

What is the '<' doing in this line?: data += dt < b

Input:
dt = [6,7,8,9,10]
data = [1,2,3,4,5]
b = 8.0
b = np.require(b, dtype=np.float)
data += dt < b
data
Output:
array([2, 3, 3, 4, 5])
I tried to input different number but still couldn't figure out what's the "<" doing there....
Also, it seems to work only when b is np.float (hence the conversion).

The < with numpy arrays does an element-wise comparison. That means it returns an array where there is a True where the condition is true and False if not. The np.require line is necessary here so it actually uses NumPy arrays. You could drop the np.require if you converted your data and dt to np.arrays beforehand.
Then the result is added (element-wise) to the numeric array. In this context True is equal to 1 and False to zero.
>>> dt < b # which elements are smaller than b?
array([ True, True, False, False, False])
>>> 0 + (dt < b) # boolean arrays in arithmetic operations with numbers
array([1, 1, 0, 0, 0])
So it adds 1 to every element of data where the element in dt is smaller than 8.

dt is a list:
In [50]: dt = [6,7,8,9,10]
In [51]: dt < 8
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-51-3d06f93227f5> in <module>()
----> 1 dt < 8
TypeError: '<' not supported between instances of 'list' and 'int'
< (.__lt__) is not defined for lists.
But if one element of the comparison is an ndarray, then the numpy definition of __lt__ applies. dt is turned into an array, and it does an element by element comparison.
In [52]: dt < np.array(8)
Out[52]: array([ True, True, False, False, False])
In [53]: np.array(dt) < 8
Out[53]: array([ True, True, False, False, False])
numpy array operations also explain the data += part:
In [54]: data = [1,2,3,4,5] # a list
In [55]: data + (dt < np.array(8)) # list=>array, and boolean array to integer array
Out[55]: array([2, 3, 3, 4, 5])
In [56]: data
Out[56]: [1, 2, 3, 4, 5]
In [57]: data += (dt < np.array(8))
In [58]: data
Out[58]: array([2, 3, 3, 4, 5])
Actually I'm a bit surprised that with the += data has been changed from list to array. It means the data+=... has been implemented as an assignment:
data = data + (dt <np.array(8))
Normally + for a list is a concatenate:
In [61]: data += ['a','b','c']
In [62]: data
Out[62]: [1, 2, 3, 4, 5, 'a', 'b', 'c']
# equivalent of: data.extend(['a','b','c'])
You can often get away with using lists in array contexts, but it's better to make objects arrays, so you do get these implicit, and sometimes unexpected, conversions.

This is just an alias (or shortcut or convenience notation) to the equivalent function: numpy.less()
In [116]: arr1 = np.arange(8)
In [117]: scalar = 6.0
# comparison that generates a boolean mask
In [118]: arr1 < scalar
Out[118]: array([ True, True, True, True, True, True, False, False])
# same operation as above
In [119]: np.less(arr1, scalar)
Out[119]: array([ True, True, True, True, True, True, False, False])
Let's see how this boolean array can be added to a non-boolean array in this case. It is possible due to type coercion
# sample array
In [120]: some_arr = np.array([1, 1, 1, 1, 1, 1, 1, 1])
# addition after type coercion
In [122]: some_arr + (arr1 < scalar)
Out[122]: array([2, 2, 2, 2, 2, 2, 1, 1])
# same output achieved with `numpy.less()`
In [123]: some_arr + np.less(arr1, scalar)
Out[123]: array([2, 2, 2, 2, 2, 2, 1, 1])
So, type coercion happens on the boolean array and then addition is performed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

"Unsparsing" numpy Arrays with Given Masks - python

Related

arranging a array without loops list comp or recursion

Numpy getting row indices of last two elements of each column in mask

classify np.arrays as duplicates

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

What is the '<' doing in this line?: data += dt < b

Categories

Resources