masked_scatter but rowwise?

masked_scatter but rowwise? - python

Assuming a mask as follows:
mask = torch.tensor([
[True, True, False, True, False],
[True, False, True, True, True ],
])
I would like to number the True values with sequential values in each row separately. I don't care what's in the False spots, so 0 for simplicity. Thus the desired result is
tensor([[0, 1, 0, 2, 0], # 0 1 _ 2 _
[0, 0, 1, 2, 3]]) # 0 _ 1 2 3
I hoped this would work:
replacements = torch.arange(mask.size(1)).expand(mask.size())
target = torch.zeros(mask.size(), dtype=int)
target.masked_scatter(mask, replacements)
Unfortunately, masked_scatter ignores the shape of replacements, so this code results in:
tensor([[0, 1, 0, 2, 0], # 0 1 _ 2 _
[3, 0, 4, 0, 1]]) # 3 _ 4 0 1
What would I need to do instead?

I would try something with torch.cumsum: torch.cumsum(mask,dim=1) -1) * mask
The complete example
import torch
mask = torch.tensor([
[True, True, False, True, False],
[True, False, True, True, True ],
])
result=torch.cumsum(mask,dim=1) -1) * mask
print(result)
That would print:
tensor([[0, 1, 0, 2, 0],
[0, 0, 1, 2, 3]])

Related

How to efficiently filter/create mask of numpy.array based on list of tuples

I try to create mask of numpy.array based on list of tuples. Here is my solution that produces expected result:
import numpy as np
filter_vals = [(1, 1, 0), (0, 0, 1), (0, 1, 0)]
data = np.array([
[[0, 0, 0], [1, 1, 0], [1, 1, 1]],
[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
[[1, 1, 0], [0, 1, 1], [1, 0, 1]],
])
mask = np.array([], dtype=bool)
for f_val in filter_vals:
if mask.size == 0:
mask = (data == f_val).all(-1)
else:
mask = mask | (data == f_val).all(-1)
Output/mask:
array([[False, True, False],
[False, True, True],
[ True, False, False]]
Problem is that with bigger data array and increasing number of tuples in filter_vals, it is getting slower.
It there any better solution? I tried to use np.isin(data, filter_vals), but it does not provide result I need.

A classical approach using broadcasting would be:
*A, B = data.shape
(data.reshape((-1,B)) == np.array(filter_vals)[:,None]).all(-1).any(0).reshape(A)
This will however be memory expensive. So applicability really depends on your use case.
output:
array([[False, True, False],
[False, True, True],
[ True, False, False]])

Efficient selection of values in numpy

I'm trying to find elements of one DataFrame (df_other) which match a column in another DataFrame (df). In other words, I'd like to know where the values in df['a'] match the values in df_other['a'] for each row in df['a'].
An example might be easier to explain the expected result:
>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> df = pd.DataFrame({'a': ['x', 'y', 'z']})
>>> df
a
0 x
1 y
2 z
>>> df_other = pd.DataFrame({'a': ['x', 'x', 'y', 'z', 'z2'], 'c': [1, 2, 3, 4, 5]})
>>> df_other
a c
0 x 1
1 x 2
2 y 3
3 z 4
4 z2 5
>>>
>>>
>>> u = df_other['c'].unique()
>>> u
array([1, 2, 3, 4, 5])
>>> bm = np.ones((len(df), len(u)), dtype=bool)
>>> bm
array([[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True]])
should yield a bitmap of
[
[1, 1, 0, 0, 0], # [1, 2] are df_other['c'] where df_other['a'] == df['a']
[0, 0, 1, 0, 0], # [3] matches
[0, 0, 0, 1, 0], # [4] matches
]
I'm looking for a fast numpy implementation that doesn't iterate through all rows (which is my current solution):
>>> df_other['a'] == df.loc[0, 'a']
0 True
1 True
2 False
3 False
4 False
Name: a, dtype: bool
>>>
>>>
>>> df_other['a'] == df.loc[1, 'a']
0 False
1 False
2 True
3 False
4 False
Name: a, dtype: bool
>>> df_other['a'] == df.loc[2, 'a']
0 False
1 False
2 False
3 True
4 False
Name: a, dtype: bool
Note: in the actual production code, there are many more column conditions ((df['a'] == df_other['a']) & (df['b'] == df_other['b'] & ...), but they are generally less than the number of rows in df, so I wouldn't mind a solution that loops over the conditions (and subsequently sets values in bm to false).
Also, the bitmap should have the shape of (len(df), len(df_other['c'].unique)).

numpy broadcasting is so useful here:
bm = df_other.values[:, 0] == df.values
Output:
>>> bm
array([[ True, True, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False]])
If you need it as ints:
>>> bm.astype(int)
array([[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]])

Another way to do this using pandas methods are as follows:
pd.crosstab(df_other['a'], df_other['c']).reindex(df['a']).to_numpy(dtype=int)
Output:
array([[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]])

How to remove rows while iterating in numpy

How to remove rows while iterating in numpy, as Java does:
Iterator < Message > itMsg = messages.iterator();
while (itMsg.hasNext()) {
Message m = itMsg.next();
if (m != null) {
itMsg.remove();
continue;
}
}
Here is my pseudo code. Remove the rows whose entries are all 0 and 1 while iterating.
#! /usr/bin/env python
import numpy as np
M = np.array(
[
[0, 1 ,0 ,0],
[0, 0, 1, 0],
[0, 0, 0, 0], #remove this row whose entries are all 0
[1, 1, 1, 1] #remove this row whose entries are all 1
])
it = np.nditer(M, order="K", op_flags=['readwrite'])
while not it.finished :
row = it.next() #how to get a row?
sumRow = np.sum(row)
if sumRow==4 or sumRow==0 : #remove rows whose entries are all 0 and 1 as well
#M = np.delete(M, row, axis =0)
it.remove_axis(i) #how to get i?

Writing good numpy code requires you to think in a vectorized fashion. Not every problem has a good vectorization, but for those that do, you can write clean and fast code pretty easily. In this case, we can decide on what rows we want to remove/keep and then use that to index into your array:
>>> M
array([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 0],
[1, 1, 1, 1]])
>>> M[~((M == 0).all(1) | (M == 1).all(1))]
array([[0, 1, 0, 0],
[0, 0, 1, 0]])
Step by step, we can compare M to something to make a boolean array:
>>> M == 0
array([[ True, False, True, True],
[ True, True, False, True],
[ True, True, True, True],
[False, False, False, False]], dtype=bool)
We can use all to see if a row or column is all true:
>>> (M == 0).all(1)
array([False, False, True, False], dtype=bool)
We can use | to do an or operation:
>>> (M == 0).all(1) | (M == 1).all(1)
array([False, False, True, True], dtype=bool)
We can use this to select rows:
>>> M[(M == 0).all(1) | (M == 1).all(1)]
array([[0, 0, 0, 0],
[1, 1, 1, 1]])
But since these are the rows we want to throw away, we can use ~ (NOT) to flip False and True:
>>> M[~((M == 0).all(1) | (M == 1).all(1))]
array([[0, 1, 0, 0],
[0, 0, 1, 0]])
If instead we wanted to keep columns which weren't all 1 or all 0, we simply need to change what axis we're working on:
>>> M
array([[1, 1, 0, 1],
[1, 0, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 1]])
>>> M[:, ~((M == 0).all(axis=0) | (M == 1).all(axis=0))]
array([[1, 0],
[0, 1],
[0, 0],
[1, 1]])

How can you turn an index array into a mask array in Numpy?

Is it possible to convert an array of indices to an array of ones and zeros, given the range?
i.e. [2,3] -> [0, 0, 1, 1, 0], in range of 5
I'm trying to automate something like this:
>>> index_array = np.arange(200,300)
array([200, 201, ... , 299])
>>> mask_array = ??? # some function of index_array and 500
array([0, 0, 0, ..., 1, 1, 1, ... , 0, 0, 0])
>>> train(data[mask_array]) # trains with 200~299
>>> predict(data[~mask_array]) # predicts with 0~199, 300~499

Here's one way:
In [1]: index_array = np.array([3, 4, 7, 9])
In [2]: n = 15
In [3]: mask_array = np.zeros(n, dtype=int)
In [4]: mask_array[index_array] = 1
In [5]: mask_array
Out[5]: array([0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0])
If the mask is always a range, you can eliminate index_array, and assign 1 to a slice:
In [6]: mask_array = np.zeros(n, dtype=int)
In [7]: mask_array[5:10] = 1
In [8]: mask_array
Out[8]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
If you want an array of boolean values instead of integers, change the dtype of mask_array when it is created:
In [11]: mask_array = np.zeros(n, dtype=bool)
In [12]: mask_array
Out[12]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False], dtype=bool)
In [13]: mask_array[5:10] = True
In [14]: mask_array
Out[14]:
array([False, False, False, False, False, True, True, True, True,
True, False, False, False, False, False], dtype=bool)

For a single dimension, try:
n = (15,)
index_array = [2, 5, 7]
mask_array = numpy.zeros(n)
mask_array[index_array] = 1
For more than one dimension, convert your n-dimensional indices into one-dimensional ones, then use ravel:
n = (15, 15)
index_array = [[1, 4, 6], [10, 11, 2]] # you may need to transpose your indices!
mask_array = numpy.zeros(n)
flat_index_array = np.ravel_multi_index(
index_array,
mask_array.shape)
numpy.ravel(mask_array)[flat_index_array] = 1

There's a nice trick to do this as a one-liner, too - use the numpy.in1d and numpy.arange functions like this (the final line is the key part):
>>> x = np.linspace(-2, 2, 10)
>>> y = x**2 - 1
>>> idxs = np.where(y<0)
>>> np.in1d(np.arange(len(x)), idxs)
array([False, False, False, True, True, True, True, False, False, False], dtype=bool)
The downside of this approach is that it's ~10-100x slower than the appropch Warren Weckesser gave... but it's a one-liner, which may or may not be what you're looking for.

As requested, here it is in an answer. The code:
[x in index_array for x in range(500)]
will give you a mask like you asked for, but it will use Bools instead of 0's and 1's.

Select elements of numpy array via boolean mask array

I have a boolean mask array a of length n:
a = np.array([True, True, True, False, False])
I have a 2d array with n columns:
b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
I want a new array which contains only the "True"-values, for example
c = ([[1,2,3], [1,2,3]])
c = a * b does not work because it contains also "0" for the false columns what I don't want
c = np.delete(b, a, 1) does not work
Any suggestions?

You probably want something like this:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([[1,2,3,4,5], [1,2,3,4,5]])
>>> b[:,a]
array([[1, 2, 3],
[1, 2, 3]])
Note that for this kind of indexing to work, it needs to be an ndarray, like you were using, not a list, or it'll interpret the False and True as 0 and 1 and give you those columns:
>>> b[:,[True, True, True, False, False]]
array([[2, 2, 2, 1, 1],
[2, 2, 2, 1, 1]])

You can use numpy.ma module and use np.ma.masked_array function to do so.
>>> x = np.array([1, 2, 3, -1, 5])
>>> mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0])
masked_array(data=[1, 2, 3, --, 5], mask=[False, False, False, True, False], fill_value=999999)

Hope I'm not too late! Here's your array:
X = np.array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
Let's create an array of zeros of the same shape as X:
mask = np.zeros_like(X)
# array([[0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0]])
Then, specify the columns that you want to mask out or hide with a 1. In this case, we want the last 2 columns to be masked out.
mask[:, -2:] = 1
# array([[0, 0, 0, 1, 1],
# [0, 0, 0, 1, 1]])
Create a masked array:
X_masked = np.ma.masked_array(X, mask)
# masked_array(data=[[1, 2, 3, --, --],
# [1, 2, 3, --, --]],
# mask=[[False, False, False, True, True],
# [False, False, False, True, True]],
# fill_value=999999)
We can then do whatever we want with X_masked, like taking the sum of each column (along axis=0):
np.sum(X_masked, axis=0)
# masked_array(data=[2, 4, 6, --, --],
# mask=[False, False],
# fill_value=1e+20)
Great thing about this is that X_masked is just a view of X, not a copy.
X_masked.base is X
# True

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

masked_scatter but rowwise? - python

Related

How to efficiently filter/create mask of numpy.array based on list of tuples

Efficient selection of values in numpy

How to remove rows while iterating in numpy

How can you turn an index array into a mask array in Numpy?

Select elements of numpy array via boolean mask array

Categories

Resources