I have a big array of integers and second array of arrays. I want to create a boolean mask for the first array based on data from the second array of arrays. Preferably I would use the numpy.isin but it clearly states in it's documentation:
The values against which to test each value of element. This argument is flattened if it is an array or array_like. See notes for behavior with non-array-like parameters.
Do you maybe know some performant way of doing this instead of list comprehension?
So for example having those arrays:
a = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
b = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
I would like to have result like:
np.array([
[True, True, False, False, False, False, False, False, False, False],
[False, False, True, True, False, False, False, False, False, False],
[False, False, False, False, True, True, False, False, False, False],
[False, False, False, False, False, False, True, True, False, False],
[False, False, False, False, False, False, False, False, True, True]
])
You can use broadcasting to avoid any loop (this is however more memory expensive):
(a == b[...,None]).any(-2)
Output:
array([[ True, True, False, False, False, False, False, False, False, False],
[False, False, True, True, False, False, False, False, False, False],
[False, False, False, False, True, True, False, False, False, False],
[False, False, False, False, False, False, True, True, False, False],
[False, False, False, False, False, False, False, False, True True]])
Try numpy.apply_along_axis to work with numpy.isin:
np.apply_along_axis(lambda x: np.isin(a, x), axis=1, arr=b)
returns
array([[[ True, True, False, False, False, False, False, False, False, False]],
[[False, False, True, True, False, False, False, False, False, False]],
[[False, False, False, False, True, True, False, False, False, False]],
[[False, False, False, False, False, False, True, True, False, False]],
[[False, False, False, False, False, False, False, False, True, True]]])
I will update with an edit comparing the runtime with a list comp
EDIT:
Whelp, I tested the runtime, and wouldn't you know, listcomp is faster
timeit.timeit("[np.isin(a,x) for x in b]",number=10000, globals=globals())
0.37380070000654086
vs
timeit.timeit("np.apply_along_axis(lambda x: np.isin(a, x), axis=1, arr=b) ",number=10000, globals=globals())
0.6078917000122601
the other answer to this post by #mozway is much faster:
timeit.timeit("(a == b[...,None]).any(-2)",number=100, globals=globals())
0.007107900004484691
and should probably be accepted.
This is a bit cheated but ultra fast solution. The cheating is that I sort the seconds matrix before so that I can use binary search.
#nb.njit(parallel=True)
def isin_multi(a, b):
out = np.zeros((b.shape[0], a.shape[0]), dtype=nb.boolean)
for i in nb.prange(a.shape[0]):
for j in nb.prange(b.shape[0]):
index = np.searchsorted(b[j], a[i])
if index >= len(b[j]) or b[j][index] != a[i]:
out[j][i] = False
else:
out[j][i] = True
break
return out
a = np.random.randint(200000, size=200000)
b = np.random.randint(200000, size=(50, 5000))
b = np.sort(b, axis=1)
start = time.perf_counter()
for _ in range(20):
isin_multi(a, b)
print(f"isin_multi {time.perf_counter() - start:.3f} seconds")
start = time.perf_counter()
for _ in range(20):
np.array([np.isin(a, ids) for ids in b])
print(f"comprehension {time.perf_counter() - start:.3f} seconds")
Results:
isin_multi 2.951 seconds.
comprehension 21.093 seconds
Related
I'm trying to use a 2d boolean array (ix) to pick elements from a 1d array (c) to create a 2d array (r). The resulting 2d array is also a boolean array. Each column stands for the unique value in c.
Example:
>>> ix
array([[ True, True, False, False, False, False, False],
[False, False, True, False, False, False, True],
[False, False, False, True, False, False, False]])
>>> c
array([1, 2, 3, 4, 8, 2, 4])
Expected result
1, 2, 3, 4, 8
r = [
[ True, True, False, False, False], # c[ix[0][0]] == 1 and c[ix[0][1]] == 2; it doesn't matter that ix[0][5] (pointing to `2` in `c`) is False as ix[0][1] was already True which is sufficient.
[False, False, True, True, False], # [3]
[False, False, False, True, False] # [4] as ix[2][3] is True
]
Can this be done in a vectorised way?
Let us try:
# unique values
uniques = np.unique(c)
# boolean index into each row
vals = np.tile(c,3)[ix.ravel()]
# search within the unique values
idx = np.searchsorted(uniques, vals)
# pre-populate output
out = np.full((len(ix), len(uniques)), False)
# index into the output:
out[np.repeat(np.arange(len(ix)), ix.sum(1)), idx ] = True
Output:
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, True, False]])
X = np.arange(1, 26).reshape(5, 5)
X[:,1:2] % 2 == 0
The conditions should only be applied to the second column
I want the whole matrix where the condition is true like
[array([[False, True, False, False, False],
[ False, False, False, False, False],
[False, True, False, False, False],
[ False, False, False, False, False],
[False, True, False, False, False]])]
It's giving the error
IndexError: boolean index did not match indexed array along dimension 1; dimension is 5 but corresponding boolean dimension is 1
Is this what you want?
import numpy as np
X = np.arange(1, 26).reshape(5, 5)
X=[X[::] % 2 == 0]
print(X)
Output
[array([[False, True, False, True, False],
[ True, False, True, False, True],
[False, True, False, True, False],
[ True, False, True, False, True],
[False, True, False, True, False]])]
If you want to get the whole matrix where the condition is true. You can simply do this
X % 2 == 0
If you want to get the first column where condition is true then
X[:, 1:2] % 2 ==0
I want to find up/down patterns in a time series. This is what I use for simple up/down:
diff = np.diff(source, n=1)
encoding = np.where(diff > 0, 1, 0)
Is there a way with Numpy to do that for patterns with a given lookback length without a slow loop? For example up/up/up = 0 down/down/down = 1 up/down/up = 2 up/down/down = 3.....
Thank you for your help.
I learned yesterday about np.lib.stride_tricks.as_strided from one of StackOverflow answers similar to this. This is an awesome trick and not that hard to understand as I expected. Now, if you get it, let's define a function called rolling that lists all the patterns to check with:
def rolling(a, window):
shape = (a.size - window + 1, window)
strides = (a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
compare_with = [True, False, True]
bool_arr = np.random.choice([True, False], size=15)
paterns = rolling(bool_arr, len(compare_with))
And after that you can calculate indexes of pattern matches as discussed here
idx = np.where(np.all(paterns == compare_with, axis=1))
Sample run:
bool_arr
array([ True, False, True, False, True, True, False, False, False,
False, False, False, True, True, False])
patterns
array([[ True, False, True],
[False, True, False],
[ True, False, True],
[False, True, True],
[ True, True, False],
[ True, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, True],
[False, True, True],
[ True, True, False]])
idx
(array([ 0, 2, 13], dtype=int64),)
I have a structure similar, to a dictionary of dictionaries:
cont = {
'perm': { 'r': False, 'rw': True}
'prig': { 'sq': False, 'rot':False, 'rq':True}
'anon': {'100': False, '500':True, '99':False; '400':False}
}
From this structure I need to generate a string from keys, and know if the string is False or True based on the values:
Example:
'perm', 'prig', 'anon' will become: 'r,sq,100' or 'rw,sq,100' or 'r,rq,100'.
I need to generate all permutations of the second level keys.
For each string, I need to associate a Boolean value using 'AND' are True or False.
For the example above:
False,False,False -> False; True,False,False -> True,False,True, False -> False;
Here is my take on the assignment. It has a lot more code than the answer from #Ajax1234 but I hope it is easier to understand/read for some.
import itertools
data = {
'test': {'a': False, 'b': True},
'perm': {'r': False, 'rw': True},
'prig': {'sq': False, 'rq': True},
'anon': {'100': False, '500': True},
}
# define the order of the keys
key_order = ('perm', 'prig', 'anon', 'test')
# build a list of iterables for the 'product()' function
iterables = [
sorted(data[k].keys())
for k in key_order]
print(iterables)
print()
print('{:3s} {:20s} {:30s} {}'.format('i', 'elements', 'value of elements', 'AND'))
for i, elements in enumerate(itertools.product(*iterables)):
# get the values for all the elements, in the correct order
values = [
data[k][sub_k]
for k, sub_k in zip(key_order, elements)]
print('{:3d} {:20s} {:30s} {}'.format(
i+1,
','.join(elements),
str(values),
all(values)))
This code gives me the following output:
[['r', 'rw'], ['rq', 'sq'], ['100', '500'], ['a', 'b']]
i elements value of elements AND
1 r,rq,100,a [False, True, False, False] False
2 r,rq,100,b [False, True, False, True] False
3 r,rq,500,a [False, True, True, False] False
4 r,rq,500,b [False, True, True, True] False
5 r,sq,100,a [False, False, False, False] False
6 r,sq,100,b [False, False, False, True] False
7 r,sq,500,a [False, False, True, False] False
8 r,sq,500,b [False, False, True, True] False
9 rw,rq,100,a [True, True, False, False] False
10 rw,rq,100,b [True, True, False, True] False
11 rw,rq,500,a [True, True, True, False] False
12 rw,rq,500,b [True, True, True, True] True
13 rw,sq,100,a [True, False, False, False] False
14 rw,sq,100,b [True, False, False, True] False
15 rw,sq,500,a [True, False, True, False] False
16 rw,sq,500,b [True, False, True, True] False
You can use itertools.product:
import itertools
cont = {'perm': {'r': False, 'rw': True}, 'prig': {'sq': False, 'rot': False, 'rq': True}, 'anon': {'100': False, '500': True, '99': False, '400': False}}
keys = {a:list(b) for a, b in cont.items()}
p = list(itertools.product(*_keys.values()))
result = [[cont[[a for a, b in keys.items() if c in b][0]][c] for c in i] for i in p]
new_result = [any(i) for i in result]
Output:
#result:
[[False, False, False], [False, False, True], [False, False, False], [False, False, False], [False, False, False], [False, False, True], [False, False, False], [False, False, False], [False, True, False], [False, True, True], [False, True, False], [False, True, False], [True, False, False], [True, False, True], [True, False, False], [True, False, False], [True, False, False], [True, False, True], [True, False, False], [True, False, False], [True, True, False], [True, True, True], [True, True, False], [True, True, False]]
#new_result
[False, True, False, False, False, True, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]
#list(map(','.join, prod))
['r,sq,100', 'r,sq,500', 'r,sq,99', 'r,sq,400', 'r,rot,100', 'r,rot,500', 'r,rot,99', 'r,rot,400', 'r,rq,100', 'r,rq,500', 'r,rq,99', 'r,rq,400', 'rw,sq,100', 'rw,sq,500', 'rw,sq,99', 'rw,sq,400', 'rw,rot,100', 'rw,rot,500', 'rw,rot,99', 'rw,rot,400', 'rw,rq,100', 'rw,rq,500', 'rw,rq,99', 'rw,rq,400']
I have a list of True and False answers like this:
[True, True, True, False, False, True, False, False]
[True, True, False, False, True, False, False, True]
[True, False, False, True, False, False, True, True]
[False, False, True, False, False, True, True, True]
[False, True, False, False, True, True, True, False]
[True, False, False, True, True, True, False, False]
[False, False, True, True, True, False, False, True]
[False, True, True, True, False, False, True, False]
I want to give True a value of 1 and False a value of 0 and then convert that overall value to hexadecimal.
How would I go about doing that? Could I look at each value in turn in the list and if it equals 'True' change that value to a 1 and if its 'False' change the value to a 0 or would there be an easier way to change the entire list straight to hex?
EDIT: Here's the full code on Pastebin: http://pastebin.com/1839NKCx
Thanks
lists = [
[True, True, True, False, False, True, False, False],
[True, True, False, False, True, False, False, True],
[True, False, False, True, False, False, True, True],
[False, False, True, False, False, True, True, True],
[False, True, False, False, True, True, True, False],
[True, False, False, True, True, True, False, False],
[False, False, True, True, True, False, False, True],
[False, True, True, True, False, False, True, False],
]
for l in lists:
zero_one = map(int, l) # convert True to 1, False to 0 using `int`
n = int(''.join(map(str, zero_one)), 2) # numbers to strings, join them
# convert to number (base 2)
print('{:02x}'.format(n)) # format them as hex string using `str.format`
output:
e4
c9
93
27
4e
9c
39
72
If you want to combine a series of boolean values into one value (as a bitfield), you could do something like this:
x = [True, False, True, False, True, False ]
v = sum(a<<i for i,a in enumerate(x))
print hex(v)
No need for a two steps process if you use reduce (assuming MSB is at left as usual):
b = [True, True, True, False, False, True, False, False]
val = reduce(lambda byte, bit: byte*2 + bit, b, 0)
print val
print hex(val)
Displaying:
228
0xe4
This should do it:
def bool_list_to_hex(list):
n = 0
for bool in list:
n *= 2
n += int(bool)
return hex(n)
One-liner:
>>> lists = [
[True, True, True, False, False, True, False, False],
[True, True, False, False, True, False, False, True],
[True, False, False, True, False, False, True, True],
[False, False, True, False, False, True, True, True],
[False, True, False, False, True, True, True, False],
[True, False, False, True, True, True, False, False],
[False, False, True, True, True, False, False, True],
[False, True, True, True, False, False, True, False]]
>>> ''.join(hex(int(''.join('1' if boolValue else '0' for boolValue in byteOfBools),2))[2:] for byteOfBools in lists)
'e4c993274e9c3972'
Inner join produces a string of eight zeros and ones.
int(foo,2) turns the string into a number interpreting it as binary.
hex turns it to hex format.
[2:] removes the leading '0x' from the standard hex format
outer join does this to all sublists and, well, joins the results.
All above methods do not work if list of bits exceeds 64.
It could also be discussed whether it is efficient to transtype boolean several times especially string before conversion to hexa.
Here is a proposal, with MSB on th left of bitlist :
from collections import deque
# (lazy) Padd False on MSB side so that bitlist length is multiple of 4.
# Padded length can be zero
msb_padlen = (-len(bitlist))%4
bitlist = deque(bitlist)
bitlist.extendleft([False]*msb_padlen)
# (lazy) Re-pack list of bits into list of 4-bit tuples
pack4s = zip(* [iter(bitlist)]*4)
# Convert each 4-uple into hex digit
hexstring = [hex(sum(a<<i for i,a in enumerate(reversed(pack4))))[-1] for pack4 in pack4s ]
# Compact list of hex digits into a string
hexstring = '0x'+''.join(hexstring)
The 4-bit tuple pack4 is (msb,...,lsb) => it has to be reversed while calculating corresponding integer.
Alternative :
hexstring = [hex(sum(a<<3-i for i,a in enumerate(pack4)))[-1] for pack4 in pack4s ]