Encoding for patterns with Numpy - python

I want to find up/down patterns in a time series. This is what I use for simple up/down:
diff = np.diff(source, n=1)
encoding = np.where(diff > 0, 1, 0)
Is there a way with Numpy to do that for patterns with a given lookback length without a slow loop? For example up/up/up = 0 down/down/down = 1 up/down/up = 2 up/down/down = 3.....
Thank you for your help.

I learned yesterday about np.lib.stride_tricks.as_strided from one of StackOverflow answers similar to this. This is an awesome trick and not that hard to understand as I expected. Now, if you get it, let's define a function called rolling that lists all the patterns to check with:
def rolling(a, window):
shape = (a.size - window + 1, window)
strides = (a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
compare_with = [True, False, True]
bool_arr = np.random.choice([True, False], size=15)
paterns = rolling(bool_arr, len(compare_with))
And after that you can calculate indexes of pattern matches as discussed here
idx = np.where(np.all(paterns == compare_with, axis=1))
Sample run:
bool_arr
array([ True, False, True, False, True, True, False, False, False,
False, False, False, True, True, False])
patterns
array([[ True, False, True],
[False, True, False],
[ True, False, True],
[False, True, True],
[ True, True, False],
[ True, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, True],
[False, True, True],
[ True, True, False]])
idx
(array([ 0, 2, 13], dtype=int64),)

Related

numpy isin for multi-dimmensions

I have a big array of integers and second array of arrays. I want to create a boolean mask for the first array based on data from the second array of arrays. Preferably I would use the numpy.isin but it clearly states in it's documentation:
The values against which to test each value of element. This argument is flattened if it is an array or array_like. See notes for behavior with non-array-like parameters.
Do you maybe know some performant way of doing this instead of list comprehension?
So for example having those arrays:
a = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
b = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
I would like to have result like:
np.array([
[True, True, False, False, False, False, False, False, False, False],
[False, False, True, True, False, False, False, False, False, False],
[False, False, False, False, True, True, False, False, False, False],
[False, False, False, False, False, False, True, True, False, False],
[False, False, False, False, False, False, False, False, True, True]
])
You can use broadcasting to avoid any loop (this is however more memory expensive):
(a == b[...,None]).any(-2)
Output:
array([[ True, True, False, False, False, False, False, False, False, False],
[False, False, True, True, False, False, False, False, False, False],
[False, False, False, False, True, True, False, False, False, False],
[False, False, False, False, False, False, True, True, False, False],
[False, False, False, False, False, False, False, False, True True]])
Try numpy.apply_along_axis to work with numpy.isin:
np.apply_along_axis(lambda x: np.isin(a, x), axis=1, arr=b)
returns
array([[[ True, True, False, False, False, False, False, False, False, False]],
[[False, False, True, True, False, False, False, False, False, False]],
[[False, False, False, False, True, True, False, False, False, False]],
[[False, False, False, False, False, False, True, True, False, False]],
[[False, False, False, False, False, False, False, False, True, True]]])
I will update with an edit comparing the runtime with a list comp
EDIT:
Whelp, I tested the runtime, and wouldn't you know, listcomp is faster
timeit.timeit("[np.isin(a,x) for x in b]",number=10000, globals=globals())
0.37380070000654086
vs
timeit.timeit("np.apply_along_axis(lambda x: np.isin(a, x), axis=1, arr=b) ",number=10000, globals=globals())
0.6078917000122601
the other answer to this post by #mozway is much faster:
timeit.timeit("(a == b[...,None]).any(-2)",number=100, globals=globals())
0.007107900004484691
and should probably be accepted.
This is a bit cheated but ultra fast solution. The cheating is that I sort the seconds matrix before so that I can use binary search.
#nb.njit(parallel=True)
def isin_multi(a, b):
out = np.zeros((b.shape[0], a.shape[0]), dtype=nb.boolean)
for i in nb.prange(a.shape[0]):
for j in nb.prange(b.shape[0]):
index = np.searchsorted(b[j], a[i])
if index >= len(b[j]) or b[j][index] != a[i]:
out[j][i] = False
else:
out[j][i] = True
break
return out
a = np.random.randint(200000, size=200000)
b = np.random.randint(200000, size=(50, 5000))
b = np.sort(b, axis=1)
start = time.perf_counter()
for _ in range(20):
isin_multi(a, b)
print(f"isin_multi {time.perf_counter() - start:.3f} seconds")
start = time.perf_counter()
for _ in range(20):
np.array([np.isin(a, ids) for ids in b])
print(f"comprehension {time.perf_counter() - start:.3f} seconds")
Results:
isin_multi 2.951 seconds.
comprehension 21.093 seconds

2d index to select elements from 1d array

I'm trying to use a 2d boolean array (ix) to pick elements from a 1d array (c) to create a 2d array (r). The resulting 2d array is also a boolean array. Each column stands for the unique value in c.
Example:
>>> ix
array([[ True, True, False, False, False, False, False],
[False, False, True, False, False, False, True],
[False, False, False, True, False, False, False]])
>>> c
array([1, 2, 3, 4, 8, 2, 4])
Expected result
1, 2, 3, 4, 8
r = [
[ True, True, False, False, False], # c[ix[0][0]] == 1 and c[ix[0][1]] == 2; it doesn't matter that ix[0][5] (pointing to `2` in `c`) is False as ix[0][1] was already True which is sufficient.
[False, False, True, True, False], # [3]
[False, False, False, True, False] # [4] as ix[2][3] is True
]
Can this be done in a vectorised way?
Let us try:
# unique values
uniques = np.unique(c)
# boolean index into each row
vals = np.tile(c,3)[ix.ravel()]
# search within the unique values
idx = np.searchsorted(uniques, vals)
# pre-populate output
out = np.full((len(ix), len(uniques)), False)
# index into the output:
out[np.repeat(np.arange(len(ix)), ix.sum(1)), idx ] = True
Output:
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, True, False]])

numpy slicing the matrix based condition on the column python

X = np.arange(1, 26).reshape(5, 5)
X[:,1:2] % 2 == 0
The conditions should only be applied to the second column
I want the whole matrix where the condition is true like
[array([[False, True, False, False, False],
[ False, False, False, False, False],
[False, True, False, False, False],
[ False, False, False, False, False],
[False, True, False, False, False]])]
It's giving the error
IndexError: boolean index did not match indexed array along dimension 1; dimension is 5 but corresponding boolean dimension is 1
Is this what you want?
import numpy as np
X = np.arange(1, 26).reshape(5, 5)
X=[X[::] % 2 == 0]
print(X)
Output
[array([[False, True, False, True, False],
[ True, False, True, False, True],
[False, True, False, True, False],
[ True, False, True, False, True],
[False, True, False, True, False]])]
If you want to get the whole matrix where the condition is true. You can simply do this
X % 2 == 0
If you want to get the first column where condition is true then
X[:, 1:2] % 2 ==0

Generate a string from a dictionary keys (combinations) and assing a boolean value based on values

I have a structure similar, to a dictionary of dictionaries:
cont = {
'perm': { 'r': False, 'rw': True}
'prig': { 'sq': False, 'rot':False, 'rq':True}
'anon': {'100': False, '500':True, '99':False; '400':False}
}
From this structure I need to generate a string from keys, and know if the string is False or True based on the values:
Example:
'perm', 'prig', 'anon' will become: 'r,sq,100' or 'rw,sq,100' or 'r,rq,100'.
I need to generate all permutations of the second level keys.
For each string, I need to associate a Boolean value using 'AND' are True or False.
For the example above:
False,False,False -> False; True,False,False -> True,False,True, False -> False;
Here is my take on the assignment. It has a lot more code than the answer from #Ajax1234 but I hope it is easier to understand/read for some.
import itertools
data = {
'test': {'a': False, 'b': True},
'perm': {'r': False, 'rw': True},
'prig': {'sq': False, 'rq': True},
'anon': {'100': False, '500': True},
}
# define the order of the keys
key_order = ('perm', 'prig', 'anon', 'test')
# build a list of iterables for the 'product()' function
iterables = [
sorted(data[k].keys())
for k in key_order]
print(iterables)
print()
print('{:3s} {:20s} {:30s} {}'.format('i', 'elements', 'value of elements', 'AND'))
for i, elements in enumerate(itertools.product(*iterables)):
# get the values for all the elements, in the correct order
values = [
data[k][sub_k]
for k, sub_k in zip(key_order, elements)]
print('{:3d} {:20s} {:30s} {}'.format(
i+1,
','.join(elements),
str(values),
all(values)))
This code gives me the following output:
[['r', 'rw'], ['rq', 'sq'], ['100', '500'], ['a', 'b']]
i elements value of elements AND
1 r,rq,100,a [False, True, False, False] False
2 r,rq,100,b [False, True, False, True] False
3 r,rq,500,a [False, True, True, False] False
4 r,rq,500,b [False, True, True, True] False
5 r,sq,100,a [False, False, False, False] False
6 r,sq,100,b [False, False, False, True] False
7 r,sq,500,a [False, False, True, False] False
8 r,sq,500,b [False, False, True, True] False
9 rw,rq,100,a [True, True, False, False] False
10 rw,rq,100,b [True, True, False, True] False
11 rw,rq,500,a [True, True, True, False] False
12 rw,rq,500,b [True, True, True, True] True
13 rw,sq,100,a [True, False, False, False] False
14 rw,sq,100,b [True, False, False, True] False
15 rw,sq,500,a [True, False, True, False] False
16 rw,sq,500,b [True, False, True, True] False
You can use itertools.product:
import itertools
cont = {'perm': {'r': False, 'rw': True}, 'prig': {'sq': False, 'rot': False, 'rq': True}, 'anon': {'100': False, '500': True, '99': False, '400': False}}
keys = {a:list(b) for a, b in cont.items()}
p = list(itertools.product(*_keys.values()))
result = [[cont[[a for a, b in keys.items() if c in b][0]][c] for c in i] for i in p]
new_result = [any(i) for i in result]
Output:
#result:
[[False, False, False], [False, False, True], [False, False, False], [False, False, False], [False, False, False], [False, False, True], [False, False, False], [False, False, False], [False, True, False], [False, True, True], [False, True, False], [False, True, False], [True, False, False], [True, False, True], [True, False, False], [True, False, False], [True, False, False], [True, False, True], [True, False, False], [True, False, False], [True, True, False], [True, True, True], [True, True, False], [True, True, False]]
#new_result
[False, True, False, False, False, True, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]
#list(map(','.join, prod))
['r,sq,100', 'r,sq,500', 'r,sq,99', 'r,sq,400', 'r,rot,100', 'r,rot,500', 'r,rot,99', 'r,rot,400', 'r,rq,100', 'r,rq,500', 'r,rq,99', 'r,rq,400', 'rw,sq,100', 'rw,sq,500', 'rw,sq,99', 'rw,sq,400', 'rw,rot,100', 'rw,rot,500', 'rw,rot,99', 'rw,rot,400', 'rw,rq,100', 'rw,rq,500', 'rw,rq,99', 'rw,rq,400']

Generate all length-n permutations of True/False?

This problem came up while trying to write code for a truth-table generating function.
How can I generate a list of lists of all length-n permutations of True and False? In other words, given a list of elements [True, False], how can I generate all permutations of all possible length-n combinations of those elements?
For example:
n=2 length-2 permutations are:
[[True, True], [True, False], [False, True], [False, False]]
n=3 the length-3 permutations are:
[[False, False, False],[False,False,True],
[False,True,False],[False,True,True],
[True,False,False],[True,False,True],[True,True,False],[True,True,True]]
I know there's 2^n lists in this list. I also have considered using itertools.product, but that only seems to give permutations of a specific combination. In this case, I think I want to generate permutations of ALL combinations of a length-n list of true/false.
Use itertools.product:
>>> import itertools
>>> l = [False, True]
>>> list(itertools.product(l, repeat=3))
[(False, False, False), (False, False, True), (False, True, False), (False, True, True), (True, False, False), (True, False, True), (True, True, False), (True, True, True)]
>>>
And if you want the to change the tuples inside the list to sublists, try a list comprehension:
>>> import itertools
>>> l = [False, True]
>>> [list(i) for i in itertools.product(l, repeat=3)]
[[False, False, False], [False, False, True], [False, True, False], [False, True, True], [True, False, False], [True, False, True], [True, True, False], [True, True, True]]
>>>
It's relatively easy if you consider the values to be bits instead. Like for the n = 3 case, see it as a value containing three bits.
Loop (using integers) from 0 to 2ⁿ - 1 (inclusive) and print all bits in each value (with 0 being False and 1 being True). Then you will have all permutations.
Of course, it's not a very Pythonic solution, but it's generic.
Try itertools.product with the repeat argument:
In [1]: from itertools import product
In [2]: product([True, False], repeat=2)
Out[2]: <itertools.product at 0x1c7eff51b40>
As you can see above, it returns an iterable, so wrap it in list():
In [3]: list(product([True, False], repeat=2))
Out[3]: [(True, True), (True, False), (False, True), (False, False)]
In [4]: list(product([True, False], repeat=3))
Out[4]:
[(True, True, True),
(True, True, False),
(True, False, True),
(True, False, False),
(False, True, True),
(False, True, False),
(False, False, True),
(False, False, False)]
In [5]: list(product([True, False], repeat=5))
Out[5]:
[(True, True, True, True, True),
(True, True, True, True, False),
(True, True, True, False, True),
(True, True, True, False, False),
(True, True, False, True, True),
...
It also returns a list of tuples instead of a list of lists, but that should be fine for most use cases and can be solved very easily with a list comprehension if lists are really needed:
[list(tup) for tup in mylist]
And if you want list of lists, not list of tuples, start with U9-Forward's answer:
import itertools
l=[False,True]
ll=list(itertools.product(l,repeat=3))
and continue:
lll=[]
for each in ll:
lll.append([EACH for EACH in each])
lll will be a list of lists, instead of tuples.
Much better way, following comments:
[list(elem) for elem in lll]
Thanks to Kevin.
It is not efficient solution but you can use:
def permuteBool(n, l):
... if n==0:
... return l
... return [permuteBool(n-1, l+[True])] + [permuteBool(n-1, l+[False])]
...
>>> permuteBool(3, [])
[[[[True, True, True], [True, True, False]], [[True, False, True], [True, False, False]]], [[[False, True, True], [False, True, False]], [[False, False, True], [False, False, False]]]]
EDIT: Looks like I didn't check the output before posting my answer. It'll stay as it is as the right way would be a duplicate answer of the correct answer.
Use this simple code:
>>> import itertools # library of magic
>>> length = 3 # length of your wanted permutations
>>> result = itertools.combinations( # combinations based on position
... [*[True, False] * length], # generates the items needed
... length # length of the wanted results
... )
>>> print([list(r) for in result])
[[False, False, False], [False, False, True], [False, True, False], [False, True, True], [True, False, False], [True, False, True], [True, True, False], [True, True, True]]
Here's a simple recursive list program
def list_exponential(n,set1=[]):
if n == 0:
print(set1)
else:
n-=1
list_exponential(n, [False]+set1)
list_exponential(n, [True]+set1)
list_exponential(5)
Sample output
$ python3 exponential.py 5
[False, False, False, False, False]
[True, False, False, False, False]
[False, True, False, False, False]
[True, True, False, False, False]
[False, False, True, False, False]
[True, False, True, False, False]
[False, True, True, False, False]
[True, True, True, False, False]
[False, False, False, True, False]
[True, False, False, True, False]
[False, True, False, True, False]
[True, True, False, True, False]
[False, False, True, True, False]
[True, False, True, True, False]
[False, True, True, True, False]
[True, True, True, True, False]
[False, False, False, False, True]
[True, False, False, False, True]
[False, True, False, False, True]
[True, True, False, False, True]
[False, False, True, False, True]
[True, False, True, False, True]
[False, True, True, False, True]
[True, True, True, False, True]
[False, False, False, True, True]
[True, False, False, True, True]
[False, True, False, True, True]
[True, True, False, True, True]
[False, False, True, True, True]
[True, False, True, True, True]
[False, True, True, True, True]
[True, True, True, True, True]

Categories