I have a large array, that looks like something below:
np.random.seed(42)
arr = np.random.permutation(np.array([
(1,1,2,2,2,2,3,3,4,4,4),
(8,9,3,4,7,9,1,9,3,4,50000)
]).T)
It isn't sorted, the rows of this array are unique, I also know the bounds for the values in both columns, they are [0, n] and [0, k]. So the maximum possible size of the array is (n+1)*(k+1), but the actual size is closer to log of that.
I need to search the array by both columns to find such row that arr[row,:] = (i,j), and return -1 when (i,j) is absent in the array. The naive implementation for such function is:
def get(arr, i, j):
cond = (arr[:,0] == i) & (arr[:,1] == j)
if np.any(cond):
return np.where(cond)[0][0]
else:
return -1
Unfortunately, since in my case arr is very large (>90M rows), this is very inefficient, especially since I would need to call get() multiple times.
Alternatively I tried translating this to a dict with (i,j) keys, such that
index[(i,j)] = row
that can be accessed by:
def get(index, i, j):
try:
retuen index[(i,j)]
except KeyError:
return -1
This works (and is much faster when tested on smaller data than I have), but again, creating the dict on-the-fly by
index = {}
for row in range(arr.shape[0]):
i,j = arr[row, :]
index[(i,j)] = row
takes huge amount of time and eats lots of RAM in my case. I was also thinking of first sorting arr and then using something like np.searchsorted, but this didn't lead me anywhere.
So what I need is a fast function get(arr, i, j) that returns
>>> get(arr, 2, 3)
4
>>> get(arr, 4, 100)
-1
A partial solution would be:
In [36]: arr
Out[36]:
array([[ 2, 9],
[ 1, 8],
[ 4, 4],
[ 4, 50000],
[ 2, 3],
[ 1, 9],
[ 4, 3],
[ 2, 7],
[ 3, 9],
[ 2, 4],
[ 3, 1]])
In [37]: (i,j) = (2, 3)
# we can use `assume_unique=True` which can speed up the calculation
In [38]: np.all(np.isin(arr, [i,j], assume_unique=True), axis=1, keepdims=True)
Out[38]:
array([[False],
[False],
[False],
[False],
[ True],
[False],
[False],
[False],
[False],
[False],
[False]])
# we can use `assume_unique=True` which can speed up the calculation
In [39]: mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1, keepdims=True)
In [40]: np.argwhere(mask)
Out[40]: array([[4, 0]])
If you need the final result as a scalar, then don't use keepdims argument and cast the array to a scalar like:
# we can use `assume_unique=True` which can speed up the calculation
In [41]: mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1)
In [42]: np.argwhere(mask)
Out[42]: array([[4]])
In [43]: np.asscalar(np.argwhere(mask))
Out[43]: 4
Solution
Python offers a set type to store unique values, but sadly no ordered version of a set. But you can use the ordered-set package.
Create an OrderedSet from the data. Fortunately, this only needs to be done once:
import ordered_set
o = ordered_set.OrderedSet(map(tuple, arr))
def ordered_get(o, i, j):
try:
return o.index((i,j))
except KeyError:
return -1
Runtime
Finding the index of a value should be O(1), according to the documentation:
In [46]: %timeit get(arr, 2, 3)
10.6 µs ± 39 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [47]: %timeit ordered_get(o, 2, 3)
1.16 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [48]: %timeit ordered_get(o, 2, 300)
1.05 µs ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Testing this for a much larger array:
a2 = random.randint(10000, size=1000000).reshape(-1,2)
o2 = ordered_set.OrderedSet()
for t in map(tuple, a2):
o2.add(t)
In [65]: %timeit get(a2, 2, 3)
1.05 ms ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [66]: %timeit ordered_get(o2, 2, 3)
1.03 µs ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [67]: %timeit ordered_get(o2, 2, 30000)
1.06 µs ± 28.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Looks like it indeed is O(1) runtime.
def get_agn(arr, i, j):
idx = np.flatnonzero((arr[:,0] == j) & (arr[:,1] == j))
return -1 if idx.size == 0 else idx[0]
Also, just in case you are thinking about the ordered_set solution, here is a better one (however, in both cases see timing tests below):
d = { (i, j): k for k, (i, j) in enumerate(arr)}
def unordered_get(d, i, j):
return d.get((i, j), -1)
and it's "full" equivalent (that builds the dictionary inside the function):
def unordered_get_full(arr, i, j):
d = { (i, j): k for k, (i, j) in enumerate(arr)}
return d.get((i, j), -1)
Timing tests:
First, define #kmario23 function:
def get_kmario23(arr, i, j):
# fundamentally, kmario23's code re-aranged to return scalars
# and -1 when (i, j) not found:
mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1)
idx = np.argwhere(mask)[0]
return -1 if idx.size == 0 else np.asscalar(idx[0])
Second, define #ChristophTerasa function (original and the full version):
import ordered_set
o = ordered_set.OrderedSet(map(tuple, arr))
def ordered_get(o, i, j):
try:
return o.index((i,j))
except KeyError:
return -1
def ordered_get_full(arr, i, j):
# "Full" version that builds ordered set inside the function
o = ordered_set.OrderedSet(map(tuple, arr))
try:
return o.index((i,j))
except KeyError:
return -1
Generate some large data:
arr = np.random.randint(1, 2000, 200000).reshape((-1, 2))
Timing results:
In [55]: %timeit get_agn(arr, *arr[-1])
149 µs ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [56]: %timeit get_kmario23(arr, *arr[-1])
1.42 ms ± 17.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [57]: %timeit get_kmario23(arr, *arr[0])
1.2 ms ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Ordered set tests:
In [80]: o = ordered_set.OrderedSet(map(tuple, arr))
In [81]: %timeit ordered_get(o, *arr[-1])
1.74 µs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [82]: %timeit ordered_get_full(arr, *arr[-1]) # include ordered set creation time
166 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Unordered dictionary tests:
In [83]: d = { (i, j): k for k, (i, j) in enumerate(arr)}
In [84]: %timeit unordered_get(d, *arr[-1])
1.18 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [85]: %timeit unordered_get_full(arr, *arr[-1])
102 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So, when taking into account the time needed to create either ordered set or unordered dictionary, these methods are quite slow. You must plan running several hundred searches on the same data for these methods to make sense. Even then, there is no need to use ordered_set package - regular dictionaries are faster.
It seems I was over-thinking this problem, there is easy solution. I was considering either filtering and subsetting the array or using dict index[(i,j)] = row. Filtering and subsetting was slow (O(n) when searching), while using dict was fast (O(1) access time), but creating the dict was slow and memory intensive.
The simple solution for this problem is using nested dicts.
index = {}
for row in range(arr.shape[0]):
i,j = arr[row, :]
try:
index[i][j] = row
except KeyError:
index[i] = {}
index[i][j] = row
def get(index, i, j):
try:
return index[i][j]
except KeyError:
return -1
Alternatively, instead of dict on higher level, I could use index = defaultdict(dict), what would allow for assigning index[i][j] = row
directly, without the try ... except conditions, but then the defaultdict(dict) object would create empty {} when queried for nonexistent i by the get(index, i, j) function, so it would be expanding the index unnecessarily.
The access time is O(1) for the first dict and O(1) for the nested dicts, so basically it's O(1). The upper level dict has manageable size (bounded by n < n*k), while the nested dicts are small (the nesting order is chosen based on the fact that in my case k << n). Building the nested dict is also very fast, even for >90M rows in the array. Moreover, it can be easily extended to more complicated cases.
Related
I have two 3d arrays of shape (N, M, D) and I want to perform an efficient row wise (over N) matrix multiplication such that the resulting array is of shape (N, D, D).
An inefficient code sample showing what I try to achieve is given by:
N = 100
M = 10
D = 50
arr1 = np.random.normal(size=(N, M, D))
arr2 = np.random.normal(size=(N, M, D))
result = []
for i in range(N):
result.append(arr1[i].T # arr2[i])
result = np.array(result)
However, this application is quite slow for large N due to the loop. Is there a more efficient way to achieve this computation without using loops? I already tried to find a solution via tensordot and einsum to no avail.
The vectorization solution is to swap the last two axes of arr1:
>>> N, M, D = 2, 3, 4
>>> np.random.seed(0)
>>> arr1 = np.random.normal(size=(N, M, D))
>>> arr2 = np.random.normal(size=(N, M, D))
>>> arr1.transpose(0, 2, 1) # arr2
array([[[ 6.95815626, 0.38299107, 0.40600482, 0.35990016],
[-0.95421604, -2.83125879, -0.2759683 , -0.38027618],
[ 3.54989101, -0.31274318, 0.14188485, 0.19860495],
[ 3.56319723, -6.36209602, -0.42687188, -0.24932248]],
[[ 0.67081341, -0.08816343, 0.35430089, 0.69962394],
[ 0.0316968 , 0.15129449, -0.51592291, 0.07118177],
[-0.22274906, -0.28955683, -1.78905988, 1.1486345 ],
[ 1.68432706, 1.93915798, 2.25785798, -2.34404577]]])
A simple benchmark for the super N:
In [225]: arr1.shape
Out[225]: (100000, 10, 50)
In [226]: %%timeit
...: result = []
...: for i in range(N):
...: result.append(arr1[i].T # arr2[i])
...: result = np.array(result)
...:
...:
12.4 s ± 224 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [227]: %timeit arr1.transpose(0, 2, 1) # arr2
843 ms ± 7.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Use pre allocated lists and do not perform data conversion after the loop ends. The performance here is not much worse than vectorization, which means that the most overhead comes from the final data conversion:
In [375]: %%timeit
...: result = [None] * N
...: for i in range(N):
...: result[i] = arr1[i].T # arr2[i]
...: # result = np.array(result)
...:
...:
1.22 s ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Performance of loop solution with data conversion:
In [376]: %%timeit
...: result = [None] * N
...: for i in range(N):
...: result[i] = arr1[i].T # arr2[i]
...: result = np.array(result)
...:
...:
11.3 s ± 179 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Another refers to the answer of #9769953 and makes additional optimization test. To my surprise, its performance is almost the same as the vectorization solution:
In [378]: %%timeit
...: result = np.empty_like(arr1, shape=(N, D, D))
...: for res, ar1, ar2 in zip(result, arr1.transpose(0, 2, 1), arr2):
...: np.matmul(ar1, ar2, out=res)
...:
843 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For interest, I wondered about the loop overhead, which I guess is minimal compared to the matrix multiplication; but in particular, the loop overhead is minimal to the potential reallocation of the list memory, which with N = 10000 could be significant.
Using a pre-allocated array instead of a list, I compared the loop result and the solution provided by Mechanic Pig, and achieved the following results on my machine:
In [10]: %timeit result1 = arr1.transpose(0, 2, 1) # arr2
33.7 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
versus
In [14]: %%timeit
...: result = np.empty_like(arr1, shape=(N, D, D))
...: for i in range(N):
...: result[i, ...] = arr1[i].T # arr2[i]
...:
...:
48.5 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The pure NumPy solution is still faster, so that's good, but only by a factor of about 1.5. Not too bad. Depending on the needs, the loop may be clearer as to what it intents (and easier to modify, in case there's a need for an if-statement or other shenigans).
And naturally, a simple comment above the faster solution can easily point out what it actually replaces.
Following the comments to this answer by Mechanic Pig, I've added below the timing results of a loop without preallocating an array (but with a preallocated list) and without conversion to a NumPy array. Mainly so the results are compared for the same machine:
In [11]: %%timeit
...: result = [None] * N
...: for i in range(N):
...: result.append(arr1[i].T # arr2[i])
...:
49.5 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
So interestingly, this results, without conversion, is (a tiny bit) slower than the one with a pre-allocated array and directly assigning into the array.
Well, I have a simple problem that is giving me a headache, basically I have two two-dimensional arrays, full of [x,y] coordinates, and I want to compare the first with the second and generate a third array that contains all the elements of the first array that doesn't appear in the second. It's simple but I couldn't make it work at all. The size varies a lot, the first array can have between a thousand and 2 million coordinates, while the first array has between 1 and a thousand.
This operation will occur many times and the larger the first array, the more times it will occur
sample:
arr1 = np.array([[0, 3], [0, 4], [1, 3], [1, 7], ])
arr2 = np.array([[0, 3], [1, 7]])
result = np.array([[0, 4], [1, 3]])
In Depth: Basically I have a binary image with variable resolution, it is composed of 0 and 1 (255) and I analyze each pixel individually (with an algorithm that is already optimized), but (on purpose) every time this function is executed it analyzes only a fraction of the pixels, and when it is finished it gives me back all the coordinates of these pixels. The problem is that when it executes it runs the following code:
ones = np.argwhere(img == 255) # ones = pixels array
It takes about 0.02 seconds and is by far the slowest part of the code. My idea is to create this variable once and, each time the function ends, it removes the parsed pixels and passes the new array as parameter to continue until the array is empty
Not sure what you intend to do with the extra dimensions, as the set difference, like any filtering, is inherently losing the shape information.
Anyway, NumPy does provide np.setdiff1d() to solve this problem elegantly.
EDIT With the clarifications provided, you seems to be looking for a way compute the set difference on a given axis, i.e. the elements of the sets are actually arrays.
There is no built-in specifically for this in NumPy, but it is not too difficult to craft one.
For simplicity, we assume that the operating axis is the first one (so that the element of the set are arr[i]), that only unique elements appear in the first array, and that the arrays are 2D.
They are all based on the idea that the asymptotically best approach is to build a set() of the second array and then using that to filter out the entries from the first array.
The idiomatic way to build such set in Python / NumPy is to use:
set(map(tuple, arr))
where the mapping to tuple freezes arr[i], allowing them to be hashable and hence making them available to use with set().
Unfortunately, since the filtering would produce results of unpredictable size, NumPy arrays are not the ideal container for the result.
To solve this issue, one can use:
an intermediate list
import numpy as np
def setdiff2d_list(arr1, arr2):
delta = set(map(tuple, arr2))
return np.array([x for x in arr1 if tuple(x) not in delta])
np.fromiter() followed by np.reshape()
import numpy as np
def setdiff2d_iter(arr1, arr2):
delta = set(map(tuple, arr2))
return np.fromiter((x for xs in arr1 if tuple(xs) not in delta for x in xs), dtype=arr1.dtype).reshape(-1, arr1.shape[-1])
NumPy's advanced indexing
def setdiff2d_idx(arr1, arr2):
delta = set(map(tuple, arr2))
idx = [tuple(x) not in delta for x in arr1]
return arr1[idx]
Convert both inputs to set() (will force uniqueness of the output elements and will lose ordering):
import numpy as np
def setdiff2d_set(arr1, arr2):
set1 = set(map(tuple, arr1))
set2 = set(map(tuple, arr2))
return np.array(list(set1 - set2))
Alternatively, the advanced indexing can be built using broadcasting, np.any() and np.all():
def setdiff2d_bc(arr1, arr2):
idx = (arr1[:, None] != arr2).any(-1).all(1)
return arr1[idx]
Some form of the above methods were originally suggested in #QuangHoang's answer.
A similar approach could also be implemented in Numba, following the same idea as above but using a hash instead of the actual array view arr[i] (because of the limitations in what is supported inside a set() by Numba) and pre-computing the output size (for speed):
import numpy as np
import numba as nb
#nb.njit
def mul_xor_hash(arr, init=65537, k=37):
result = init
for x in arr.view(np.uint64):
result = (result * k) ^ x
return result
#nb.njit
def setdiff2d_nb(arr1, arr2):
# : build `delta` set using hashes
delta = {mul_xor_hash(arr2[0])}
for i in range(1, arr2.shape[0]):
delta.add(mul_xor_hash(arr2[i]))
# : compute the size of the result
n = 0
for i in range(arr1.shape[0]):
if mul_xor_hash(arr1[i]) not in delta:
n += 1
# : build the result
result = np.empty((n, arr1.shape[-1]), dtype=arr1.dtype)
j = 0
for i in range(arr1.shape[0]):
if mul_xor_hash(arr1[i]) not in delta:
result[j] = arr1[i]
j += 1
return result
While they all give the same result:
funcs = setdiff2d_iter, setdiff2d_list, setdiff2d_idx, setdiff2d_set, setdiff2d_bc, setdiff2d_nb
arr1 = np.array([[0, 3], [0, 4], [1, 3], [1, 7]])
print(arr1)
# [[0 3]
# [0 4]
# [1 3]
# [1 7]]
arr2 = np.array([[0, 3], [1, 7], [4, 0]])
print(arr2)
# [[0 3]
# [1 7]
# [4 0]]
result = funcs[0](arr1, arr2)
print(result)
# [[0 4]
# [1 3]]
for func in funcs:
print(f'{func.__name__:>24s}', np.all(result == func(arr1, arr2)))
# setdiff2d_iter True
# setdiff2d_list True
# setdiff2d_idx True
# setdiff2d_set False # because of ordering
# setdiff2d_bc True
# setdiff2d_nb True
their performance seems to be varying:
for func in funcs:
print(f'{func.__name__:>24s}', end=' ')
%timeit func(arr1, arr2)
# setdiff2d_iter 16.3 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_list 14.9 µs ± 528 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_idx 17.8 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_set 17.5 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_bc 9.45 µs ± 405 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_nb 1.58 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The Numba-based approach proposed seems to outperform the other approaches by a fair margin (some 10x using the given input).
Similar timings are observed with larger inputs:
np.random.seed(42)
arr1 = np.random.randint(0, 100, (1000, 2))
arr2 = np.random.randint(0, 100, (1000, 2))
print(setdiff2d_nb(arr1, arr2).shape)
# (736, 2)
for func in funcs:
print(f'{func.__name__:>24s}', end=' ')
%timeit func(arr1, arr2)
# setdiff2d_iter 3.51 ms ± 75.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_list 2.92 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_idx 2.61 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_set 3.52 ms ± 67.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_bc 25.6 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setdiff2d_nb 192 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(As a side note, setdiff2d_bc() is the most negatively affected by the size of the second input).
Depending on how large your arrays are. If they are not too large (few thousands), you can
use broadcasting to compare each point in x to each point in y
use any to check for inequality at the last dimension
use all to check for matching
Code:
idx = (arr1[:,None]!=arr2).any(-1).all(1)
arr1[idx]
Output:
array([[0, 4],
[1, 3]])
update: for longer data, you can try set and a for loop:
set_y = set(map(tuple, y))
idx = [tuple(point) not in set_y for point in x]
x[idx]
I have a numpy array and I like to check if it is sorted.
>>> a = np.array([1,2,3,4,5])
array([1, 2, 3, 4, 5])
np.all(a[:-1] <= a[1:])
Examples:
is_sorted = lambda a: np.all(a[:-1] <= a[1:])
>>> a = np.array([1,2,3,4,9])
>>> is_sorted(a)
True
>>> a = np.array([1,2,3,4,3])
>>> is_sorted(a)
False
With NumPy tools:
np.all(np.diff(a) >= 0)
but numpy solutions are all O(n).
If you want quick code and very quick conclusion on unsorted arrays:
import numba
#numba.jit
def is_sorted(a):
for i in range(a.size-1):
if a[i+1] < a[i] :
return False
return True
which is O(1) (in mean) on random arrays.
The inefficient but easy-to-type solution:
(a == np.sort(a)).all()
For completeness, the O(log n) iterative solution is found below. The recursive version is slower and crashes with big vector sizes. However, it is still slower than the native numpy using np.all(a[:-1] <= a[1:]) most likely due to modern CPU optimizations. The only case where the O(log n) is faster is on the "average" random case or if it is "almost" sorted. If you suspect your array is already fully sorted then np.all will be faster.
def is_sorted(a):
idx = [(0, a.size - 1)]
while idx:
i, j = idx.pop(0) # Breadth-First will find almost-sorted in O(log N)
if i >= j:
continue
elif a[i] > a[j]:
return False
elif i + 1 == j:
continue
else:
mid = (i + j) >> 1 # Division by 2 with floor
idx.append((i, mid))
idx.append((mid, j))
return True
is_sorted2 = lambda a: np.all(a[:-1] <= a[1:])
Here are the results:
# Already sorted array - np.all is super fast
sorted_array = np.sort(np.random.rand(1000000))
%timeit is_sorted(sorted_array)
659 ms ± 3.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit is_sorted2(sorted_array)
431 µs ± 35.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Here I included the random in each command so we need to substract it's timing
%timeit np.random.rand(1000000)
6.08 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit is_sorted(np.random.rand(1000000))
6.11 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Without random part, it took 6.11 ms - 6.08 ms = 30µs per loop
%timeit is_sorted2(np.random.rand(1000000))
6.83 ms ± 75.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Without random part, it took 6.83 ms - 6.08 ms = 750µs per loop
Net, a O(n) vector optimized code is better than an O(log n) algorithm, unless you will run >100 million element arrays.
I have a three-dimensional array like
A=np.array([[[1,1],
[1,0]],
[[1,2],
[1,0]],
[[1,0],
[0,0]]])
Now I would like to obtain an array that has a nonzero value in a given position if only a unique nonzero value (or zero) occurs in that position. It should have zero if only zeros or more than one nonzero value occur in that position. For the example above, I would like
[[1,0],
[1,0]]
since
in A[:,0,0] there are only 1s
in A[:,0,1] there are 0, 1 and 2, so more than one nonzero value
in A[:,1,0] there are 0 and 1, so 1 is retained
in A[:,1,1] there are only 0s
I can find how many nonzero elements there are with np.count_nonzero(A, axis=0), but I would like to keep 1s or 2s even if there are several of them. I looked at np.unique but it doesn't seem to support what I'd like to do.
Ideally, I'd like a function like np.count_unique(A, axis=0) which would return an array in the original shape, e.g. [[1, 3],[2, 1]], so I could check whether 3 or more occur and then ignore that position.
All I could come up with was a list comprehension iterating over the that I'd like to obtain
[[len(np.unique(A[:, i, j])) for j in range(A.shape[2])] for i in range(A.shape[1])]
Any other ideas?
You can use np.diff to stay at numpy level for the second task.
def diffcount(A):
B=A.copy()
B.sort(axis=0)
C=np.diff(B,axis=0)>0
D=C.sum(axis=0)+1
return D
# [[1 3]
# [2 1]]
it's seems to be a little faster on big arrays:
In [62]: A=np.random.randint(0,100,(100,100,100))
In [63]: %timeit diffcount(A)
46.8 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [64]: timeit [[len(np.unique(A[:, i, j])) for j in range(A.shape[2])]\
for i in range(A.shape[1])]
149 ms ± 700 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Finally counting unique is simpler than sorting, a ln(A.shape[0]) factor can be win.
A way to win this factor is to use the set mechanism :
In [81]: %timeit np.apply_along_axis(lambda a:len(set(a)),axis=0,A)
183 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Unfortunately, this is not faster.
Another way is to do it by hand :
def countunique(A,Amax):
res=np.empty(A.shape[1:],A.dtype)
c=np.empty(Amax+1,A.dtype)
for i in range(A.shape[1]):
for j in range(A.shape[2]):
T=A[:,i,j]
for k in range(c.size): c[k]=0
for x in T:
c[x]=1
res[i,j]= c.sum()
return res
At python level:
In [70]: %timeit countunique(A,100)
429 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Which is not so bad for a pure python approach. Then just shift this code at low level with numba :
import numba
countunique2=numba.jit(countunique)
In [71]: %timeit countunique2(A,100)
3.63 ms ± 70.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Which will be difficult to improve a lot.
One approach would be to use A as first axis indices for setting a boolean array of the same lengths along the other two axes and then simply counting the non-zeros along the first axis of it. Two variants would be possible - One keeping it as 3D and another would be to reshape into 2D for some performance benefit as indexing into 2D would be faster. Thus, the two implementations would be -
def nunique_axis0_maskcount_app1(A):
m,n = A.shape[1:]
mask = np.zeros((A.max()+1,m,n),dtype=bool)
mask[A,np.arange(m)[:,None],np.arange(n)] = 1
return mask.sum(0)
def nunique_axis0_maskcount_app2(A):
m,n = A.shape[1:]
A.shape = (-1,m*n)
maxn = A.max()+1
N = A.shape[1]
mask = np.zeros((maxn,N),dtype=bool)
mask[A,np.arange(N)] = 1
A.shape = (-1,m,n)
return mask.sum(0).reshape(m,n)
Runtime test -
In [154]: A = np.random.randint(0,100,(100,100,100))
# #B. M.'s soln
In [155]: %timeit f(A)
10 loops, best of 3: 28.3 ms per loop
# #B. M.'s soln using slicing : (B[1:] != B[:-1]).sum(0)+1
In [156]: %timeit f2(A)
10 loops, best of 3: 26.2 ms per loop
In [157]: %timeit nunique_axis0_maskcount_app1(A)
100 loops, best of 3: 12 ms per loop
In [158]: %timeit nunique_axis0_maskcount_app2(A)
100 loops, best of 3: 9.14 ms per loop
Numba method
Using the same strategy as used for nunique_axis0_maskcount_app2 with directly getting the counts at C-level with numba, we would have -
from numba import njit
#njit
def nunique_loopy_func(mask, N, A, p, count):
for j in range(N):
mask[:] = True
mask[A[0,j]] = False
c = 1
for i in range(1,p):
if mask[A[i,j]]:
c += 1
mask[A[i,j]] = False
count[j] = c
return count
def nunique_axis0_numba(A):
p,m,n = A.shape
A.shape = (-1,m*n)
maxn = A.max()+1
N = A.shape[1]
mask = np.empty(maxn,dtype=bool)
count = np.empty(N,dtype=int)
out = nunique_loopy_func(mask, N, A, p, count).reshape(m,n)
A.shape = (-1,m,n)
return out
Runtime test -
In [328]: np.random.seed(0)
In [329]: A = np.random.randint(0,100,(100,100,100))
In [330]: %timeit nunique_axis0_maskcount_app2(A)
100 loops, best of 3: 11.1 ms per loop
# #B.M.'s numba soln
In [331]: %timeit countunique2(A,A.max()+1)
100 loops, best of 3: 3.43 ms per loop
# Numba soln posted in this post
In [332]: %timeit nunique_axis0_numba(A)
100 loops, best of 3: 2.76 ms per loop
Right now I have vector3 values represented as lists. is there a way to subtract 2 of these like vector3 values, like
[2,2,2] - [1,1,1] = [1,1,1]
Should I use tuples?
If none of them defines these operands on these types, can I define it instead?
If not, should I create a new vector3 class?
If this is something you end up doing frequently, and with different operations, you should probably create a class to handle cases like this, or better use some library like Numpy.
Otherwise, look for list comprehensions used with the zip builtin function:
[a_i - b_i for a_i, b_i in zip(a, b)]
Here's an alternative to list comprehensions. Map iterates through the list(s) (the latter arguments), doing so simulataneously, and passes their elements as arguments to the function (the first arg). It returns the resulting list.
import operator
map(operator.sub, a, b)
This code because has less syntax (which is more aesthetic for me), and apparently it's 40% faster for lists of length 5 (see bobince's comment). Still, either solution will work.
If your lists are a and b, you can do:
map(int.__sub__, a, b)
But you probably shouldn't. No one will know what it means.
import numpy as np
a = [2,2,2]
b = [1,1,1]
np.subtract(a,b)
I'd have to recommend NumPy as well
Not only is it faster for doing vector math, but it also has a ton of convenience functions.
If you want something even faster for 1d vectors, try vop
It's similar to MatLab, but free and stuff. Here's an example of what you'd do
from numpy import matrix
a = matrix((2,2,2))
b = matrix((1,1,1))
ret = a - b
print ret
>> [[1 1 1]]
Boom.
If you have two lists called 'a' and 'b', you can do: [m - n for m,n in zip(a,b)]
A slightly different Vector class.
class Vector( object ):
def __init__(self, *data):
self.data = data
def __repr__(self):
return repr(self.data)
def __add__(self, other):
return tuple( (a+b for a,b in zip(self.data, other.data) ) )
def __sub__(self, other):
return tuple( (a-b for a,b in zip(self.data, other.data) ) )
Vector(1, 2, 3) - Vector(1, 1, 1)
Many solutions have been suggested.
If speed is of interest, here is a review of the different solutions with respect to speed (from fastest to slowest)
import timeit
import operator
a = [2,2,2]
b = [1,1,1] # we want to obtain c = [2,2,2] - [1,1,1] = [1,1,1
%timeit map(operator.sub, a, b)
176 ns ± 7.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit map(int.__sub__, a, b)
179 ns ± 4.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit map(lambda x,y: x-y, a,b)
189 ns ± 8.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [a_i - b_i for a_i, b_i in zip(a, b)]
421 ns ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x - b[i] for i, x in enumerate(a)]
452 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each
%timeit [a[i] - b[i] for i in range(len(a))]
530 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit list(map(lambda x, y: x - y, a, b))
546 ns ± 16.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.subtract(a,b)
2.68 µs ± 80.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit list(np.array(a) - np.array(b))
2.82 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.matrix(a) - np.matrix(b)
12.3 µs ± 437 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Using map is clearly the fastest.
Surprisingly, numpy is the slowest. It turns out that the cost of first converting the lists a and b to a numpy array is a bottleneck that outweighs any efficiency gains from vectorization.
%timeit a = np.array([2,2,2]); b=np.array([1,1,1])
1.55 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
a = np.array([2,2,2])
b = np.array([1,1,1])
%timeit a - b
417 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you plan on performing more than simple one liners, it would be better to implement your own class and override the appropriate operators as they apply to your case.
Taken from Mathematics in Python:
class Vector:
def __init__(self, data):
self.data = data
def __repr__(self):
return repr(self.data)
def __add__(self, other):
data = []
for j in range(len(self.data)):
data.append(self.data[j] + other.data[j])
return Vector(data)
x = Vector([1, 2, 3])
print x + x
For the one who used to code on Pycharm, it also revives others as well.
import operator
Arr1=[1,2,3,45]
Arr2=[3,4,56,78]
print(list(map(operator.sub,Arr1,Arr2)))
The combination of map and lambda functions in Python is a good solution for this kind of problem:
a = [2,2,2]
b = [1,1,1]
map(lambda x,y: x-y, a,b)
zip function is another good choice, as demonstrated by #UncleZeiv
This answer shows how to write "normal/easily understandable" pythonic code.
I suggest not using zip as not really everyone knows about it.
The solutions use list comprehensions and common built-in functions.
Alternative 1 (Recommended):
a = [2, 2, 2]
b = [1, 1, 1]
result = [a[i] - b[i] for i in range(len(a))]
Recommended as it only uses the most basic functions in Python
Alternative 2:
a = [2, 2, 2]
b = [1, 1, 1]
result = [x - b[i] for i, x in enumerate(a)]
Alternative 3 (as mentioned by BioCoder):
a = [2, 2, 2]
b = [1, 1, 1]
result = list(map(lambda x, y: x - y, a, b))
arr1=[1,2,3]
arr2=[2,1,3]
ls=[arr2-arr1 for arr1,arr2 in zip(arr1,arr2)]
print(ls)
>>[1,-1,0]
Very easy
list1=[1,2,3,4,5]
list2=[1,2]
list3=[]
# print(list1-list2)
for element in list1:
if element not in list2:
list3.append(element)
print(list3)
use a for loop
a = [3,5,6]
b = [3,7,2]
c = []
for i in range(len(a)):
c.append(a[i] - b[i])
print(c)
output
[0, -2, 4]
Try this:
list(array([1,2,3])-1)