Right now I have vector3 values represented as lists. is there a way to subtract 2 of these like vector3 values, like
[2,2,2] - [1,1,1] = [1,1,1]
Should I use tuples?
If none of them defines these operands on these types, can I define it instead?
If not, should I create a new vector3 class?
If this is something you end up doing frequently, and with different operations, you should probably create a class to handle cases like this, or better use some library like Numpy.
Otherwise, look for list comprehensions used with the zip builtin function:
[a_i - b_i for a_i, b_i in zip(a, b)]
Here's an alternative to list comprehensions. Map iterates through the list(s) (the latter arguments), doing so simulataneously, and passes their elements as arguments to the function (the first arg). It returns the resulting list.
import operator
map(operator.sub, a, b)
This code because has less syntax (which is more aesthetic for me), and apparently it's 40% faster for lists of length 5 (see bobince's comment). Still, either solution will work.
If your lists are a and b, you can do:
map(int.__sub__, a, b)
But you probably shouldn't. No one will know what it means.
import numpy as np
a = [2,2,2]
b = [1,1,1]
np.subtract(a,b)
I'd have to recommend NumPy as well
Not only is it faster for doing vector math, but it also has a ton of convenience functions.
If you want something even faster for 1d vectors, try vop
It's similar to MatLab, but free and stuff. Here's an example of what you'd do
from numpy import matrix
a = matrix((2,2,2))
b = matrix((1,1,1))
ret = a - b
print ret
>> [[1 1 1]]
Boom.
If you have two lists called 'a' and 'b', you can do: [m - n for m,n in zip(a,b)]
A slightly different Vector class.
class Vector( object ):
def __init__(self, *data):
self.data = data
def __repr__(self):
return repr(self.data)
def __add__(self, other):
return tuple( (a+b for a,b in zip(self.data, other.data) ) )
def __sub__(self, other):
return tuple( (a-b for a,b in zip(self.data, other.data) ) )
Vector(1, 2, 3) - Vector(1, 1, 1)
Many solutions have been suggested.
If speed is of interest, here is a review of the different solutions with respect to speed (from fastest to slowest)
import timeit
import operator
a = [2,2,2]
b = [1,1,1] # we want to obtain c = [2,2,2] - [1,1,1] = [1,1,1
%timeit map(operator.sub, a, b)
176 ns ± 7.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit map(int.__sub__, a, b)
179 ns ± 4.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit map(lambda x,y: x-y, a,b)
189 ns ± 8.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [a_i - b_i for a_i, b_i in zip(a, b)]
421 ns ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x - b[i] for i, x in enumerate(a)]
452 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each
%timeit [a[i] - b[i] for i in range(len(a))]
530 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit list(map(lambda x, y: x - y, a, b))
546 ns ± 16.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.subtract(a,b)
2.68 µs ± 80.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit list(np.array(a) - np.array(b))
2.82 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.matrix(a) - np.matrix(b)
12.3 µs ± 437 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Using map is clearly the fastest.
Surprisingly, numpy is the slowest. It turns out that the cost of first converting the lists a and b to a numpy array is a bottleneck that outweighs any efficiency gains from vectorization.
%timeit a = np.array([2,2,2]); b=np.array([1,1,1])
1.55 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
a = np.array([2,2,2])
b = np.array([1,1,1])
%timeit a - b
417 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you plan on performing more than simple one liners, it would be better to implement your own class and override the appropriate operators as they apply to your case.
Taken from Mathematics in Python:
class Vector:
def __init__(self, data):
self.data = data
def __repr__(self):
return repr(self.data)
def __add__(self, other):
data = []
for j in range(len(self.data)):
data.append(self.data[j] + other.data[j])
return Vector(data)
x = Vector([1, 2, 3])
print x + x
For the one who used to code on Pycharm, it also revives others as well.
import operator
Arr1=[1,2,3,45]
Arr2=[3,4,56,78]
print(list(map(operator.sub,Arr1,Arr2)))
The combination of map and lambda functions in Python is a good solution for this kind of problem:
a = [2,2,2]
b = [1,1,1]
map(lambda x,y: x-y, a,b)
zip function is another good choice, as demonstrated by #UncleZeiv
This answer shows how to write "normal/easily understandable" pythonic code.
I suggest not using zip as not really everyone knows about it.
The solutions use list comprehensions and common built-in functions.
Alternative 1 (Recommended):
a = [2, 2, 2]
b = [1, 1, 1]
result = [a[i] - b[i] for i in range(len(a))]
Recommended as it only uses the most basic functions in Python
Alternative 2:
a = [2, 2, 2]
b = [1, 1, 1]
result = [x - b[i] for i, x in enumerate(a)]
Alternative 3 (as mentioned by BioCoder):
a = [2, 2, 2]
b = [1, 1, 1]
result = list(map(lambda x, y: x - y, a, b))
arr1=[1,2,3]
arr2=[2,1,3]
ls=[arr2-arr1 for arr1,arr2 in zip(arr1,arr2)]
print(ls)
>>[1,-1,0]
Very easy
list1=[1,2,3,4,5]
list2=[1,2]
list3=[]
# print(list1-list2)
for element in list1:
if element not in list2:
list3.append(element)
print(list3)
use a for loop
a = [3,5,6]
b = [3,7,2]
c = []
for i in range(len(a)):
c.append(a[i] - b[i])
print(c)
output
[0, -2, 4]
Try this:
list(array([1,2,3])-1)
Related
I want to add all nonzero elements from a numpy array arr to a list out_list. Previous research suggests that for numpy arrays, using np.nonzero is most efficient. (My own benchmark below actually suggests it can be slightly improved using np.delete).
However, in my case I want my output to be a list, because I am combining many arrays for which I don't know the number of nonzero elements (so I can't effectively preallocate a numpy array for them). Hence, I was wondering whether there are some synergies that can be exploited to speed up the process. While my naive list comprehension approach is much slower than the pure numpy approach, I got some promising results combining list comprehension with numba.
Here's what I found so far:
import numpy as np
n = 60_000 # size of array
nz = 0.3 # fraction of zero elements
arr = (np.random.random_sample(n) - nz).clip(min=0)
# method 1
def add_to_list1(arr, out):
out.extend(list(arr[np.nonzero(arr)]))
# method 2
def add_to_list2(arr, out):
out.extend(list(np.delete(arr, arr == 0)))
# method 3
def add_to_list3(arr, out):
out += [x for x in arr if x != 0]
# method 4 (not sure how to get numba to accept an empty list as argument)
#njit
def add_to_list4(arr):
return [x for x in arr if x != 0]
out_list = []
%timeit add_to_list1(arr, out_list)
out_list = []
%timeit add_to_list2(arr, out_list)
out_list = []
%timeit add_to_list3(arr, out_list)
_ = add_to_list4(arr) # call once to compile
out_list = []
%timeit out_list.extend(add_to_list4(arr))
Yielding the following results:
2.51 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.19 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.6 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.63 ms ± 158 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Not surprisingly, numba outperforms all other methods. Among the rest, method 2 (using np.delete) is the best. Am I missing any obvious alternative that exploits the fact that I am converting to a list afterwards? Can you think of anything to further speed up the process?
Edit 1:
Performance of .tolist():
# method 5
def add_to_list5(arr, out):
out += arr[arr != 0].tolist()
# method 6
def add_to_list6(arr, out):
out += np.delete(arr, arr == 0).tolist()
# method 7
def add_to_list7(arr, out):
out += arr[arr.astype(bool)].tolist()
Timings are on par with numba:
1.62 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.65 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each
1.78 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Edit 2:
Here's some benchmarking using Mad Physicist's suggestion to use np.concatenate to construct a numpy array instead.
# construct numpy array using np.concatenate
out_list = []
t = time.perf_counter()
for i in range(100):
out_list.append(arr[arr != 0])
result = np.concatenate(out_list)
print(f"Time elapsed: {time.perf_counter() - t:.4f}s")
# compare with best list-based method
out_list = []
t = time.perf_counter()
for i in range(100):
out_list += arr[arr != 0].tolist()
print(f"Time elapsed: {time.perf_counter() - t:.4f}s")
Concatenating numpy arrays yields indeed another significant speed-up, although it is not directly comparable since the output is a numpy array instead of a list. So it will depend on the precise use what will be best.
Time elapsed: 0.0400s
Time elapsed: 0.1430s
TLDR;
1/ using arr[arr != 0] is the fastest of all the indexing options
2/ using .tolist() instead of list(.) speeds up things by a factor 1.3 - 1.5
3/ with the gains of 1/ and 2/ combined, the speed is on par with numba
4/ if having a numpy array instead of a list is acceptable, then using np.concatenate yields another gain in speed by a factor of ~3.5 compared to the best alternative
I submit that the method of choice, if you are indeed looking for a list output, is:
def f(arr, out_list):
out_list += arr[arr != 0].tolist()
It seems to beat all the other methods mentioned so far in the OP's question or in other responses (at the time of this writing).
If, however, you are looking for a result as a numpy array, then following #MadPhysicist's version (slightly modified to use arr[arr != 0] instead of using np.nonzero()) is almost 6x faster, see at the end of this post.
Side note: I would avoid using %timeit out_list.extend(some_list): it keeps adding to out_list during the many loops of timeit. Example:
out_list = []
%timeit out_list.extend([1,2,3])
and now:
>>> len(out_list)
243333333 # yikes
Timings
On 60K items on my machine, I see:
out_list = []
a = %timeit -o out_list + arr[arr != 0].tolist()
b = %timeit -o out_list + arr[np.nonzero(arr)].tolist()
c = %timeit -o out_list + list(arr[np.nonzero(arr)])
Yields:
1.23 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.53 ms ± 2.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.29 ms ± 3.02 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And:
>>> c.average / a.average
3.476
>>> b.average / a.average
1.244
For a numpy array result instead
Following #MadPhysicist, you can get some extra boost by not turning the arrays into lists, but using np.concatenate() instead:
def all_nonzero(arr_iter):
"""return non zero elements of all arrays as a np.array"""
return np.concatenate([a[a != 0] for a in arr_iter])
def all_nonzero_list(arr_iter):
"""return non zero elements of all arrays as a list"""
out_list = []
for a in arr_iter:
out_list += a[a != 0].tolist()
return out_list
from itertools import repeat
ta = %timeit -o all_nonzero(repeat(arr, 100))
tl = %timeit -o all_nonzero_list(repeat(arr, 100))
Yields:
39.7 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
227 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
and
>>> tl.average / ta.average
5.75
Instead of extending a list by all of the elements of a new array, append the array itself. This will make for much fewer and smaller reallocations. You can also pre-allocate a list of Nones up-front or even use an object array, if you have an upper bound on the number of arrays you will process.
When you're done, call np.concatenate on the list.
So instead of this:
L = []
for i in range(10):
arr = (np.random.random_sample(n) - nz).clip(min=0)
L.extend(arr[np.nonzero(arr)])
result = np.array(L)
Try this:
L = []
for i in range(10):
arr = (np.random.random_sample(n) - nz).clip(min=0)
L.append(arr[np.nonzero(arr)])
result = np.concatenate(L)
Since you're keeping arrays around, the final concatenation will be a series of buffer copies (which is fast), rather than a bunch of python-to numpy type conversions (which won't be). The exact method you choose for deletion is of course still up to the result of your benchmark.
Also, here's another method to add to your benchmark:
def add_to_list5(arr, out):
out.extend(list(arr[arr.astype(bool)]))
I don't expect this to be overwhelmingly fast, but it's interesting to see how masking stacks up next to indexing.
I have 1d array, I need to remove all trailing blocks of 8 zeros.
[0,1,1,0,1,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0]
->
[0,1,1,0,1,0,0,0]
a.shape[0] % 8 == 0 always, so no worries about that.
Is there a better way to do it?
import numpy as np
P = 8
arr1 = np.random.randint(2,size=np.random.randint(5,10) * P)
arr2 = np.random.randint(1,size=np.random.randint(5,10) * P)
arr = np.concatenate((arr1, arr2))
indexes = []
arr = np.flip(arr).reshape(arr.shape[0] // P, P)
for i, f in enumerate(arr):
if (f == 0).all():
indexes.append(i)
else:
break
arr = np.delete(arr, indexes, axis=0)
arr = np.flip(arr.reshape(arr.shape[0] * P))
You can do it without allocating more space by using views and np.argmax to get the last nonzero element:
index = arr.size - np.argmax(arr[::-1])
Rounding up to the nearest multiple of eight is easy:
index = np.ceil(index / 8) * 8
Now chop off the rest:
arr = arr[:index]
Or as a one-liner:
arr = arr[:(arr.size - np.argmax(arr[::-1])) / 8) * 8]
This version is O(n) in time and O(1) in space because it reuses the same buffers for everything (including the output).
This has the additional advantage that it will work correctly even if there are no trailing zeros. Using argmax does rely on all the elements being the same though. If that is not the case, you will need to compute a mask first, e.g. with arr.astype(bool).
If you want to use your original approach, you could vectorize that too, although there will be a bit more overhead:
view = arr.reshape(-1, 8)
mask = view.any(axis = 1)
index = view.shape[0] - np.argmax(mask[::-1])
arr = arr[:index * 8]
There is a numpy function that does almost what you want np.trim_zeros. We can use that:
import numpy as np
def trim_mod(a, m=8):
t = np.trim_zeros(a, 'b')
return a[:len(a)-(len(a)-len(t))//m*m]
def test(a, t, m=8):
assert (len(a) - len(t)) % m == 0
assert len(t) < m or np.any(t[-m:])
assert not np.any(a[len(t):])
for _ in range(1000):
a = (np.random.random(np.random.randint(10, 100000))<0.002).astype(int)
m = np.random.randint(4, 20)
t = trim_mod(a, m)
test(a, t, m)
print("Looks correct")
Prints:
Looks correct
It seems to scale linearly in the number of trailing zeros:
But feels rather slow in absolute terms (units are ms per trial), so maybe np.trim_zeros is just a python loop.
Code for the picture:
from timeit import timeit
A = (np.random.random(1000000)<0.02).astype(int)
m = 8
T = []
for last in range(1, 1000, 9):
A[-last:] = 0
A[-last] = 1
T.append(timeit(lambda: trim_mod(A, m), number=100)*10)
import pylab
pylab.plot(range(1, 1000, 9), T)
pylab.show()
A low level approach :
import numba
#numba.njit
def trim8(a):
n=a.size-1
while n>=0 and a[n]==0 : n-=1
c= (n//8+1)*8
return a[:c]
Some tests :
In [194]: A[-1]=1 # best case
In [196]: %timeit trim_mod(A,8)
5.7 µs ± 323 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [197]: %timeit trim8(A)
714 ns ± 33.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [198]: %timeit A[:(A.size - np.argmax(A[::-1]) // 8) * 8]
4.83 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [202]: A[:]=0 #worst case
In [203]: %timeit trim_mod(A,8)
2.5 s ± 49.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [204]: %timeit trim8(A)
1.14 ms ± 71.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [205]: %timeit A[:(A.size - np.argmax(A[::-1]) // 8) * 8]
5.5 ms ± 950 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It has a short circuit mechanism like trim_zeros, but is much faster.
I have a large array, that looks like something below:
np.random.seed(42)
arr = np.random.permutation(np.array([
(1,1,2,2,2,2,3,3,4,4,4),
(8,9,3,4,7,9,1,9,3,4,50000)
]).T)
It isn't sorted, the rows of this array are unique, I also know the bounds for the values in both columns, they are [0, n] and [0, k]. So the maximum possible size of the array is (n+1)*(k+1), but the actual size is closer to log of that.
I need to search the array by both columns to find such row that arr[row,:] = (i,j), and return -1 when (i,j) is absent in the array. The naive implementation for such function is:
def get(arr, i, j):
cond = (arr[:,0] == i) & (arr[:,1] == j)
if np.any(cond):
return np.where(cond)[0][0]
else:
return -1
Unfortunately, since in my case arr is very large (>90M rows), this is very inefficient, especially since I would need to call get() multiple times.
Alternatively I tried translating this to a dict with (i,j) keys, such that
index[(i,j)] = row
that can be accessed by:
def get(index, i, j):
try:
retuen index[(i,j)]
except KeyError:
return -1
This works (and is much faster when tested on smaller data than I have), but again, creating the dict on-the-fly by
index = {}
for row in range(arr.shape[0]):
i,j = arr[row, :]
index[(i,j)] = row
takes huge amount of time and eats lots of RAM in my case. I was also thinking of first sorting arr and then using something like np.searchsorted, but this didn't lead me anywhere.
So what I need is a fast function get(arr, i, j) that returns
>>> get(arr, 2, 3)
4
>>> get(arr, 4, 100)
-1
A partial solution would be:
In [36]: arr
Out[36]:
array([[ 2, 9],
[ 1, 8],
[ 4, 4],
[ 4, 50000],
[ 2, 3],
[ 1, 9],
[ 4, 3],
[ 2, 7],
[ 3, 9],
[ 2, 4],
[ 3, 1]])
In [37]: (i,j) = (2, 3)
# we can use `assume_unique=True` which can speed up the calculation
In [38]: np.all(np.isin(arr, [i,j], assume_unique=True), axis=1, keepdims=True)
Out[38]:
array([[False],
[False],
[False],
[False],
[ True],
[False],
[False],
[False],
[False],
[False],
[False]])
# we can use `assume_unique=True` which can speed up the calculation
In [39]: mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1, keepdims=True)
In [40]: np.argwhere(mask)
Out[40]: array([[4, 0]])
If you need the final result as a scalar, then don't use keepdims argument and cast the array to a scalar like:
# we can use `assume_unique=True` which can speed up the calculation
In [41]: mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1)
In [42]: np.argwhere(mask)
Out[42]: array([[4]])
In [43]: np.asscalar(np.argwhere(mask))
Out[43]: 4
Solution
Python offers a set type to store unique values, but sadly no ordered version of a set. But you can use the ordered-set package.
Create an OrderedSet from the data. Fortunately, this only needs to be done once:
import ordered_set
o = ordered_set.OrderedSet(map(tuple, arr))
def ordered_get(o, i, j):
try:
return o.index((i,j))
except KeyError:
return -1
Runtime
Finding the index of a value should be O(1), according to the documentation:
In [46]: %timeit get(arr, 2, 3)
10.6 µs ± 39 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [47]: %timeit ordered_get(o, 2, 3)
1.16 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [48]: %timeit ordered_get(o, 2, 300)
1.05 µs ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Testing this for a much larger array:
a2 = random.randint(10000, size=1000000).reshape(-1,2)
o2 = ordered_set.OrderedSet()
for t in map(tuple, a2):
o2.add(t)
In [65]: %timeit get(a2, 2, 3)
1.05 ms ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [66]: %timeit ordered_get(o2, 2, 3)
1.03 µs ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [67]: %timeit ordered_get(o2, 2, 30000)
1.06 µs ± 28.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Looks like it indeed is O(1) runtime.
def get_agn(arr, i, j):
idx = np.flatnonzero((arr[:,0] == j) & (arr[:,1] == j))
return -1 if idx.size == 0 else idx[0]
Also, just in case you are thinking about the ordered_set solution, here is a better one (however, in both cases see timing tests below):
d = { (i, j): k for k, (i, j) in enumerate(arr)}
def unordered_get(d, i, j):
return d.get((i, j), -1)
and it's "full" equivalent (that builds the dictionary inside the function):
def unordered_get_full(arr, i, j):
d = { (i, j): k for k, (i, j) in enumerate(arr)}
return d.get((i, j), -1)
Timing tests:
First, define #kmario23 function:
def get_kmario23(arr, i, j):
# fundamentally, kmario23's code re-aranged to return scalars
# and -1 when (i, j) not found:
mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1)
idx = np.argwhere(mask)[0]
return -1 if idx.size == 0 else np.asscalar(idx[0])
Second, define #ChristophTerasa function (original and the full version):
import ordered_set
o = ordered_set.OrderedSet(map(tuple, arr))
def ordered_get(o, i, j):
try:
return o.index((i,j))
except KeyError:
return -1
def ordered_get_full(arr, i, j):
# "Full" version that builds ordered set inside the function
o = ordered_set.OrderedSet(map(tuple, arr))
try:
return o.index((i,j))
except KeyError:
return -1
Generate some large data:
arr = np.random.randint(1, 2000, 200000).reshape((-1, 2))
Timing results:
In [55]: %timeit get_agn(arr, *arr[-1])
149 µs ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [56]: %timeit get_kmario23(arr, *arr[-1])
1.42 ms ± 17.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [57]: %timeit get_kmario23(arr, *arr[0])
1.2 ms ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Ordered set tests:
In [80]: o = ordered_set.OrderedSet(map(tuple, arr))
In [81]: %timeit ordered_get(o, *arr[-1])
1.74 µs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [82]: %timeit ordered_get_full(arr, *arr[-1]) # include ordered set creation time
166 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Unordered dictionary tests:
In [83]: d = { (i, j): k for k, (i, j) in enumerate(arr)}
In [84]: %timeit unordered_get(d, *arr[-1])
1.18 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [85]: %timeit unordered_get_full(arr, *arr[-1])
102 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So, when taking into account the time needed to create either ordered set or unordered dictionary, these methods are quite slow. You must plan running several hundred searches on the same data for these methods to make sense. Even then, there is no need to use ordered_set package - regular dictionaries are faster.
It seems I was over-thinking this problem, there is easy solution. I was considering either filtering and subsetting the array or using dict index[(i,j)] = row. Filtering and subsetting was slow (O(n) when searching), while using dict was fast (O(1) access time), but creating the dict was slow and memory intensive.
The simple solution for this problem is using nested dicts.
index = {}
for row in range(arr.shape[0]):
i,j = arr[row, :]
try:
index[i][j] = row
except KeyError:
index[i] = {}
index[i][j] = row
def get(index, i, j):
try:
return index[i][j]
except KeyError:
return -1
Alternatively, instead of dict on higher level, I could use index = defaultdict(dict), what would allow for assigning index[i][j] = row
directly, without the try ... except conditions, but then the defaultdict(dict) object would create empty {} when queried for nonexistent i by the get(index, i, j) function, so it would be expanding the index unnecessarily.
The access time is O(1) for the first dict and O(1) for the nested dicts, so basically it's O(1). The upper level dict has manageable size (bounded by n < n*k), while the nested dicts are small (the nesting order is chosen based on the fact that in my case k << n). Building the nested dict is also very fast, even for >90M rows in the array. Moreover, it can be easily extended to more complicated cases.
Basically I want to map over each value of a multidimensional numpy array. The output should have the same shape as the input.
This is the way I did it:
def f(x):
return x*x
input = np.arange(3*4*5).reshape(3,4,5)
output = np.array(list(map(f, input)))
print(output)
It works, but it feels a bit too complicated (np.array, list, map). Is there a more elegant solution?
Just call your function on the array:
f(input)
Also, better not to use the name input for your variable as it is a builtin:
import numpy as np
def f(x):
return x*x
arr = np.arange(3*4*5).reshape(3,4,5)
print(np.alltrue(f(arr) == np.array(list(map(f, input)))))
Output:
True
If the function is more complex:
def f(x):
return x+1 if x%2 else 2*x
use vectorize:
np.vectorize(f)(arr)
Better, always try to use the vectorized NumPy functions such as np.where:
>>> np.alltrue(np.vectorize(f)(arr) == np.where(arr % 2, arr + 1, arr * 2))
True
The native NumPy version is considerably faster:
%%timeit
np.vectorize(f)(arr)
34 µs ± 996 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
np.where(arr % 2, arr + 1, arr * 2)
5.16 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is much more pronounced for larger arrays:
big_arr = np.arange(30 * 40 * 50).reshape(30, 40, 50)
%%timeit
np.vectorize(f)(big_arr)
15.5 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
np.where(big_arr % 2, big_arr + 1, big_arr * 2)
797 µs ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I have a numpy array and I like to check if it is sorted.
>>> a = np.array([1,2,3,4,5])
array([1, 2, 3, 4, 5])
np.all(a[:-1] <= a[1:])
Examples:
is_sorted = lambda a: np.all(a[:-1] <= a[1:])
>>> a = np.array([1,2,3,4,9])
>>> is_sorted(a)
True
>>> a = np.array([1,2,3,4,3])
>>> is_sorted(a)
False
With NumPy tools:
np.all(np.diff(a) >= 0)
but numpy solutions are all O(n).
If you want quick code and very quick conclusion on unsorted arrays:
import numba
#numba.jit
def is_sorted(a):
for i in range(a.size-1):
if a[i+1] < a[i] :
return False
return True
which is O(1) (in mean) on random arrays.
The inefficient but easy-to-type solution:
(a == np.sort(a)).all()
For completeness, the O(log n) iterative solution is found below. The recursive version is slower and crashes with big vector sizes. However, it is still slower than the native numpy using np.all(a[:-1] <= a[1:]) most likely due to modern CPU optimizations. The only case where the O(log n) is faster is on the "average" random case or if it is "almost" sorted. If you suspect your array is already fully sorted then np.all will be faster.
def is_sorted(a):
idx = [(0, a.size - 1)]
while idx:
i, j = idx.pop(0) # Breadth-First will find almost-sorted in O(log N)
if i >= j:
continue
elif a[i] > a[j]:
return False
elif i + 1 == j:
continue
else:
mid = (i + j) >> 1 # Division by 2 with floor
idx.append((i, mid))
idx.append((mid, j))
return True
is_sorted2 = lambda a: np.all(a[:-1] <= a[1:])
Here are the results:
# Already sorted array - np.all is super fast
sorted_array = np.sort(np.random.rand(1000000))
%timeit is_sorted(sorted_array)
659 ms ± 3.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit is_sorted2(sorted_array)
431 µs ± 35.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Here I included the random in each command so we need to substract it's timing
%timeit np.random.rand(1000000)
6.08 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit is_sorted(np.random.rand(1000000))
6.11 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Without random part, it took 6.11 ms - 6.08 ms = 30µs per loop
%timeit is_sorted2(np.random.rand(1000000))
6.83 ms ± 75.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Without random part, it took 6.83 ms - 6.08 ms = 750µs per loop
Net, a O(n) vector optimized code is better than an O(log n) algorithm, unless you will run >100 million element arrays.