Stochastically rounding a float to an integer - python

I want a function (using Python 3.6+ if it's relevant) that will stochastically round a floating-point number to an integer in the following manner:
Given a real number, x, let a = floor(x) and let b = ceil(x). Then, write a function s_int() that will return a with a probability of b - x and return b with a probability of x - a.
For example, s_int(14.8) should return 14 20% of the time, and return 15 for the remaining 80% of the time.
Here is my attempt:
import math
from random import random
def s_int(x):
a = math.floor(x)
return a + ((x - a) > random())
It appears to work for all cases I can think of:
In [2]: Counter(s_int(14.7) for _ in range(1000000))
Out[2]: Counter({14: 300510, 15: 699490})
In [3]: Counter(s_int(-14.7) for _ in range(1000000))
Out[3]: Counter({-15: 700133, -14: 299867})
In [4]: Counter(s_int(14) for _ in range(1000000))
Out[4]: Counter({14: 1000000})
In [5]: Counter(s_int(-14) for _ in range(1000000))
Out[5]: Counter({-14: 1000000})
In [6]: Counter(s_int(0) for _ in range(1000000))
Out[6]: Counter({0: 1000000})
Here are my questions:
Are there any edge cases I didn't consider that this function doesn't work for?
Are there other simpler or more elegant solutions?
Can this be made to run faster?

I think there are no edge cases
Try using numpy
import numpy as np
def s_int(x):
a = np.floor(x)
b = a + 1
return (np.random.choice([a, b], p=[b - x, x - a]))
I think that it's an O(1) operation. Does not go faster.

Related

Trigonometry functions from series expansion

I am trying to write functions which emulate math.sin and math.tan but, instead of using the math library, performing the calculation using a series expansion.
The formulae are from Mathematics SE, How would you calculate the Tangent without a calculator?:
sin(x) = x − x^3/3! + x^5/5! −...
tan(x) = sin(x) / √(1 − sin(x)^2)
This is my attempt, but I could not figure out how to perform the sign flipping + / - / + / ... part of the series expansion for sin:
from math import factorial
res = 0
for i in [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]:
res += 1**i/factorial(i)
print(res) # 1.1752011936438016
The result is not correct because I have not applied a + / - switch. I could add an if / else clause but this seems messy. Is there a better way?
Note: This question is an embellished version of a now deleted question that was posted yesterday by #Lana.
You can avoid recalculating x**n and the factorial at each step by calculating the next term of the sum using the previous one:
def sin2(x, n=20):
curr = x
res = curr
for i in range(2, n, 2):
curr *= - x**2/(i*(i+1))
res += curr
return res
Compared to jpp's version, it's about twice as fast:
from math import factorial
def sin(x, n=20):
return sum(x**j/factorial(j)*(1 if i%2==0 else -1)
for i, j in enumerate(range(1, n, 2)))
%timeit sin(0.7)
# 100000 loops, best of 3: 8.52 µs per loop
%timeit sin2(0.7)
# 100000 loops, best of 3: 4.54 µs per loop
And it can get a bit faster if we calculate - x**2 once and for all:
def sin3(x, n=20):
curr = x
res = 0
minus_x_squared = - x**2
for i in range(2, n, 2):
res += curr
curr *= minus_x_squared/(i*(i+1))
return res
%timeit sin2(0.7)
# 100000 loops, best of 3: 4.6 µs per loop
%timeit sin3(0.7)
# 100000 loops, best of 3: 3.54 µs per loop
You are close. Below is one way using sum with enumerate for your series expansion.
enumerate works by taking each value of an iterable and attaching an index, i.e. 0 for the first item, 1 for the second item, etc. Then we only need to test whether the index is even or odd and use a ternary statement.
In addition, you can use range instead of listing the odd numbers required in your expansion.
from math import factorial
def sin(x, n=20):
return sum(x**j/factorial(j)*(1 if i%2==0 else -1)
for i, j in enumerate(range(1, n, 2)))
def tan(x):
return sin(x) / (1-(sin(x))**2)**0.5
print(tan(1.2)) # 2.572151622126318
You can avoid the need for a ternary statement and enumerate altogether:
def sin(x, n=20):
return sum((-1)**i * x**(2*i+1) / factorial(2*i+1) for i in range(n))
If you write out the first few terms by hand, the equivalence will become clear.
Notes:
The sign of the tan function is only correct for 1st and 4th quadrants. This is consistent with the formulae you have provided. You can perform a trivial transformation to the input to account for this.
You can improve accuracy by increasing parameter n.
You can also calculate factorial without a library, but I'll leave that as an exercise.

Randomly generate 1 or -1 (positive or negative integer)

I wanted to generate 1 or -1 in Python as a step to randomizing between non-negative and non-positive numbers or to randomly changing sign of an already existing integer. What would be the best way to generate 1 or -1 in Python? Assuming even distribution I know I could use:
import random
#method1
my_number = random.choice((-1, 1))
#method2
my_number = (-1)**random.randrange(2)
#method3
# if I understand correctly random.random() should never return exactly 1
# so I use "<", not "<="
if random.random() < 0.5:
my_number = 1
else:
my_number = -1
#method4
my_number = random.randint(0,1)*2-1
Using timeit module I got the following results:
#method1
s = "my_number = random.choice((-1, 1))"
timeit.timeit(stmt = s, setup = "import random")
>2.814896769857569
#method2
s = "my_number = (-1)**random.randrange(2)"
timeit.timeit(stmt = s, setup = "import random")
>3.521280517518562
#method3
s = """
if random.random() < 0.5: my_number = 1
else: my_number = -1"""
timeit.timeit(stmt = s, setup = "import random")
>0.25321546903273884
#method4
s = "random.randint(0,1)*2-1"
timeit.timeit(stmt = s, setup = "import random")
>4.526625442240402
So unexpectedly method 3 is the fastest. My bet was on method 1 to be the fastest as it is also shortest. Also both method 1 (since Python 3.6 I think?) and 3 give the possibility to introduce uneven distributions. Although method 1 is shortest (main advantege) for now I would choose method 3:
def positive_or_negative():
if random.random() < 0.5:
return 1
else:
return -1
Testing:
s = """
import random
def positive_or_negative():
if random.random() < 0.5:
return 1
else:
return -1
"""
timeit.timeit(stmt = "my_number = positive_or_negative()", setup = s)
>0.3916183138621818
Any better (faster or shorter) method to randomly generate -1 or 1 in Python? Any reason why would you choose method 1 over method 3 or vice versa?
A one liner variation of #3:
return 1 if random.random() < 0.5 else -1
It's fast(er) than the 'math' variants, because it doesn't involve additional arithmetic.
Here's another one-liner that my timings show to be faster than the if/else comparison to 0.5:
[-1,1][random.randrange(2)]
not sure what your application is exactly, but I needed something similar for a large vectorized array.
Here's a good way to get a sign array:
(2*np.random.randint(0,2,size=(your_size))-1)
The result is an array, for example:
array([-1, -1, -1, 1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, -1])
and you can use the reshape command to get the above to the size of your matrix:
(2*np.random.randint(0,2,size=(m*n))-1).reshape(m,n)
Then you can multiply a matrix by the above and get all of the members with random signs.
A= np.array([[1, 2, 3],
[4, 5, 6]])
B = A*(2*np.random.randint(0,2,size=(2*3))-1).reshape(2,3)
Then you get something like :
B = array([[ 1, 2, -3],[ 4, 5, -6]])
Pretty quick, if your data is vectorized.
Maths made simple:
Generate random number: 0 or 1
Get them mutiplied by 2: 0 or 2
Substract 1: -1 or 1
Adapt that to any programming code. No need for test functions.
print(random.randint(0,1)*2-1)
works also without randint
print(int(random.random()*2)*2-1)
The fastest way to generate random numbers if you're going to be doing lots of them is by using numpy:
In [1]: import numpy as np
In [2]: import random
In [3]: %timeit [random.choice([-1,1]) for i in range(100000)]
10 loops, best of 3: 88.9 ms per loop
In [4]: %timeit [(-1)**random.randrange(2) for i in range(100000)]
10 loops, best of 3: 110 ms per loop
In [5]: %timeit [1 if random.random() < 0.5 else -1 for i in range(100000)]
100 loops, best of 3: 18.4 ms per loop
In [6]: %timeit [random.randint(0,1)*2-1 for i in range(100000)]
1 loop, best of 3: 180 ms per loop
In [7]: %timeit np.random.choice([-1,1],size=100000)
1000 loops, best of 3: 1.52 ms per loop
If you need single bits (one per call), you already did your benchmark and other answers provide additional info.
If you need many bits or can pre-calculate bit-arrays for later consumption, numpy's methods might shine.
Here is some more demo-approach using numpy (which surprisingly does not have a method dedicated for this job exactly):
import numpy as np
import random
def sample_bits(N):
assert N % 8 == 0 # demo only
n_bytes = N // 8
rbytes = np.random.randint(0, 255, dtype=np.uint8, size=n_bytes)
return np.unpackbits(rbytes)
def alt(N):
return np.random.choice([-1,1],size=N)
def alt2(N):
return [1 if random.random() < 0.5 else -1 for i in range(N)]
if __name__ == '__main__':
import timeit
print(timeit.timeit("sample_bits(1024)", setup="from __main__ import sample_bits", number=10000))
print(timeit.timeit("alt(1024)", setup="from __main__ import alt", number=10000))
print(timeit.timeit("alt2(1024)", setup="from __main__ import alt2", number=10000))
Output:
0.06640421246836543
0.352129537507486
1.5522800431775592
The general idea is:
use numpy to generate many uint8's in one step
(there might be something better using internal functions without the randint-API)
unpack uint8's to 8 bits
uniformity follows from randint's uniformity guarantees
Again, this is only a demo:
for one specific case
not caring about different result-types of these functions
not caring about -1 vs. 0 (might be important in your use-case)
(not even optimal compared to much more low-level approaches; MT used internally can be used as a bit-source, which does not need fp-math, like many other PRNGs!)
My code is as
vals = array("i", [-1, 1])
def my_rnd():
return vals[randint(0, 7) % 2]

efficient loop over numpy array

Versions of this question have already been asked but I have not found a satisfactory answer.
Problem: given a large numpy vector, find indices of the vector elements which are duplicated (a variation of that could be comparison with tolerance).
So the problem is ~O(N^2) and memory bound (at least from the current algorithm point of view). I wonder why whatever I tried Python is 100x or more slower than an equivalent C code.
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = []
print("init done")
counter = 0
for i in range(N):
for j in range(i+1, N):
if vect[i] == vect[j]:
dupl.append(j)
counter += 1
print("counter =", counter)
print(dupl)
# For simplicity, this code ignores repeated indices
# which can be trimmed later. Ref output is
# counter = 3
# [2500, 5000, 5000]
I tried using numpy iterators but they are even worse (~ x4-5)
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
Using N=10,000 I'm getting 0.1 sec in C, 12 sec in Python (code above), 40 sec in Python using np.nditer, 50 sec in Python using np.ndindex. I pushed it to N=160,000 and the timing scales as N^2 as expected.
Since the answers have stopped coming and none was totally satisfactory, for the record I post my own solution.
It is my understanding that it's the assignment which makes Python slow in this case, not the nested loops as I thought initially. Using a library or compiled code eliminates the need for assignments and performance improves dramatically.
from __future__ import print_function
import numpy as np
from numba import jit
N = 10000
vect = np.arange(N, dtype=np.float32)
vect[N/2] = 1
vect[N/4] = 1
dupl = np.zeros(N, dtype=np.int32)
print("init done")
# uncomment to enable compiled function
##jit
def duplicates(i, counter, dupl, vect):
eps = 0.01
ns = len(vect)
for j in range(i+1, ns):
# replace if to use approx comparison
#if abs(vect[i] - vect[j]) < eps:
if vect[i] == vect[j]:
dupl[counter] = j
counter += 1
return counter
counter = 0
for i in xrange(N):
counter = duplicates(i, counter, dupl, vect)
print("counter =", counter)
print(dupl[0:counter])
Tests
# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 10.135 s
# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 0.480 s
The performance of compiled version (with #jit uncommented) is close to C code performance ~0.1 - 0.2 sec. Perhaps eliminating the last loop could improve the performance even further. The difference in performance is even stronger when using approximate comparison using eps while there is very little difference for the compiled version.
# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 109.218 s
# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]
elapsed 0.506 s
This is ~ 200x difference. In the real code, I had to put both loops in the function as well as use a function template with variable types so it was a bit more complex but not very much.
Python itself is a highly-dynamic, slow, language. The idea in numpy is to use vectorization, and avoid explicit loops. In this case, you can use np.equal.outer. You can start with
a = np.equal.outer(vect, vect)
Now, for example, to find the sum:
>>> np.sum(a)
10006
To find the indices of i that are equal, you can do
np.fill_diagonal(a, 0)
>>> np.nonzero(np.any(a, axis=0))[0]
array([ 1, 2500, 5000])
Timing
def find_vec():
a = np.equal.outer(vect, vect)
s = np.sum(a)
np.fill_diagonal(a, 0)
return np.sum(a), np.nonzero(np.any(a, axis=0))[0]
>>> %timeit find_vec()
1 loops, best of 3: 214 ms per loop
def find_loop():
dupl = []
counter = 0
for i in range(N):
for j in range(i+1, N):
if vect[i] == vect[j]:
dupl.append(j)
counter += 1
return dupl
>>> % timeit find_loop()
1 loops, best of 3: 8.51 s per loop
This solution using the numpy_indexed package has complexity n Log n, and is fully vectorized; so not terribly different from C performance, in all likelihood.
import numpy_indexed as npi
dpl = np.flatnonzero(npi.multiplicity(vect) > 1)
The obvious question is why you want to do this in this way. NumPy arrays are intended to be opaque data structures – by this I mean NumPy arrays are intended to be created inside the NumPy system and then operations sent in to the NumPy subsystem to deliver a result. i.e. NumPy should be a black box into which you throw requests and out come results.
So given the code above I am not at all suprised that NumPy performance is worse than dreadful.
The following should be effectively what you want, I believe, but done the NumPy way:
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
print([np.where(a == vect)[0] for a in vect][1])
# Delivers [1, 2500, 5000]
Approach #1
You can simulate that iterator dependency criteria for a vectorized solution using a triangular matrix. This is based on this post that dealt with multiplication involving iterator dependency. For performing the elementwise equality of each element in vect against its all elements, we can use NumPy broadcasting. Finally, we can use np.count_nonzero to get the count, as it's supposed to be very efficient in summing purposes on boolean arrays.
So, we would have a solution like so -
mask = np.triu(vect[:,None] == vect,1)
counter = np.count_nonzero(mask)
dupl = np.where(mask)[1]
If you only care about the count counter, we could have two more approaches as listed next.
Approach #2
We can avoid the use of the triangular matrix and simply get the entire count and just subtract the contribution from diagonal elements and consider just one of either lower of upper triangular regions by just halving the remaining count as the contributions from either ones would be identical.
So, we would have a modified solution like so -
counter = (np.count_nonzero(vect[:,None] == vect) - vect.size)//2
Approach #3
Here's an entirely different approach that uses the fact the count of each unique element plays a cumsumed contribution to the final total.
So, with that idea in mind, we would have a third approach like so -
count = np.bincount(vect) # OR np.unique(vect,return_counts=True)[1]
idx = count[count>1]
id_arr = np.ones(idx.sum(),dtype=int)
id_arr[0] = 0
id_arr[idx[:-1].cumsum()] = -idx[:-1]+1
counter = np.sum(id_arr.cumsum())
As an alternative to Ami Tavory's answer, you can use a Counter from the collections package to detect duplicates. On my computer it seems to be even faster. See the function below which can also find different duplicates.
import collections
import numpy as np
def find_duplicates_original(x):
d = []
for i in range(len(x)):
for j in range(i + 1, len(x)):
if x[i] == x[j]:
d.append(j)
return d
def find_duplicates_outer(x):
a = np.equal.outer(x, x)
np.fill_diagonal(a, 0)
return np.flatnonzero(np.any(a, axis=0))
def find_duplicates_counter(x):
counter = collections.Counter(x)
values = (v for v, c in counter.items() if c > 1)
return {v: np.flatnonzero(x == v) for v in values}
n = 10000
x = np.arange(float(n))
x[n // 2] = 1
x[n // 4] = 1
>>>> find_duplicates_counter(x)
{1.0: array([ 1, 2500, 5000], dtype=int64)}
>>>> %timeit find_duplicates_original(x)
1 loop, best of 3: 12 s per loop
>>>> %timeit find_duplicates_outer(x)
10 loops, best of 3: 84.3 ms per loop
>>>> %timeit find_duplicates_counter(x)
1000 loops, best of 3: 1.63 ms per loop
This runs in 8 ms compared to 18 s for your code and doesn't use any strange libraries. It's similar to the approach by #vs0, but I like defaultdict more. It should be approximately O(N).
from collections import defaultdict
dupl = []
counter = 0
indexes = defaultdict(list)
for i, e in enumerate(vect):
indexes[e].append(i)
if len(indexes[e]) > 1:
dupl.append(i)
counter += 1
I wonder why whatever I tried Python is 100x or more slower than an equivalent C code.
Because Python programs are usually 100x slower than C programs.
You can either implement critical code paths in C and provide Python-C bindings, or change the algorithm. You can write an O(N) version by using a dict that reverses the array from value to index.
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = {}
print("init done")
counter = 0
for i in range(N):
e = dupl.get(vect[i], None)
if e is None:
dupl[vect[i]] = [i]
else:
e.append(i)
counter += 1
print("counter =", counter)
print([(k, v) for k, v in dupl.items() if len(v) > 1])
Edit:
If you need to test against an eps with abs(vect[i] - vect[j]) < eps you can then normalize the values up to eps
abs(vect[i] - vect[j]) < eps ->
abs(vect[i] - vect[j]) / eps < (eps / eps) ->
abs(vect[i]/eps - vect[j]/eps) < 1
int(abs(vect[i]/eps - vect[j]/eps)) = 0
Like this:
import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = {}
print("init done")
counter = 0
eps = 0.01
for i in range(N):
k = int(vect[i] / eps)
e = dupl.get(k, None)
if e is None:
dupl[k] = [i]
else:
e.append(i)
counter += 1
print("counter =", counter)
print([(k, v) for k, v in dupl.items() if len(v) > 1])

Numpy gcd function

Does numpy have a gcd function somewhere in its structure of modules?
I'm aware of fractions.gcd but thought a numpy equivalent maybe potentially quicker and work better with numpy datatypes.
I have been unable to uncover anything on google other than this link which seems out of date and I don't know how I would access the _gcd function it suggests exists.
Naively trying:
np.gcd
np.euclid
hasn't worked for me...
You can write it yourself:
def numpy_gcd(a, b):
a, b = np.broadcast_arrays(a, b)
a = a.copy()
b = b.copy()
pos = np.nonzero(b)[0]
while len(pos) > 0:
b2 = b[pos]
a[pos], b[pos] = b2, a[pos] % b2
pos = pos[b[pos]!=0]
return a
Here is the code to test the result and speed:
In [181]:
n = 2000
a = np.random.randint(100, 1000, n)
b = np.random.randint(1, 100, n)
al = a.tolist()
bl = b.tolist()
cl = zip(al, bl)
from fractions import gcd
g1 = numpy_gcd(a, b)
g2 = [gcd(x, y) for x, y in cl]
print np.all(g1 == g2)
True
In [182]:
%timeit numpy_gcd(a, b)
1000 loops, best of 3: 721 us per loop
In [183]:
%timeit [gcd(x, y) for x, y in cl]
1000 loops, best of 3: 1.64 ms per loop
Public service announcement for anyone using Python 3.5
from math import gcd
gcd(2, 4)
And if you want to write it yourself in a one-liner:
def gcd(a: int, b: int): return gcd(b, a % b) if b else a
It seems there is no gcd function yet in numpy. However, there is a gcd function in fractions module. If you need to perform gcd on numpy arrays, you could build a ufunc using it:
gcd = numpy.frompyfunc(fractions.gcd, 2, 1)
The functions gcd (Greatest Common Divisor) and lcm (Lowest Common Multiple) have been added to numpy in version 1.15.
You can use them both "as is" on a pair of scalars
import numpy as np
np.gcd(-5, 10) # yields '5'
or on a list or array using .reduce:
np.gcd.reduce(np.array([-5, 10, 0, 5])) # yields '5'
In case the desired result is not an element-wise gcd but rather the gcd of all numbers in the array, you may use the code below.
import numpy as np
from math import gcd as mathgcd
def numpy_set_gcd(a):
a = np.unique(a)
if not a.dtype == np.int or a[0] <= 0:
raise ValueError("Argument must be an array of positive " +
"integers.")
gcd = a[0]
for i in a[1:]:
gcd = mathgcd(i, gcd)
if gcd == 1:
return 1
return gcd
Depending on the use case, it can be faster to omit the sorting step a = np.unique(a).
An alternative (maybe more elegant but slower) implementation using ufuncs is
import numpy as np
from math import gcd as mathgcd
npmathgcd = np.frompyfunc(mathgcd, 2, 1)
def numpy_set_gcd2(a):
a = np.unique(a)
if not a.dtype == np.int or a[0] <= 0:
raise ValueError("Argument must be an array of positive " +
"integers.")
npmathgcd.at(a[1:], np.arange(a.size-1), a[:-1])
return a[-1]

How can the Euclidean distance be calculated with NumPy?

I have two points in 3D space:
a = (ax, ay, az)
b = (bx, by, bz)
I want to calculate the distance between them:
dist = sqrt((ax-bx)^2 + (ay-by)^2 + (az-bz)^2)
How do I do this with NumPy? I have:
import numpy
a = numpy.array((ax, ay, az))
b = numpy.array((bx, by, bz))
Use numpy.linalg.norm:
dist = numpy.linalg.norm(a-b)
This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.
For more theory, see Introduction to Data Mining:
Use scipy.spatial.distance.euclidean:
from scipy.spatial import distance
a = (1, 2, 3)
b = (4, 5, 6)
dst = distance.euclidean(a, b)
For anyone interested in computing multiple distances at once, I've done a little comparison using perfplot (a small project of mine).
The first advice is to organize your data such that the arrays have dimension (3, n) (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn't matter too much if you use sqrt-sum with axis=0, linalg.norm with axis=0, or
a_min_b = a - b
numpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))
which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)
The variants where you sum up over the second axis, axis=1, are all substantially slower.
Code to reproduce the plot:
import numpy
import perfplot
from scipy.spatial import distance
def linalg_norm(data):
a, b = data[0]
return numpy.linalg.norm(a - b, axis=1)
def linalg_norm_T(data):
a, b = data[1]
return numpy.linalg.norm(a - b, axis=0)
def sqrt_sum(data):
a, b = data[0]
return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))
def sqrt_sum_T(data):
a, b = data[1]
return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))
def scipy_distance(data):
a, b = data[0]
return list(map(distance.euclidean, a, b))
def sqrt_einsum(data):
a, b = data[0]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))
def sqrt_einsum_T(data):
a, b = data[1]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))
def setup(n):
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
out0 = numpy.array([a, b])
out1 = numpy.array([a.T, b.T])
return out0, out1
b = perfplot.bench(
setup=setup,
n_range=[2 ** k for k in range(22)],
kernels=[
linalg_norm,
linalg_norm_T,
scipy_distance,
sqrt_sum,
sqrt_sum_T,
sqrt_einsum,
sqrt_einsum_T,
],
xlabel="len(x), len(y)",
)
b.save("norm.png")
I want to expound on the simple answer with various performance notes. np.linalg.norm will do perhaps more than you need:
dist = numpy.linalg.norm(a-b)
Firstly - this function is designed to work over a list and return all of the values, e.g. to compare the distance from pA to the set of points sP:
sP = set(points)
pA = point
distances = np.linalg.norm(sP - pA, ord=2, axis=1.) # 'distances' is a list
Remember several things:
Python function calls are expensive.
[Regular] Python doesn't cache name lookups.
So
def distance(pointA, pointB):
dist = np.linalg.norm(pointA - pointB)
return dist
isn't as innocent as it looks.
>>> dis.dis(distance)
2 0 LOAD_GLOBAL 0 (np)
2 LOAD_ATTR 1 (linalg)
4 LOAD_ATTR 2 (norm)
6 LOAD_FAST 0 (pointA)
8 LOAD_FAST 1 (pointB)
10 BINARY_SUBTRACT
12 CALL_FUNCTION 1
14 STORE_FAST 2 (dist)
3 16 LOAD_FAST 2 (dist)
18 RETURN_VALUE
Firstly - every time we call it, we have to do a global lookup for "np", a scoped lookup for "linalg" and a scoped lookup for "norm", and the overhead of merely calling the function can equate to dozens of python instructions.
Lastly, we wasted two operations on to store the result and reload it for return...
First pass at improvement: make the lookup faster, skip the store
def distance(pointA, pointB, _norm=np.linalg.norm):
return _norm(pointA - pointB)
We get the far more streamlined:
>>> dis.dis(distance)
2 0 LOAD_FAST 2 (_norm)
2 LOAD_FAST 0 (pointA)
4 LOAD_FAST 1 (pointB)
6 BINARY_SUBTRACT
8 CALL_FUNCTION 1
10 RETURN_VALUE
The function call overhead still amounts to some work, though. And you'll want to do benchmarks to determine whether you might be better doing the math yourself:
def distance(pointA, pointB):
return (
((pointA.x - pointB.x) ** 2) +
((pointA.y - pointB.y) ** 2) +
((pointA.z - pointB.z) ** 2)
) ** 0.5 # fast sqrt
On some platforms, **0.5 is faster than math.sqrt. Your mileage may vary.
**** Advanced performance notes.
Why are you calculating distance? If the sole purpose is to display it,
print("The target is %.2fm away" % (distance(a, b)))
move along. But if you're comparing distances, doing range checks, etc., I'd like to add some useful performance observations.
Let’s take two cases: sorting by distance or culling a list to items that meet a range constraint.
# Ultra naive implementations. Hold onto your hat.
def sort_things_by_distance(origin, things):
return things.sort(key=lambda thing: distance(origin, thing))
def in_range(origin, range, things):
things_in_range = []
for thing in things:
if distance(origin, thing) <= range:
things_in_range.append(thing)
The first thing we need to remember is that we are using Pythagoras to calculate the distance (dist = sqrt(x^2 + y^2 + z^2)) so we're making a lot of sqrt calls. Math 101:
dist = root ( x^2 + y^2 + z^2 )
:.
dist^2 = x^2 + y^2 + z^2
and
sq(N) < sq(M) iff M > N
and
sq(N) > sq(M) iff N > M
and
sq(N) = sq(M) iff N == M
In short: until we actually require the distance in a unit of X rather than X^2, we can eliminate the hardest part of the calculations.
# Still naive, but much faster.
def distance_sq(left, right):
""" Returns the square of the distance between left and right. """
return (
((left.x - right.x) ** 2) +
((left.y - right.y) ** 2) +
((left.z - right.z) ** 2)
)
def sort_things_by_distance(origin, things):
return things.sort(key=lambda thing: distance_sq(origin, thing))
def in_range(origin, range, things):
things_in_range = []
# Remember that sqrt(N)**2 == N, so if we square
# range, we don't need to root the distances.
range_sq = range**2
for thing in things:
if distance_sq(origin, thing) <= range_sq:
things_in_range.append(thing)
Great, both functions no-longer do any expensive square roots. That'll be much faster, but before you go further, check yourself: why did sort_things_by_distance need a "naive" disclaimer both times above? Answer at the very bottom (*a1).
We can improve in_range by converting it to a generator:
def in_range(origin, range, things):
range_sq = range**2
yield from (thing for thing in things
if distance_sq(origin, thing) <= range_sq)
This especially has benefits if you are doing something like:
if any(in_range(origin, max_dist, things)):
...
But if the very next thing you are going to do requires a distance,
for nearby in in_range(origin, walking_distance, hotdog_stands):
print("%s %.2fm" % (nearby.name, distance(origin, nearby)))
consider yielding tuples:
def in_range_with_dist_sq(origin, range, things):
range_sq = range**2
for thing in things:
dist_sq = distance_sq(origin, thing)
if dist_sq <= range_sq: yield (thing, dist_sq)
This can be especially useful if you might chain range checks ('find things that are near X and within Nm of Y', since you don't have to calculate the distance again).
But what about if we're searching a really large list of things and we anticipate a lot of them not being worth consideration?
There is actually a very simple optimization:
def in_range_all_the_things(origin, range, things):
range_sq = range**2
for thing in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
Whether this is useful will depend on the size of 'things'.
def in_range_all_the_things(origin, range, things):
range_sq = range**2
if len(things) >= 4096:
for thing in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
elif len(things) > 32:
for things in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2 + (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
else:
... just calculate distance and range-check it ...
And again, consider yielding the dist_sq. Our hotdog example then becomes:
# Chaining generators
info = in_range_with_dist_sq(origin, walking_distance, hotdog_stands)
info = (stand, dist_sq**0.5 for stand, dist_sq in info)
for stand, dist in info:
print("%s %.2fm" % (stand, dist))
(*a1: sort_things_by_distance's sort key calls distance_sq for every single item, and that innocent looking key is a lambda, which is a second function that has to be invoked...)
Another instance of this problem solving method:
def dist(x,y):
return numpy.sqrt(numpy.sum((x-y)**2))
a = numpy.array((xa,ya,za))
b = numpy.array((xb,yb,zb))
dist_a_b = dist(a,b)
Starting Python 3.8, the math module directly provides the dist function, which returns the euclidean distance between two points (given as tuples or lists of coordinates):
from math import dist
dist((1, 2, 6), (-2, 3, 2)) # 5.0990195135927845
And if you're working with lists:
dist([1, 2, 6], [-2, 3, 2]) # 5.0990195135927845
It can be done like the following. I don't know how fast it is, but it's not using NumPy.
from math import sqrt
a = (1, 2, 3) # Data point 1
b = (4, 5, 6) # Data point 2
print sqrt(sum( (a - b)**2 for a, b in zip(a, b)))
A nice one-liner:
dist = numpy.linalg.norm(a-b)
However, if speed is a concern I would recommend experimenting on your machine. I've found that using math library's sqrt with the ** operator for the square is much faster on my machine than the one-liner NumPy solution.
I ran my tests using this simple program:
#!/usr/bin/python
import math
import numpy
from random import uniform
def fastest_calc_dist(p1,p2):
return math.sqrt((p2[0] - p1[0]) ** 2 +
(p2[1] - p1[1]) ** 2 +
(p2[2] - p1[2]) ** 2)
def math_calc_dist(p1,p2):
return math.sqrt(math.pow((p2[0] - p1[0]), 2) +
math.pow((p2[1] - p1[1]), 2) +
math.pow((p2[2] - p1[2]), 2))
def numpy_calc_dist(p1,p2):
return numpy.linalg.norm(numpy.array(p1)-numpy.array(p2))
TOTAL_LOCATIONS = 1000
p1 = dict()
p2 = dict()
for i in range(0, TOTAL_LOCATIONS):
p1[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
p2[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
total_dist = 0
for i in range(0, TOTAL_LOCATIONS):
for j in range(0, TOTAL_LOCATIONS):
dist = fastest_calc_dist(p1[i], p2[j]) #change this line for testing
total_dist += dist
print total_dist
On my machine, math_calc_dist runs much faster than numpy_calc_dist: 1.5 seconds versus 23.5 seconds.
To get a measurable difference between fastest_calc_dist and math_calc_dist I had to up TOTAL_LOCATIONS to 6000. Then fastest_calc_dist takes ~50 seconds while math_calc_dist takes ~60 seconds.
You can also experiment with numpy.sqrt and numpy.square though both were slower than the math alternatives on my machine.
My tests were run with Python 2.6.6.
I find a 'dist' function in matplotlib.mlab, but I don't think it's handy enough.
I'm posting it here just for reference.
import numpy as np
import matplotlib as plt
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
# Distance between a and b
dis = plt.mlab.dist(a, b)
You can just subtract the vectors and then innerproduct.
Following your example,
a = numpy.array((xa, ya, za))
b = numpy.array((xb, yb, zb))
tmp = a - b
sum_squared = numpy.dot(tmp.T, tmp)
result = numpy.sqrt(sum_squared)
I like np.dot (dot product):
a = numpy.array((xa,ya,za))
b = numpy.array((xb,yb,zb))
distance = (np.dot(a-b,a-b))**.5
With Python 3.8, it's very easy.
https://docs.python.org/3/library/math.html#math.dist
math.dist(p, q)
Return the Euclidean distance between two points p and q, each given
as a sequence (or iterable) of coordinates. The two points must have
the same dimension.
Roughly equivalent to:
sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))
Having a and b as you defined them, you can use also:
distance = np.sqrt(np.sum((a-b)**2))
Since Python 3.8
Since Python 3.8 the math module includes the function math.dist().
See here https://docs.python.org/3.8/library/math.html#math.dist.
math.dist(p1, p2)
Return the Euclidean distance between two points p1 and p2,
each given as a sequence (or iterable) of coordinates.
import math
print( math.dist( (0,0), (1,1) )) # sqrt(2) -> 1.4142
print( math.dist( (0,0,0), (1,1,1) )) # sqrt(3) -> 1.7321
Here's some concise code for Euclidean distance in Python given two points represented as lists in Python.
def distance(v1,v2):
return sum([(x-y)**2 for (x,y) in zip(v1,v2)])**(0.5)
import math
dist = math.hypot(math.hypot(xa-xb, ya-yb), za-zb)
Calculate the Euclidean distance for multidimensional space:
import math
x = [1, 2, 6]
y = [-2, 3, 2]
dist = math.sqrt(sum([(xi-yi)**2 for xi,yi in zip(x, y)]))
5.0990195135927845
import numpy as np
from scipy.spatial import distance
input_arr = np.array([[0,3,0],[2,0,0],[0,1,3],[0,1,2],[-1,0,1],[1,1,1]])
test_case = np.array([0,0,0])
dst=[]
for i in range(0,6):
temp = distance.euclidean(test_case,input_arr[i])
dst.append(temp)
print(dst)
You can easily use the formula
distance = np.sqrt(np.sum(np.square(a-b)))
which does actually nothing more than using Pythagoras' theorem to calculate the distance, by adding the squares of Δx, Δy and Δz and rooting the result.
import numpy as np
# any two python array as two points
a = [0, 0]
b = [3, 4]
You first change list to numpy array and do like this: print(np.linalg.norm(np.array(a) - np.array(b))). Second method directly from python list as: print(np.linalg.norm(np.subtract(a,b)))
The other answers work for floating point numbers, but do not correctly compute the distance for integer dtypes which are subject to overflow and underflow. Note that even scipy.distance.euclidean has this issue:
>>> a1 = np.array([1], dtype='uint8')
>>> a2 = np.array([2], dtype='uint8')
>>> a1 - a2
array([255], dtype=uint8)
>>> np.linalg.norm(a1 - a2)
255.0
>>> from scipy.spatial import distance
>>> distance.euclidean(a1, a2)
255.0
This is common, since many image libraries represent an image as an ndarray with dtype="uint8". This means that if you have a greyscale image which consists of very dark grey pixels (say all the pixels have color #000001) and you're diffing it against black image (#000000), you can end up with x-y consisting of 255 in all cells, which registers as the two images being very far apart from each other. For unsigned integer types (e.g. uint8), you can safely compute the distance in numpy as:
np.linalg.norm(np.maximum(x, y) - np.minimum(x, y))
For signed integer types, you can cast to a float first:
np.linalg.norm(x.astype("float") - y.astype("float"))
For image data specifically, you can use opencv's norm method:
import cv2
cv2.norm(x, y, cv2.NORM_L2)
Find difference of two matrices first. Then, apply element wise multiplication with numpy's multiply command. After then, find summation of the element wise multiplied new matrix. Finally, find square root of the summation.
def findEuclideanDistance(a, b):
euclidean_distance = a - b
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
return euclidean_distance
What's the best way to do this with NumPy, or with Python in general? I have:
Well best way would be safest and also the fastest
I would suggest hypot usage for reliable results for chances of underflow and overflow are very little compared to writing own sqroot calculator
Lets see math.hypot, np.hypot vs vanilla np.sqrt(np.sum((np.array([i, j, k])) ** 2, axis=1))
i, j, k = 1e+200, 1e+200, 1e+200
math.hypot(i, j, k)
# 1.7320508075688773e+200
np.sqrt(np.sum((np.array([i, j, k])) ** 2))
# RuntimeWarning: overflow encountered in square
Speed wise math.hypot look better
%%timeit
math.hypot(i, j, k)
# 100 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%%timeit
np.sqrt(np.sum((np.array([i, j, k])) ** 2))
# 6.41 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Underflow
i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0
Overflow
i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf
No Underflow
i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200
No Overflow
i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200
Refer
The fastest solution I could come up with for large number of distances is using numexpr. On my machine it is faster than using numpy einsum:
import numexpr as ne
import numpy as np
np.sqrt(ne.evaluate("sum((a_min_b)**2,axis=1)"))
If you want something more explicit you can easily write the formula like this:
np.sqrt(np.sum((a-b)**2))
Even with arrays of 10_000_000 elements this still runs at 0.1s on my machine.

Categories