NumPy array to bounded by 0 and 1? - python

Basically I have an array that may vary between any two numbers, and I want to preserve the distribution while constraining it to the [0,1] space. The function to do this is very very simple. I usually write it as:
def to01(array):
array -= array.min()
array /= array.max()
return array
Of course it can and should be more complex to account for tons of situations, such as all the values being the same (divide by zero) and float vs. integer division (use np.subtract and np.divide instead of operators). But this is the most basic.
The problem is that I do this very frequently across stuff in my project, and it seems like a fairly standard mathematical operation. Is there a built in function that does this in NumPy?

Don't know if there's a builtin for that (probably not, it's not really a difficult thing to do as is). You can use vectorize to apply a function to all the elements of the array:
def to01(array):
a = array.min()
# ignore the Runtime Warning
with numpy.errstate(divide='ignore'):
b = 1. /(array.max() - array.min())
if not(numpy.isfinite(b)):
b = 0
return numpy.vectorize(lambda x: b * (x - a))(array)

Related

Assert array almost equal zero

I'm writing unit tests for my simulation and want to check that for specific parameters the result, a numpy array, is zero. Due to calculation inaccuracies, small values are also accepted (1e-7). What is the best way to assert this array is close to 0 in all places?
np.testing.assert_array_almost_equal(a, np.zeros(a.shape)) and assert_allclose fail as the relative tolerance is inf (or 1 if you switch the arguments) Docu
I feel like np.testing.assert_array_almost_equal_nulp(a, np.zeros(a.shape)) is not precise enough as it compares the difference to the spacing, therefore it's always true for nulps >= 1 and false otherways but does not say anything about the amplitude of a Docu
Use of np.testing.assert_(np.all(np.absolute(a) < 1e-7)) based on this question does not give any of the detailed output, I am used to by other np.testing methods
Is there another way to test this? Maybe another testing package?
If you compare a numpy array with all zeros, you can use the absolute tolerance, as the relative tolerance does not make sense here:
from numpy.testing import assert_allclose
def test_zero_array():
a = np.array([0, 1e-07, 1e-08])
assert_allclose(a, 0, atol=1e-07)
The rtol value does not matter in this case, as it is multiplied with 0 if calculating the tolerance:
atol + rtol * abs(desired)
Update: Replaced np.zeros_like(a) with the simpler scalar 0. As pointed out by #hintze, np array comparisons also work against scalars.

Logarithm over x

Since the following expansion for the logarithm holds:
log(1-x)=-x-x^2/2-x^3/3-...
one can calculate the following functions which have removable singularities at x:
log(1-x)/x=-1-x/2-...
(log(1-x)/x+1)/x=-1/2-x/3-...
((log(1-x)/x+1)/x+1/2)/x=-1/3-x/4-...
I am trying to use NumPy for these calculations, and specifically the log1p function, which is accurate near x=0. However, convergence for the aforementioned functions is still problematic.
Do you have any ideas for any existing functions implementing these formulas or should I write one myself using the previous expansions, which will not be as efficient, however?
The simplest thing to do is something like
In [17]: def logf(x, eps=1e-6):
...: if abs(x) < eps:
...: return -0.5 - x/3.
...: else:
...: return (1. + log1p(-x)/x)/x
and play a bit with the threshold eps.
If you want a numpy-like, vectorized solution, replace an if with a np.where
>>> np.where(x > eps, 1. + log1p(-x)/x) / x, -0.5 - x/3.)
Why not successively take the Square of the candidate, after initially extracting the exponent component? When the square results in a number greater than 2, divide by two, and set the bit in the mantissa of your result that corresponds to the iteration. This is a much quicker and simpler way of determining log base 2, which can then in a single multiplication, be transformed to the e or 10 base.
Some predefined functions don't work at singularity points. One simple-minded solution is to compute the series by adding terms from a peculiar sequence.
For your example, the sequence would be :
sum = 0
for i in range(n):
sum+= x^k/k
sum = -sum
for log(1-x)
Then you keep adding a lot of terms or until the last term is under a small threshold.

Integer optimization/maximization in numpy

I need to estimate the size of a population, by finding the value of n which maximises scipy.misc.comb(n, a)/n**b where a and b are constants. n, a and b are all integers.
Obviously, I could just have a loop in range(SOME_HUGE_NUMBER), calculate the value for each n and break out of the loop once I reach an inflexion in the curve. But I wondered if there was an elegant way of doing this with (say) numpy/scipy, or is there some other elegant way of doing this just in pure Python (e.g. like an integer equivalent of Newton's method?)
As long as your number n is reasonably small (smaller than approx. 1500), my guess for the fastest way to do this is to actually try all possible values. You can do this quickly by using numpy:
import numpy as np
import scipy.misc as misc
nMax = 1000
a = 77
b = 100
n = np.arange(1, nMax+1, dtype=np.float64)
val = misc.comb(n, a)/n**b
print("Maximized for n={:d}".format(int(n[val.argmax()]+0.5)))
# Maximized for n=181
This is not especially elegant but rather fast for that range of n. Problem is that for n>1484 the numerator can already get too large to be stored in a float. This method will then fail, as you will run into overflows. But this is not only a problem of numpy.ndarray not working with python integers. Even with them, you would not be able to compute:
misc.comb(10000, 1000, exact=True)/10000**1001
as you want to have a float result in your division of two numbers larger than the maximum a float in python can hold (max_10_exp = 1024 on my system. See sys.float_info().). You couldn't use your range in that case, as well. If you really want to do something like that, you will have to take more care numerically.
You essentially have a nicely smooth function of n that you want to maximise. n is required to be integral but we can consider the function instead to be a function of the reals. In this case, the maximising integral value of n must be close to (next to) the maximising real value.
We could convert comb to a real function by using the gamma function and use numerical optimisation techniques to find the maximum. Another approach is to replace the factorials with Stirling's approximation. This gives a moderately complicated but tractable algebraic expression. This expression is not hard to differentiate and set to zero to find the extrema.
I did this and obtained
n * (b + (n-a) * log((n-a)/n) ) = a * b - a/2
This is not straightforward to solve algebraically but easy enough numerically (e.g. using Newton's method, as you suggest).
I may have made a mistake in the algebra, but I typed the a = 77, b = 100 example into Wolfram Alpha and got 180.58 so the approach seems to work.

Allowing for deviations in exact values during matrix multiplication, python

I need to solve this:
Check if AT * n * A = n, where A is the test matrix, AT is the transposed test matrix and n = [[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,-1]].
I don't know how to check for equality due to the numerical errors in the float multiplication. How do I go about doing this?
Current code:
def trans(A):
n = numpy.matrix([[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,-1]])
c = numpy.matrix.transpose(A) * n * numpy.matrix(A)
Have then tried
>if c == n:
return True
I have also tried assigning variables to every element of matrix and then checking that each variable is within certain limits.
Typically, the way that numerical-precision limitations are overcome is by allowing for some epsilon (or error-value) between the actual value and expected value that is still considered 'equal'. For example, I might say that some value a is equal to some value b if they are within plus/minus 0.01. This would be implemented in python as:
def float_equals(a, b, epsilon):
return abs(a-b)<epsilon
Of course, for matrixes entered as lists, this isn't quite so simple. We have to check if all values are within the epsilon to their partner. One example solution would be as follows, assuming your matrices are standard python lists:
from itertools import product # need this to generate indexes
def matrix_float_equals(A, B, epsilon):
return all(abs(A[i][j]-B[i][j])<epsilon for i,j in product(xrange(len(A)), repeat = 2))
all returns True iff all values in a list are True (list-wise and). product effectively dot-products two lists, with the repeat keyword allowing easy duplicate lists. Therefore given a range repeated twice, it will produce a list of tuples for each index. Of course, this method of index generation assumes square, equally-sized matrices. For non-square matrices you have to get more creative, but the idea is the same.
However, as is typically the way in python, there are libraries that do this kind of thing for you. Numpy's allclose does exactly this; compares two numpy arrays for equality element-wise within some tolerance. If you're working with matrices in python for numeric analysis, numpy is really the way to go, I would get familiar with its basic API.
If a and b are numpy arrays or matrices of the same shape, then you can use allclose:
if numpy.allclose(a, b): # a is approximately equal to b
# do something ...
This checks that for all i and all j, |aij - bij| < εa for some absolute error εa (by default 10-5) and that |aij - bij| < |bij| εr for some relative error εr (by default 10-8). Thus it is safe to use, even if your calculations introduce numerical errors.

Sum of Square Differences (SSD) in numpy/scipy

I'm trying to use Python and Numpy/Scipy to implement an image processing algorithm. The profiler tells me a lot of time is being spent in the following function (called often), which tells me the sum of square differences between two images
def ssd(A,B):
s = 0
for i in range(3):
s += sum(pow(A[:,:,i] - B[:,:,i],2))
return s
How can I speed this up? Thanks.
Just
s = numpy.sum((A[:,:,0:3]-B[:,:,0:3])**2)
(which I expect is likely just sum((A-B)**2) if the shape is always (,,3))
You can also use the sum method: ((A-B)**2).sum()
Right?
Just to mention that one can also use np.dot:
def ssd(A,B):
dif = A.ravel() - B.ravel()
return np.dot( dif, dif )
This might be a bit faster and possibly more accurate than alternatives using np.sum and **2, but doesn't work if you want to compute ssd along a specified axis. In that case, there might be a magical subscript formula using np.einsum.
I am confused why you are taking i in range(3). Is that supposed to be the whole array, or just part?
Overall, you can replace most of this with operations defined in numpy:
def ssd(A,B):
squares = (A[:,:,:3] - B[:,:,:3]) ** 2
return numpy.sum(squares)
This way you can do one operation instead of three and using numpy.sum may be able to optimize the addition better than the builtin sum.
Further to Ritsaert Hornstra's answer that got 2 negative marks (admittedly I didn't see it in it's original form...)
This is actually true.
For a large number of iterations it can often take twice as long to use the '**' operator or the pow(x,y) method as to just manually multiply the pairs together. If necessary use the math.fabs() method if it's throwing out NaN's (which it sometimes does especially when using int16s etc.), and it still only takes approximately half the time of the two functions given.
Not that important to the original question I know, but definitely worth knowing.
I do not know if the pow() function with power 2 will be fast. Try:
def ssd(A,B):
s = 0
for i in range(3):
s += sum((A[:,:,i] - B[:,:,i])*(A[:,:,i] - B[:,:,I]))
return s
You can try this one:
dist_sq = np.sum((A[:, np.newaxis, :] - B[np.newaxis, :, :]) ** 2, axis=-1)
More details can be found here (the 'k-Nearest Neighbors' example):
https://jakevdp.github.io/PythonDataScienceHandbook/02.08-sorting.html
In Ruby language you can achieve this in this way
def diff_btw_sum_of_squars_and_squar_of_sum(from=1,to=100) # use default values from 1..100.
((1..100).inject(:+)**2) -(1..100).map {|num| num ** 2}.inject(:+)
end
diff_btw_sum_of_squars_and_squar_of_sum #call for above method

Categories