Numerical Python - how do I make this a ufunc? - python

new to NumPy and may not be searching properly, so I'll take the lumps if this is a common question...
I'm working on a problem where I need to calculate log(n!) for relatively large numbers - ie. to large to calculate the factorial first, so I've written the following function:
def log_fact(n):
x = 0
for i in range(1,n+1):
x += log(i)
return x
Now the problem is that I want to use this as part of the function passed to curve_fit:
def logfactfunc(x, a, b, c):
return a*log_fact(x) + b*x + c
from scipy.optimize import curve_fit
curve_fit(logfactfunc, x, y)
However, this produces the following error:
File "./fit2.py", line 16, in log_fact
for i in range(1,n+1):
TypeError: only length-1 arrays can be converted to Python scalars
A little searching suggested numpy.frompyfunc() to convert this to a ufunc
curve_fit(np.frompyfunc(logfactfunc, 1, 1), data[k].step, data[k].sieve)
TypeError: <ufunc 'logfactfunc (vectorized)'> is not a Python function
Tried this as well:
def logfactfunc(x, a, b, c):
return a*np.frompyfunc(log_fact, 1, 1)(x) + b*x + c
File "./fit2.py", line 30, in logfactfunc
return a*np.frompyfunc(log_fact, 1, 1)(x) + b*x + c
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.float64
Any ideas on how I can get my log_fact() function to be used within the curve_fit() function??
Thanks!

Your log_fact function is closely related to the gammaln function, which is defined as a ufunc in scipy.special. Specifically, log_fact(n) == scipy.special.gammaln(n+1). For even modest values of n, this is significantly faster:
In [15]: %timeit log_fact(19)
10000 loops, best of 3: 24.4 us per loop
In [16]: %timeit scipy.special.gammaln(20)
1000000 loops, best of 3: 1.13 us per loop
Also, the running time of gammaln is independent of n, unlike log_fact.

Your log_fact function is being called with arrays as input parameters, which is what's throwing off your method. A possible way of vectorizing your code would be the following:
def log_fact(n):
n = np.asarray(n)
m = np.max(n)
return np.take(np.cumsum(np.log(np.arange(1, m+1))), n-1)
Taking it for a test ride:
>>> log_fact(3)
1.791759469228055
>>> log_fact([10, 15, 23])
array([ 15.10441257, 27.89927138, 51.60667557])
>>> log_fact([[10, 15, 23], [14, 15, 8]])
array([[ 15.10441257, 27.89927138, 51.60667557],
[ 25.19122118, 27.89927138, 10.6046029 ]])
The only caveat with this approach is that it stores an array as long as the largest value you call it with. If your n gets into the billions, you'll probably break it. Other than that, it actually avoids repeated calculations if you call it with many values.

If n really is large (say larger than about 10 or so) then a much better approach is to using Stirling's approximation. This will be much more efficient. It will also be easy to vectorize.
For the approach you are taking your log_fact(n) function can be written much more efficiently and compactly as
def log_fact(n) :
return np.sum(np.log(np.arange(1,n+1)))
This does not help with your problem though. We could vectorize this as #Isaac shows or just use np.vectorize() which is a convenience wrapper that does basically the same thing. Note that it does not offer speed advantages, you are still using Python loops which are slow.
That being said, use Stirling's approximation!

As far as I can tell creating a ufunc is fairly involved, and my require writing your function in c. See here for the documentation on creating ufuncs.
You might instead consider just writing a version of your function that takes and returns ndarrays. For instance:
def logfact_arr(a):
return np.array([log_fact(x) for x in a.flat]).reshape(a.shape)

The previous answers show efficient ways to solve your problem. But the precise answer to your question, i.e., how to vectorize log_fact function is to use np.vectorize:
vlog_fact=np.vectorize(log_fact)
def vlogfactfunc(x, a, b, c):
return a*vlog_fact(x) + b*x + c
With that, you can call curve_fit(vlogfactfunc, np.array([1,2,3]), np.array([ -1. , 4.465 , 11.958]))
As you suggested, you could also use np.frompyfunc but as you can read in its documentation, that always returns python objects, as then curve_fit complains:
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
A workaround is to covert the returned array to an array of floats:
ulog_fact = np.frompyfunc(log_fact, 1,1)
def ulogfactfunc(x, a, b, c):
return a*ulog_fact(x).astype(np.float) + b*x + c
So you can also call curve_fit with ulogfactfunc
Hope this helps!

Related

How to make two arrays contiguous so that Numba can speed up np.dot()

I have the following code:
import numpy as np
from numba import jit
Nx = 15
Ny = 1000
v = np.ones((Nx,Ny))
v = np.reshape(v,(Nx*Ny))
A = np.random.rand(Nx*Ny,Nx*Ny,5)
B = np.random.rand(Nx*Ny,Nx*Ny,5)
C = np.random.rand(Nx*Ny,5)
#jit(nopython=True)
def dotplus(B, v, C):
return np.dot(B, v) + C
k = 2
D = dotplus(B[:,:,k], v, C[:,k])
I get the following warning, which I guess refers to arrays B[:,:,k] and v:
NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 2d, A), array(float64, 1d, C))
return np.dot(B, v0) + C
Is there a way to make the two arrays contiguous, so that Numba can speed up the code?
PS in case you're wondering about the meaning of k, note this is just a MRE. In the actual code, dotplus is called multiple times inside a for loop for different values of k (so, different slices of B and C). The for loop updates the values of v, but B and C don't change.
Flawr is correct. B[..., k] returns a np.view() into B, but does not actually copy any data. In memory, two neighbouring elements of the view have a distance of B.strides[1], which evaluates to B.shape[-1]*B.itemsize and is greater than B.itemsize. Consequentially, your array is not contiguous.
The best optimization is to vectorize the dotplus loop and write
D = np.tensordot(B, v, axes=(1, 0)) + C
The second best optimization is to refactor and let the batch dimension be the first dimension of the array. This can be done on top of the above vectorization and is generally advisable. It would look something like
A = np.random.rand(5, Nx*Ny,Nx*Ny)
# rather than
A = np.random.rand(Nx*Ny,Nx*Ny,5)
If you can't refactor the code, you need to start profiling. You can easily swap axes temporarily via
B = np.moveaxis(B, -1, 0)
some_op(B[k, ...], ...)
B = np.moveaxis(B, 0, -1)
Contrary to max9111's comment, this will not net you anything compared to np.ascontiguousarray() because the data has to be copied in both cases. That said, a copy is O(Nx*Ny*k) + buffer allocation. Direct matrix-vector multiplication is O(Nx*Ny), but you have to gather the elements first, which is really expensive. It comes down to your specific architecture and concrete problem, so profiling is the way to go.

When a function which has one parameter receives np.array

Suppose there is a function:
def test(x):
return x**2
When I give a list of ints to the function, an error is raised:
TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
But an array of ints instead, the function returns an array of outputs.
How is this possible?
It's important to understand that operators are functions too:
Writing a**b is like writing pow(a, b)
Functions can't guess what the expected behavior is when you give them different inputs, so behind the scenes, pow(a, b) has different implementations for different inputs (i.e. for two integers, return the first in the power of the second. For an array of integers, return an array where each cell has the corresponding cell in the input array in the power of the second integer)
whoever implemented the numpy array created a ** implementation for it, but an ordinary list doesn't have a ** implementation.
If you want to raise a list to a power, use list comprehension:
[xi ** 2 for xi in x]
You can also write your own class and implement ** for it.
I don't see why it would be impossible. Although at a very high level Python lists and Numpy arrays may appear to be the same (i.e. a sequence of numbers), they are implemented in different ways. Numpy is particularly known for its array operations (where an operation can be applied to each of an array's elements in one go).
Here's another example where you can see their differences in action:
a = [1, 2, 3, 4, 5]
print(np.array(a) * 5)
print(a * 5)
You can use this instead for lists:
x = [k**2 for k in x]
return x
The function you wrote works fine for Numpy array but not for lists.
Use the above line to avoid that error.

Is there a way to get every element of a list without using loops?

I found this task in a book of my prof:
def f(x):
return f = log(exp(z))
def problem(M: List)
return np.array([f(x) for x in M])
How do I implement a solution?
Numpy is all about performing operations on entire arrays. Your professor is expecting you to use that functionality.
Start by converting your list M into array z:
z = np.array(M)
Now you can do elementwise operations like exp and log:
e = np.exp(z)
f = 1 + e
g = np.log(f)
The functions np.exp and np.log are applied to each element of an array. If the input is not an array, it will be converted into one.
Operations like 1 + e work on an entire array as well, in this case using the magic of broadcasting. Since 1 is a scalar, it can unambiguously expanded to the same shape as e, and added as if by np.add.
Normally, the sequence of operations can be compactified into a single line, similarly to what you did in your initial attempt. You can reduce the number of operations slightly by using np.log1p:
def f(x):
return np.log1p(np.exp(x))
Notice that I did not convert x to an array first since np.exp will do that for you.
A fundamental problem with this naive approach is that np.exp will overflow for values that we would expect to get reasonable results. This can be solved using the technique in this answer:
def f(x):
return np.log1p(np.exp(-np.abs(x))) + np.maximum(x, 0)

NumPy tensordot MemoryError

I have two matrices -- A is 3033x3033, and X is 3033x20. I am running the following lines (as suggested in the answer to another question I asked):
n, d = X.shape
c = X.reshape(n, -1, d) - X.reshape(-1, n, d)
return np.tensordot(A.reshape(n, n, -1) * c, c, axes=[(0,1),(0,1)])
On the final line, Python simply stops and says "MemoryError". How can I get around this, either by changing some setting in Python or performing this operation in a more memory-efficient way?
Here is a function that does the calculation without any for loops and without any large temporary array. See the related question for a longer answer, complete with a test script.
def fbest(A, X):
""
KA_best = np.tensordot(A.sum(1)[:,None] * X, X, axes=[(0,), (0,)])
KA_best += np.tensordot(A.sum(0)[:,None] * X, X, axes=[(0,), (0,)])
KA_best -= np.tensordot(np.dot(A, X), X, axes=[(0,), (0,)])
KA_best -= np.tensordot(X, np.dot(A, X), axes=[(0,), (0,)])
return KA_best
I profiled the code with your size arrays:
I love sp.einsum by the way. It is a great place to start when speeding up array operations by removing for loops. You can do SOOOO much with one call to sp.einsum.
The advantage of np.tensordot is that it links to whatever fast numerical library you have installed (i.e. MKL). So, tensordot will run faster and in parallel when you have the right libraries installed.
If you replace the final line with
return np.einsum('ij,ijk,ijl->kl',A,c,c)
you avoid creating the A.reshape(n, n, -1) * c (3301 by 3301 by 20) intermediate that I think is your main problem.
My impression is that the version I give is probably slower (for cases where it doesn't run out of memory), but I haven't rigourously timed it.
It's possible you could go further and avoid creating c, but I can't immediately see how to do it. It'd be a case of following writing the whole thing in terms of sums of matrix indicies and seeing what it simplified to.
You can employ a two-nested loop format iterating along the last dimension of X. Now, that last dimension is 20, so hopefully it would still be efficient enough and more importantly leave minimum memory footprint. Here's the implementation -
n, d = X.shape
c = X.reshape(n, -1, d) - X.reshape(-1, n, d)
out = np.empty((d,d)) # d is a small number: 20
for i in range(d):
for j in range(d):
out[i,j] = (A*c[:,:,i]*(c[:,:,j])).sum()
return out
You can replace the last line with np.einsum -
out[i,j] = np.einsum('ij->',A*c[:,:,i]*c[:,:,j])

python numpy - optimize chisq function by removing explicit python loop?

I'm trying to evaluate a chi squared function, i.e. compare an arbitrary (blackbox) function to a numpy vector array of data. At the moment I'm looping over the array in python but something like this is very slow:
n=len(array)
sigma=1.0
chisq=0.0
for i in range(n):
data = array[i]
model = f(i,a,b,c)
chisq += 0.5*((data-model)/sigma)**2.0
return chisq
array is a 1-d numpy array and a,b,c are scalars. Is there a way to speed this up by using numpy.sum() or some sort of lambda function etc.? I can see how to remove one loop (over chisq) like this:
numpy.sum(((array-model_vec)/sigma)**2.0)
but then I still need to explicitly populate the array model_vec, which will presumably be just as slow; how do I do that without an explicit loop like this:
model_vec=numpy.zeros(len(data))
for i in range(n):
model_vec[i] = f(i,a,b,c)
return numpy.sum(((array-model_vec)/sigma)**2.0)
?
Thanks!
You can use np.vectorize to 'vectorize' your function f if you don't have control over its definition:
g = np.vectorize(f)
But this is not as good as vectorizing the function yourself manually to support arrays, as it doesn't really do much more than internalize the loop, and it might not work well with certain functions. In fact, from the documentation:
Notes The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
You should instead focus on making f accept a vector instead of i:
def f(i, a, b, x):
return a*x[i] + b
def g(a, b, x):
x = np.asarray(x)
return a*x + b
Then, instead of calling f(i, a, b, x), call g(a,b,x)[i] if you only want the ith, but for operations on the entire function, use g(a, b, x) and it will be much faster.
model_vec = g(a, b, x)
return numpy.sum(((array-model_vec)/sigma)**2.0)
It seems that your code is slow because what is executing in the loop is slow (your model generation). Turning this into a one-liner won't speed things up. If you have access to a modern computer with more than on CPU you could try to run this loop in parallel - for example using the multiprocessing module;
from multiprocessing import Pool
if __name__ == '__main__':
# snip set up code
pool = Pool(processes=4) # start 4 worker processes
inputs = [(i,a,b,c) for i in range(n)]
model_array = pool.map(model, inputs)
for i in range(n):
data = array[i]
model = model_array[i]
chisq += 0.5*((data-model)/sigma)**2.0

Categories