scipy curve_fitting uses xdata array wrong

scipy curve_fitting uses xdata array wrong - python

When I try call curve_data
Like so:
curve_fit(func, xdata=np.arange(50), ydata=some_array)
It calls func using all xdata at once ( the whole array) instead of e.g. the first element of the xadata (xdata[0])
What is happening.
Cheers.

For people who happen to come across this: Scipy uses broadcasting, meaning if you use a math function in the function your are trying to fit, it also has to work with broadcasting, which basically means it internally loops over input vectors using c which is more efficient than looping in python. https://numpy.org/doc/stable/user/basics.broadcasting.html
For me the problem was, that I used the math library which does not support this.

Related

Timeit showing that regular python is faster than numpy?

I'm working on a piece of code for a game that calculates the distances between all the objects on the screen using their in-game coordinate positions. Originally I was going to use basic Python and lists to do this, but since the number of distances that will need calculated will increase exponentially with the number of objects, I thought it might be faster to do it with numpy.
I'm not very familiar with numpy, and I've been experimenting on basic bits of code with it. I wrote a little bit of code to time how long it takes for the same function to complete a calculation in numpy and in regular Python, and numpy seems to consistently take a good bit more time than the regular python.
The function is very simple. It starts with 1.1 and then increments 200,000 times, adding 0.1 to the last value and then finding the square root of the new value. It's not what I'll actually be doing in the game code, which will involve finding total distance vectors from position coordinates; it's just a quick test I threw together. I already read here that the initialization of arrays takes more time in NumPy, so I moved the initializations of both the numpy and python arrays outside their functions, but Python is still faster than numpy.
Here is the bit of code:
#!/usr/bin/python3
import numpy
from timeit import timeit
#from time import process_time as timer
import math
thing = numpy.array([1.1,0.0], dtype='float')
thing2 = [1.1,0.0]
def NPFunc():
for x in range(1,200000):
thing[0] += 0.1
thing[1] = numpy.sqrt(thing[0])
print(thing)
return None
def PyFunc():
for x in range(1,200000):
thing2[0] += 0.1
thing2[1] = math.sqrt(thing2[0])
print(thing2)
return None
print(timeit(NPFunc, number=1))
print(timeit(PyFunc, number=1))
It gives this result, which indicates normal Python is 3x faster:
[ 20000.99999999 141.42489173]
0.2917748889885843
[20000.99999998944, 141.42489172698504]
0.10341173503547907
Am I doing something wrong, is is this calculation just so simple that it isn't a good test for numpy?

Am I doing something wrong, is is this calculation just so simple that it isn't a good test for NumPy?
It's not really that the calculation is simple, but that you're not taking any advantage of NumPy.
The main benefit of NumPy is vectorization: you can apply an operation to every element of an array in one go, and whatever looping is needed happens inside some tightly-optimized C (or Fortran or C++ or whatever) loop inside NumPy, rather than in a slow generic Python iteration.
But you're only accessing a single value, so there's no looping to be done in C.
On top of that, because the values in an array are stored as "native" values, NumPy functions don't need to unbox them, pulling the raw C double out of a Python float, and then re-box them in a new Python float, the way any Python math functions have to.
But you're not doing that either. In fact, you're doubling that work: You're pulling the value out o the array as a float (boxing it), then passing it to a function (which has to unbox it, and then rebox it to return a result), then storing it back in an array (unboxing it again).
And meanwhile, because np.sqrt is designed to work on arrays, it has to first check the type of what you're passing it and decide whether it needs to loop over an array or unbox and rebox a single value or whatever, while math.sqrt just takes a single value. When you call np.sqrt on an array of 200000 elements, the added cost of that type switch is negligible, but when you're doing it every time through the inner loop, that's a different story.
So, it's not an unfair test.
You've demonstrated that using NumPy to pull out values one at a time, act on them one at a time, and store them back in the array one at a time is slower than just not using NumPy.
But, if you compare it to actually taking advantage of NumPy—e.g., by creating an array of 200000 floats and then calling np.sqrt on that array vs. looping over it and calling math.sqrt on each one—you'll demonstrate that using NumPy the way it was intended is faster than not using it.

you're comparing it wrong
a_list = np.arange(0,20000,0.1)
timeit(lambda:np.sqrt(a_list),number=1)

python one line for loop over a function returning a tuple

I've tried searching quite a lot on this one, but being relatively new to python I feel I am missing the required terminology to find what I'm looking for.
I have a function:
def my_function(x,y):
# code...
return(a,b,c)
Where x and y are numpy arrays of length 2000 and the return values are integers. I'm looking for a shorthand (one-liner) to loop over this function as such:
Output = [my_function(X[i],Y[i]) for i in range(len(Y))]
Where X and Y are of the shape (135,2000). However, after running this I am currently having to do the following to separate out 'Output' into three numpy arrays.
Output = np.asarray(Output)
a = Output.T[0]
b = Output.T[1]
c = Output.T[2]
Which I feel isn't the best practice. I have tried:
(a,b,c) = [my_function(X[i],Y[i]) for i in range(len(Y))]
But this doesn't seem to work. Does anyone know a quick way around my problem?

my_function(X[i], Y[i]) for i in range(len(Y))
On the verge of crossing the "opinion-based" border, ...Y[i]... for i in range(len(Y)) is usually a big no-no in Python. It is even a bigger no-no when working with numpy arrays. One of the advantages of working with numpy is the 'vectorization' that it provides, and thus pushing the for loop down to the C level rather than the (slower) Python level.
So, if you rewrite my_function so it can handle the arrays in a vectorized fashion using the multiple tools and methods that numpy provides, you may not even need that "one-liner" you are looking for.

How to use arrays/vectors in a Python user-defined function?

I'm building a function to calculate the Reliability of a given component/subsystem. For this, I wrote the following in a script:
import math as m
import numpy as np
def Reliability (MTBF,time):
failure_param = pow(MTBF,-1)
R = m.exp(-failure_param*time)
return R
The function works just fine for any time values I call in the function. Now I wanna call the function to calculate the Reliability for a given array, let's say np.linspace(0,24,25). But then I get errors like "Type error: only length-1 arrays can be converted to Python scalars".
Anyone that could help me being able to pass arrays/vectors on a Python function like that?
Thank you very much in advance.

The math.exp() function you are using knows nothing about numpy. It expects either a scalar, or an iterable with only one element, which it can treat as a scalar. Use the numpy.exp() instead, which accepts numpy arrays.

To be able to work with numpy arrays you need to use numpy functions:
import numpy as np
def Reliability (MTBF,time):
return np.exp(-(MTBF ** -1) * time)

If possible you should always use numpy functions instead of math functions, when working with numpy objects.
They do not only work directly on numpy objects like arrays and matrices, but they are highly optimized, i.e using vectorization features of the CPU (like SSE). Most functions like exp/sin/cos/pow are available in the numpy module. Some more advanced functions can be found in scipy.

Rather than call Reliability on the vector, use list comprehension to call it on each element:
[Reliability(MTBF, test_time) for test_time in np.linspace(0,24,25)]
Or:
map(Reliability, zip([MTBF]*25, linspace(0,24,25))
The second one produces a generator object which may be better for performance if the size of your list starts getting huge.

How to perform an operation on every element in a numpy matrix?

Say I have a function foo() that takes in a single float and returns a single float. What's the fastest/most pythonic way to apply this function to every element in a numpy matrix or array?
What I essentially need is a version of this code that doesn't use a loop:
import numpy as np
big_matrix = np.matrix(np.ones((1000, 1000)))
for i in xrange(np.shape(big_matrix)[0]):
for j in xrange(np.shape(big_matrix)[1]):
big_matrix[i, j] = foo(big_matrix[i, j])
I was trying to find something in the numpy documentation that will allow me to do this but I haven't found anything.
Edit: As I mentioned in the comments, specifically the function I need to work with is the sigmoid function, f(z) = 1 / (1 + exp(-z)).

If foo is really a black box that takes a scalar, and returns a scalar, then you must use some sort of iteration. People often try np.vectorize and realize that, as documented, it does not speed things up much. It is most valuable as a way of broadcasting several inputs. It uses np.frompyfunc, which is slightly faster, but with a less convenient interface.
The proper numpy way is to change your function so it works with arrays. That shouldn't be hard to do with the function in your comments
f(z) = 1 / (1 + exp(-z))
There's a np.exp function. The rest is simple math.

Substitute numpy functions with Python only

I have a python function that employs the numpy package. It uses numpy.sort and numpy.array functions as shown below:
def function(group):
pre_data = np.sort(np.array(
[c["data"] for c in group[1]],
dtype = np.float64
))
How can I re-write the sort and array functions using only Python in such a way that I no longer need the numpy package?

It really depends on the code after this. pre_data will be a numpy.ndarray which means that it has array methods which will be really hard to replicate without numpy. If those methods are being called later in the code, you're going to have a hard time and I'd advise you to just bite the bullet and install numpy. It's popularity is a testament to it's usefulness...
However, if you really just want to sort a list of floats and put it into a sequence-like container:
def function(group):
pre_data = sorted(float(c['data']) for c in group[1])
should do the trick.

Well, it's not strictly possible because the return type is an ndarray. If you don't mind to use a list instead, try this:
pre_data = sorted(float(c["data"]) for c in group[1])

That's not actually using any useful numpy functions anyway
def function(group):
pre_data = sorted(float(c["data"]) for c in group[1])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.