A question from a complete Python novice.
I have a column array where I need to force certain values to zero depending on a conditional statement applied to another array. I have found two solutions, which both provide the correct answer. But they are both quite time consuming for the larger arrays I typically need (>1E6 elements) - also I suspect that it is poor programming technique. The two versions are:
from numpy import zeros,abs,multiply,array,reshape
def testA(y, f, FC1, FC2):
c = zeros((len(f),1))
for n in xrange(len(f)):
if abs(f[n,0]) >= FC1 and abs(f[n,0]) <= FC2:
c[n,0] = 1.
w = multiply(c,y)
return w
def testB(y, f, FC1, FC2):
z = [(abs(f[n,0])>=FC1 and abs(f[n,0])<=FC2) for n in xrange(len(f))]
z = multiply(array(z,dtype=float).reshape(len(f),1), y)
return z
The input arrays are column arrays as this matches the post processing to be done. The test can be done like:
>>> from numpy.random import normal as randn
>>> fs, N = 1.E3, 2**22
>>> f = fs/N*arange(N).reshape((N,1))
>>> x = randn(size=(N,1))
>>> w1 = testA(x,f,200.,550.)
>>> z1 = testB(x,f,200.,550.)
On my laptop testA takes 18.7 seconds and testB takes 19.3 - both for N=2**22. In testB I also tried to include "z = [None]*len(f)" to preallocate as suggested in another thread but this doesn't really make any difference.
I have two questions, which I hope to have the same answer:
What is the "correct" Python solution to this problem?
Is there anything I can do to get the answer faster?
I have deliberately not used any time at all using compiled Python for example - I wanted to have some working code first. Hopefully also something, which is good Python style. I hope to be able to get the execution time for N=2**22 below two seconds or so. This particular operation will be used many times so the execution time does matter.
I apologize in advance if the question is stupid - I haven't been able to find an answer in the overwhelming amount of not always easily accessible Python documentation or in another thread.
use bool array to access elements in array y:
def testC(y, f, FC1, FC2):
f2 = abs(f)
idx = (f2>=FC1) & (f2<=FC2)
y[~idx] = 0
return y
All of these are slower than HYRY solution by a large factor:
How about
( x[1] if FC1<=abs(x[0])<=FC2 else 0 for x in itertools.izip(f,x) )
If you need to do random access (very slow)
[ x[1] if FC1<=abs(x[0])<=FC2 else 0 for x in itertools.izip(f,x) ]
or you can also use map
map(lambda x: x[1] if FC1<=abs(x[0])<=FC2 else 0 , itertools,izip(f,x))
or using vectorize (faster than A and B but much much slower than C)
b1v = np.vectorize(lambda a,b: a if 200<=abs(b)<=550 else 0)
b1 = b1v(f,x)
Related
In numpy if I want to compare two arrays, say for example I want to test if all elements in A are less than values in B, I use if (A < B).all():. But in practice this requires allocation and evaluation of complete array C = A < B and then calling C.all() on it. This is a bit of waste. Is there any way to 'shortcut' the comparison, i.e. directly evaluate A < B element by element (without allocation and calculation of temporary C) and stop and return False when first invalid element comparison is found?
Plain Python and and or use shortcut evaluation, but numpy does not.
(A < B).all()
uses numpy building blocks, the broadcasting, the element by element comparison with < and the all reduction. The < works just other binary operations, plus, times, and, or, gt, le, etc. And all is like other reduction methods, any, max, sum, mean, and can operate on the whole array or by rows or by columns.
It is possible to write a function that combines the all and < into one iteration, but it would be difficult to get the generality that I just described.
But if you must implement an iterative solution, with a shortcut action, and do it fast, I'd suggest developing the idea with nditer, and then compile it with cython.
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html is a good tutorial on using nditer, and it takes you through using it in cython. nditer takes care of broadcasting and iteration, letting you concentrate on the comparison and any shortcutting.
Here's a sketch of an iterator that could be cast into cython:
import numpy as np
a = np.arange(4)[:,None]
b = np.arange(2,5)[None,:]
c = np.array(True)
it = np.nditer([a, b, c], flags=['reduce_ok'],
op_flags = [['readonly'], ['readonly'],['readwrite']])
for x, y, z in it:
z[...] = x<y
if not z:
print('>',x,y)
break
else:
print(x,y)
print(z)
with a sample run:
1420:~/mypy$ python stack34852272.py
(array(0), array(2))
(array(0), array(3))
(array(0), array(4))
(array(1), array(2))
(array(1), array(3))
(array(1), array(4))
('>', array(2), array(2))
False
Start with a default False, and a different break condition and you get a shortcutting any. Generalizing the test to handle <, <=, etc will be more work.
Get something like this working in Python, and then try it in Cython. If you have trouble with that step, come back with a new question. SO has a good base of Cython users.
How large are you arrays? I would imagine they are very large, e.g. A.shape = (1000000) or larger before performance becomes an issue. Would you consider using numpy views?
Instead of comparing (A < B).all() or (A < B).any() you can try defining a view, such as (A[:10] < B[:10]).all(). Here's a simple loop that might work:
k = 0
while( (A[k*10: (k+1)*10] < B[k*10: (k+1)*10] ).all() ):
k += 1
Instead of 10 you can use 100 or 10**3 segment size you wish. Obviously if your segment size is 1, you are saying:
k = 0
while ( A[k] < B[k] ):
k+= 1
Sometimes, comparing the entire array can become memory intensive. If A and B have length of 10000 and I need to compare each pair of elements, I am going to run out of space.
I am calculating a trend line slope using numpy:
xs = []
ys = []
my_x = 0
for i in range(2000):
my_x += 1
ys.append(5*my_x+random.rand())
xs.append(my_x)
A = matrix(xs).T;
b = matrix(ys).T;
N = A.T*A
U = A.T*b
print N,U
a = (N.I*U)[0,0]
print a
The result I get is a=-8.2053307679 instead of the expected 5. Probably it happends beacuse the number in variable N is too big.
How to overcome this problem ? any help will be appreciated.
When I run the code, the answer is as you would expect:
[[2668667000]] [[ 1.33443472e+10]]
5.00037927592
It's probably due to the fact that you're on a 32-bit system, and I'm on a 64-bit system. Instead, you can use
A = matrix(xs, dtype='float64').T;
b = matrix(ys, dtype='float64').T;
Just FYI, when using numpy you'll be much more efficient if you work on vectorizing your algorithms. For example, you could replace the first several lines with this:
xs = np.arange(2000)
ys = 5 * xs + np.random.rand(2000)
Edit – one more thing: numerically, it is a bad idea to explicitly invert matrices when doing computations like these. It would be better to use something like a = np.linalg.solve(N, U)[0, 0] in your algorithm. It won't make a big difference here, but if you move to more complicated problems it definitely will! For some discussion this, take a look at this article.
:) The problem solved by using:
A = matrix(xs,float64).T;
b = matrix(ys,float64).T;
So, in my previous question wflynny gave me a really neat solution (Surface where height is a function of two functions, and a sum over the third). I've got that part working for my simple version, but now I'm trying to improve on this.
Consider the following lambda function:
x = np.arange(0,100, 0.1)
y = np.sin(y);
f = lambda xx: (xx-y[x=xx])**2
values = f(x)
Now, in this scenario it works. In fact, the [x=xx] is trivial in the example. However, the example can be extended:
x = np.arange(0,100, 0.1)
z = np.sin(y);
f = lambda xx, yy: ( (xx-z[x=xx])**2 + yy**2)**0.5
y = np.arange(0,100,0.1)
[xgrid, ygrid] = np.meshgrid(x,y);
values = f(xgrid,ygrid)
In this case, the error ValueError: boolean index array should have 1 dimension is generated. This is because z.shape is different from xgrid.shape, I think.
Note that here, y=np.sin(y) is a simplification. It's not a function but an array of arbitrary values. We really need to go to that array to retrieve them.
I do not know what the proper way to implement this is. I am going to try some things, but I hope that somebody here will give me hints or provide me with the proper way to do this in Python.
EDIT: I originally thought I had solved it by using the following:
retrieve = lambda pp: map(lambda pp: dataArray[pp==phiArray][0], phi)
However, this merely returns the dataArray. Suppose dataArray contains a number of 'maximum' values for the polar radius. Then, you would normally incorporate this by saying something like g = lambda xx, yy: f(xx,yy) * Heaviside( dataArray - radius(xx,yy)). Then g would properly be zero if the radius is too large.
However, this doesn't work. I'm not fully sure but the behaviour seems to be something like taking a single value of dataArray instead of the entire array.
Thanks!
EDIT: Sadly, this stuff has to work and I can't spend more time on making it nice. Therefore, I've opted for the dirty implementation. The actual thing I was interested in would be of the sort as the g = lambda xx, yy written above, so I can implement that directly (dirty) instead of nicely (without nested for loops).
def envelope(xx, yy):
value = xx * 0.
for i in range(0,N): #N is defined somewhere, and xx.shape = (N,N)
for j in range(0,N):
if ( dataArray[x=xx[i,j]][0] > radius(xx[i,j],yy[i,j])):
value[i,j] = 1.
else:
value[i,j] = 0.
return value
A last resort, but it works. And, sometimes results matter over writing good code, especially when there's a deadline coming up (and you are the only one that cares about good code).
I would still be very much interested in learning how to do this properly, if there is a proper way, and thus increase my fluency in clean Python.
I'm working with some very large arrays. An issue that I'm dealing with of course is running out of RAM to work with, but even before that my code is running slowly so that, even if I had infinite RAM, it would still take way too long. I'll give a bit of my code to show what I'm trying to do:
#samplez is a 3 million element 1-D array
#zfit is a 10,000 x 500 2-D array
b = np.arange((len(zfit))
for x in samplez:
a = x-zfit
mask = np.ma.masked_array(a)
mask[a <= 0] = np.ma.masked
index = mask.argmin(axis=1)
# These past 4 lines give me an index array of the smallest positive number
# in x - zift
d = zfit[b,index]
e = zfit[b,index+1]
f = (x-d)/(e-d)
# f is the calculation I am after
if x == samplez[0]:
g = f
index_stack = index
else:
g = np.vstack((g,f))
index_stack = np.vstack((index_stack,index))
I need to use g and index_stack, each of which are 3million x 10,000 2-D arrays, in a further calculation. Each iteration of this loop takes almost 1 second, so 3 million seconds total, which is way too long.
Is there anything I can do so that this calculation will run much faster? I've tried to think how I can do without this for loop, but the only way I can imagine is making 3 million copies of zfit, which is unfeasible.
And is there someway I can work with these arrays by not keeping everything in RAM? I'm a beginner and everything I've searched about this is either irrelevant or something I can't understand. Thanks in advance.
It is good to know that the smallest positive number will never show up in the end of rows.
In samplez there are 1 million unique values but in zfit, each row can only have 500 unique values at most. The entire zfit can have as much as 50 million unique values. The algorithm can be greatly sped up, if the number of 'finding the smallest positive number > each_element_in_samplez' calculation can be greatly reduced. Doing all 5e13 comparisons are probably an overkill and careful planing will be able to get rid of a large proportion of it. That will heavy depend on your actual underlying mathematics.
Without knowing it, there are still some small things can be done. 1, there are not so many of possible (e-d) so that can be taken out of the loop. 2, The loop can be eliminated by map. These two small fix, on my machine, result in about 22% speed-up.
def function_map(samplez, zfit):
diff=zfit[:,:-1]-zfit[:,1:]
def _fuc1(x):
a = x-zfit
mask = np.ma.masked_array(a)
mask[a <= 0] = np.ma.masked
index = mask.argmin(axis=1)
d = zfit[:,index]
f = (x-d)/diff[:,index] #constrain: smallest value never at the very end.
return (index, f)
result=map(_fuc1, samplez)
return (np.array([item[1] for item in result]),
np.array([item[0] for item in result]))
Next: masked_array can be avoided completely (which should bring significant improvement). samplez needs to be sorted as well.
>>> x1=arange(50)
>>> x2=random.random(size=(20, 10))*120
>>> x2=sort(x2, axis=1) #just to make sure the last elements of each col > largest val in x1
>>> x3=x2*1
>>> f1=lambda: function_map2(x1,x3)
>>> f0=lambda: function_map(x1, x2)
>>> def function_map2(samplez, zfit):
_diff=diff(zfit, axis=1)
_zfit=zfit*1
def _fuc1(x):
_zfit[_zfit<x]=(+inf)
index = nanargmin(zfit, axis=1)
d = zfit[:,index]
f = (x-d)/_diff[:,index] #constrain: smallest value never at the very end.
return (index, f)
result=map(_fuc1, samplez)
return (np.array([item[1] for item in result]),
np.array([item[0] for item in result]))
>>> import timeit
>>> t1=timeit.Timer('f1()', 'from __main__ import f1')
>>> t0=timeit.Timer('f0()', 'from __main__ import f0')
>>> t0.timeit(5)
0.09083795547485352
>>> t1.timeit(5)
0.05301499366760254
>>> t0.timeit(50)
0.8838210105895996
>>> t1.timeit(50)
0.5063929557800293
>>> t0.timeit(500)
8.900799036026001
>>> t1.timeit(500)
4.614129018783569
So, that is another 50% speed-up.
masked_array is avoided and that saves some RAM. Can't think of anything else to reduce RAM usage. It may be necessary to process samplez in parts. And also, dependents on the data and the required precision, if you can use float16 or float32 instead of the default float64 that can save you a lot of RAM.
I have a convolution integral of the type:
To solve this integral numerically, I would like to use numpy.convolve(). Now, as you can see in the online help, the convolution is formally done from -infinity to +infinity meaning that the arrays are moved along each other completely for evaluation - which is not what I need. I obviously need to be sure to pick the correct part of the convolution - can you confirm that this is the right way to do it or alternatively tell me how to do it right and (maybe even more important) why?
res = np.convolve(J_t, dF, mode="full")[:len(dF)]
J_t is an analytical function and I can evaluate as many points as I need, dF are derivatives of measurement data. for this attempt I choose len(J_t) = len(dF) because from my understanding I do not need more.
Thank you for your thoughts, as always, I appreciate your help!
Background information (for those who might be interested)
These type of integrals can be used to evaluate viscoelastic behaviour of bodies (or the response of an electric circuit during change of voltage, if you feel more familiar on this topic). For viscoelasticity, J(t) is the creep compliance function and F(t) can be the deviatoric strains over time, then this integral would yield the deviatoric stresses.
If you now e.g. have a J(t) of the form:
J_t = lambda p, t: p[0] + p[1]*N.exp(-t/p[2])
with p = [J_elastic, J_viscous, tau] this would be the "famous" standard linear solid. The integral limits are the start of the measurement t_0 = 0 and the moment of interest, t.
To get it right, I have chosen the following two functions:
a(t) = t
b(t) = t**2
It is easy to do the math and find that their "convolution" as defined in your case, takes
on the values:
c(t) = t**4 / 12
So lets try them out:
>>> delta = 0.001
>>> t = np.arange(1000) * delta
>>> a = t
>>> b = t**2
>>> c = np.convolve(a, b) * delta
>>> d = t**4 / 12
>>> plt.plot(np.arange(len(c)) * delta, c)
[<matplotlib.lines.Line2D object at 0x00000000025C37B8>]
>>> plt.plot(t[::50], d[::50], 'o')
[<matplotlib.lines.Line2D object at 0x000000000637AB38>]
>>> plt.show()
So by doing the above, if both your a and b have n elements, you get the right convolution values in the first n elements of c.
Not sure if the following explanation will make any sense, but here it goes... If you think of convolution as mirroring one of the functions along the y-axis, then sliding it along the x axis and computing the integral of the product at each point, it is easy to see how, since outside of the area of definition numpy takes them as if padded with zeros, you are effectively setting an integration interval from 0 to t, since the first function is zero below zero, and the second is zero above t, since it originally was zero below zero, but has been mirrored and moved t to the right.
I was tackling this same problem and solved it using a highly inefficient but functionally correct algorithm:
def Jfunk(inz,t):
c0 = inz[0]
c1 = inz[1]
c2 = inz[2]
J = c0 - c1*np.exp(-t/c2)
return J
def SLS_funk(inz, t, dl_dt):
boltz_int = np.empty(shape=(0,))
for i,v in enumerate(t, start=1):
t_int = t[0:i]
Jarg = v - t[0:i]
J_int = Jfunk(inz,Jarg)
dl_dt_int = dl_dt[0:i]
inter_grand = np.multiply(J_int, dl_dt_int)
boltz_int = np.append(boltz_int, simps (inter_grand, x=t_int) )
return boltz_int
Thanks to this question and its answers, I was able to implement a much better solution based on the numpy convolution function suggested above. In case the OP was curious I did a time comparison of the two methods.
For an SLS (three parameter J function) with 20,000 time points:
Using Numpy convolution: ~0.1 seconds
Using Brute Force method: ~7.2 seconds
If if helps to get a feeling for the alignment, try convolving a pair of impulses. With matplotlib (using ipython --pylab):
In [1]: a = numpy.zeros(20)
In [2]: b = numpy.zeros(20)
In [3]: a[0] = 1
In [4]: b[0] = 1
In [5]: c = numpy.convolve(a, b, mode='full')
In [6]: plot(c)
You can see from the resultant plot that the first sample in c corresponds to the first position of overlap. In this case, only the first samples of a and b overlap. All the rest are floating in undefined space. numpy.convolve effectively replaces this undefined space with zeros, which you can see if you set a second non-zero value:
In [9]: b[1] = 1
In [10]: plot(numpy.convolve(a, b, mode='full'))
In this case, the first value of the plot is 1, as before (showing that the second value of b is not contributing at all).
I have been struggling with similar question for past 2 days.
The OP may have moved on, but I am still presenting my analysis here.
Following two sources helped me:
Discussion on stackoverflow
These notes
I will consider time-series data defined on the same time series starting from time .
Let the two series be A and B.
Their (continuous) convolution is
Substituting with in the above equation we get what np.convolve(A,B) returns:
What you want is
Again making the same substitution, we get
which is same as above because A for negative indices is extrapolated to zero and for i > (j + m) B[j - i + m] is zero.
If you look at the notes cited above, you can figure out that corresponds to time for our time series.
The next value in the list will correspond to and so on.
Therefore, the correct answer will be
is equal to np.convolve(A,B)[0:M], where M = len(A) = len(B).
Here keep in mind that M*dt = T, where T is the last element of time array.
Disclaimer: I am not a programmer, mathematician or an engineer. I had to use convolution somewhere and have derived these conclusions from my own struggle with the problem. I will be happy to cite any book which has this analysis if someone can point it out.