python numpy - is there a faster way to convolve? - python

I have a numpy array that is very large (1 million integers). I'm using np.convolve in order to find the "densest" area of that array. By "desnsest" area I mean the window of a fixed length that has the the highest numbers when the window is summed. Let me show you in code:
import numpy as np
example = np.array([0,0,0,1,1,1,1,1,1,1,0,1,1,1,1,0,0,0,1,0,0,1,1,0,1,0,0,0,0,0,1,0])
window_size = 10
density = np.convolve(example, np.ones([window_size]), mode='valid')
print(density)
# [7.0, 7.0, 8.0, 9.0, 9.0, 9.0, 8.0, 7.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 4.0, 3.0]
I can then use np.argmax(density) to get the starting index of the highest density area 3.
Anyway, with this example it runs fast. but when convolving over million element array and with a window size of 10,000 it takes 2 seconds to complete. if I choose a windows_size of 500,000 it takes 3 minutes to complete.
Is there a better way to sum over the array with a certain window size to speed this up? If I converted this into a pandas series instead could I perhaps use something there?
Thanks for your help!

Try using scipy.signal.convolve. It has the option to compute the convolution using the fast Fourier transform (FFT), which should be much faster for the array sizes that you mentioned.
Using an array example with length 1000000 and convolving it with an array of length 10000, np.convolve took about 1.45 seconds on my computer, and scipy.signal.convolve took 22.7 milliseconds.

cumsum = np.cumsum(np.insert(example, 0, 0))
density2 = cumsum[window_size:]-cumsum[:-window_size]
np.all(density2 == density)
True
(remove insertion if you can live without the first value...)

This is how you can use the built-in NumPy real FFT functions to convolve in 1 dimension:
import numpy, numpy.fft.fftpack_lite
def fftpack_lite_rfftb(buf, s):
n = len(buf)
m = (n - 1) * 2
temp = numpy.empty(m, buf.dtype)
numpy.divide(buf, m, temp[:n])
temp[n:m] = 0
return numpy.fft.fftpack_lite.rfftb(temp[:m], s)
def fftconvolve(x, y):
xn = x.shape[-1]
yn = y.shape[-1]
cn = xn + yn - (xn + yn > 0)
m = 1 << cn.bit_length()
s = numpy.fft.fftpack_lite.rffti(m) # Initialization; can be factored out for performance
xpad = numpy.pad(x, [(0, 0)] * (len(x.shape) - 1) + [(0, m - xn)], 'constant')
a = numpy.fft.fftpack_lite.rfftf(xpad, s) # Forward transform
ypad = numpy.pad(y, [(0, 0)] * (len(y.shape) - 1) + [(0, m - yn)], 'constant')
b = numpy.fft.fftpack_lite.rfftf(ypad, s) # Forward transform
numpy.multiply(a, b, b) # Spectral multiplication
c = fftpack_lite_rfftb(b, s) # Backward transform
return c[:cn]
# Verify convolution is correct
assert (lambda a, b: numpy.allclose(fftconvolve(a, b), numpy.convolve(a, b)))(numpy.random.randn(numpy.random.randint(1, 32)), numpy.random.randn(numpy.random.randint(1, 32)))
Bear in mind that this padding is inefficient for convolution of vectors with significantly different sizes (> 100%); you'll want to use a linear combination technique like overlap-add to do smaller convolution.

Related

Numpy: How can I create a linspace in 3D dimension?

as a homework task I was given to create an RGB spectrum image just with numpy functions.
This is my current code:
zero = np.dstack([
np.linspace(0.0, 1.0, self.resolution),
np.linspace(0.0, 0.0, self.resolution),
np.linspace(1.0, 0.0, self.resolution)
])
spectrum = np.tile(zero, (self.resolution, 1, 1))
What this produces is a gradient from red to blue. Now, what is left is to linspace the green value into the third dimension.
Anyone here who has some tips how to do that?
Edit: Let me re-phrase - how can I avoid this loop with numpy?
spectrum = np.tile(zero, (self.resolution, 1, 1))
for i in range(self.resolution):
spectrum[i, :, 1] = green[i]
Your last for loop is:
spectrum[:, :, 1] = np.linspace(0.0, 1.0, resolution)[:, None]
Edit: after playing with your spectrum, this also do the job:
res = np.linspace(0.0, 1.0, resolution)
s = np.meshgrid(res, res)
spectrum = np.stack([s[0], s[1], 1-s[0]],axis=-1)

python (reverse) Interpolate assign a tenor-point value to two closest tenor point

I am looking to do a reverse type of (numpy) interpolation.
Consider the case where I have a 'risk' value of 2.2, and that is mapped to this tenor-point value of 1.50.
Consider a have a tenor-list of list (or array) = [0.5, 1.0, 2.0, 3.0, 5.0].
Now, I would like to attribute this risk-value of 2.2 to what it would be, as mapped to the closest two tenor-points (in this case 1.0 and 2.0), in the form of a linear interpolation.
In this example, the function will generate the risk-value of 2.0, (which is mapped to expiry value of 1.50) to
for the 1.0 tenor point : of 2.2 * (1.5 - 1.0)/(2.0 - 1.0)
for the 2.0 tenor point : of 2.2 * (2.0 - 1.5)/(2.0 - 1.0)
Is there a numpy/scipy/panda or python code that would do this?
Thanks!
Well, I have attempted a bit of a different approach but maybe this helps you. I try to interpolate the points for the new grid points using interpolate.interp1d (with the option to extrapolate points fill_value="extrapolate") to extend the range beyond the interval given. In your first example the new points were always internal, in the comment example also external, so I used the more general case. This still might be polished, but should give an idea:
import numpy as np
from scipy import interpolate
def dist_val(vpt, arr):
dist = np.abs(arr-np.full_like(arr, vpt))
i0 = np.argmin(dist)
dist[i0] = np.max(dist) + 1
i1 = np.argmin(dist)
return (i0, i1)
def dstr_lin(ra, tnl, tnh):
'''returns a risk-array like ra for tnh based on tnl'''
if len(tnh) < len(tnl) or len(ra) != len(tnl):
return -1
rah = []
for vh in tnh:
try:
rah.append((vh, ra[tnl.index(vh)]))
except ValueError:
rah.append((vh, float(interpolate.interp1d(tnl, ra, fill_value="extrapolate")(vh))))
return rah
ra = [0.422, 1.053, 100.423, -99.53]
tn_low = [1.0, 2.0, 5.0, 10.0]
tn_high = [1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 12.0, 15.0]
print(dstr_lin(ra, tn_low, tn_high))
this results in
[(1.0, 0.422), (2.0, 1.053), (3.0, 34.17633333333333), (5.0, 100.423), (7.0, 20.4418), (10.0, -99.53), (12.0, -179.51120000000003), (15.0, -299.483)]
Careful though, I am not sure how "well behaved" your data is, interpolation or extrapolation might swing out of range so use with care.

Transformation matrix in Python

I have a scale in a 3D space from -10 to 10 with step of 2.5 ([-10,-7.5,-5,-2.5,0,2.5,5,7.5,10]).
I have a 3D point in this scale, I want mapped it in another 3D space from 0 to 8 with step of 1 ([0,1,2,3,4,5,6,7,8]).
How can I do it?
Thanks for your help
In this stack exachange post you find the mathematical formula for normalization within a range. The python implementation could be the following:
def normalize(values, curr_bounds, new_bounds):
return [(x - curr_bounds[0]) * (new_bounds[1] - new_bounds[0]) / (curr_bounds[1] - curr_bounds[0] + new_bounds[0]) for x in values]
Your request is not very clear, if your data needs to be mapped in a discrete range (rounded by step) then you can do this:
def normalize_step(values, curr_bounds, new_bounds, new_step):
return [round_step(new_step, (x - curr_bounds[0]) * (new_bounds[1] - new_bounds[0]) / (curr_bounds[1] - curr_bounds[0] + new_bounds[0])) for x in values]
def round_step(step, n):
return (n//step + 1) * step if n%step >= step/2 else n//step * step
For example, given the following data:
current_bounds = (-10, 10)
new_bounds = (0, 8)
step = 1
values = [-10, -2.5, 0, 7.5, 2.5]
normalize(values, current_bounds, new_bounds)
# [0.0, 3.0, 4.0, 7.0, 5.0]
normalize_step(values, current_bounds, new_bounds, step)
# [0.0, 3.0, 4.0, 7.0, 5.0]
Note: In this case the result is the same because your step=1, if we put the step=1.5 the results change:
normalize_step(values, current_bounds, new_bounds, 1.5)
# [0.0, 3.0, 4.5, 7.5, 4.5]

the the step size problem in linespace function

import numpy as np
z=np.linspace(10,20,5)
z1=np.linspace(10,20,5,endpoint=False)
print(z)
print(z1)
The first z is printed out:
[ 10. 12.5 15. 17.5 20.]
The Second z1 is printed out:
[ 10. 12. 14. 16. 18.]
My confusion: when it is z, endpoint=True, the number of equally spaced samples to be generated, num=5, that is, five Numbers, so there are four steps, and it's easy to calculate that each step is 2.5.
But: when it is z1, endpoint=False, according to endpoint definition: if the sequence contains 20, it will be rejected, but the sequence is still 5 Numbers, 4 steps long, why is the last number of sequences 18. Not 19. Or other?
Your intuition seems to be that if endpoint=False is specified, then the last element of the returned array should be 1 less than the stop value.
Suppose we implement things that way. What does the following linspace call return?
numpy.linspace(0, 0.5, 5, endpoint=False)
Is it going to end at -0.5, counting down? That wouldn't make much sense.
numpy.linspace always divides the interval from start to stop into equally-sized chunks, and it always returns an array of length num. The difference between endpoint=True and endpoint=False is that with endpoint=False, it makes one extra chunk to compensate for leaving out the right endpoint. The step from 16 to 18 is the same size as the step from 18 to the omitted 20.
It would be helpful to look at the source here. This will allow you to walk through what the function is doing, which will give you a better understanding of the output of linspace.
Two parameters are used to calculate the step size: div and delta. However, the difference between endpoint=True, and endpoint=False, is that div is equal to num-1 if endpoint=True, and num if endpoint=False.
div = (num - 1) if endpoint else num
Here are the other relevent pieces of the source (very slimmed down):
delta = stop - start
y = _nx.arange(0, num, dtype=dt)
# ...
if num > 1:
step = delta / div
# ...
y = y * step
# ...
y += start
If we walk through each of these the outputs make a lot more sense.
endpoint=True:
div = num - 1 # 4
y = _nx.arange(0, num, dtype=dt) # [0, 1, 2, 3, 4]
delta = 20 - 10 # 10
step = delta / div # 2.5
y = y * step # [0.0, 2.5, 5.0, 7.5, 10.0]
y += start # [10.0, 12.5, 15.0, 17.5, 20.0]
endpoint=False:
div = num # 5
y = _nx.arange(0, num, dtype=dt) # [0, 1, 2, 3, 4]
delta = 20 - 10 # 10
step = delta / div # 2
y = y * step # [0.0, 2.0, 4.0, 6.0, 8.0]
y += start # [10.0, 12.0, 14.0, 16., 18.0]

python: dividing a number by a numpy array

I have tried the top two solutions here and the bottom solution since it dealt with numpy but nothing worked.
I wanted to take 80.0 divided each element of my array name that a new array dx.
import numpy
L = 80.0
N = []
for n in range(-4, 10):
N.append(str(2 ** N))
N = np.array([N])
This is the set up. So what I have tried is:
dx = L / N
dx = map(lambda x: L / x, N)
dx = np.array([dx])
Lastly, keeping N as a list and keeping N as a numpy array and doing
dx = [x / N for x in N]
dx = np.array([dx])
Unfortunately, I haven't been able to find a post that helped or anything in the documentation. What can I do to achieve the desired result?
Your code contains several bugs and you have a lot of unnecessary casts, but however: why don't you try this with numpy directly?
Something like this:
import numpy as np
L = 80.0
N = 2 ** np.arange(-4, 10, dtype=np.float64)
dx = L / N
gives you the expected result
array([ 1.28000000e+03, 6.40000000e+02, 3.20000000e+02,
1.60000000e+02, 8.00000000e+01, 4.00000000e+01,
2.00000000e+01, 1.00000000e+01, 5.00000000e+00,
2.50000000e+00, 1.25000000e+00, 6.25000000e-01,
3.12500000e-01, 1.56250000e-01])
Btw. you can also implicitly force the dtype to be float when using dots:
N = 2 ** np.arange(-4., 10.)
You could do it in a single line using List comprehensions.
In [8]: N=[80.0/(2**n) for n in range(-4,10)]
In [10]: print N
[1280.0, 640.0, 320.0, 160.0, 80.0, 40.0, 20.0, 10.0, 5.0, 2.5, 1.25, 0.625, 0.3125, 0.15625]
You can avoid using Numpy for such tasks.
The for loop equivalent of this would be (without preallocating N):
In [11]: N=[]
In [12]: for n in range(-4,10):
....: N.append(80.0/(2**n))
....:
In [13]: print N
[1280.0, 640.0, 320.0, 160.0, 80.0, 40.0, 20.0, 10.0, 5.0, 2.5, 1.25, 0.625, 0.3125, 0.15625]

Categories