Related
I am looking to do a reverse type of (numpy) interpolation.
Consider the case where I have a 'risk' value of 2.2, and that is mapped to this tenor-point value of 1.50.
Consider a have a tenor-list of list (or array) = [0.5, 1.0, 2.0, 3.0, 5.0].
Now, I would like to attribute this risk-value of 2.2 to what it would be, as mapped to the closest two tenor-points (in this case 1.0 and 2.0), in the form of a linear interpolation.
In this example, the function will generate the risk-value of 2.0, (which is mapped to expiry value of 1.50) to
for the 1.0 tenor point : of 2.2 * (1.5 - 1.0)/(2.0 - 1.0)
for the 2.0 tenor point : of 2.2 * (2.0 - 1.5)/(2.0 - 1.0)
Is there a numpy/scipy/panda or python code that would do this?
Thanks!
Well, I have attempted a bit of a different approach but maybe this helps you. I try to interpolate the points for the new grid points using interpolate.interp1d (with the option to extrapolate points fill_value="extrapolate") to extend the range beyond the interval given. In your first example the new points were always internal, in the comment example also external, so I used the more general case. This still might be polished, but should give an idea:
import numpy as np
from scipy import interpolate
def dist_val(vpt, arr):
dist = np.abs(arr-np.full_like(arr, vpt))
i0 = np.argmin(dist)
dist[i0] = np.max(dist) + 1
i1 = np.argmin(dist)
return (i0, i1)
def dstr_lin(ra, tnl, tnh):
'''returns a risk-array like ra for tnh based on tnl'''
if len(tnh) < len(tnl) or len(ra) != len(tnl):
return -1
rah = []
for vh in tnh:
try:
rah.append((vh, ra[tnl.index(vh)]))
except ValueError:
rah.append((vh, float(interpolate.interp1d(tnl, ra, fill_value="extrapolate")(vh))))
return rah
ra = [0.422, 1.053, 100.423, -99.53]
tn_low = [1.0, 2.0, 5.0, 10.0]
tn_high = [1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 12.0, 15.0]
print(dstr_lin(ra, tn_low, tn_high))
this results in
[(1.0, 0.422), (2.0, 1.053), (3.0, 34.17633333333333), (5.0, 100.423), (7.0, 20.4418), (10.0, -99.53), (12.0, -179.51120000000003), (15.0, -299.483)]
Careful though, I am not sure how "well behaved" your data is, interpolation or extrapolation might swing out of range so use with care.
I need to integrate the area under a curve, but rather than integrating the entire area under the curve at once, I would like to integrate partial areas at a specified interval of 5m. I.e, I would like to know the area under the curve from 0-5m, 5 - 10m, 10 - 15m, etc.
However, the spacing between my x values is irregular (i.e., it does not go [1, 2, 3, 4...] but rather could go, [1, 1.2, 2, 2.3, 3.1, 4...]. So I can't go by index number but rather need to go by values, and I want to create intervals of every 5 meters.
# Here is a sample of the data set (which I do NOT use in the function below, just an example of how irregular the spacing between x values is)
x = [0, 1.0, 2.0, 3.0, 4.3, 5.0, 6.0, 7.0, 8.0, 9.0, 10, 12, 12.5, 12.7, 13, 14.5, 15, 15.5, 16, 16.5]
y = [0, -0.44, -0.83, -0.91, -1.10, -1.16, -1.00, -1.02, -1.05, -1.0, -0.94, - 0.89, -1, -1.39, -1.44, -1.88, -1.9, -1.94, -2.03, -1.9]
I've created a function to get the partial area based on one specific interval (5<x<10), but I need to figure out how to do this for the entire dataframe.
from scipy.integrate import simps
def partial_area (y, x):
x =df.query('5 <= X <= 10')['X']
y =df.query('5 <= X <= 10')['Z']
area = simps(y,x)
return (area)
area = partial_area(y,x)
I'm stuck on the best way to go about this, as I'm not sure how to create intervals by data values rather than index.
Here is what I am trying to do.
Take the list:
list1 = [0,2]
This list has start point 0 and end point 2.
Now, if we were to take the midpoint of this list, the list would become:
list1 = [0,1,2]
Now, if we were to recursively split up the list again (take the midpoints of the midpoints), the list would becomes:
list1 = [0,.5,1,1.5,2]
I need a function that will generate lists like this, preferably by keeping track of a variable. So, for instance, let's say there is a variable, n, that keeps track of something. When n = 1, the list might be [0,1,2] and when n = 2, the list might be [0,.5,1,1.5,2], and I am going to increment the value of to keep track of how many times I have divided up the list.
I know you need to use recursion for this, but I'm not sure how to implement it.
Should be something like this:
def recursive(list1,a,b,n):
"""list 1 is a list of values, a and b are the start
and end points of the list, and n is an int representing
how many times the list needs to be divided"""
int mid = len(list1)//2
stuff
Could someone help me write this function? Not for homework, part of a project I'm working on that involves using mesh analysis to divide up rectangle into parts.
This is what I have so far:
def recursive(a,b,list1,n):
w = b - a
mid = a + w / 2
left = list1[0:mid]
right = list1[mid:len(list1)-1]
return recursive(a,mid,list1,n) + mid + recursive(mid,b,list1,n)
but I'm not sure how to incorporate n into here.
NOTE: The list1 would initially be [a,b] - I would just manually enter that but I'm sure there's a better way to do it.
You've generated some interesting answers. Here are two more.
My first uses an iterator to avoid
slicing the list and is recursive because that seems like the most natural formulation.
def list_split(orig, n):
if not n:
return orig
else:
li = iter(orig)
this = next(li)
result = [this]
for nxt in li:
result.extend([(this+nxt)/2, nxt])
this = nxt
return list_split(result, n-1)
for i in range(6):
print(i, list_split([0, 2], i))
This prints
0 [0, 2]
1 [0, 1.0, 2]
2 [0, 0.5, 1.0, 1.5, 2]
3 [0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2]
4 [0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.125, 1.25, 1.375, 1.5, 1.625, 1.75, 1.875, 2]
5 [0, 0.0625, 0.125, 0.1875, 0.25, 0.3125, 0.375, 0.4375, 0.5, 0.5625, 0.625, 0.6875, 0.75, 0.8125, 0.875, 0.9375, 1.0, 1.0625, 1.125, 1.1875, 1.25, 1.3125, 1.375, 1.4375, 1.5, 1.5625, 1.625, 1.6875, 1.75, 1.8125, 1.875, 1.9375, 2]
My second is based on the observation that recursion isn't necessary if you always start from two elements. Suppose those elements are mn and mx. After N applications of the split operation you will have 2^N+1 elements in it, so the numerical distance between the elements will be (mx-mn)/(2**N).
Given this information it should therefore be possible to deterministically compute the elements of the array, or even easier to use numpy.linspace like this:
def grid(emin, emax, N):
return numpy.linspace(emin, emax, 2**N+1)
This appears to give the same answers, and will probably serve you best in the long run.
You can use some arithmetic and slicing to figure out the size of the result, and fill it efficiently with values.
While not required, you can implement a recursive call by wrapping this functionality in a simple helper function, which checks what iteration of splitting you are on, and splits the list further if you are not at your limit.
def expand(a):
"""
expands a list based on average values between every two values
"""
o = [0] * ((len(a) * 2) - 1)
o[::2] = a
o[1::2] = [(x+y)/2 for x, y in zip(a, a[1:])]
return o
def rec_expand(a, n):
if n == 0:
return a
else:
return rec_expand(expand(a), n-1)
In action
>>> rec_expand([0, 2], 2)
[0, 0.5, 1.0, 1.5, 2]
>>> rec_expand([0, 2], 4)
[0,
0.125,
0.25,
0.375,
0.5,
0.625,
0.75,
0.875,
1.0,
1.125,
1.25,
1.375,
1.5,
1.625,
1.75,
1.875,
2]
You could do this with a for loop
import numpy as np
def add_midpoints(orig_list, n):
for i in range(n):
new_list = []
for j in range(len(orig_list)-1):
new_list.append(np.mean(orig_list[j:(j+2)]))
orig_list = orig_list + new_list
orig_list.sort()
return orig_list
add_midpoints([0,2],1)
[0, 1.0, 2]
add_midpoints([0,2],2)
[0, 0.5, 1.0, 1.5, 2]
add_midpoints([0,2],3)
[0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2]
You can also do this totally non-recursively and without looping. What we're doing here is just making a binary scale between two numbers like on most Imperial system rulers.
def binary_scale(start, stop, level):
length = stop - start
scale = 2 ** level
return [start + i * length / scale for i in range(scale + 1)]
In use:
>>> binary_scale(0, 10, 0)
[0.0, 10.0]
>>> binary_scale(0, 10, 2)
[0.0, 2.5, 5.0, 7.5, 10.0]
>>> binary_scale(10, 0, 1)
[10.0, 5.0, 0.0]
Fun with anti-patterns:
def expand(a, n):
for _ in range(n):
a[:-1] = sum(([a[i], (a[i] + a[i + 1]) / 2] for i in range(len(a) - 1)), [])
return a
print(expand([0, 2], 2))
OUTPUT
% python3 test.py
[0, 0.5, 1.0, 1.5, 2]
%
I have a numpy array that is very large (1 million integers). I'm using np.convolve in order to find the "densest" area of that array. By "desnsest" area I mean the window of a fixed length that has the the highest numbers when the window is summed. Let me show you in code:
import numpy as np
example = np.array([0,0,0,1,1,1,1,1,1,1,0,1,1,1,1,0,0,0,1,0,0,1,1,0,1,0,0,0,0,0,1,0])
window_size = 10
density = np.convolve(example, np.ones([window_size]), mode='valid')
print(density)
# [7.0, 7.0, 8.0, 9.0, 9.0, 9.0, 8.0, 7.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 4.0, 3.0]
I can then use np.argmax(density) to get the starting index of the highest density area 3.
Anyway, with this example it runs fast. but when convolving over million element array and with a window size of 10,000 it takes 2 seconds to complete. if I choose a windows_size of 500,000 it takes 3 minutes to complete.
Is there a better way to sum over the array with a certain window size to speed this up? If I converted this into a pandas series instead could I perhaps use something there?
Thanks for your help!
Try using scipy.signal.convolve. It has the option to compute the convolution using the fast Fourier transform (FFT), which should be much faster for the array sizes that you mentioned.
Using an array example with length 1000000 and convolving it with an array of length 10000, np.convolve took about 1.45 seconds on my computer, and scipy.signal.convolve took 22.7 milliseconds.
cumsum = np.cumsum(np.insert(example, 0, 0))
density2 = cumsum[window_size:]-cumsum[:-window_size]
np.all(density2 == density)
True
(remove insertion if you can live without the first value...)
This is how you can use the built-in NumPy real FFT functions to convolve in 1 dimension:
import numpy, numpy.fft.fftpack_lite
def fftpack_lite_rfftb(buf, s):
n = len(buf)
m = (n - 1) * 2
temp = numpy.empty(m, buf.dtype)
numpy.divide(buf, m, temp[:n])
temp[n:m] = 0
return numpy.fft.fftpack_lite.rfftb(temp[:m], s)
def fftconvolve(x, y):
xn = x.shape[-1]
yn = y.shape[-1]
cn = xn + yn - (xn + yn > 0)
m = 1 << cn.bit_length()
s = numpy.fft.fftpack_lite.rffti(m) # Initialization; can be factored out for performance
xpad = numpy.pad(x, [(0, 0)] * (len(x.shape) - 1) + [(0, m - xn)], 'constant')
a = numpy.fft.fftpack_lite.rfftf(xpad, s) # Forward transform
ypad = numpy.pad(y, [(0, 0)] * (len(y.shape) - 1) + [(0, m - yn)], 'constant')
b = numpy.fft.fftpack_lite.rfftf(ypad, s) # Forward transform
numpy.multiply(a, b, b) # Spectral multiplication
c = fftpack_lite_rfftb(b, s) # Backward transform
return c[:cn]
# Verify convolution is correct
assert (lambda a, b: numpy.allclose(fftconvolve(a, b), numpy.convolve(a, b)))(numpy.random.randn(numpy.random.randint(1, 32)), numpy.random.randn(numpy.random.randint(1, 32)))
Bear in mind that this padding is inefficient for convolution of vectors with significantly different sizes (> 100%); you'll want to use a linear combination technique like overlap-add to do smaller convolution.
I'm working in Python and have a list of hourly values for a day. For simplicity let's say there are only 10 hours in a day.
[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0]
I want to stretch this around the centre-point to 150% to end up with:
[0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0]
Note this is just an example and I will also need to stretch things by amounts that leave fractional amounts in a given hour. For example stretching to 125% would give:
[0.0, 0.0, 0.5, 1.0, 1.0, 1.0, 1.0, 0.5, 0.0, 0.0]
My first thought for handling the fractional amounts is to multiply the list up by a factor of 10 using np.repeat, apply some method for stretching out the values around the midpoint, then finally split the list into chunks of 10 and take the mean for each hour.
My main issue is the "stretching" part but if the answer also solves the second part so much the better.
I guess, you need something like that:
def stretch(xs, coef):
# compute new distibution
oldDist = sum(hours[:len(hours)/2])
newDist = oldDist * coef
# generate new list
def f(x):
if newDist - x < 0:
return 0.0
return min(1.0, newDist - x)
t = [f(x) for x in range(len(xs)/2)]
res = list(reversed(t))
res.extend(t)
return res
But be careful with odd count of hours.
If I look at the expected output, the algorithm goes something like this:
start with a list of numbers, values >0.0 indicate working hours
sum those hours
compute how many extra hours are requested
divide those
extra hours over both ends of the sequence by prepending or appending
half of this at each 'end'
So:
hours = [0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0]
expansion = 130
extra_hrs = float(sum(hours)) * float(expansion - 100)/100
# find indices of the first and last non-zero hours
# because of floating point can't use "==" for comparison.
hr_idx = [idx for (idx, value) in enumerate(hours) if value>0.001]
# replace the entries before the first and after the last
# with half the extra hours
print "Before expansion:",hours
hours[ hr_idx[0]-1 ] = hours[ hr_idx[-1]+1 ] = extra_hrs/2.0
print "After expansion:",hours
Gives as output:
Before expansion: [0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0]
After expansion: [0.0, 0.0, 0.6, 1.0, 1.0, 1.0, 1.0, 0.6, 0.0, 0.0]
This is what I've ended up doing. It's a little ugly as it needs to handle stretch coefficients less than 100%.
def stretch(xs, coef, centre):
"""Scale a list by a coefficient around a point in the list.
Parameters
----------
xs : list
Input values.
coef : float
Coefficient to scale by.
centre : int
Position in the list to use as a centre point.
Returns
-------
list
"""
grain = 100
stretched_array = np.repeat(xs, grain * coef)
if coef < 1:
# pad start and end
total_pad_len = grain * len(xs) - len(stretched_array)
centre_pos = float(centre) / len(xs)
start_pad_len = centre_pos * total_pad_len
end_pad_len = (1 - centre_pos) * total_pad_len
start_pad = [stretched_array[0]] * int(start_pad_len)
end_pad = [stretched_array[-1]] * int(end_pad_len)
stretched_array = np.array(start_pad + list(stretched_array) + end_pad)
else:
pivot_point = (len(xs) - centre) * grain * coef
first = int(pivot_point - (len(xs) * grain)/2)
last = first + len(xs) * grain
stretched_array = stretched_array[first:last]
return [round(chunk.mean(), 2) for chunk in chunks(stretched_array, grain)]
def chunks(iterable, n):
"""
Yield successive n-sized chunks from iterable.
Source: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python#answer-312464
"""
for i in xrange(0, len(iterable), n):
yield iterable[i:i + n]