Modifying array elements based on an absolute difference value - python

I have two arrays of the same length as shown below.
import numpy as np
y1 = [12.1, 6.2, 1.4, 0.8, 5.6, 6.8, 8.5]
y2 = [8.2, 5.6, 2.8, 1.4, 2.5, 4.2, 6.4]
y1_a = np.array(y1)
y2_a = np.array(y2)
for i in range(len(y2_a)):
y3_a[i] = abs(y2_a[i] - y2_a[i])
I am computing the absolute difference at each index/location between the two arrays. I have to replace 'y1_a' with 'y2_a' whenever the absolute difference exceeds 2.0 at a given index/location and write it to a new array variable 'y3_a'. The starter code is added.

First of all, let numpy do the lifting for you. You can calculate your absolute differences without a manual for loop:
abs_diff = np.abs(y2_a - y1_a) # I assume your original code has a typo
Now you can get all the values where the absolute difference is more than 2.0:
y3_a = y1_a
y3_a[abs_diff > 2.0] = y2_a[abs_diff > 2.0]


Index values from 2D to 1D array

I have defined a function which searches for the column and row index of the minimum value for a given 2D array (main_array). In this case, the minimum value for main_array is 1.1, so index should be [0,2]. I then must use the column index value 0 to input into another given 1D array A_array, and similarly the row index value 2 into another given 1D array B_array, which is the part I am struggling with.
The following is my code so far:
import numpy as np
main_array = np.array([[3.1, 2.1, 1.1],
[4.1, 1.6, 2.4],
[2.2, 3.2, 3.6],
[1.5, 2.5, 3.5]])
A_array = np.array([3.7, 4.7, 5.7, 6.7])
B_array = np.array([1.5, 1.8, 2.1])
def min_picks(main_array,A_array,B_array):
min_index = np.argwhere(main_array == np.min(main_array)) #this gives [[0 2]]
A_pick = A_array[min_index[0]]
B_pick = B_array[min_index[-1]]
return A_pick, B_pick
The function should return an expected answer of A_array[0] which is assigned to A_pick, and B_array[2] assigned to B_pick.
You can use reduce to flatten the min_index and simply access what you need from that flatten list.
from functools import reduce
def min_picks(main_array,A_array,B_array):
min_index = reduce(lambda z, y :z + y, np.argwhere(main_array == np.min(main_array)))
A_pick = A_array[min_index[0]]
B_pick = B_array[min_index[1]]
return A_pick, B_pick
print(min_picks(main_array, A_array, B_array))
This will give you:
(3.7, 2.1)
Your array index is not correct. Try the following instead:
main_array = np.array([[3.1, 2.1, 1.1],
[4.1, 1.6, 2.4],
[2.2, 3.2, 3.6],
[1.5, 2.5, 3.5]])
A_array = np.array([3.7, 4.7, 5.7, 6.7])
B_array = np.array([1.5, 1.8, 2.1])
def min_picks(main_array,A_array,B_array):
min_index = np.argwhere(main_array == np.min(main_array)) #this gives [[0 2]]
A_pick = A_array[min_index[:,0]][0]
B_pick = B_array[min_index[:,1]][0]
return A_pick, B_pick
>>> min_picks(main_array,A_array,B_array)
#(3.7, 2.1)

python (reverse) Interpolate assign a tenor-point value to two closest tenor point

I am looking to do a reverse type of (numpy) interpolation.
Consider the case where I have a 'risk' value of 2.2, and that is mapped to this tenor-point value of 1.50.
Consider a have a tenor-list of list (or array) = [0.5, 1.0, 2.0, 3.0, 5.0].
Now, I would like to attribute this risk-value of 2.2 to what it would be, as mapped to the closest two tenor-points (in this case 1.0 and 2.0), in the form of a linear interpolation.
In this example, the function will generate the risk-value of 2.0, (which is mapped to expiry value of 1.50) to
for the 1.0 tenor point : of 2.2 * (1.5 - 1.0)/(2.0 - 1.0)
for the 2.0 tenor point : of 2.2 * (2.0 - 1.5)/(2.0 - 1.0)
Is there a numpy/scipy/panda or python code that would do this?
Well, I have attempted a bit of a different approach but maybe this helps you. I try to interpolate the points for the new grid points using interpolate.interp1d (with the option to extrapolate points fill_value="extrapolate") to extend the range beyond the interval given. In your first example the new points were always internal, in the comment example also external, so I used the more general case. This still might be polished, but should give an idea:
import numpy as np
from scipy import interpolate
def dist_val(vpt, arr):
dist = np.abs(arr-np.full_like(arr, vpt))
i0 = np.argmin(dist)
dist[i0] = np.max(dist) + 1
i1 = np.argmin(dist)
return (i0, i1)
def dstr_lin(ra, tnl, tnh):
'''returns a risk-array like ra for tnh based on tnl'''
if len(tnh) < len(tnl) or len(ra) != len(tnl):
return -1
rah = []
for vh in tnh:
rah.append((vh, ra[tnl.index(vh)]))
except ValueError:
rah.append((vh, float(interpolate.interp1d(tnl, ra, fill_value="extrapolate")(vh))))
return rah
ra = [0.422, 1.053, 100.423, -99.53]
tn_low = [1.0, 2.0, 5.0, 10.0]
tn_high = [1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 12.0, 15.0]
print(dstr_lin(ra, tn_low, tn_high))
this results in
[(1.0, 0.422), (2.0, 1.053), (3.0, 34.17633333333333), (5.0, 100.423), (7.0, 20.4418), (10.0, -99.53), (12.0, -179.51120000000003), (15.0, -299.483)]
Careful though, I am not sure how "well behaved" your data is, interpolation or extrapolation might swing out of range so use with care.

computing partial area under a curve at specified intervals

I need to integrate the area under a curve, but rather than integrating the entire area under the curve at once, I would like to integrate partial areas at a specified interval of 5m. I.e, I would like to know the area under the curve from 0-5m, 5 - 10m, 10 - 15m, etc.
However, the spacing between my x values is irregular (i.e., it does not go [1, 2, 3, 4...] but rather could go, [1, 1.2, 2, 2.3, 3.1, 4...]. So I can't go by index number but rather need to go by values, and I want to create intervals of every 5 meters.
# Here is a sample of the data set (which I do NOT use in the function below, just an example of how irregular the spacing between x values is)
x = [0, 1.0, 2.0, 3.0, 4.3, 5.0, 6.0, 7.0, 8.0, 9.0, 10, 12, 12.5, 12.7, 13, 14.5, 15, 15.5, 16, 16.5]
y = [0, -0.44, -0.83, -0.91, -1.10, -1.16, -1.00, -1.02, -1.05, -1.0, -0.94, - 0.89, -1, -1.39, -1.44, -1.88, -1.9, -1.94, -2.03, -1.9]
I've created a function to get the partial area based on one specific interval (5<x<10), but I need to figure out how to do this for the entire dataframe.
from scipy.integrate import simps
def partial_area (y, x):
x =df.query('5 <= X <= 10')['X']
y =df.query('5 <= X <= 10')['Z']
area = simps(y,x)
return (area)
area = partial_area(y,x)
I'm stuck on the best way to go about this, as I'm not sure how to create intervals by data values rather than index.

Python Scipy spearman correlation for matrix does not match two-array correlation nor pandas.Data.Frame.corr()

I was computing spearman correlations for matrix. I found the matrix input and two-array input gave different results when using scipy.stats.spearmanr. The results are also different from pandas.Data.Frame.corr.
from scipy.stats import spearmanr # scipy 1.0.1
import pandas as pd # 0.22.0
import numpy as np
X = pd.DataFrame({"A":[-0.4,1,12,78,84,26,0,0], "B":[-0.4,3.3,54,87,25,np.nan,0,1.2], "C":[np.nan,56,78,0,np.nan,143,11,np.nan], "D":[0,-9.3,23,72,np.nan,-2,-0.3,-0.4], "E":[78,np.nan,np.nan,0,-1,-11,1,323]})
matrix_rho_scipy = spearmanr(X,nan_policy='omit',axis=0)[0]
matrix_rho_pandas = X.corr('spearman')
print(matrix_rho_scipy == matrix_rho_pandas.values) # All False except diagonal
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8839285714285714 from scipy 1.0.1
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8829187134416477 from scipy 1.1.0
print(matrix_rho_scipy[0,1]) # 0.8263621207201486
print(matrix_rho_pandas.values[0,1]) # 0.8829187134416477
Later I found Pandas's rho is the same as R's rho.
X = data.frame(A=c(-0.4,1,12,78,84,26,0,0),
B=c(-0.4,3.3,54,87,25,NaN,0,1.2), C=c(NaN,56,78,0,NaN, 143,11,NaN),
D=c(0,-9.3,23,72,NaN,-2,-0.3,-0.4), E=c(78,NaN,NaN,0,-1,-11,1,323))
cor.test(X$A,X$B,method='spearman', exact = FALSE, na.action="na.omit") # 0.8829187
However, Pandas's corr doesn't work with large tables (e.g., here and my case is 16,000).
Thanks to Warren Weckesser's testing, I found the two-array results from Scipy 1.1.0 (but not 1.0.1) are the same results as Pandas and R.
Please let me know if you have any suggestions or comments. Thank you.
I use Python: 3.6.2 (Anaconda); Mac OS: 10.10.5.
It appears that scipy.stats.spearmanr doesn't handle nan values as expected when the input is an array and an axis is given. Here's a script that compares a few methods of computing pairwise Spearman rank-order correlations:
import numpy as np
import pandas as pd
from scipy.stats import spearmanr
x = np.array([[np.nan, 3.0, 4.0, 5.0, 5.1, 6.0, 9.2],
[5.0, np.nan, 4.1, 4.8, 4.9, 5.0, 4.1],
[0.5, 4.0, 7.1, 3.8, 8.0, 5.1, 7.6]])
r = spearmanr(x, nan_policy='omit', axis=1)[0]
print("spearmanr, array: %11.7f %11.7f %11.7f" % (r[0, 1], r[0, 2], r[1, 2]))
r01 = spearmanr(x[0], x[1], nan_policy='omit')[0]
r02 = spearmanr(x[0], x[2], nan_policy='omit')[0]
r12 = spearmanr(x[1], x[2], nan_policy='omit')[0]
print("spearmanr, individual: %11.7f %11.7f %11.7f" % (r01, r02, r12))
df = pd.DataFrame(x.T)
c = df.corr('spearman')
print("Pandas df.corr('spearman'): %11.7f %11.7f %11.7f" % (c[0][1], c[0][2], c[1][2]))
print("R cor.test: 0.2051957 0.4857143 -0.4707919")
print(' (method="spearman", continuity=FALSE)')
# R code:
> x0 = c(NA, 3, 4, 5, 5.1, 6.0, 9.2)
> x1 = c(5.0, NA, 4.1, 4.8, 4.9, 5.0, 4.1)
> x2 = c(0.5, 4.0, 7.1, 3.8, 8.0, 5.1, 7.6)
> cor.test(x0, x1, method="spearman", continuity=FALSE)
> cor.test(x0, x2, method="spearman", continuity=FALSE)
> cor.test(x1, x2, method="spearman", continuity=FALSE)
spearmanr, array: -0.0727393 -0.0714286 -0.4728054
spearmanr, individual: 0.2051957 0.4857143 -0.4707919
Pandas df.corr('spearman'): 0.2051957 0.4857143 -0.4707919
R cor.test: 0.2051957 0.4857143 -0.4707919
(method="spearman", continuity=FALSE)
My suggestion is to not use scipy.stats.spearmanr in the form spearmanr(x, nan_policy='omit', axis=<whatever>). Use the corr() method of the Pandas DataFrame, or use a loop to compute the values pairwise using spearmanr(x0, x1, nan_policy='omit').

Can I use np.arange with lists as my inputs?

The relevant excerpt of my code is as follows:
import numpy as np
def create_function(duration, start, stop):
rates = np.linspace(start, stop, duration*1000)
return rates
def generate_spikes(duration, start, stop):
rates = [create_function(duration, start, stop)]
array = [np.arange(0, (duration*1000), 1)]
start_value = [np.repeat(start, duration*1000)]
double_array = [np.add(array,array)]
times = np.arange(np.add(start_value,array), np.add(start_value,double_array), rates)
return times/1000.
I know this is really inefficient coding (especially the start_value and double_array stuff), but it's all a product of trying to somehow use arange with lists as my inputs.
I keep getting this error:
Type Error: int() argument must be a string, a bytes-like element, or a number, not 'list'
Essentially, an example of what I'm trying to do is this:
I had two arrays a = [1, 2, 3, 4] and b = [0.1, 0.2, 0.3, 0.4], I'd want to use np.arange to generate [1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8, 5.2]? (I'd be using a different step size for every element in the array.)
Is this even possible? And if so, would I have to flatten my list?
You can use broadcasting there for efficiency purposes -
(a + (b[:,None] * a)).ravel('F')
Sample run -
In [52]: a
Out[52]: array([1, 2, 3, 4])
In [53]: b
Out[53]: array([ 0.1, 0.2, 0.3, 0.4])
In [54]: (a + (b[:,None] * a)).ravel('F')
array([ 1.1, 1.2, 1.3, 1.4, 2.2, 2.4, 2.6, 2.8, 3.3, 3.6, 3.9,
4.2, 4.4, 4.8, 5.2, 5.6])
Looking at the expected output, it seems you are using just the first three elements off b for the computation. So, to achieve that target, we just slice the first three elements and do that computation, like so -
In [55]: (a + (b[:3,None] * a)).ravel('F')
array([ 1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8,
