Related
I have defined a function which searches for the column and row index of the minimum value for a given 2D array (main_array). In this case, the minimum value for main_array is 1.1, so index should be [0,2]. I then must use the column index value 0 to input into another given 1D array A_array, and similarly the row index value 2 into another given 1D array B_array, which is the part I am struggling with.
The following is my code so far:
import numpy as np
main_array = np.array([[3.1, 2.1, 1.1],
[4.1, 1.6, 2.4],
[2.2, 3.2, 3.6],
[1.5, 2.5, 3.5]])
A_array = np.array([3.7, 4.7, 5.7, 6.7])
B_array = np.array([1.5, 1.8, 2.1])
def min_picks(main_array,A_array,B_array):
min_index = np.argwhere(main_array == np.min(main_array)) #this gives [[0 2]]
A_pick = A_array[min_index[0]]
B_pick = B_array[min_index[-1]]
return A_pick, B_pick
The function should return an expected answer of A_array[0] which is assigned to A_pick, and B_array[2] assigned to B_pick.
You can use reduce to flatten the min_index and simply access what you need from that flatten list.
from functools import reduce
def min_picks(main_array,A_array,B_array):
min_index = reduce(lambda z, y :z + y, np.argwhere(main_array == np.min(main_array)))
A_pick = A_array[min_index[0]]
B_pick = B_array[min_index[1]]
return A_pick, B_pick
print(min_picks(main_array, A_array, B_array))
This will give you:
(3.7, 2.1)
Your array index is not correct. Try the following instead:
main_array = np.array([[3.1, 2.1, 1.1],
[4.1, 1.6, 2.4],
[2.2, 3.2, 3.6],
[1.5, 2.5, 3.5]])
A_array = np.array([3.7, 4.7, 5.7, 6.7])
B_array = np.array([1.5, 1.8, 2.1])
def min_picks(main_array,A_array,B_array):
min_index = np.argwhere(main_array == np.min(main_array)) #this gives [[0 2]]
A_pick = A_array[min_index[:,0]][0]
B_pick = B_array[min_index[:,1]][0]
return A_pick, B_pick
>>> min_picks(main_array,A_array,B_array)
#(3.7, 2.1)
I am looking to do a reverse type of (numpy) interpolation.
Consider the case where I have a 'risk' value of 2.2, and that is mapped to this tenor-point value of 1.50.
Consider a have a tenor-list of list (or array) = [0.5, 1.0, 2.0, 3.0, 5.0].
Now, I would like to attribute this risk-value of 2.2 to what it would be, as mapped to the closest two tenor-points (in this case 1.0 and 2.0), in the form of a linear interpolation.
In this example, the function will generate the risk-value of 2.0, (which is mapped to expiry value of 1.50) to
for the 1.0 tenor point : of 2.2 * (1.5 - 1.0)/(2.0 - 1.0)
for the 2.0 tenor point : of 2.2 * (2.0 - 1.5)/(2.0 - 1.0)
Is there a numpy/scipy/panda or python code that would do this?
Thanks!
Well, I have attempted a bit of a different approach but maybe this helps you. I try to interpolate the points for the new grid points using interpolate.interp1d (with the option to extrapolate points fill_value="extrapolate") to extend the range beyond the interval given. In your first example the new points were always internal, in the comment example also external, so I used the more general case. This still might be polished, but should give an idea:
import numpy as np
from scipy import interpolate
def dist_val(vpt, arr):
dist = np.abs(arr-np.full_like(arr, vpt))
i0 = np.argmin(dist)
dist[i0] = np.max(dist) + 1
i1 = np.argmin(dist)
return (i0, i1)
def dstr_lin(ra, tnl, tnh):
'''returns a risk-array like ra for tnh based on tnl'''
if len(tnh) < len(tnl) or len(ra) != len(tnl):
return -1
rah = []
for vh in tnh:
try:
rah.append((vh, ra[tnl.index(vh)]))
except ValueError:
rah.append((vh, float(interpolate.interp1d(tnl, ra, fill_value="extrapolate")(vh))))
return rah
ra = [0.422, 1.053, 100.423, -99.53]
tn_low = [1.0, 2.0, 5.0, 10.0]
tn_high = [1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 12.0, 15.0]
print(dstr_lin(ra, tn_low, tn_high))
this results in
[(1.0, 0.422), (2.0, 1.053), (3.0, 34.17633333333333), (5.0, 100.423), (7.0, 20.4418), (10.0, -99.53), (12.0, -179.51120000000003), (15.0, -299.483)]
Careful though, I am not sure how "well behaved" your data is, interpolation or extrapolation might swing out of range so use with care.
I need to integrate the area under a curve, but rather than integrating the entire area under the curve at once, I would like to integrate partial areas at a specified interval of 5m. I.e, I would like to know the area under the curve from 0-5m, 5 - 10m, 10 - 15m, etc.
However, the spacing between my x values is irregular (i.e., it does not go [1, 2, 3, 4...] but rather could go, [1, 1.2, 2, 2.3, 3.1, 4...]. So I can't go by index number but rather need to go by values, and I want to create intervals of every 5 meters.
# Here is a sample of the data set (which I do NOT use in the function below, just an example of how irregular the spacing between x values is)
x = [0, 1.0, 2.0, 3.0, 4.3, 5.0, 6.0, 7.0, 8.0, 9.0, 10, 12, 12.5, 12.7, 13, 14.5, 15, 15.5, 16, 16.5]
y = [0, -0.44, -0.83, -0.91, -1.10, -1.16, -1.00, -1.02, -1.05, -1.0, -0.94, - 0.89, -1, -1.39, -1.44, -1.88, -1.9, -1.94, -2.03, -1.9]
I've created a function to get the partial area based on one specific interval (5<x<10), but I need to figure out how to do this for the entire dataframe.
from scipy.integrate import simps
def partial_area (y, x):
x =df.query('5 <= X <= 10')['X']
y =df.query('5 <= X <= 10')['Z']
area = simps(y,x)
return (area)
area = partial_area(y,x)
I'm stuck on the best way to go about this, as I'm not sure how to create intervals by data values rather than index.
I was computing spearman correlations for matrix. I found the matrix input and two-array input gave different results when using scipy.stats.spearmanr. The results are also different from pandas.Data.Frame.corr.
from scipy.stats import spearmanr # scipy 1.0.1
import pandas as pd # 0.22.0
import numpy as np
#Data
X = pd.DataFrame({"A":[-0.4,1,12,78,84,26,0,0], "B":[-0.4,3.3,54,87,25,np.nan,0,1.2], "C":[np.nan,56,78,0,np.nan,143,11,np.nan], "D":[0,-9.3,23,72,np.nan,-2,-0.3,-0.4], "E":[78,np.nan,np.nan,0,-1,-11,1,323]})
matrix_rho_scipy = spearmanr(X,nan_policy='omit',axis=0)[0]
matrix_rho_pandas = X.corr('spearman')
print(matrix_rho_scipy == matrix_rho_pandas.values) # All False except diagonal
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8839285714285714 from scipy 1.0.1
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8829187134416477 from scipy 1.1.0
print(matrix_rho_scipy[0,1]) # 0.8263621207201486
print(matrix_rho_pandas.values[0,1]) # 0.8829187134416477
Later I found Pandas's rho is the same as R's rho.
X = data.frame(A=c(-0.4,1,12,78,84,26,0,0),
B=c(-0.4,3.3,54,87,25,NaN,0,1.2), C=c(NaN,56,78,0,NaN, 143,11,NaN),
D=c(0,-9.3,23,72,NaN,-2,-0.3,-0.4), E=c(78,NaN,NaN,0,-1,-11,1,323))
cor.test(X$A,X$B,method='spearman', exact = FALSE, na.action="na.omit") # 0.8829187
However, Pandas's corr doesn't work with large tables (e.g., here and my case is 16,000).
Thanks to Warren Weckesser's testing, I found the two-array results from Scipy 1.1.0 (but not 1.0.1) are the same results as Pandas and R.
Please let me know if you have any suggestions or comments. Thank you.
I use Python: 3.6.2 (Anaconda); Mac OS: 10.10.5.
It appears that scipy.stats.spearmanr doesn't handle nan values as expected when the input is an array and an axis is given. Here's a script that compares a few methods of computing pairwise Spearman rank-order correlations:
import numpy as np
import pandas as pd
from scipy.stats import spearmanr
x = np.array([[np.nan, 3.0, 4.0, 5.0, 5.1, 6.0, 9.2],
[5.0, np.nan, 4.1, 4.8, 4.9, 5.0, 4.1],
[0.5, 4.0, 7.1, 3.8, 8.0, 5.1, 7.6]])
r = spearmanr(x, nan_policy='omit', axis=1)[0]
print("spearmanr, array: %11.7f %11.7f %11.7f" % (r[0, 1], r[0, 2], r[1, 2]))
r01 = spearmanr(x[0], x[1], nan_policy='omit')[0]
r02 = spearmanr(x[0], x[2], nan_policy='omit')[0]
r12 = spearmanr(x[1], x[2], nan_policy='omit')[0]
print("spearmanr, individual: %11.7f %11.7f %11.7f" % (r01, r02, r12))
df = pd.DataFrame(x.T)
c = df.corr('spearman')
print("Pandas df.corr('spearman'): %11.7f %11.7f %11.7f" % (c[0][1], c[0][2], c[1][2]))
print("R cor.test: 0.2051957 0.4857143 -0.4707919")
print(' (method="spearman", continuity=FALSE)')
"""
# R code:
> x0 = c(NA, 3, 4, 5, 5.1, 6.0, 9.2)
> x1 = c(5.0, NA, 4.1, 4.8, 4.9, 5.0, 4.1)
> x2 = c(0.5, 4.0, 7.1, 3.8, 8.0, 5.1, 7.6)
> cor.test(x0, x1, method="spearman", continuity=FALSE)
> cor.test(x0, x2, method="spearman", continuity=FALSE)
> cor.test(x1, x2, method="spearman", continuity=FALSE)
"""
Output:
spearmanr, array: -0.0727393 -0.0714286 -0.4728054
spearmanr, individual: 0.2051957 0.4857143 -0.4707919
Pandas df.corr('spearman'): 0.2051957 0.4857143 -0.4707919
R cor.test: 0.2051957 0.4857143 -0.4707919
(method="spearman", continuity=FALSE)
My suggestion is to not use scipy.stats.spearmanr in the form spearmanr(x, nan_policy='omit', axis=<whatever>). Use the corr() method of the Pandas DataFrame, or use a loop to compute the values pairwise using spearmanr(x0, x1, nan_policy='omit').
The relevant excerpt of my code is as follows:
import numpy as np
def create_function(duration, start, stop):
rates = np.linspace(start, stop, duration*1000)
return rates
def generate_spikes(duration, start, stop):
rates = [create_function(duration, start, stop)]
array = [np.arange(0, (duration*1000), 1)]
start_value = [np.repeat(start, duration*1000)]
double_array = [np.add(array,array)]
times = np.arange(np.add(start_value,array), np.add(start_value,double_array), rates)
return times/1000.
I know this is really inefficient coding (especially the start_value and double_array stuff), but it's all a product of trying to somehow use arange with lists as my inputs.
I keep getting this error:
Type Error: int() argument must be a string, a bytes-like element, or a number, not 'list'
Essentially, an example of what I'm trying to do is this:
I had two arrays a = [1, 2, 3, 4] and b = [0.1, 0.2, 0.3, 0.4], I'd want to use np.arange to generate [1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8, 5.2]? (I'd be using a different step size for every element in the array.)
Is this even possible? And if so, would I have to flatten my list?
You can use broadcasting there for efficiency purposes -
(a + (b[:,None] * a)).ravel('F')
Sample run -
In [52]: a
Out[52]: array([1, 2, 3, 4])
In [53]: b
Out[53]: array([ 0.1, 0.2, 0.3, 0.4])
In [54]: (a + (b[:,None] * a)).ravel('F')
Out[54]:
array([ 1.1, 1.2, 1.3, 1.4, 2.2, 2.4, 2.6, 2.8, 3.3, 3.6, 3.9,
4.2, 4.4, 4.8, 5.2, 5.6])
Looking at the expected output, it seems you are using just the first three elements off b for the computation. So, to achieve that target, we just slice the first three elements and do that computation, like so -
In [55]: (a + (b[:3,None] * a)).ravel('F')
Out[55]:
array([ 1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8,
5.2])