How to interpolate list containing arrays?

How to interpolate list containing arrays? - python

I would like to interpolate between two lists in which 1st one contains numbers and second one contains arrays.
I tried using interp1d from scipy, but it did not work
from scipy import interpolate
r = [2,3,4]
t = [5,6,7]
f = [r,t]
q = [10,20]
c = interpolate.interp1d(q, f)
I would like to get an array, for example at value 15, which should be interpolated values between r and t arrays
Error message:
ValueError: x and y arrays must be equal in length along interpolation axis.

In the simple example of the OP it does not make a difference whether one takes 1D or 2D interpolation. If more vectors come into play, however, it makes a difference. Here both options, using numpy and taking care of floating point.
from scipy.interpolate import interp1d
from scipy.interpolate import interp2d
import numpy as np
r = np.array( [ 1, 1, 2], np.float )
s = np.array( [ 2, 3, 4], np.float )
t = np.array( [ 5, 6, 12], np.float ) # length of r,s,t,etc must be equal
f = np.array( [ r, s, t ] )
q = np.array( [ 0, 10, 20 ], np.float ) # length of q is length of f
def interpolate_my_array1D( x, xData, myArray ):
out = myArray[0].copy()
n = len( out )
for i in range(n):
vec = myArray[ : , i ]
func = interp1d( xData, vec )
out[ i ] = func( x )
return out
def interpolate_my_array2D( x, xData, myArray ):
out = myArray[0].copy()
n = len( out )
xDataLoc = np.concatenate( [ [xx] * n for xx in xData ] )
yDataLoc = np.array( range( n ) * len( xData ), np.float )
zDataLoc = np.concatenate( myArray )
func = interp2d( xDataLoc, yDataLoc, zDataLoc )
out = np.fromiter( ( func( x, yy ) for yy in range(n) ), np.float )
return out
print interpolate_my_array1D( 15., q, f )
print interpolate_my_array2D( 15., q, f )
giving
>> [3.5 4.5 5.5]
>> [2.85135135 4.17567568 6.05405405]

Following is the link to the interp1d function in scipy documentation interpolate SciPy.
From the docs you can see that the function does not take a list of list as an input. the inputs need to be either numpy arrays or list of primitive values.

Related

Numpy to return average of 3D matrix

I have a 3D Numpy matrix named stocks that has shape (A, P, T), with A corresponding to the number of stock symbols, P corresponding to number of prices for the stock at a given point in time, and T corresponding to the time.
stocks = np.array([ [ [1,2,3],[4,5,6],[7,8,9] ], [ [10,11,12],[13,14,15],[16,17,18] ], [ [19,20,21],[22,23,24],[25,26,27] ], [ [28,29,30],[31,32,33],[34,35,36] ] ])
I would like to return a 2D Numpy matrix of shape (P, T) where each element is the average of the stock price at a given time.
How would I do this? Thanks in advance!

import numpy as np
# shape (A, P, T)
stocks = np.array([ [ [1,2,3],[4,5,6],[7,8,9] ], [ [10,11,12],[13,14,15],[16,17,18] ], [ [19,20,21],[22,23,24],[25,26,27] ], [ [28,29,30],[31,32,33],[34,35,36] ] ])
# this will give you the mean calculated along the first dimension
# the shape will be (P, T)
out_0 = np.mean(stocks, axis=0)
# this will be of the shape (A, T)
out_1 = np.mean(stocks, axis=1)
# this will be of the shape (A, P)
out_2 = np.mean(stocks, axis=2)

Solving Ax=By without inverting matrices

I need to solve an equation in the form Ax=By for x. I know I shouldn't solve it by inverting B but I couldn't solve B^-1Ax=y with scipy.gmres or linalg.solve since it fails when I try to invert B with linalg.inv. It returns the error message "Singular matrix".
Is there any other way to invert a matrix? Efficiency is not important since I need to do it just once. I dont want to solve the equation twice like first for T=Ax and then x.

I'd join the suggestion of user hilberts-drinking-problem and not invert B but follow a two step approach: First multiply B and y to yield a vector By and then solve the system A·x=By. The following illustrates this approach using small arrays as test data:
from numpy import array;
from scipy.sparse import coo_matrix
from scipy.sparse.linalg import gmres
A = coo_matrix((3, 3), dtype=float)
A.setdiag( [ 2, 4, -1 ] )
A.setdiag( [ 2, -0.5 ], 1 )
print( "A", A )
B = coo_matrix((3, 2), dtype=float)
B.setdiag( [ 1, -2 ] )
B.setdiag( [ -0.5, 4 ], -1 )
print( "B", B )
y = array( [ 13, 5 ] )
print( "y", y )
By = B * y
print( "By", By )
x = gmres( A, By )
print( "x", x )

Why does Scipy dendrogram distance axis scale changes with the number of variables?

I have a distributional analysis algorithm for words. It generates observational vectors for each target word and from this table, I use stats.spearmanr() to calculate the distances (rescaled from [-1,1] to [0,1]), generating a distance matrix (Y). Then I use hierarchy.average() to obtain the clustering (Z). Finally, a dendrogram is generated and plotted.
The problem I have is this: the dendrogram scale varies with the number of target words. I was assuming that its distance axis varied along the [0,1] range (as obtained (and rescaled) with spearmanr()), as presented above. But it is [0, 0.5] for, say, 50 words, but [0, 1] for 150, and [0, 2] for 1000.
Why is that so (that the distance scale have values bigger than the ones in Y)?
I'd appreciate any ideas on this issue, because I can't seem to find any hint in the documentation and over the web (which makes me worried about making the wrong questions...). And I'd need a fixed scale or at least a way to know which one the dendrogram is using, for cut level specification purposes. Thanks in advance for any help.
The simplified code:
# coding: utf-8
# Estatísticas e visualização
import numpy as np
import scipy, random
import scipy.stats
# Clusterização e visualização do dendrograma
import scipy.cluster.hierarchy as hac
import matplotlib.pyplot as plt
def remap(x, in_min, in_max, out_min, out_max):
return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min
random.seed('7622')
sizes = [50, 250, 500, 1000, 2000]
for n in sizes:
# Generate observation matrix
X = []
for i in range(n):
vet = []
for j in range(300):
# Generate random observations
vet.append(random.randint(0, 50))
X.append(vet)
# X is a matrix where lines are variables (target words) and columns are observations (contexts of occurrence)
Y = scipy.stats.spearmanr(X, axis=1)
# Y rescaling
for i in range(len(Y[0])):
Y[0][i] = [ remap(v, -1, 1, 0, 1) for v in Y[0][i] ]
print 'Y [', np.matrix(Y[0]).min(), ',', np.matrix(Y[0]).max(), ']'
# Clustering
Z = hac.average(Y[0])
print 'n=', n, \
'Z [', min([ el[2] for el in Z ]), ',', max([ el[2] for el in Z ]), ']'
[UPDATE] Results of the above code:
Y [ 0.401120498124 , 1.0 ]
n= 50 Z [ 0.634408300876 , 0.77633631869 ]
Y [ 0.379375733574 , 1.0 ]
n= 250 Z [ 0.775241869849 , 0.969704246048 ]
Y [ 0.37559031365 , 1.0 ]
n= 500 Z [ 0.935671154717 , 1.16505319575 ]
Y [ 0.370600337649 , 1.0 ]
n= 1000 Z [ 1.19646327361 , 1.47897594053 ]
Y [ 0.359010408057 , 1.0 ]
n= 2000 Z [ 1.56890165007 , 1.96898566034 ]

NumPy indexing with varying position

I have an array input_data of shape (A, B, C), and an array ind of shape (B,). I want to loop through the B axis and take the sum of elements C[B[i]] and C[B[i]+1]. The desired output is of shape (A, B). I have the following code which works, but I feel is inefficient due to index-based looping through the B axis. Is there a more efficient method?
import numpy as np
input_data = np.random.rand(2, 6, 10)
ind = [ 2, 3, 5, 6, 5, 4 ]
out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )
for i in range( len(ind) ):
d = input_data[:, i, ind[i]:ind[i]+2]
out[:, i] = np.sum(d, axis = 1)
Edited based on Divakar's answer:
import timeit
import numpy as np
N = 1000
input_data = np.random.rand(10, N, 5000)
ind = ( 4999 * np.random.rand(N) ).astype(np.int)
def test_1(): # Old loop-based method
out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )
for i in range( len(ind) ):
d = input_data[:, i, ind[i]:ind[i]+2]
out[:, i] = np.sum(d, axis = 1)
return out
def test_2():
extent = 2 # Comes from 2 in "ind[i]:ind[i]+2"
m,n,r = input_data.shape
idx = (np.arange(n)*r + ind)[:,None] + np.arange(extent)
out1 = input_data.reshape(m,-1)[:,idx].reshape(m,n,-1).sum(2)
return out1
print timeit.timeit(stmt = test_1, number = 1000)
print timeit.timeit(stmt = test_2, number = 1000)
print np.all( test_1() == test_2(), keepdims = True )
>> 7.70429363482
>> 0.392034666757
>> [[ True]]

Here's a vectorized approach using linear indexing with some help from broadcasting. We merge the last two axes of the input array, calculate the linear indices corresponding to the last two axes, perform slicing and reshape back to a 3D shape. Finally, we do summation along the last axis to get the desired output. The implementation would look something like this -
extent = 2 # Comes from 2 in "ind[i]:ind[i]+2"
m,n,r = input_data.shape
idx = (np.arange(n)*r + ind)[:,None] + np.arange(extent)
out1 = input_data.reshape(m,-1)[:,idx].reshape(m,n,-1).sum(2)
If the extent is always going to be 2 as stated in the question - "... sum of elements C[B[i]] and C[B[i]+1]", then you could simply do -
m,n,r = input_data.shape
ind_arr = np.array(ind)
axis1_r = np.arange(n)
out2 = input_data[:,axis1_r,ind_arr] + input_data[:,axis1_r,ind_arr+1]

You could also use integer array indexing combined with basic slicing:
import numpy as np
m,n,r = 2, 6, 10
input_data = np.arange(2*6*10).reshape(m, n, r)
ind = np.array([ 2, 3, 5, 6, 5, 4 ])
out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )
for i in range( len(ind) ):
d = input_data[:, i, ind[i]:ind[i]+2]
out[:, i] = np.sum(d, axis = 1)
out2 = input_data[:, np.arange(n)[:,None], np.add.outer(ind,range(2))].sum(axis=-1)
print(out2)
# array([[ 5, 27, 51, 73, 91, 109],
# [125, 147, 171, 193, 211, 229]])
assert np.allclose(out, out2)

Using Pylab to create a plot of a line and then getting the rasterized data from the line

I am trying to get the rasterized line data from a pylab plot function. My code is like so:
fitfunc = lambda p, x: p[0] + p[1] * sin(2 * pi * x / data[head[0]].size + p[2])
errfunc = lambda p, x, y: fitfunc(p, x) - y
data = np.genfromtxt(dataFileName, dtype=None, delimiter='\t', names=True)
xAxisSeries =linspace(0., data[head[0]].max(), data[head[0]].size)
p0 = [489., 1000., 9000.] # Initial guess for the parameters
p1, success = optimize.leastsq(errfunc, p0[:], args=(xAxisSeries, data[head[1]]))
time = linspace(xAxisSeries.min(), xAxisSeries.max(), 1000)
plotinfo = plot(time, fitfunc(p1, time), 'r-')
I want to get the x and y line data from plotinfo. When I use "type(plotinfo)," plotinfo is a list, but when using "print plotinfo," it is a 2dlist object.

import numpy as np
import matplotlib.pyplot as plt
N=4
x=np.linspace(0, 10, N)
y=np.cumsum(np.random.random(N) - 0.5)
line=plt.plot(x,y)[0]
path=line._path
These are the original (x,y) data points:
print(path.vertices)
# [[ 0. 0.08426592]
# [ 3.33333333 0.14204252]
# [ 6.66666667 0.41860647]
# [ 10. 0.22516175]]
Here we (linearly) interpolate to find additional points. You can increase the argument to path.interpolated to find more interpolated points between the original points.
path2=path.interpolated(2)
print(path2.vertices)
# [[ 0. 0.08426592]
# [ 1.66666667 0.11315422]
# [ 3.33333333 0.14204252]
# [ 5. 0.2803245 ]
# [ 6.66666667 0.41860647]
# [ 8.33333333 0.32188411]
# [ 10. 0.22516175]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to interpolate list containing arrays? - python

Following is the link to the interp1d function in scipy documentation interpolate SciPy. From the docs you can see that the function does not take a list of list as an input. the inputs need to be either numpy arrays or list of primitive values.

Related

Numpy to return average of 3D matrix

Solving Ax=By without inverting matrices

Why does Scipy dendrogram distance axis scale changes with the number of variables?

NumPy indexing with varying position

Using Pylab to create a plot of a line and then getting the rasterized data from the line

Categories

Resources