I have a np.array of 50 elements. For example:
data = np.array([9.22, 9. , 9.01, ..., 7.98, 6.77, 7.3 ])
For each element of the data np.array, I have a x and y data pair (both with the same length) that I want to interpolate with. For example:
x = np.array([[ 1, 2, 3, 4, 5 ],
...,
[ 1.01, 2.01, 3.02, 4.03, 5.07 ]])
y = np.array([[0. , 1. , 0.95, ..., 0.07, 0.06, 0.06],
...,
[0. , 0.99 , 0.85, ..., 0.03, 0.05, 0.06]])
I want to interpolate each data element with the respective np.array of x and y.
I have the following solution using map():
def cubic_spline(i):
return scipy.interpolate.splev(x=data[i],
tck=scipy.interpolate.splrep(x[i], y[i], k=3))
list(map(cubic_spline, np.arange(len(data)))
But I'm wondering if there is a way to do it directly with scipy and numpy to optimize the execution time. Something like:
scipy.interpolate.splev(x=data,
tck=scipy.interpolate.splrep(x, y, k=3))
Any suggestions will be appreciated. Thanks in advance.
If you have a single x array and multiple y arrays, newer interpolators (make_interp_spline, PchipInterpolator etc) support multidimensional y arrays automatically.
If you really have a collection of pairs of 1D arrays, x and y, where x arrays differ, and you want scipy to loop over these datasets, then no, scipy does not support that. You'd need to loop over them manually.
As the title says, I'm trying to get a Markov Clustering Algorithm to work in Python, namely Python 3.7
Unfortunately, it's not doing much of anything, and it's driving me up the wall trying to fix it.
EDIT: First, I've made the adjustments to the main code to make each column sum to 100, even if it's not perfectly balanced. I'm going to try to account for that in the final answer.
To be clear, the biggest problem is that the numbers spiral out of control, into such easily-understandable numbers as 5.56268465e-309, and I don't know how to convert that into something understandable.
Here's the code so far:
import numpy as np
import math
## How far you'd like your random-walkers to go (bigger number -> more walking)
EXPANSION_POWER = 2
## How tightly clustered you'd like your final picture to be (bigger number -> more clusters)
INFLATION_POWER = 2
ITERATION_COUNT = 10
def normalize(matrix):
return matrix/np.sum(matrix, axis=0)
def expand(matrix, power):
return np.linalg.matrix_power(matrix, power)
def inflate(matrix, power):
for entry in np.nditer(transition_matrix, op_flags=['readwrite']):
entry[...] = math.pow(entry, power)
return matrix
def run(matrix):
#np.fill_diagonal(matrix, 1)
#print(matrix)
matrix = normalize(matrix)
print(matrix)
for _ in range(ITERATION_COUNT):
matrix = normalize(inflate(expand(matrix, EXPANSION_POWER), INFLATION_POWER))
return matrix
transition_matrix = np.array ([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0.5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0.33,0,0,0.5,0,0,0,0,0,0,0,0,0,0.125,1],
[0,0,0,0.33,0,0,0.5,1,1,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.166,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.166,0,0,0,0,0.2,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,0,0,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,0,0,0,0,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,1,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,1,0,1,0,0.125,0],
[0,0,0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0],
[0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0]])
run(transition_matrix)
print(transition_matrix)
This is part of a uni assignment - I need to do this array both weighted and unweighted (though the weighted part can just wait until I've got the bloody thing working at all) any tips or suggestions?
Your transition matrix is not valid.
>>> transition_matrix.sum(axis=0)
>>> matrix([[1. , 1. , 0.99, 0.99, 0.96, 0.99, 1. , 1. , 0. , 1. ,
1. , 1. , 1. , 0. , 0. , 1. , 0.88, 1. ]])
Not only does some of your columns not sum to 1, some of them sum to 0.
This means when you try to normalize your matrix, you will end up with nan because you are dividing by 0.
Lastly, is there a reason why you are using a Numpy matrix instead of just a Numpy array, which is the recommended container for such data? Because using Numpy arrays will simplify some of the operations, such as raising each entry to a power. Also, there are some differences between Numpy matrix and Numpy array which can result in subtle bugs.
I have a sparse 3D array of values. I am trying to turn each "point" into a fuzzy "sphere", by applying a Gaussian filter to the array.
I would like the original value at the point (x,y,z) to remain the same. I just want to create falloff values around this point... But applying the Gaussian filter changes the original (x,y,z) value as well.
I am currently doing this:
dataCube = scipy.ndimage.filters.gaussian_filter(dataCube, 3, truncate=8)
Is there a way for me to normalize this, or do something so that my original values are still in this new dataCube? I am not necessarily tied to using a Gaussian filter, if that is not the best approach.
You can do this using a convolution with a kernel that has 1 as its central value, and a width smaller than the spacing between your data points.
1-d example:
import numpy as np
import scipy.signal
data = np.array([0,0,0,0,0,5,0,0,0,0,0])
kernel = np.array([0.5,1,0.5])
scipy.signal.convolve(data, kernel, mode="same")
gives
array([ 0. , 0. , 0. , 0. , 2.5, 5. , 2.5, 0. , 0. , 0. , 0. ])
Note that fftconvolve might be much faster for large arrays. You also have to specify what should happen at the boundaries of your array.
Update: 3-d example
import numpy as np
from scipy import signal
# first build the smoothing kernel
sigma = 1.0 # width of kernel
x = np.arange(-3,4,1) # coordinate arrays -- make sure they contain 0!
y = np.arange(-3,4,1)
z = np.arange(-3,4,1)
xx, yy, zz = np.meshgrid(x,y,z)
kernel = np.exp(-(xx**2 + yy**2 + zz**2)/(2*sigma**2))
# apply to sample data
data = np.zeros((11,11,11))
data[5,5,5] = 5.
filtered = signal.convolve(data, kernel, mode="same")
# check output
print filtered[:,5,5]
gives
[ 0. 0. 0.05554498 0.67667642 3.0326533 5. 3.0326533
0.67667642 0.05554498 0. 0. ]
Is there an easy way to generate two time-series with a fixed correlation? For instance 0.5.
Does anyone know a solution in R or Python?
Thanks!
This question is quite general, I think. It is not limited to just time-series. What you are asking is to generate 2d random variable, with known covariance. r==0.5, std1=1 and std2=2 would translate to a covariance matrix of [[1,1],[1,4]]. Therefore, if we assume the data is multidimensional normal distributed, we can generate such a random variable:
In [42]:
import numpy as np
val=np.random.multivariate_normal((0,0),[[1,1],[1,4]],1000)
In [43]:
np.corrcoef(val.T)
Out[43]:
array([[ 1. , 0.488883],
[ 0.488883, 1. ]])
In [44]:
np.cov(val.T)
Out[44]:
array([[ 1.03693888, 0.96490767],
[ 0.96490767, 3.75671707]])
In [45]:
val=np.random.multivariate_normal((0,0),[[1,1],[1,4]],10)
In [46]:
np.corrcoef(val.T)
Out[46]:
array([[ 1. , 0.56807297],
[ 0.56807297, 1. ]])
In [48]:
val[:,0]
Out[48]:
array([-0.77425116, 0.35758601, -1.21668939, -0.95127533, -0.5714381 ,
0.87530824, 0.9594394 , 1.30123373, 1.92511929, 0.98070711])
In [49]:
val[:,1]
Out[49]:
array([-1.75698285, 2.24011423, -3.5129411 , -1.33889305, 2.32720257,
0.53750133, 3.23935645, 2.96819425, -0.72551024, 3.0743096 ])
As shown in this example, if your sample size is small, the resulting random variable may deviate from r=0.5, considerably.