numpy interpolation to increase a vector size - python

Hi I have to enlarge the number of points inside of vector to enlarge the vector to fixed size. for example:
for this simple vector
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> len(a)
# 6
now, I want to get a vector with size of 11 taken the a vector as base the results will be
# array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
EDIT 1
what I need is a function that will enter the base vector and the number of values that must be the resultant vector, and I return a new vector with size equal to the parameter. something like
def enlargeVector(vector, size):
.....
return newVector
to use like:
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> b = enlargeVector(a, 200):
>>> len(b)
# 200
and b contains data results of linear, cubic, or whatever interpolation methods

There are many methods to do this within scipy.interpolate. My favourite is UnivariateSpline, which produces an order k spline guaranteed to be differentiable k times.
To use it:
from scipy.interpolate import UnivariateSpline
old_indices = np.arange(0,len(a))
new_length = 11
new_indices = np.linspace(0,len(a)-1,new_length)
spl = UnivariateSpline(old_indices,a,k=3,s=0)
new_array = spl(new_indices)
The s is a smoothing factor that you should set to 0 in this case (since the data are exact).
Note that for the problem you have specified (since a just increases monotonically by 1), this is overkill, since the second np.linspace gives already the desired output.
EDIT: clarified that the length is arbitrary

As AGML pointed out there are tools to do this, but how about a pure numpy solution:
In [20]: a = np.arange(6)
In [21]: temp = np.dstack((a[:-1], a[:-1] + np.diff(a) / 2.0)).ravel()
In [22]: temp
Out[22]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
In [23]: np.hstack((temp, [a[-1]]))
Out[23]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])

Related

Calculate means of array with specific elements

I'm implementing the Nearest Centroid Classification algorithm and I'm kind of blocked on how to use numpy.mean in my case.
So suppose I have some spherical datasets X:
[[ 0.39151059 3.48203037]
[-0.68677876 1.45377717]
[ 2.30803493 4.19341503]
[ 0.50395297 2.87076658]
[ 0.06677012 3.23265678]
[-0.24135103 3.78044279]
[-0.05660036 2.37695381]
[ 0.74210998 -3.2654815 ]
[ 0.05815341 -2.41905942]
[ 0.72126958 -1.71081388]
[ 1.03581142 -4.09666955]
[ 0.23209714 -1.86675298]
[-0.49136284 -1.55736028]
[ 0.00654881 -2.22505305]]]
and the labeled vector Y:
[0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]
An example with 100 2D data points gives the following result:
The NCC algorithm consists of first calculating the class mean of each class (0 and 1: that's blue and red) and then calculating the nearest class centroid for the next data point.
This is my current function:
def mean_ncc(X,Y):
# find unique classes
m_cids = np.unique(Y) #[0. 1.]
# compute class means
mu = np.zeros((len(cids), X.shape[1])) #[[0. 0.] [0. 0.]] (in the case where Y has 2 unique points (0 and 1)
for class_idx, class_label in enumerate(cids):
mu[class_idx, :] = #problem here
return mu
So here I want an array containing the class means of '0' (blue) points and '1' (red) points:
How can I specify the number of elements of X whose mean I want to calculate?
I would like to do something like this:
for class_idx, class_label in enumerate(m_cids):
mu[class_idx, :] = np.mean(X[only the elements,that contains the same class_label], axis=0)
Is it possible or is there another way to implement this?
You could use something like this:
import numpy as np
tags = [0, 0, 1, 1, 0, 1]
values = [5, 4, 2, 5, 9, 8]
tags_np = np.array(tags)
values_np = np.array(values)
print(values_np[tags_np == 1].mean())
EDIT: You will surely need to look more into the axis parameter for the mean function:
import numpy as np
values = [[5, 4],
[5, 4],
[4, 3],
[4, 3]]
values_np = np.array(values)
tags_np = np.array([0, 0, 1, 1])
print(values_np[tags_np == 0].mean(axis=0))

Min-max scaling along rows in numpy array

I have a numpy array and I want to rescale values along each row to values between 0 and 1 using the following procedure:
If the maximum value along a given row is X_max and the minimum value along that row is X_min, then the rescaled value (X_rescaled) of a given entry (X) in that row should become:
X_rescaled = (X - X_min)/(X_max - X_min)
As an example, let's consider the following array (arr):
arr = np.array([[1.0,2.0,3.0],[0.1, 5.1, 100.1],[0.01, 20.1, 1000.1]])
print arr
array([[ 1.00000000e+00, 2.00000000e+00, 3.00000000e+00],
[ 1.00000000e-01, 5.10000000e+00, 1.00100000e+02],
[ 1.00000000e-02, 2.01000000e+01, 1.00010000e+03]])
Presently, I am trying to use MinMaxscaler from scikit-learn in the following way:
from sklearn.preprocessing import MinMaxScaler
result = MinMaxScaler(arr)
But, I keep getting my initial array, i.e. result turns out to be the same as arr in the aforementioned method. What am I doing wrong?
How can I scale the array arr in the manner that I require (min-max scaling along each axis?) Thanks in advance.
MinMaxScaler is a bit clunky to use; sklearn.preprocessing.minmax_scale is more convenient. This operates along columns, so use the transpose:
>>> import numpy as np
>>> from sklearn import preprocessing
>>>
>>> a = np.random.random((3,5))
>>> a
array([[0.80161048, 0.99572497, 0.45944366, 0.17338664, 0.07627295],
[0.54467986, 0.8059851 , 0.72999058, 0.08819178, 0.31421126],
[0.51774372, 0.6958269 , 0.62931078, 0.58075685, 0.57161181]])
>>> preprocessing.minmax_scale(a.T).T
array([[0.78888024, 1. , 0.41673812, 0.10562126, 0. ],
[0.63596033, 1. , 0.89412757, 0. , 0.314881 ],
[0. , 1. , 0.62648851, 0.35384099, 0.30248836]])
>>>
>>> b = np.array([(4, 1, 5, 3), (0, 1.5, 1, 3)])
>>> preprocessing.minmax_scale(b.T).T
array([[0.75 , 0. , 1. , 0.5 ],
[0. , 0.5 , 0.33333333, 1. ]])

Vectorized NumPy linspace across multi-dimensional arrays

Say I have 2 numpy 2D arrays, mins, and maxs, that will always be the same dimension as one another. I'd like to create a third array, results, that is the result of applying linspace to max and min value. Is there some "numpy"/vectorized way to do this? Example non-vectorized code is below to show results I would like.
import numpy as np
mins = np.random.rand(2,2)
maxs = np.random.rand(2,2)
# Number of elements in the linspace
x = 3
m, n = mins.shape
results = np.zeros((m, n, x))
for i in range(m):
for j in range(n):
min = mins[i][j]
max = maxs[i][j]
results[i][j] = np.linspace(min, max, num=x)
Here's one vectorized approach based on this post to cover for generic n-dim cases -
def create_ranges_nd(start, stop, N, endpoint=True):
if endpoint==1:
divisor = N-1
else:
divisor = N
steps = (1.0/divisor) * (stop - start)
return start[...,None] + steps[...,None]*np.arange(N)
Sample run -
In [536]: mins = np.array([[3,5],[2,4]])
In [537]: maxs = np.array([[13,16],[11,12]])
In [538]: create_ranges_nd(mins, maxs, 6)
Out[538]:
array([[[ 3. , 5. , 7. , 9. , 11. , 13. ],
[ 5. , 7.2, 9.4, 11.6, 13.8, 16. ]],
[[ 2. , 3.8, 5.6, 7.4, 9.2, 11. ],
[ 4. , 5.6, 7.2, 8.8, 10.4, 12. ]]])
As of Numpy version 1.16.0, non-scalar start and stop are now supported.
So, now you can do this:
assert np.__version__ > '1.17.2'
mins = np.random.rand(2,2)
maxs = np.random.rand(2,2)
# Number of elements in the linspace
x = 3
results = np.linspace(mins, maxs, num=x)
# And, if required
results = np.rollaxis(results, 0, 3)

Upsample and Interpolate a NumPy Array

I have an array, something like:
array = np.arange(0,4,1).reshape(2,2)
> [[0 1
2 3]]
I want to both upsample this array as well as interpolate the resulting values. I know that a good way to upsample an array is by using:
array = eratemp[0].repeat(2, axis = 0).repeat(2, axis = 1)
[[0 0 1 1]
[0 0 1 1]
[2 2 3 3]
[2 2 3 3]]
but I cannot figure out a way to interpolate the values to remove the 'blocky' nature between each 2x2 section of the array.
I want something like this:
[[0 0.4 1 1.1]
[1 0.8 1 2.1]
[2 2.3 3 3.1]
[2.1 2.3 3.1 3.2]]
Something like this (NOTE: these will not be the exact numbers). I understand that it may not be possible to interpolate this particular 2D grid, but using the first grid in my answer, an interpolation should be possible during the upsampling process as you are increasing the number of pixels, and can therefore 'fill in the gaps'.
I am not too fussed on the type of interpolation, providing the final output is a smoothed surface! I have tried to use the scipy.interp2d method but to no avail, would be grateful if someone could share their wisdom!
You can use SciPy interp2d for the interpolation, you can find the documentation here.
I've modified the example from the documentation a bit:
from scipy import interpolate
x = np.array(range(2))
y = np.array(range(2))
a = np.array([[0, 1], [2, 3]])
f = interpolate.interp2d(x, y, a, kind='linear')
xnew = np.linspace(0, 2, 4)
ynew = np.linspace(0, 2, 4)
znew = f(xnew, ynew)
If you print znew it should look like this:
array([[ 0. , 0.66666667, 1. , 1. ],
[ 1.33333333, 2. , 2.33333333, 2.33333333],
[ 2. , 2.66666667, 3. , 3. ],
[ 2. , 2.66666667, 3. , 3. ]])
I would use scipy.misc.imresize:
array = np.arange(0,4,1).reshape(2,2)
from skimage.transform import resize
out = scipy.misc.imresize(array, 2.0)
The 2.0 indicates that I want the output to be twice the dimensions of the input. You could alternatively supply an int or a tuple to specify a percentage of the original dimensions or just the new dimensions themselves.
This is very easy to use, but there is an extra step because imresize rescales everything so that your max value becomes 255 and your min becomes 0. (And it changes the datatype to np.unit8.) You may need to do something like:
out = out.astype(array.dtype) / 255 * (np.max(array) - np.min(array)) + np.min(array)
Let's look at the output:
>>> out.round(2)
array([[0. , 0.25, 0.75, 1. ],
[0.51, 0.75, 1.26, 1.51],
[1.51, 1.75, 2.26, 2.51],
[2. , 2.25, 2.75, 3. ]])
imresize comes with a deprecation warning and a substitute, though:
DeprecationWarning: imresize is deprecated! imresize is deprecated
in SciPy 1.0.0, and will be removed in 1.2.0. Use
skimage.transform.resize instead.
Form resample method in SciPy. Signal you can up-sample your 2d array sequentially in one axis and then the other axis.

Combination of two Binomial Draws Or "Group and sum matrix by value"

Consider two urns, E and U. There are holy grails and crappy grails in each of these. Denote the holy ones with H.
Say we draw out of both urns, xe times out of E, and xu times out of U - how many holy grails are we going to find? This is easily solvable for any pair (xe, xu). But I'd like to do this for grids of draws out of xe and xu.
What is the most efficient way to do this in Python using standard packages?
Here is my approach.
import numpy as np
import scipy.stats as stats
binomial = stats.binom.pmf
# define the grids of E, U to search
numberOfE = np.arange(3)
numberOfHolyE = np.arange(3)
numberOfU = np.arange(5)
numberOfHolyU = np.arange(5)
# mesh it
E, U, EH, UH = np.meshgrid(numberOfE, numberOfU, numberOfHolyE, numberOfHolyU, indexing='ij')
# independent draws from both urns. Probabilities are 0.9 and 0.1
drawsE = binomial(EH, E, 0.9)
drawsU = binomial(UH, U, 0.1)
# joint probability of being at a specific grid point
prob = drawsE * drawsU
totalHigh = EH + UH
This is how far I've come:
In [77]: prob[1,1,:]
Out[77]:
array([[ 0.09, 0.01, 0. , 0. , 0. ],
[ 0.81, 0.09, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [78]: totalHigh[1,1,:]
Out[78]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
I think, that, these matrices mean the following:
Take a look at where totalHigh has value 1: if I draw one time from both urns, I have a 0.81 probability of drawing one high from E and zero from U, and 0.01 the other way around. That means, the total probability of drawing one guy conditional on drawing once from both urns is 0.82.
Which brings me to my second question:
Conditional on doing it this way, How do I sum up these probabilities efficiently, conditional on the first two dimensions? I effectively want to transform these 4D matrices into 3D matrices.

Categories