scalling new values after fit_transform - python

I have the following features :
array([[290., 50.],
[290., 46.],
[285., 44.],
...,
[295., 46.],
[299., 46.],
[ 0., 0.]])
after transforming it with:
from sklearn.preprocessing import StandardScaler
self.scaler = StandardScaler()
self.scaled_features = self.scaler.fit_transform(self.features)
I have scaled_features:
array([[ 0.27489919, 0.71822864],
[ 0.27489919, 0.26499222],
[ 0.18021955, 0.03837402],
...,
[ 0.36957884, 0.26499222],
[ 0.44532255, 0.26499222],
[-5.2165202 , -4.94722653]])
Now I wish to get a sample from the self.scaler so I send my new feature example t:
t = [299.0, 46.0]
new_data = np.array(t).reshape(-1, 1)
new_data_scaled = self.scaler.transform(t)
I get
non-broadcastable output operand with shape (2,1) doesn't match the broadcast shape (2,2)
what am I doing wrong? Why new_data is not scalled?

There are two things, first you are putting the list t into transform and not new_data. Second, new_date has shape (2,1) but should have shape (1,2). So if you change it to
t = [299.0, 46.0]
new_data = np.array(t).reshape(1, -1)
new_data_scaled = self.scaler.transform(new_data)
you should get scaled data.

Related

ValueError: all the input arrays must have same number of dims, but the arr at index 0 has 1 dimension(s) and the arr at index 11 has 2 dimension(s)

I have array as follows
samples_data = [array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)
array([ 0. , 0. , 0. , ..., -0.00020519,
-0.00019427, -0.00107348], dtype=float32)
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-8.9004419e-07, 7.3998461e-07, -6.9706215e-07], dtype=float32)
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)]
And I have a function like this
def generate_segmented_data_1(
samples_data: np.ndarray, sampling_rate: int = 16000
) -> np.ndarray:
new_data = []
for data in samples_data:
segments = segment_audio(data, sampling_rate=sampling_rate)
new_data.append(segments)
new_data = np.array(new_data)
return np.concatenate(new_data)
It shows error like this
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 11 has 2 dimension(s)
And the array at index 0 is like this
[array([ 0. , 0. , 0. , ..., -0.00022057,
0.00013752, -0.00114789], dtype=float32)
array([-4.3174211e-04, -5.4488028e-04, -1.1238289e-03, ...,
8.4724619e-05, 3.0450989e-05, -3.9514929e-05], dtype=float32)]
then the array at index 11 is like this
[[3.0856067e-05 3.0295929e-05 3.0955063e-05 ... 8.5010566e-03
1.3315652e-02 1.5698154e-02]]
And then what should I do so all of the segments I produced being concatenated as an array of segments?
I'm not quite sure I understand what you are trying to do.
b = np.array([[2]])
b.shape
# (1,1)
b = np.array([2])
b.shape
# (1,)
For the segment part of the question, it is unclear what your data structure is, but the code example is broken, as you are appending to a list that hasn't been created.
how do I can get the shape of below array to be 1D instead of 2D?
b = np.array([[2]])
b_shape = b.shape
This will result (1, 1). But, I want it results (1, ) without flattening it?
I suspect the confusion stems from the fact that you chose an example which can be also seen as a scalar, so I'll instead use a different example:
b = np.array([[1,2]])
now, b.shape is (1,2). Removing the first "one" dimension in any way (be it b.flatten() or b.squeeze() or using b[0]) all result in the same:
assert (b.flatten() == b.squeeze()).all()
assert (b.flatten() == b[0]).all()
Now, for the real problem: it appears you're trying to concatenate "rows" from "segments", but the "segments" (which I believe from your sample dat are lists of np.arrays?) are inconsistently formed.
Your sample data is very chaotic: Segments 0-10 seem to be lists of 1D arrays; Segment 11, 18 and 19 are either 2D arrays or lists of lists of floats. This, plus the error code, suggest you have an issue in the data processing of the segments.
Now, to actually concatenate both types of data:
new_data = []
for data in samples_data:
segments = function_a(data) # it appears this doesn't return consistent data
segments = np.asarray(segments) # force it to always be an array...
if segments.ndim > 1: # ...and append each row
for row in segments:
new_data.append(row)
elif segments.ndim == 1: # if just one row, append it directly
new_data.append(segments)
else:
# function_a returned an empty list, do nothing
pass
Given the shown data and code, this should work (but it's neither efficient, nor tested).

From numpy array within array to single row

I am preprocessing my data to make this work:
model = LogisticRegression()
model.fit(X, Y)
I am struggling to reshape my numpy.ndarray.
At this point, for Y I have:
Y
array([array([[52593.4410802]]), array([[52593.4410802]])], dtype=object)
Y.shape
(2,)
type(Y)
<class 'numpy.ndarray'>
And for X, I have:
X
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
X.shape
(2,)
type(X)
<class 'numpy.ndarray'>
I would like to get my X and transform so each data becomes a column/feature (idea of transpose). So each value would became a feature something like this idea:
X[0][0]
array([34.07824204])
X[0][1]
array([33.36032467])
# Sudo code idea:
# X_new = [0][0],[0][1],...
# X_new = append(X_new,[1][0],[1][1]...)
What I have tried:
nsamples, nx, ny = X.shape
d2_train_dataset = X.reshape((nsamples,nx*ny))
Also, I tried to reshape and transpose but it will not give what I need:
X
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
X.T
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
As suggested in one of the comments, I tried, without sucess to:
(I get the output as input)
X.flatten()
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
As I can understand from Y, your labels are continuous, not discrete. Your data suggest that you need a regression model, but you are trying to fit a binary classifier, logistic regression. As a regression algorithm, you may use linear regression, Support Vector Regression or any other regression model.
Before reshaping, get rid of your arrays in arrays.
You can do this easily with numpy.stack. For instance
import numpy
from numpy import array
Y = array([array([[52593.4410802]]), array([[52593.4410802]])], dtype=object)
Y = numpy.stack(Y)
print(Y.shape)
print(Y)
gives:
(2,1,1)
[[[52593.4410802]]
[[52593.4410802]]]
From this, you can reshape to what you need.

How using "dot" (or "matmul") function for iterative multiplication in Python

I need obtain a "W" matrix of multiples matrix multiplications (all multiplications result in column vectors).
from numpy import matrix
from numpy import transpose
from numpy import matmul
from numpy import dot
# Iterative matrix multiplication
def iterativeMultiplication(X, Y):
W = [] # Matrix of matricial products
X = matrix(X) # same number of rows
Y = matrix(Y) # same number of rows
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
return W
But, unexpectedly, I obtain a list of objects with their respective data types.
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[-0.2], [1.1], [5.9], [12.3]] # Edit Y column
iterativeMultiplication( X, Y )
Results in:
[array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]])]
I need any method for obtain only the numerical values for the matrix conversion.
W = matrix(W) # Results in error
It is the same using "matmul" function. Thx for your time.
If you want to stack multiple matrices, you can use numpy.vstack:
W = numpy.vstack(W)
Edit: There seems to be a discrepancy between your function, X and Y versus the "result" list in your question. But based on your comments below, what you're actually looking for is numpy.hstack (horizontal stack) which will give you the desired 3x3 matrix based on your "result" list.
W = numpy.hstack(W)
Of course you are going to get a list. You initial W as a list, and append the same calculation to it 3 times.
But your 3 element arrays don't make sense with this data, array([[ 3.36877336],[ 3.97112615],[ 3.8092797 ]]).
If I make Xm=np.matrix(X), etc:
In [162]: Xm
Out[162]:
matrix([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 2., 2., 2.],
[ 2., 5., 4.]])
In [163]: Ym
Out[163]:
matrix([[ 0.1, -0.2],
[ 0.9, 1.1],
[ 6.2, 5.9],
[ 11.9, 12.3]])
In [164]: Xm.T.dot(Ym)
Out[164]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
In [165]: Xm.T*Ym # matrix interprets * as .dot
Out[165]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
You need to edit the question, to have both valid Python code (missing def and :), and results that match the inputs.
===============
In [173]: Y = [[-0.2], [1.1], [5.9], [12.3]]
In [174]: Ym=np.matrix(Y)
Out[176]:
matrix([[ 37.5],
[ 73.3],
[ 60.8]])
=====================
This iteration is clumsy:
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
A more Pythonic approach
for h in range(X.shape[1]):
W.append(np.dot(...))
Or even
W = [np.dot(....) for h in range(X.shape[1])]

reshape list of numpy arrays and then reshape back

I have a list which consists of several numpy arrays with different shapes.
I want to reshape this list of arrays into a numpy vector and then change each element in the vector and then reshape it back to the original list of arrays.
For example:
input
[numpy.zeros((2,2)), numpy.ones((3,3))]
First
To vector
[0,0,0,0,1,1,1,1,1,1,1,1,1]
Second
every time change only one element. for example change the 1st element 0 to 2
[0,2,0,0,1,1,1,1,1,1,1,1,1]
Last
convert it back to
[array([[0,2],[0,0]]),array([[1,1,1],[1,1,1],[1,1,1]])]
Is there any fast implementation? Thanks very much.
It seems like converting to a list and back will be inefficient. Instead, why not figure out which array to index (and where) and then just update that index? e.g.
def change_element(arr1, arr2, ix, value):
which = ix >= arr1.size
arr = [arr1, arr2][which]
ix = ix - arr1.size if which else ix
arr.ravel()[ix] = value
And here's some example usage:
>>> arr1 = np.zeros((2, 2))
>>> arr2 = np.ones((3, 3))
>>> change_element(arr1, arr2, 1, 2)
>>> change_element(arr1, arr2, 6, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 1. , 1. , 1. ],
[ 1. , 1. , 1. ]])
>>> change_element(arr1, arr2, 7, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 3.14, 1. , 1. ],
[ 1. , 1. , 1. ]])
A few notes -- This updates the arrays in place. It doesn't create new arrays. If you really need to create new arrays, I suppose you could np.copy them and return. Also, this relies on the arrays sharing memory before and after the ravel. I don't remember the exact circumstances where ravel would return a new array rather than a view into the original array . . .
Generalizing to more arrays is actually quite easy. We just need to walk down the list of arrays and see if ix is less than the array size. If it is, we've found our array. If it isn't, we need to subtract the array's size from ix to represent the number of elements we've traversed thus far:
def change_element(arrays, ix, value):
for arr in arrays:
if ix < arr.size:
arr.ravel()[ix] = value
return
ix -= arr.size
And you can call this similar to before:
change_element([arr1, arr2], 6, 3.14159)
#mgilson probably has the best answer for you, but if you absolutely have to convert to a flat list first and then go back again (perhaps because you need to do something else with the flat list as well), then you can do this with list comprehensions:
lst = [numpy.zeros((2,4)), numpy.ones((3,3))]
tlist = [e for a in lst for e in a.ravel()]
tlist[1] = 2
i = 0
lst2 = []
dims = [a.shape for a in lst]
for n, m in dims:
lst2.append(np.array(tlist[i:i+n*m]).reshape(n,m))
i += n*m
lst2
[array([[ 0., 2.],
[ 0., 0.]]), array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])]
Of course, you lose the information about your array sizes when you flatten, so you need to store them somewhere (here, in dims).

Scipy filter with multi-dimensional (or non-scalar) output

Is there a filter similar to ndimage's generic_filter that supports vector output? I did not manage to make scipy.ndimage.filters.generic_filter return more than a scalar. Uncomment the line in the code below to get the error: TypeError: only length-1 arrays can be converted to Python scalars.
I'm looking for a generic filter that process 2D or 3D arrays and returns a vector at each point. Thus the output would have one added dimension. For the example below I'd expect something like this:
m.shape # (10,10)
res.shape # (10,10,2)
Example Code
import numpy as np
from scipy import ndimage
a = np.ones((10, 10)) * np.arange(10)
footprint = np.array([[1,1,1],
[1,0,1],
[1,1,1]])
def myfunc(x):
r = sum(x)
#r = np.array([1,1]) # uncomment this
return r
res = ndimage.generic_filter(a, myfunc, footprint=footprint)
The generic_filter expects myfunc to return a scalar, never a vector.
However, there is nothing that precludes myfunc from also adding information
to, say, a list which is passed to myfunc as an extra argument.
Instead of using the array returned by generic_filter, we can generate our vector-valued array by reshaping this list.
For example,
import numpy as np
from scipy import ndimage
a = np.ones((10, 10)) * np.arange(10)
footprint = np.array([[1,1,1],
[1,0,1],
[1,1,1]])
ndim = 2
def myfunc(x, out):
r = np.arange(ndim, dtype='float64')
out.extend(r)
return 0
result = []
ndimage.generic_filter(
a, myfunc, footprint=footprint, extra_arguments=(result,))
result = np.array(result).reshape(a.shape+(ndim,))
I think I get what you're asking, but I'm not completely sure how does the ndimage.generic_filter work (how abstruse is the source!).
Here's just a simple wrapper function. This function will take in an array, all the parameters ndimage.generic_filter needs. Function returns an array where each element of the former array is now represented by an array with shape (2,), result of the function is stored as the second element of that array.
def generic_expand_filter(inarr, func, **kwargs):
shape = inarr.shape
res = np.empty(( shape+(2,) ))
temp = ndimage.generic_filter(inarr, func, **kwargs)
for row in range(shape[0]):
for val in range(shape[1]):
res[row][val][0] = inarr[row][val]
res[row][val][1] = temp[row][val]
return res
Output, where res denotes just the generic_filter and res2 denotes generic_expand_filter, of this function is:
>>> a.shape #same as res.shape
(10, 10)
>>> res2.shape
(10, 10, 2)
>>> a[0]
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> res[0]
array([ 3., 8., 16., 24., 32., 40., 48., 56., 64., 69.])
>>> print(*res2[0], sep=", ") #this is just to avoid the vertical default output
[ 0. 3.], [ 1. 8.], [ 2. 16.], [ 3. 24.], [ 4. 32.], [ 5. 40.], [ 6. 48.], [ 7. 56.], [ 8. 64.], [ 9. 69.]
>>> a[0][0]
0.0
>>> res[0][0]
3.0
>>> res2[0][0]
array([ 0., 3.])
Of course you probably don't want to save the old array, but instead have both fields as new results. Except I don't know what exactly you had in mind, if the two values you want stored are unrelated, just add a temp2 and func2 and call another generic_filter with the same **kwargs and store that as the first value.
However if you want an actual vector quantity that is calculated using multiple inarr elements, meaning that the two new created fields aren't independent, you are just going to have to write that kind of a function, one that takes in an array, idx, idy indices and returns a tuple\list\array value which you can then unpack and assign to the result.

Categories