ValueError: Found array with dim 3. Estimator expected <= 2 python - python

I am trying to perform decision trees with some train and test data which are in lists named x&y.
my train data x is this:
[array([[19. , 14. , 0.8],
[23. , 24. , 0.8],
[25. , 26. , 0.8],
[22. , 24. , 1. ],
[25. , 29. , 1.4],
[36. , 86. , 1.6],
[28. , 52. , 0.8],
[21. , 20. , 1. ],
[22. , 28. , 0.8],
[24. , 27. , 1. ],
[18. , 8. , 0.6],
[30. , 58. , 1.2],
[24. , 30. , 0.8],
[24. , 28. , 0.8],
[32. , 65. , 1.6],
[28. , 47. , 0.8],
[26. , 41. , 0.8],
[18. , 14. , 0.6],
[32. , 71. , 2.2],
[27. , 45. , 2. ],
[29. , 53. , 2.2],
[18. , 11. , 0.8],
[20. , 23. , 0.8],
[20. , 19. , 0.6],
[20. , 15. , 0.6],
[19. , 18. , 0.4],
[24. , 55. , 1.2],
[24. , 59. , 1. ],
[20. , 17. , 0.6],
[21. , 28. , 0.8]])]
and y:
[array([ 3100., 2750., 7800., 6000., 15000., 15500., 5600., 8000.,
6000., 7500., 4000., 9000., 5850., 5750., 18000., 5600.,
5600., 4500., 22000., 21500., 24000., 4000., 6000., 4000.,
8000., 8000., 14000., 14000., 6000., 4000.])]
when i try to perform
dtree= DecisionTreeRegressor(random_state=0, max_depth=1)
dtree.fit(x_train, y_train)
I get the error ValueError: Found array with dim 3. Estimator expected <= 2. and couldn't solve it with reshape since these are lists. any suggestions?

First of all, I recommend you to convert X and Y as numpy arrays, but I can not be 100% sure if your variables are indeed, since you haven't uploaded your code here. Secondly, take a look at your variables. As it says in the page:
X{array-like, sparse matrix} of shape (n_samples, n_features)
AND
yarray-like of shape (n_samples,) or (n_samples, n_outputs)
fit function expects 2D arrays in both X and Y arrays. And X_train is 3D.
So you need to reshape these two. One solution can be:
AMSWER EDITTED AFTER READING HIS/HER COMMENTS
The reason why you can't train your data is because 2 things:
X_train has a bad shape
Y_train has a bad shape
Your are passing a 3D array with X_train, and fit only allows you to be 2D. Furthermore, your Y_train has shape (1, 30) which means you are passing 30 data at once. You need to separate them and passing as (30, ), as follows:
from sklearn.tree import DecisionTreeRegressor
import numpy as np
X_train = np.array([np.array([[19. , 14. , 0.8],
[23. , 24. , 0.8],
[25. , 26. , 0.8],
[22. , 24. , 1. ],
[25. , 29. , 1.4],
[36. , 86. , 1.6],
[28. , 52. , 0.8],
[21. , 20. , 1. ],
[22. , 28. , 0.8],
[24. , 27. , 1. ],
[18. , 8. , 0.6],
[30. , 58. , 1.2],
[24. , 30. , 0.8],
[24. , 28. , 0.8],
[32. , 65. , 1.6],
[28. , 47. , 0.8],
[26. , 41. , 0.8],
[18. , 14. , 0.6],
[32. , 71. , 2.2],
[27. , 45. , 2. ],
[29. , 53. , 2.2],
[18. , 11. , 0.8],
[20. , 23. , 0.8],
[20. , 19. , 0.6],
[20. , 15. , 0.6],
[19. , 18. , 0.4],
[24. , 55. , 1.2],
[24. , 59. , 1. ],
[20. , 17. , 0.6],
[21. , 28. , 0.8]])])
dimX1, dimX2, dimX3 = np.array(X_train).shape
X_train = np.reshape(np.array(X_train), (dimX1*dimX2, dimX3))
Y_train = np.array([np.array([ 3100., 2750., 7800., 6000., 15000., 15500., 5600., 8000.,
6000., 7500., 4000., 9000., 5850., 5750., 18000., 5600.,
5600., 4500., 22000., 21500., 24000., 4000., 6000., 4000.,
8000., 8000., 14000., 14000., 6000., 4000.])])
dimY1, dimY2 = Y_train.shape
Y_train = np.reshape(np.array(Y_train), (dimY2, ))
print(X_train.shape, Y_train.shape)
dtree= DecisionTreeRegressor(random_state=0, max_depth=1)
dtree.fit(X_train, Y_train)
Its output is:
>>> (30, 3) (30,)
>>> DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=1,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=0, splitter='best')

Related

axis -1 in numpy array

I am struggling to understand two things from below matrix (numpy arrays):
How can I deduce from the np.stack(cart_indexing, axis=1) function that there are 5 dimensions? I am struggling to conceptually understand the (5, 2, 5) part. I see it as (rows, column numbers, dimensions).
What does axis = -1 really mean? How to understand it?
x = np.linspace(start=-10, stop=0, num=5, endpoint=True)
y = np.linspace(start=1, stop=10, num=5)
cart_indexing = np.meshgrid(x, y, indexing="xy") # cartesian indexing
>> [array([[-10. , -7.5, -5. , -2.5, 0. ],
[-10. , -7.5, -5. , -2.5, 0. ],
[-10. , -7.5, -5. , -2.5, 0. ],
[-10. , -7.5, -5. , -2.5, 0. ],
[-10. , -7.5, -5. , -2.5, 0. ]]),
array([[ 1. , 1. , 1. , 1. , 1. ],
[ 3.25, 3.25, 3.25, 3.25, 3.25],
[ 5.5 , 5.5 , 5.5 , 5.5 , 5.5 ],
[ 7.75, 7.75, 7.75, 7.75, 7.75],
[10. , 10. , 10. , 10. , 10. ]])]
np.stack(cart_indexing, axis=0)
>> array([[[-10. , -7.5 , -5. , -2.5 , 0. ],
[-10. , -7.5 , -5. , -2.5 , 0. ],
[-10. , -7.5 , -5. , -2.5 , 0. ],
[-10. , -7.5 , -5. , -2.5 , 0. ],
[-10. , -7.5 , -5. , -2.5 , 0. ]],
[[ 1. , 1. , 1. , 1. , 1. ],
[ 3.25, 3.25, 3.25, 3.25, 3.25],
[ 5.5 , 5.5 , 5.5 , 5.5 , 5.5 ],
[ 7.75, 7.75, 7.75, 7.75, 7.75],
[ 10. , 10. , 10. , 10. , 10. ]]])
np.stack(cart_indexing, axis=1)
>> array([[[-10. , -7.5 , -5. , -2.5 , 0. ],
[ 1. , 1. , 1. , 1. , 1. ]],
[[-10. , -7.5 , -5. , -2.5 , 0. ],
[ 3.25, 3.25, 3.25, 3.25, 3.25]],
[[-10. , -7.5 , -5. , -2.5 , 0. ],
[ 5.5 , 5.5 , 5.5 , 5.5 , 5.5 ]],
[[-10. , -7.5 , -5. , -2.5 , 0. ],
[ 7.75, 7.75, 7.75, 7.75, 7.75]],
[[-10. , -7.5 , -5. , -2.5 , 0. ],
[ 10. , 10. , 10. , 10. , 10. ]]])
np.stack(cart_indexing, axis=1).shape
>> (5, 2, 5)
np.stack(cart_indexing, axis=-1)
>> array([[[-10. , 1. ],
[ -7.5 , 1. ],
[ -5. , 1. ],
[ -2.5 , 1. ],
[ 0. , 1. ]],
[[-10. , 3.25],
[ -7.5 , 3.25],
[ -5. , 3.25],
[ -2.5 , 3.25],
[ 0. , 3.25]],
[[-10. , 5.5 ],
[ -7.5 , 5.5 ],
[ -5. , 5.5 ],
[ -2.5 , 5.5 ],
[ 0. , 5.5 ]],
[[-10. , 7.75],
[ -7.5 , 7.75],
[ -5. , 7.75],
[ -2.5 , 7.75],
[ 0. , 7.75]],
[[-10. , 10. ],
[ -7.5 , 10. ],
[ -5. , 10. ],
[ -2.5 , 10. ],
[ 0. , 10. ]]])
np.stack(cart_indexing, axis=-1).shape
>> (5, 5, 2)
It's not clear what you mean by
there are 5 dimensions
None of your arrays have 5 dimensions. You start with a list of 2 arrays with 2 dimensions;
for i in cart_indexing:
print(f"Shape:{i.shape}; Dimensions:{i.ndim}")
Shape:(5, 5); Dimensions:2
Shape:(5, 5); Dimensions:2
Notice here how you have 5 and 5 and 2.
Then, the axis parameter in your stack comes into play:
for i in range(3):
print(f"Stacked on axis {i} my array has {np.stack(cart_indexing, axis=i).ndim} dimensions and a shape of {np.stack(cart_indexing, axis=i).shape}")
Stacked on axis 0 my array has 3 dimensions and a shape of (2, 5, 5) #the 2 is in the (axis=)0th position
Stacked on axis 1 my array has 3 dimensions and a shape of (5, 2, 5) #the 2 is in the (axis=)1st position
Stacked on axis 2 my array has 3 dimensions and a shape of (5, 5, 2) #the 2 is in the (axis=)2nd position
Put another way, stacking adds a dimension along which the arrays are stacked. The axis parameter determines which dimension is created during stacking/along which dimension they are stacked
What does axis = -1 really mean?
Why does print("Hello world"[-1]) print "d"?
Or, in other words, if we want to count our dimensions from last to first:
for i in range(-3,0):
print(f"Stacked on axis {i} my array has {np.stack(cart_indexing, axis=i).ndim} dimensions and a shape of {np.stack(cart_indexing, axis=i).shape}")
Stacked on axis -3 my array has 3 dimensions and a shape of (2, 5, 5) #dimension that is third from last
Stacked on axis -2 my array has 3 dimensions and a shape of (5, 2, 5) #dimension that is second from last
Stacked on axis -1 my array has 3 dimensions and a shape of (5, 5, 2) #last dimesnion

replacing specific columns values in 2d array numpy

how to replace 4th and 5th column values in utl by new_values array and keep the remaining columns as it is
utl = np.array([[ 3. , 134.4 , 17. , 135.05 , 22. , 135.25 , 0.04 ],
[ 12. , 134.3 , 17. , 135.05 , 22. , 135.8 , 0.15 ]])
new_values=np.array([[ 27., 135.45],
[ 27., 136.55]])
i tried this but it does not work
# utl[:,[4,5]] = new_values
# utl[:,4] = new_values[:,0]
output must be
#values changed
[[ 3. , 134.4 , 17. , 135.05 , | 27. , 135.45 |, 0.04 ],
[ 12. , 134.3 , 17. , 135.05 , | 27. , 136.55 |, 0.15 ]])
this works fine, as expected:
utl[:, [4,5]] = new_values
output:
array([[ 3. , 134.4 , 17. , 135.05, 27. , 135.45, 0.04],
[ 12. , 134.3 , 17. , 135.05, 27. , 136.55, 0.15]])

Sort Multidimensional List - Python

I have a 3d list of lists or numpy array and I need to sort it, by the smallest first item index.
This are the last two tentatives I did on this program. Sorry, I am quite sure it is an easy/silly question, but as a newbie in programming 'way of thinking', it is kind of hard for me.
First Try:
lstsArray = [[[54,21,31], [1,2,3], [15,25,35]],
[[12,22,32], [3,2,1], [16,26,36]],
[[34,24,38], [0.1,1,1], [17,27,37]]]
val = np.array(lstsArray)
menor = 120e26
for item in val:
for i in item:
if menor >= i[0] and i[0] >= min(i):
menor = i[0]
print(menor)
lstA = list(val)
a = sorted(lstA, key=itemgetter(menor))
print(a)
Second Try
for i in val:
for j in i:
print(sorted((i), key =itemgetter(j[0])))
Desired Output
[[[0.1,1,1],[1,2,3],[3,2,1]],
[[12,22,32],[15,25,35],[16,26,36]],
[[17,27,37],[34,24,38],[54,21,31]]]
Your list, and array made from it. Note the floats in the array:
In [124]: lstsArray = [[[54,21,31], [1,2,3], [15,25,35]],
...: [[12,22,32], [3,2,1], [16,26,36]],
...: [[34,24,38], [0.1,1,1], [17,27,37]]]
In [125]: val=np.array(lstsArray)
In [126]: val
Out[126]:
array([[[54. , 21. , 31. ],
[ 1. , 2. , 3. ],
[15. , 25. , 35. ]],
[[12. , 22. , 32. ],
[ 3. , 2. , 1. ],
[16. , 26. , 36. ]],
[[34. , 24. , 38. ],
[ 0.1, 1. , 1. ],
[17. , 27. , 37. ]]])
This is a (3,3,3) shaped array. But your sorting ignores the initial (3,3) layout, so let's go ahead and reshape it:
In [133]: val = np.array(lstsArray).reshape(-1,3)
In [134]: val
Out[134]:
array([[54. , 21. , 31. ],
[ 1. , 2. , 3. ],
[15. , 25. , 35. ],
[12. , 22. , 32. ],
[ 3. , 2. , 1. ],
[16. , 26. , 36. ],
[34. , 24. , 38. ],
[ 0.1, 1. , 1. ],
[17. , 27. , 37. ]])
Now we can easily reshape on the first column value. argsort gives the sort order:
In [135]: idx = np.argsort(val[:,0])
In [136]: idx
Out[136]: array([7, 1, 4, 3, 2, 5, 8, 6, 0])
In [137]: val[idx]
Out[137]:
array([[ 0.1, 1. , 1. ],
[ 1. , 2. , 3. ],
[ 3. , 2. , 1. ],
[12. , 22. , 32. ],
[15. , 25. , 35. ],
[16. , 26. , 36. ],
[17. , 27. , 37. ],
[34. , 24. , 38. ],
[54. , 21. , 31. ]])
and to get it back to 3d:
In [138]: val[idx].reshape(3,3,3)
Out[138]:
array([[[ 0.1, 1. , 1. ],
[ 1. , 2. , 3. ],
[ 3. , 2. , 1. ]],
[[12. , 22. , 32. ],
[15. , 25. , 35. ],
[16. , 26. , 36. ]],
[[17. , 27. , 37. ],
[34. , 24. , 38. ],
[54. , 21. , 31. ]]])
or in list display:
In [139]: val[idx].reshape(3,3,3).tolist()
Out[139]:
[[[0.1, 1.0, 1.0], [1.0, 2.0, 3.0], [3.0, 2.0, 1.0]],
[[12.0, 22.0, 32.0], [15.0, 25.0, 35.0], [16.0, 26.0, 36.0]],
[[17.0, 27.0, 37.0], [34.0, 24.0, 38.0], [54.0, 21.0, 31.0]]]
But if the list had just one level of nesting:
In [140]: alist = val.tolist()
In [141]: alist
Out[141]:
[[54.0, 21.0, 31.0],
[1.0, 2.0, 3.0],
[15.0, 25.0, 35.0],
[12.0, 22.0, 32.0],
[3.0, 2.0, 1.0],
[16.0, 26.0, 36.0],
[34.0, 24.0, 38.0],
[0.1, 1.0, 1.0],
[17.0, 27.0, 37.0]]
the python sorted works quite nicely:
In [142]: sorted(alist, key=lambda x:x[0]) # or itemgetter
Out[142]:
[[0.1, 1.0, 1.0],
[1.0, 2.0, 3.0],
[3.0, 2.0, 1.0],
[12.0, 22.0, 32.0],
[15.0, 25.0, 35.0],
[16.0, 26.0, 36.0],
[17.0, 27.0, 37.0],
[34.0, 24.0, 38.0],
[54.0, 21.0, 31.0]]
The fact that you have a double nested list, but want the sort to ignore one layer, complicates the list processing. That's where numpy reshape helps a lot.
For now I won't test the relative speeds of these approaches.

Changing the structure of np.array in Python

I have the following array structure.
array([[ 0.3, 0.1, 0. , 1. , 0. , 0. , 2.7],
[ 0.5, 0.5, 0. , 0. , 1. , 0. , 6. ],
[ 0.6, 0.4, -1. , 0. , 0. , 1. , 6. ]])
How can I change to the following structure?
array([[ 0.3, 0.1, 0. , 1. , 2.7],
[ 0.5, 0.5, 0. , 0. , 6. ],
[ 0.6, 0.4, -1. , 0. , 6. ]])
Assuming array is stored in a variable arr use indexing:
arr[:,[0, 1, 2, 3, 6]]

numpy only perform function on nonzero parts while preserving structure of array

In numpy:
Foo =
array([[ 3.5, 0. , 2.5, 2. , 0. , 1. , 0. ],
[ 0. , 3. , 2.5, 2. , 0. , 0. , 0.5],
[ 3.5, 0. , 0. , 0. , 1.5, 0. , 0.5]])
I want to perform a function on Foo such that only the nonzero elements are changed, i.e. for f(x) = x(nonzero)+5:
array([[ 8.5, 0. , 7.5, 7. , 0. , 6. , 0. ],
[ 0. , 8. , 8.5, 7. , 0. , 0. , 5.5],
[ 8.5, 0. , 0. , 0. , 6.5, 0. , 5.5]])
Also I want the shape/structure of the array to stay the same, so I don't think Foo[np.nonzero(Foo)] is going to work...
How do I do this in numpy?
thanks!
In [138]: foo = np.array([[ 3.5, 0. , 2.5, 2. , 0. , 1. , 0. ],
[ 0. , 3. , 2.5, 2. , 0. , 0. , 0.5],
[ 3.5, 0. , 0. , 0. , 1.5, 0. , 0.5]])
In [141]: mask = foo != 0
In [142]: foo[mask] = foo[mask]+5
In [143]: foo
Out[143]:
array([[ 8.5, 0. , 7.5, 7. , 0. , 6. , 0. ],
[ 0. , 8. , 7.5, 7. , 0. , 0. , 5.5],
[ 8.5, 0. , 0. , 0. , 6.5, 0. , 5.5]])
you can also do it in place as follows
>>> import numpy as np
>>> foo = np.array([[ 3.5, 0. , 2.5, 2. , 0. , 1. , 0. ],
... [ 0. , 3. , 2.5, 2. , 0. , 0. , 0.5],
... [ 3.5, 0. , 0. , 0. , 1.5, 0. , 0.5]])
>>> foo[foo!=0] += 5
>>> foo
array([[ 8.5, 0. , 7.5, 7. , 0. , 6. , 0. ],
[ 0. , 8. , 7.5, 7. , 0. , 0. , 5.5],
[ 8.5, 0. , 0. , 0. , 6.5, 0. , 5.5]])
>>>

Categories