Sort Multidimensional List - Python - python

I have a 3d list of lists or numpy array and I need to sort it, by the smallest first item index.
This are the last two tentatives I did on this program. Sorry, I am quite sure it is an easy/silly question, but as a newbie in programming 'way of thinking', it is kind of hard for me.
First Try:
lstsArray = [[[54,21,31], [1,2,3], [15,25,35]],
[[12,22,32], [3,2,1], [16,26,36]],
[[34,24,38], [0.1,1,1], [17,27,37]]]
val = np.array(lstsArray)
menor = 120e26
for item in val:
for i in item:
if menor >= i[0] and i[0] >= min(i):
menor = i[0]
print(menor)
lstA = list(val)
a = sorted(lstA, key=itemgetter(menor))
print(a)
Second Try
for i in val:
for j in i:
print(sorted((i), key =itemgetter(j[0])))
Desired Output
[[[0.1,1,1],[1,2,3],[3,2,1]],
[[12,22,32],[15,25,35],[16,26,36]],
[[17,27,37],[34,24,38],[54,21,31]]]

Your list, and array made from it. Note the floats in the array:
In [124]: lstsArray = [[[54,21,31], [1,2,3], [15,25,35]],
...: [[12,22,32], [3,2,1], [16,26,36]],
...: [[34,24,38], [0.1,1,1], [17,27,37]]]
In [125]: val=np.array(lstsArray)
In [126]: val
Out[126]:
array([[[54. , 21. , 31. ],
[ 1. , 2. , 3. ],
[15. , 25. , 35. ]],
[[12. , 22. , 32. ],
[ 3. , 2. , 1. ],
[16. , 26. , 36. ]],
[[34. , 24. , 38. ],
[ 0.1, 1. , 1. ],
[17. , 27. , 37. ]]])
This is a (3,3,3) shaped array. But your sorting ignores the initial (3,3) layout, so let's go ahead and reshape it:
In [133]: val = np.array(lstsArray).reshape(-1,3)
In [134]: val
Out[134]:
array([[54. , 21. , 31. ],
[ 1. , 2. , 3. ],
[15. , 25. , 35. ],
[12. , 22. , 32. ],
[ 3. , 2. , 1. ],
[16. , 26. , 36. ],
[34. , 24. , 38. ],
[ 0.1, 1. , 1. ],
[17. , 27. , 37. ]])
Now we can easily reshape on the first column value. argsort gives the sort order:
In [135]: idx = np.argsort(val[:,0])
In [136]: idx
Out[136]: array([7, 1, 4, 3, 2, 5, 8, 6, 0])
In [137]: val[idx]
Out[137]:
array([[ 0.1, 1. , 1. ],
[ 1. , 2. , 3. ],
[ 3. , 2. , 1. ],
[12. , 22. , 32. ],
[15. , 25. , 35. ],
[16. , 26. , 36. ],
[17. , 27. , 37. ],
[34. , 24. , 38. ],
[54. , 21. , 31. ]])
and to get it back to 3d:
In [138]: val[idx].reshape(3,3,3)
Out[138]:
array([[[ 0.1, 1. , 1. ],
[ 1. , 2. , 3. ],
[ 3. , 2. , 1. ]],
[[12. , 22. , 32. ],
[15. , 25. , 35. ],
[16. , 26. , 36. ]],
[[17. , 27. , 37. ],
[34. , 24. , 38. ],
[54. , 21. , 31. ]]])
or in list display:
In [139]: val[idx].reshape(3,3,3).tolist()
Out[139]:
[[[0.1, 1.0, 1.0], [1.0, 2.0, 3.0], [3.0, 2.0, 1.0]],
[[12.0, 22.0, 32.0], [15.0, 25.0, 35.0], [16.0, 26.0, 36.0]],
[[17.0, 27.0, 37.0], [34.0, 24.0, 38.0], [54.0, 21.0, 31.0]]]
But if the list had just one level of nesting:
In [140]: alist = val.tolist()
In [141]: alist
Out[141]:
[[54.0, 21.0, 31.0],
[1.0, 2.0, 3.0],
[15.0, 25.0, 35.0],
[12.0, 22.0, 32.0],
[3.0, 2.0, 1.0],
[16.0, 26.0, 36.0],
[34.0, 24.0, 38.0],
[0.1, 1.0, 1.0],
[17.0, 27.0, 37.0]]
the python sorted works quite nicely:
In [142]: sorted(alist, key=lambda x:x[0]) # or itemgetter
Out[142]:
[[0.1, 1.0, 1.0],
[1.0, 2.0, 3.0],
[3.0, 2.0, 1.0],
[12.0, 22.0, 32.0],
[15.0, 25.0, 35.0],
[16.0, 26.0, 36.0],
[17.0, 27.0, 37.0],
[34.0, 24.0, 38.0],
[54.0, 21.0, 31.0]]
The fact that you have a double nested list, but want the sort to ignore one layer, complicates the list processing. That's where numpy reshape helps a lot.
For now I won't test the relative speeds of these approaches.

Related

python: DELETE points out of a very big 2D array and elements are float, like discarding unwanted points in KNN

I have a 2D array and I want to delete a point out of it but suppose it's so big meaning I can't specify an index and just grab it and the values of the array are float
How can I delete this point? With a LOOP and WITHOUT LOOP?? the following is 2D array and I want to delete [ 32.9, 23.]
[[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]]
I tried this but doesn't work:
this_point = np.asarray([ 32.9, 23.])
[x for x in y if x == point]
del datapoints[this_point]
np.delete(datapoints,len(datapoints), axis=0)
for this_point in datapoints:
del this_point
when I do this, the this_point stays in after printing all points, what should I do?
Python can remove a list element by content, but numpy does only by index. So, use "where" to find the coordinates of the matching row:
import numpy as np
a = np.array([[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]])
find = np.array([32.9,23.])
row = np.where( (a == find).all(axis=1))
print( row )
print(np.delete( a, row, axis=0 ) )
Output:
(array([10], dtype=int64),)
[[ 1. -1.4]
[ -2.9 -1.5]
[ -3.6 -2. ]
[ 1.5 1. ]
[ 24. 11. ]
[ -1. 1.4]
[ 2.9 1.5]
[ 3.6 2. ]
[ -1.5 -1. ]
[ -24. -11. ]
[-440. 310. ]]
C:\tmp>

ValueError: Found array with dim 3. Estimator expected <= 2 python

I am trying to perform decision trees with some train and test data which are in lists named x&y.
my train data x is this:
[array([[19. , 14. , 0.8],
[23. , 24. , 0.8],
[25. , 26. , 0.8],
[22. , 24. , 1. ],
[25. , 29. , 1.4],
[36. , 86. , 1.6],
[28. , 52. , 0.8],
[21. , 20. , 1. ],
[22. , 28. , 0.8],
[24. , 27. , 1. ],
[18. , 8. , 0.6],
[30. , 58. , 1.2],
[24. , 30. , 0.8],
[24. , 28. , 0.8],
[32. , 65. , 1.6],
[28. , 47. , 0.8],
[26. , 41. , 0.8],
[18. , 14. , 0.6],
[32. , 71. , 2.2],
[27. , 45. , 2. ],
[29. , 53. , 2.2],
[18. , 11. , 0.8],
[20. , 23. , 0.8],
[20. , 19. , 0.6],
[20. , 15. , 0.6],
[19. , 18. , 0.4],
[24. , 55. , 1.2],
[24. , 59. , 1. ],
[20. , 17. , 0.6],
[21. , 28. , 0.8]])]
and y:
[array([ 3100., 2750., 7800., 6000., 15000., 15500., 5600., 8000.,
6000., 7500., 4000., 9000., 5850., 5750., 18000., 5600.,
5600., 4500., 22000., 21500., 24000., 4000., 6000., 4000.,
8000., 8000., 14000., 14000., 6000., 4000.])]
when i try to perform
dtree= DecisionTreeRegressor(random_state=0, max_depth=1)
dtree.fit(x_train, y_train)
I get the error ValueError: Found array with dim 3. Estimator expected <= 2. and couldn't solve it with reshape since these are lists. any suggestions?
First of all, I recommend you to convert X and Y as numpy arrays, but I can not be 100% sure if your variables are indeed, since you haven't uploaded your code here. Secondly, take a look at your variables. As it says in the page:
X{array-like, sparse matrix} of shape (n_samples, n_features)
AND
yarray-like of shape (n_samples,) or (n_samples, n_outputs)
fit function expects 2D arrays in both X and Y arrays. And X_train is 3D.
So you need to reshape these two. One solution can be:
AMSWER EDITTED AFTER READING HIS/HER COMMENTS
The reason why you can't train your data is because 2 things:
X_train has a bad shape
Y_train has a bad shape
Your are passing a 3D array with X_train, and fit only allows you to be 2D. Furthermore, your Y_train has shape (1, 30) which means you are passing 30 data at once. You need to separate them and passing as (30, ), as follows:
from sklearn.tree import DecisionTreeRegressor
import numpy as np
X_train = np.array([np.array([[19. , 14. , 0.8],
[23. , 24. , 0.8],
[25. , 26. , 0.8],
[22. , 24. , 1. ],
[25. , 29. , 1.4],
[36. , 86. , 1.6],
[28. , 52. , 0.8],
[21. , 20. , 1. ],
[22. , 28. , 0.8],
[24. , 27. , 1. ],
[18. , 8. , 0.6],
[30. , 58. , 1.2],
[24. , 30. , 0.8],
[24. , 28. , 0.8],
[32. , 65. , 1.6],
[28. , 47. , 0.8],
[26. , 41. , 0.8],
[18. , 14. , 0.6],
[32. , 71. , 2.2],
[27. , 45. , 2. ],
[29. , 53. , 2.2],
[18. , 11. , 0.8],
[20. , 23. , 0.8],
[20. , 19. , 0.6],
[20. , 15. , 0.6],
[19. , 18. , 0.4],
[24. , 55. , 1.2],
[24. , 59. , 1. ],
[20. , 17. , 0.6],
[21. , 28. , 0.8]])])
dimX1, dimX2, dimX3 = np.array(X_train).shape
X_train = np.reshape(np.array(X_train), (dimX1*dimX2, dimX3))
Y_train = np.array([np.array([ 3100., 2750., 7800., 6000., 15000., 15500., 5600., 8000.,
6000., 7500., 4000., 9000., 5850., 5750., 18000., 5600.,
5600., 4500., 22000., 21500., 24000., 4000., 6000., 4000.,
8000., 8000., 14000., 14000., 6000., 4000.])])
dimY1, dimY2 = Y_train.shape
Y_train = np.reshape(np.array(Y_train), (dimY2, ))
print(X_train.shape, Y_train.shape)
dtree= DecisionTreeRegressor(random_state=0, max_depth=1)
dtree.fit(X_train, Y_train)
Its output is:
>>> (30, 3) (30,)
>>> DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=1,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=0, splitter='best')

Combination of rows in numpy.ndarray

I have the following numpy.ndarray
S=np.array([[[ -0.6, -0.2, 0. ],
[-60. , 2. , 0. ],
[ 6. , -20. , 0. ]],
[[ -0.4, -0.8, 0. ],
[-40. , 8. , 0. ],
[ 4. , -80. , 0. ]]])
I want to find all the possible combinations of sum of each row (sum of individual elements of a row except the last column) of S[0,:,:] with each row of S[1,:,:], i.e., my desired result is (order does not matter):
array([[-1, -1],
[-40.6, 7.8],
[3.4, -80.2],
[-60.4, 1.2],
[-100, 10],
[-56, -78],
[5.6, -20.8],
[-34, -12],
[10, -100]])
which is a 9-by-2 array resulting from 9 possible combinations of S[0,:,:] and S[1,:,:]. Although I have used a particular shape of S here, the shape may vary, i.e., for
x,y,z = np.shape(S)
in the above problem, x=2, y=3, and z=3, but these values may vary. Therefore, I am seeking for a generalized version.
Your help will be highly appreciated. Thank you for your time!
(Please no for loops if possible. It is pretty trivial then.)
You can use broadcast like this:
(S[0,:,None, :-1] + S[1,None,:,:-1]).reshape(-1,2)
Output:
array([[ -1. , -1. ],
[ -40.6, 7.8],
[ 3.4, -80.2],
[ -60.4, 1.2],
[-100. , 10. ],
[ -56. , -78. ],
[ 5.6, -20.8],
[ -34. , -12. ],
[ 10. , -100. ]])

turning a list of numpy.ndarray to a matrix in order to perform multiplication

i have vectors of this form :
test=np.linspace(0,1,10)
i want to stack them horizontally in order to make a matrix .
problem is that i define them in a loop so the first stack is between an empty matrix and the first column vector , which gives the following error:
ValueError: all the input arrays must have same number of dimensions
bottom line - i have a for loop that with every iteration creates a vector p1 and i want to add it to a final matrix of the form :
[p1 p2 p3 p4] which i could then do matrix operations on such as multiplying by the transposed etc
If you've got a list of 1D arrays that you want horizontally stacked, you could convert them all to column first, but it's probably easier to just vertically stack them and then transpose:
In [6]: vector_list = [np.linspace(0, 1, 10) for _ in range(3)]
In [7]: np.vstack(vector_list).T
Out[7]:
array([[0. , 0. , 0. ],
[0.11111111, 0.11111111, 0.11111111],
[0.22222222, 0.22222222, 0.22222222],
[0.33333333, 0.33333333, 0.33333333],
[0.44444444, 0.44444444, 0.44444444],
[0.55555556, 0.55555556, 0.55555556],
[0.66666667, 0.66666667, 0.66666667],
[0.77777778, 0.77777778, 0.77777778],
[0.88888889, 0.88888889, 0.88888889],
[1. , 1. , 1. ]])
How did you get this dimension error? What does empty array have to do with it?
A list of arrays of the same length:
In [610]: alist = [np.linspace(0,1,6), np.linspace(10,11,6)]
In [611]: alist
Out[611]:
[array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]),
array([10. , 10.2, 10.4, 10.6, 10.8, 11. ])]
Several ways of making an array from them:
In [612]: np.array(alist)
Out[612]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
In [614]: np.stack(alist)
Out[614]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
If you want to join them in columns, you can transpose one of the above, or use:
In [615]: np.stack(alist, axis=1)
Out[615]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
np.column_stack is also handy.
In newer numpy versions you can do:
In [617]: np.linspace((0,10),(1,11),6)
Out[617]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
You don't specify how you create the 'empty array' and how you attempt to stack. I can't exactly recreate the error message (full traceback would have helped). But given that message did you check the number of dimensions of the inputs? Did they match?
Array stacking in a loop is tricky. You have to pay close attention to the shapes, especially of the initial 'empty' array. There isn't a close analog to the empty list []. np.array([]) is 1d with shape (1,). np.empty((0,6)) is 2d with shape (0,6). Also all the stacking functions create a new array with each call (non operate in-place), so they are inefficient (compared to list append).

Delete columns based on repeat value in one row in numpy array

I'm hoping to delete columns in my arrays that have repeat entries in row 1 as shown below (row 1 has repeats of values 1 & 2.5, so one of each of those values have been been deleted, together with the column each deleted value lies within).
initial_array =
row 0 [[ 1, 1, 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2.5, 2, 1, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 3, 2.5, 1.5, 4,]
row 3 [228, 314, 173, 452, 168, 351, 300, 396]]
final_array =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 314, 173, 452, 351, 396]]
Ways I was thinking of included using some function that checked for repeats, giving a True response for the second (or more) time a value turned up in the dataset, then using that response to delete the row. That or possibly using the return indices function within numpy.unique. I just can't quite find a way through it or find the right function though.
If I could find a way to return an mean value in the row 3 of the retained repeat and the deleted one, that would be even better (see below).
final_array_averaged =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 307, 170.5, 452, 351, 396]]
Thanks in advance for any help you can give to a beginner who is stumped!
You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -
_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C
Sample run -
In [212]: arr
Out[212]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2.5, 2. , 1. , 3.5],
[ 1. , 1.5, 3. , 4.5, 3. , 2.5, 1.5, 4. ],
[ 228. , 314. , 173. , 452. , 168. , 351. , 300. , 396. ]])
In [213]: out
Out[213]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2. , 2.5, 3.5, 4. ],
[ 1. , 1.5, 2.5, 3. , 4. , 4.5],
[ 228. , 307. , 351. , 170.5, 396. , 452. ]])
As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -
In [221]: out[:,unqID.argsort()]
Out[221]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 307. , 170.5, 452. , 351. , 396. ]])
You can find the indices of wanted columns using unique:
>>> indices = np.sort(np.unique(A[1], return_index=True)[1])
Then use a simple indexing to get the desire columns:
>>> A[:,indices]
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 314. , 173. , 452. , 351. , 396. ]])
This is a typical grouping problem, which can be solve elegantly and efficiently using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
unique, final_array = npi.group_by(initial_array[1]).mean(initial_array, axis=1)
Note that there are many other reductions than mean; if you want the original behavior you described, you could replace 'mean' with 'first', for instance.

Categories