Sorting matrix columns - python
I have a matrix 4*5 and I need to sort it by several columns.
Given these inputs:
sort_columns = [3, 1, 2, 4, 5, 2]
matrix = [[3, 1, 8, 1, 9],
[3, 7, 8, 2, 9],
[2, 7, 7, 1, 2],
[2, 1, 7, 1, 9]]
the matrix should first be sorted by the 3nd column (so the values 8, 8, 7, 7), then the sorted result should again be sorted by column 1 (values 3, 3, 2, 2) and so on.
So, after first sorting by column 3, the matrix would be:
2 7 7 1 2
2 1 7 1 9
3 1 8 1 9
3 7 8 2 9
and sorting on column 1 then has no effect as the values are already in the right order. The next column, 2, then makes the order:
2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9
etc.
After sorting on all the sort_columns numbers, I expect to get the result:
2 7 7 1 2
3 1 8 1 9
2 1 7 1 9
3 7 8 2 9
This is my code to sort the matrix:
def sort_matrix_columns(matrix, n, sort_columns):
for col in sort_columns:
column = col - 1
for i in range(n):
for j in range(i + 1, n):
if matrix[i][column] > matrix[j][column]:
temp = matrix[i]
matrix[i] = matrix[j]
matrix[j] = temp
which is called like this:
sort_matrix_columns(matrix, len(matrix), sort_columns)
But when I do I get the following wrong result:
3 1 8 1 9
2 1 7 1 9
2 7 7 1 2
3 7 8 2 9
Why am I getting the wrong order here? Where is my sort implementation failing?
The short answer is that your sort implementation is not stable.
A sort algorithm is stable when two entries in the sorted sequence keep the same (relative) order when their sort key is the same. For example, when sorting only by the first letter, a stable algorithm will always sort the sequence ['foo', 'flub', 'bar'] to be ['bar', 'foo', 'flub'], keeping the 'foo' and 'flub' values in the same relative order. Your algorithm would swap 'foo' and 'bar' (as 'f' > 'b' is true) without touching 'flub', and so you'd end up with ['bar', 'flub', 'foo'].
You need a stable sort algorithm when applying sort multiple times as you do when using multiple columns, because subsequent sortings should leave the original order applied by preceding sort operations when the value in the current column is the same between two rows.
You can see this when your implementation sorts by column 5, after first sorting on columns 3, 1, 2, 4. After those first 4 sort operations the matrix looks like this:
2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9
Your implementation then sorts by column 5, so by 9, 9, 2, 9. The first row is then swapped with the 3rd row (2 1 7 1 9 and 2 7 7 1 2, leaving the other rows all untouched. This changed the relative order of all the columns with a 9:
2 7 7 1 2 < - was third
3 1 8 1 9 < - so this row is now re-ordered!
2 1 7 1 9 < - was first
3 7 8 2 9
Sorting the above output by the 2nd column (7, 1, 1, 7) then leads to the wrong output you see.
A stable sort algorithm would have moved the 2 7 7 1 2 row to be the first row without reordering the other rows:
2 7 7 1 2 < - was third
2 1 7 1 9 < - was first
3 1 8 1 9 < - was second, stays *after* the first row
3 7 8 2 9 < - was third, stays *after* the second row
and sorting by the second column produces the correct output.
The default Python sort implementation, TimSort (named after its inventor, Tim Peters), is a stable sort function. You could just use that (via the list.sort() method and a sort key function):
def sort_matrix_columns(matrix, sort_columns):
for col in sort_columns:
matrix.sort(key=lambda row: row[col - 1])
Heads-up: I removed the n parameter from the function, for simplicity's sake.
Demo:
>>> def pm(m): print(*(' '.join(map(str, r)) for r in m), sep="\n")
...
>>> def sort_matrix_columns(matrix, sort_columns):
... for col in sort_columns:
... matrix.sort(key=lambda row: row[col - 1])
...
>>> sort_columns = [3, 1, 2, 4, 5, 2]
>>> matrix = [[3, 1, 8, 1, 9],
... [3, 7, 8, 2, 9],
... [2, 7, 7, 1, 2],
... [2, 1, 7, 1, 9]]
>>> sort_matrix_columns(matrix, sort_columns)
>>> pm(matrix)
2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9
You don't need to use loop, if you reverse the sort_columns list and use that to create a single sort key, you can do this with a single call:
def sort_matrix_columns(matrix, sort_columns):
matrix.sort(key=lambda r: [r[c - 1] for c in sort_columns[::-1]])
This works the same way, the most significant sort is the last column, only when two rows have the same value (a tie) would the one-but-last column sort matter, etc.
There are other stable sort algorithms, e.g. insertion or bubble sort would work just as well here. Wikipedia has a handy table of comparison sort algorithms that includes a 'stable' column, if you wanted to implement sorting yourself still.
E.g. here is a version using insertion sort:
def insertionsort_matrix_columns(matrix, sort_columns):
for col in sort_columns:
column = col - 1
for i in range(1, len(matrix)):
for j in range(i, 0, -1):
if matrix[j - 1][column] <= matrix[j][column]:
break
matrix[j - 1], matrix[j] = matrix[j], matrix[j - 1]
I didn't use a temp variable to swap two rows. In Python, you can swap two values simply by using tuple assignments.
Because insertion sort is stable, this produces the expected outcome:
>>> matrix = [[3, 1, 8, 1, 9],
... [3, 7, 8, 2, 9],
... [2, 7, 7, 1, 2],
... [2, 1, 7, 1, 9]]
>>> insertionsort_matrix_columns(matrix, sort_columns)
>>> pm(matrix)
2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9
Related
Generating all sums of one element per line in a matrix in Python
Let us say I have a matrix of 4 lines by 3 columns. E.g.: 1 2 3 4 5 6 7 8 9 10 11 12 I would like to generate the list of all possible sums, a sum being computed by taking only 1 element per line, for each line. E.g.: 1 + 4 + 7 + 10 = 22 3 + 5 + 7 + 11 = 26 ... How could I do this in Python?
Perform a Cartesion product, summing the values in each result of that product. You can use itertools, as it provides an iterator over a Cartesion product: import itertools m=[[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]] res = map(sum, itertools.product(*m)) print(list(res))
An efficient way to concatenate rows of a 2-dim array according to a given list of pairs of indexes
Suppose I have a 2 dimensional array with a very large number of rows, and a list of pairs of indexes of that array. I want to create a new 2 dim array, whose rows are concatenations of the rows of the original array, made according to the list of pairs of indexes. For example: a = 1 2 3 4 5 6 7 8 9 0 0 0 indexes = [[0,0], [0,1], [2,3]] the returned array should be: 1 2 3 1 2 3 1 2 3 4 5 6 7 8 9 0 0 0 Obviously I can iterate the list of indexes, but my question is whether there is a more efficient way of doing this. I should say that the list of indexes is also very large.
First convert indexes to a Numpy array: ind = np.array(indexes) Then generate your result as: result = np.concatenate([a[ind[:,0]], a[ind[:,1]]], axis=1) The result is: array([[1, 2, 3, 1, 2, 3], [1, 2, 3, 4, 5, 6], [7, 8, 9, 0, 0, 0]]) Another possible formula (with the same result): result = np.concatenate([ a[ind[:,i]] for i in range(ind.shape[1]) ], axis=1)
You can do this in one line using NumPy as: a = np.arange(12).reshape(4, 3) print(a) b = [[0, 0], [1, 1], [2, 3]] b = np.array(b) print(b) c = a[b.reshape(-1)].reshape(-1, a.shape[1]*b.shape[1]) print(c) ''' [[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]] [[0 0] [1 1] [2 3]] [[ 0 1 2 0 1 2] [ 3 4 5 3 4 5] [ 6 7 8 9 10 11]] '''
You can use horizontal stacking np.hstack: c = np.array(indexes) np.hstack((a[c[:,0]],a[c[:,1]])) output: [[1 2 3 1 2 3] [1 2 3 4 5 6] [7 8 9 0 0 0]]
Python: Find maximum element index in a list
I have a list arr = [0, 1, 45, 2, 40, 3, 70, 4, 45, 5, 6, 7, 8, 9] in which I'm trying to find the position/index of the maximum element from 3 consecutive elements using below code: for i in range (0, len(arr)-3): print(arr.index(max(arr[i : i+3]))) When i goes to position 7, it gives incorrect result. Result should be: 2 2 2 4 6 6 6 8 8 11 12 But is instead 2 2 2 4 6 6 6 2 2 11 12
That's because there's two 45's and index returns the first occurrence. You can pass a start and end argument to tell the index method from which indices to start looking from. for i in range (0, len(arr)-3): print(arr.index(max(arr[i : i+3]), i, i+3)) Or alternatively: for i in range (0, len(arr)-3): sliced_array = arr[i : i+3] print(i + sliced_array.index(max(sliced_array)))
Groupwise sorting in pandas
I want to sort an array within the group boundaries defined in another array. The groups are not presorted in any way and need to remain unchanged after the sorting. In numpy terms it would look like this: import numpy as np def groupwise_sort(group_idx, a, reverse=False): sortidx = np.lexsort((-a if reverse else a, group_idx)) # Reverse sorting back to into grouped order, but preserving groupwise sorting revidx = np.argsort(np.argsort(group_idx, kind='mergesort'), kind='mergesort') return a[sortidx][revidx] group_idx = np.array([3, 2, 3, 2, 2, 1, 2, 1, 1]) a = np.array([3, 2, 1, 7, 4, 5, 5, 9, 1]) groupwise_sort(group_idx, a) # >>> array([1, 2, 3, 4, 5, 1, 7, 5, 9]) groupwise_sort(group_idx, a, reverse=True) # >>> array([3, 7, 1, 5, 4, 9, 2, 5, 1]) How can I do the same with pandas? I saw df.groupby() and df.sort_values(), though I couldn't find a straight forward way to achieve the same sorting. And a fast one, if possible.
Let us first set the stage: import pandas as pd import numpy as np group_idx = np.array([3, 2, 3, 2, 2, 1, 2, 1, 1]) a = np.array([3, 2, 1, 7, 4, 5, 5, 9, 1]) df = pd.DataFrame({'group': group_idx, 'values': a}) df # group values #0 3 3 #1 2 2 #2 3 1 #3 2 7 #4 2 4 #5 1 5 #6 2 5 #7 1 9 #8 1 1 To get a dataframe sorted by group and values (within groups): df.sort_values(["group", "values"]) # group values #8 1 1 #5 1 5 #7 1 9 #1 2 2 #4 2 4 #6 2 5 #3 2 7 #2 3 1 #0 3 3 To sort the values in descending order, use ascending = False. To apply different orders to different columns, you can supply a list: df.sort_values(["group", "values"], ascending = [True, False]) # group values #7 1 9 #5 1 5 #8 1 1 #3 2 7 #6 2 5 #4 2 4 #1 2 2 #0 3 3 #2 3 1 Here, groups are sorted in ascending order, and the values within each group are sorted in descending order. To only sort values for contiguous rows belonging to the same group, create a new group indicator: (I keep this in here for reference since it might be helpful for others. I wrote this in an earlier version before the OP clarified his question in the comments.) df['new_grp'] = (df.group.diff(1) != 0).astype('int').cumsum() df # group values new_grp #0 3 3 1 #1 2 2 2 #2 3 1 3 #3 2 7 4 #4 2 4 4 #5 1 5 5 #6 2 5 6 #7 1 9 7 #8 1 1 7 We can then easily sort with new_grp instead of group, leaving the original order of groups untouched. Ordering within groups but keeping the group-specifing row-positions: To sort the elements of each group but keep the group-specific positions in the dataframe, we need to keep track of the original row numbers. For instance, the following will do the trick: # First, create an indicator for the original row-number: df["ind"] = range(len(df)) # Now, sort the dataframe as before df_sorted = df.sort_values(["group", "values"]) # sort the original row-numbers within each group newindex = df.groupby("group").apply(lambda x: x.sort_values(["ind"]))["ind"].values # assign the sorted row-numbers to the sorted dataframe df_sorted["ind"] = newindex # Sort based on the row-numbers: sorted_asc = df_sorted.sort_values("ind") # compare the resulting order of values with your desired output: np.array(sorted_asc["values"]) # array([1, 2, 3, 4, 5, 1, 7, 5, 9]) This is easier to test and profile when written up in a function, so let's do that: def sort_my_frame(frame, groupcol = "group", valcol = "values", asc = True): frame["ind"] = range(len(frame)) frame_sorted = frame.sort_values([groupcol, valcol], ascending = [True, asc]) ind_sorted = frame.groupby(groupcol).apply(lambda x: x.sort_values(["ind"]))["ind"].values frame_sorted["ind"] = ind_sorted frame_sorted = frame_sorted.sort_values(["ind"]) return(frame_sorted.drop(columns = "ind")) np.array(sort_my_frame(df, "group", "values", asc = True)["values"]) # array([1, 2, 3, 4, 5, 1, 7, 5, 9]) np.array(sort_my_frame(df, "group", "values", asc = False)["values"]) # array([3, 7, 1, 5, 4, 9, 2, 5, 1]) Note that the latter results match your desired outcome. I am sure this can be written up in a more succinct way. For instance, if the index of your dataframe is already ordered, you can use that one instead of the indicator ind I create (i.e., following #DJK's comment, we can use sort_index instead of sort_values and avoid assigning an additional column). In any case, the above highlights one possible solution and how to approach it. An alternative would be to use your numpy functions and wrap the output around a pd.DataFrame.
Pandas is built on top of numpy. Assuming a dataframe like so: df Out[21]: group values 0 3 3 1 2 2 2 3 1 3 2 7 4 2 4 5 1 5 6 2 5 7 1 9 8 1 1 Call your function. groupwise_sort(df.group.values, df['values'].values) Out[22]: array([1, 2, 3, 4, 5, 1, 7, 5, 9]) groupwise_sort(df.group.values, df['values'].values, reverse=True) Out[23]: array([3, 7, 1, 5, 4, 9, 2, 5, 1])
Rearranging numbers from list in python3
Lets say I have an list of numbers a = [ 1,2,3,4,5,6,7,8,9,10] and I want to print the output as 1 2 3 4 5 6 7 8 9 10 How can I do it in python3. My attempt: a = [1,2,3,4,5,6,7,8,9,10] for i in a: print(a[i]," ") i=i+1 I'm getting IndexError: list index out of range and also I don't know to print 1 element in 1'st row , 2nd and 3rd in second row and so on.
One way to do this in Python 3 is to use islice on an iterator : from itertools import islice a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] it = iter(a) print('\n'.join([' '.join([str(u)for u in islice(it, i)])for i in range(1,5)])) output 1 2 3 4 5 6 7 8 9 10