Sorted a list by frequency - python

I would like to sort a list by its frequency in descending order. If the frequency for two values is the same, then I also want the descending order for these two values.
For example,
mylist = [1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 5, 5, 5, 4, 4, 4, 4, 4, 4]
I would like my result to be
[4,4,4,4,4,4,3,3,3,3,3,5,5,5,2,2,2,1,1].
If I use
sorted(mylist,key = mylist.count,reverse = True)
I would get
[4,4,4,4,4,4,3,3,3,3,3,2,2,2,5,5,5,1,1];
I tried
sorted(mylist,key = lambda x:(mylist.count,-x),reverse = True)
But I think something is wrong, and it only give me the result:
[1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5].
So my questions are how can I get the result I want and why the result will be
[1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5]
if I use
sorted(mylist,key = lambda x:(mylist.count,-x),reverse = True)

Use a Counter to get the frequencies, then sort by the frequencies it gives:
from collections import Counter
def sorted_by_frequency(arr):
counts = Counter(arr)
# secondarily sort by value
arr2 = sorted(arr, reverse=True)
# primarily sort by frequency
return sorted(arr2, key=counts.get, reverse=True)
# Usage:
>>> sorted_by_frequency([1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 5, 5, 5, 4, 4, 4, 4, 4, 4])
[4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 5, 5, 5, 2, 2, 2, 1, 1]

You can try :
from collections import Counter
counts = Counter(mylist)
new_list = sorted(mylist, key=lambda x: (counts[x], x), reverse=True)

Why does
sorted(mylist, key=lambda x: (mylist.count, -x), reverse=True)
go wrong?
It compares the keys, so for example the two values 3 and 1 become the pairs (mylist.count, -3) and (mylist.count, -1) and the comparison would be (mylist.count, -3) < (mylist.count, -1).
So the obvious mistake is that the pairs don't have the frequencies of the numbers as intended. Instead they have that function. And the function is not less than itself.
But I find it interesting to note what exactly happens then. How does that pair comparison work? You might think that (a, b) < (c, d) is equivalent to (a < c) or (a == c and b < d). That is not the case. Because that would evaluate mylist.count < mylist.count, and then you'd crash with a TypeError. The actual way tuples compare with each other is by first finding a difference, and that's done by checking equality. And mylist.count == mylist.count not only doesn't crash but returns True. So the tuple comparison then goes to the next index, where it will find the -3 and -1.
So essentially you're really only doing
sorted(mylist, key=lambda x: -x, reverse=True)
and the negation and the reverse=True cancel each other out, so you get the same as
sorted(mylist, key=lambda x: x)
or just
sorted(mylist)
Now how to get it right? One way is to call the function (and to remove the negation):
result = sorted(mylist, key=lambda x: (mylist.count(x), x), reverse=True)
Or negate both frequency and value, instead of reverse=True:
result = sorted(mylist, key=lambda x: (-mylist.count(x), -x))
Another would be to take advantage of the sort's stability and use two simpler sorts (which might even be faster than the one more elaborate sort):
result = sorted(mylist, reverse=True)
result.sort(key=mylist.count, reverse=True)
Note that here we don't have to call mylist.count ourselves, because as it is the key it will be called for us. Just like your "lambda function" does get called (just not the function inside its result). Also note that I use sorted followed by in-place sort - no point creating yet another list and incur the costs associated with that.
Though in all cases, for long lists it would be more efficient to use a collections.Counter instead of mylist.count, as the latter makes the solution take O(n2) instead of O(n log n).

Related

Sort a list based on the the order of occurence of that value in another list

How to sort values of A based on the order of occurrence in B where values in A may be repetitive and values in B are unique
A=[1, 2, 2, 2, 3, 4, 4, 5]
B=[8, 5, 6, 2, 10, 3, 1, 9, 4]
The expected list is C which should contain
C = [5, 2, 2, 2, 3, 1, 4, 4]
Solution:
Try using sorted:
C = sorted(A, key=B.index)
And now:
print(C)
Output:
[5, 2, 2, 2, 3, 1, 4, 4]
Documentation reference:
As mentioned in the documentation of sorted:
Return a new sorted list from the items in iterable.
Has two optional arguments which must be specified as keyword
arguments.
key specifies a function of one argument that is used to extract a
comparison key from each element in iterable (for example,
key=str.lower). The default value is None (compare the elements
directly).
reverse is a boolean value. If set to True, then the list elements are
sorted as if each comparison were reversed.
you can use the key in sorted function
A=[1, 2, 2, 2, 3, 4, 4, 5]
B=[8, 5, 6, 2, 10, 3, 1, 9, 4]
C = ((i, B.index(i)) for i in A) # <generator object <genexpr> at 0x000001CE8FFBE0A0>
output = [i[0] for i in sorted(C, key=lambda x: x[1])] #[5, 2, 2, 2, 3, 1, 4, 4]
You can sort it without actually using a sort. The Counter class (from collection) is a special dictionary that maintains counts for a set of keys. In this case, your B list contains all keys that are possible. So you can use it to initialize a Counter object with zero occurrences of each key (this will preserve the order) and then add the A list to that. Finally, get the repeated elements out of the resulting Counter object.
from collections import Counter
A=[1, 2, 2, 2, 3, 4, 4, 5]
B=[8, 5, 6, 2, 10, 3, 1, 9, 4]
C = Counter(dict.fromkeys(B,0)) # initialize order
C.update(A) # 'sort' A
C = list(C.elements()) # get sorted elements
print(C)
[5, 2, 2, 2, 3, 1, 4, 4]
You could also write it in a single line:
C = list((Counter(dict.fromkeys(B,0))+Counter(A)).elements())
While using sorted(A,key=B.index) is simpler to write, this solution has lower complexity O(K+N) than a sort on an index lookup O(N x K x logN).

Using lambda function in generator expression

I am trying to count the total number of occurrences of a given val in the list using a lambda function:
def countOccurrence(givenList, val):
result = sum(1 for i in range(len(givenList)) if lambda i: givenList(i) == val)
return result
givenList = [3, 4, 5, 8, 0, 3, 8, 5, 0, 3, 1, 5, 2, 3, 4, 2]
print(countOccurrence(givenList, 5))
But the returned result is 16, which is nothing but the length of the list.
if you are trying to count the number of 5's in a list you should use the builtin
my_list.count(5)
Why use a lambda?
def countOccurrence(givenList, val):
result = sum(1 for i in range(len(givenList)) if givenList[i] == val)
return result
givenList = [3, 4, 5, 8, 0, 3, 8, 5, 0, 3, 1, 5, 2, 3, 4, 2]
print(countOccurrence(givenList, 5))
If you are trying to count the number of 5's in a list you should use the Counter. You get the numbers of all other elements as a bonus:
from collections import Counter
cntr = Counter(givenList)
#Counter({3: 4, 5: 3, 4: 2, 8: 2, 0: 2, 2: 2, 1: 1})
cntr[5]
# 3
As is often the case in Python, the best way to do something its make use of its built-ins as much as possible because they've frequently been written in C. In this case that would be the count() method of sequence objects, my_list.count(5) as #Joran Beasley suggests in his answer.
Regardless, for future reference the code below shows how to use a lambda function in a generator expression like you where trying to do. Note that the lambda function needs to be defined outside of the expression itself and also what its definition needs to (because what you had wasn't quite correct).
def countOccurrence(givenList, val):
check_list = lambda i: givenList[i] == val
return sum(1 for i in range(len(givenList)) if check_list(i))
givenList = [3, 4, 5, 8, 0, 3, 8, 5, 0, 3, 1, 5, 2, 3, 4, 2]
print(countOccurrence(givenList, 5)) # -> 3

Is it possible to sort a list with reduce?

I was given this as an exercise. I could of course sort a list by using sorted() or other ways from Python Standard Library, but I can't in this case. I think I'm only supposed to use reduce().
from functools import reduce
arr = [17, 2, 3, 6, 1, 3, 1, 9, 5, 3]
sorted_arr = reduce(lambda a,b : (b,a) if a > b else (a,b), arr)
The error I get:
TypeError: '>' not supported between instances of 'tuple' and 'int'
Which is expected, because my reduce function inserts a tuple into the int array, instead of 2 separate integers. And then the tuple gets compared to an int...
Is there a way to insert back 2 numbers into the list, and only run the function on every second number in the list? Or a way to swap the numbers with using reduce()?
Documentation says very little about the reduce function, so I am out of ideas right now.
https://docs.python.org/3/library/functools.html?highlight=reduce#functools.reduce
Here is one way to sort the list using reduce:
arr = [17, 2, 3, 6, 1, 3, 1, 9, 5, 3]
sorted_arr = reduce(
lambda a, b: [x for x in a if x <= b] + [b] + [x for x in a if x > b],
arr,
[]
)
print(sorted_arr)
#[1, 1, 2, 3, 3, 3, 5, 6, 9, 17]
At each reduce step, build a new output list which concatenates a list of all of the values less than or equal to b, [b], and a list of all of the values greater than b. Use the optional third argument to reduce to initialize the output to an empty list.
I think you're misunderstanding how reduce works here. Reduce is synonymous to right-fold in some other languages (e.g. Haskell). The first argument expects a function which takes two parameters: an accumulator and an element to accumulate.
Let's hack into it:
arr = [17, 2, 3, 6, 1, 3, 1, 9, 5, 3]
reduce(lambda xs, x: [print(xs, x), xs+[x]][1], arr, [])
Here, xs is the accumulator and x is the element to accumulate. Don't worry too much about [print(xs, x), xs+[x]][1] – it's just there to print intermediate values of xs and x. Without the printing, we could simplify the lambda to lambda xs, x: xs + [x], which just appends to the list.
The above outputs:
[] 17
[17] 2
[17, 2] 3
[17, 2, 3] 6
[17, 2, 3, 6] 1
[17, 2, 3, 6, 1] 3
[17, 2, 3, 6, 1, 3] 1
[17, 2, 3, 6, 1, 3, 1] 9
[17, 2, 3, 6, 1, 3, 1, 9] 5
[17, 2, 3, 6, 1, 3, 1, 9, 5] 3
As we can see, reduce passes an accumulated list as the first argument and a new element as the second argument.(If reduce is still boggling you, How does reduce work? contains some nice explanations.)
Our particular lambda inserts a new element into the accumulator on each "iteration". This hints at insertion sort:
def insert(xs, n):
"""
Finds first element in `xs` greater than `n` and returns an inserted element.
`xs` is assumed to be a sorted list.
"""
for i, x in enumerate(xs):
if x > n:
return xs[:i] + [n] + xs[i:]
return xs + [n]
sorted_arr = reduce(insert, arr, [])
print(sorted_arr)
This prints the correctly sorted array:
[1, 1, 2, 3, 3, 3, 5, 6, 9, 17]
Note that a third parameter to reduce (i.e. []) was specified as we initialise the sort should with an empty list.
Ninjad! But yes, it's an insertion sort.
def insert(acc, e):
for i, x in enumerate(acc):
if x > e:
acc.insert(i, e)
return acc
acc.append(e)
return acc
reduce(insert, [1, 2, 6, 4, 7, 3, 0, -1], [])
outputs
[-1, 0, 1, 2, 3, 4, 6, 7]
After some thinking I concluded that it is also possible to do swap-based sort, if you are allowed to use reduce more than once. Namely:
from functools import reduce
arr = [17, 2, 3, 6, 1, 3, 1, 9, 5, 3]
def func(acc,x):
if not acc:
return [x]
if acc[-1]<x:
return acc+[x]
else:
return acc[:-1]+[x]+acc[-1:]
def my_sort(x):
moresorted = reduce(func,x,[])
print(moresorted)
if x==moresorted:
return moresorted
else:
return my_sort(moresorted)
print('arr:',arr)
arr_sorted = my_sort(arr)
print('arr sorted:',arr_sorted)
Output:
arr: [17, 2, 3, 6, 1, 3, 1, 9, 5, 3]
[2, 3, 6, 1, 3, 1, 9, 5, 3, 17]
[2, 3, 1, 3, 1, 6, 5, 3, 9, 17]
[2, 1, 3, 1, 3, 5, 3, 6, 9, 17]
[1, 2, 1, 3, 3, 3, 5, 6, 9, 17]
[1, 1, 2, 3, 3, 3, 5, 6, 9, 17]
[1, 1, 2, 3, 3, 3, 5, 6, 9, 17]
arr sorted: [1, 1, 2, 3, 3, 3, 5, 6, 9, 17]
I placed print(moresorted) inside func for educational purposes, you could remove it if you wish.
Now explanation: my_sort is recursive function, with every run of it list become more and more sorted. func which is used as function in reduce does append new element and then swaps 2 last elements of list if they are not in ascending order.
This mean in every run of my_sort number "travels" rightward until in take place where next number is bigger.
if not acc is required for starting - notice that third argument of reduce is [] meaning that during first execution of func in each reduce first argument for func is [], so asking acc[-1]<x? would result in error.
Let's understand this
(1)Usage of Reduce is basically to reduce the expression to a single final value
(2)reduce() stores the intermediate result and only returns the final summation value
(3)We will take the smallest element using reduce, append it to sorted_list and remove from the original list
(4)Now the reduce will work on the rest of the elements and repeat step 3 again
(5)while list_nums: will run until the list becomes empty
list_of_nums = [1,19,5,17,9]
sorted_list=[]
while list_of_nums:
maxvalue=reduce(lambda x,y: x if x<y else y,list_of_nums)
sorted_list.append(maxvalue)
list_of_nums.remove(maxvalue)
print(sorted_list)
[1, 5, 9, 17, 19]

How to replace numbers with order in (python) list

I have a list containing integers and want to replace them so that the element which previously contained the highest number now contains a 1, the second highest number set to 2, etc etc.
Example:
[5, 6, 34, 1, 9, 3] should yield [4, 3, 1, 6, 2, 5].
I personally only care about the first 9 highest numbers by I thought there might be a simple algorithm or possibly even a python function to do take care of this task?
Edit: I don't care how duplicates are handled.
A fast way to do this is to first generate a list of tuples of the element and its position:
sort_data = [(x,i) for i,x in enumerate(data)]
next we sort these elements in reverse:
sort_data = sorted(sort_data,reverse=True)
which generates (for your sample input):
>>> sort_data
[(34, 2), (9, 4), (6, 1), (5, 0), (3, 5), (1, 3)]
and nest we need to fill in these elements like:
result = [0]*len(data)
for i,(_,idx) in enumerate(sort_data,1):
result[idx] = i
Or putting it together:
def obtain_rank(data):
sort_data = [(x,i) for i,x in enumerate(data)]
sort_data = sorted(sort_data,reverse=True)
result = [0]*len(data)
for i,(_,idx) in enumerate(sort_data,1):
result[idx] = i
return result
this approach works in O(n log n) with n the number of elements in data.
A more compact algorithm (in the sense that no tuples are constructed for the sorting) is:
def obtain_rank(data):
sort_data = sorted(range(len(data)),key=lambda i:data[i],reverse=True)
result = [0]*len(data)
for i,idx in enumerate(sort_data,1):
result[idx] = i
return result
Another option, you can use rankdata function from scipy, and it provides options to handle duplicates:
from scipy.stats import rankdata
lst = [5, 6, 34, 1, 9, 3]
rankdata(list(map(lambda x: -x, lst)), method='ordinal')
# array([4, 3, 1, 6, 2, 5])
Assuimg you do not have any duplicates, the following list comprehension will do:
lst = [5, 6, 34, 1, 9, 3]
tmp_sorted = sorted(lst, reverse=True) # kudos to #Wondercricket
res = [tmp_sorted.index(x) + 1 for x in lst] # [4, 3, 1, 6, 2, 5]
To understand how it works, you can break it up into pieces like so:
lst = [5, 6, 34, 1, 9, 3]
# let's see what the sorted returns
print(sorted(lst, reverse=True)) # [34, 9, 6, 5, 3, 1]
# biggest to smallest. that is handy.
# Since it returns a list, i can index it. Let's try with 6
print(sorted(lst, reverse=True).index(6)) # 2
# oh, python is 0-index, let's add 1
print(sorted(lst, reverse=True).index(6) + 1) # 3
# that's more like it. now the same for all elements of original list
for x in lst:
print(sorted(lst, reverse=True).index(x) + 1) # 4, 3, 1, 6, 2, 5
# too verbose and not a list yet..
res = [sorted(lst, reverse=True).index(x) + 1 for x in lst]
# but now we are sorting in every iteration... let's store the sorted one instead
tmp_sorted = sorted(lst, reverse=True)
res = [tmp_sorted.index(x) + 1 for x in lst]
Using numpy.argsort:
numpy.argsort returns the indices that would sort an array.
>>> xs = [5, 6, 34, 1, 9, 3]
>>> import numpy as np
>>> np.argsort(np.argsort(-np.array(xs))) + 1
array([4, 3, 1, 6, 2, 5])
A short, log-linear solution using pure Python, and no look-up tables.
The idea: store the positions in a list of pairs, then sort the list to reorder the positions.
enum1 = lambda seq: enumerate(seq, start=1) # We want 1-based positions
def replaceWithRank(xs):
# pos = position in the original list, rank = position in the top-down sorted list.
vp = sorted([(value, pos) for (pos, value) in enum1(xs)], reverse=True)
pr = sorted([(pos, rank) for (rank, (_, pos)) in enum1(vp)])
return [rank for (_, rank) in pr]
assert replaceWithRank([5, 6, 34, 1, 9, 3]) == [4, 3, 1, 6, 2, 5]

How to test membership of sequence in python list? [duplicate]

This question already has answers here:
Best way to determine if a sequence is in another sequence?
(10 answers)
Closed 8 years ago.
I have a dictionary which consists of {str: list}.
What I want to do is find out the keys with specific sequnce that may exist in value.
for example, the content of dictionary is like this:
DOC3187 [1, 2, 3, 6, 7]
DOC4552 [5, 2, 3, 6]
DOC4974 [1, 2, 3, 6]
DOC8365 [1, 2, 3, 5, 6, 7]
DOC3738 [1, 4, 2, 3, 6]
DOC5311 [1, 5, 2, 3, 6, 7]
and I need to find out the keys with sequence of [5,2,3], so desired return should be:
DOC4552, DOC5311
I'm using Python 3.3.2, and the dictionary has about 400 items.
for any sequence 'seq' and longer sequence in your dictionary, 'myseq' the statement:
any(myseq[a:a+len(seq)] == seq for a in range(len(myseq)))
will evaluate to True if seq is a subsequence of myseq, False otherwise
NOTE: I realized that this will actually fail if your list contains [15, 2, 36] which does contain the string 5, 2, 3 so it is just for special cases.
Since you have a dictionary, maybe list comprehension on the keys and string matching? It is actually the same speed as walking through the elements, according to timeit...
s_list = [5,2,3] # sequence to search for
# Setting up your dictionary
MyD = {'DOC3187' : [1, 2, 3, 6, 7],
'DOC4552' : [5, 2, 3, 6],
'DOC4974' : [1, 2, 3, 6],
'DOC8365' : [1, 2, 3, 5, 6, 7],
'DOC3738' : [1, 4, 2, 3, 6],
'DOC5311' : [1, 5, 2, 3, 6, 7]}
query = str(s_list)[1:-1] # make a string of '5, 2, 3'
Matches = [ k for k in MyD if query in str(MyD[k]) ]
Result:
['DOC5311', 'DOC4552']
You can use this function:
def find_key_in_dict(d, t):
""" d is dict for searching, t is target list.
-> return matching key list.
"""
b_str = reduce(lambda x, y: str(x) + str(y), t)
return map(lambda x: x[0], filter(lambda i: b_str in reduce(lambda x, y: str(x) + str(y), i[1]), d.items()))
To search the value, you can use reduce() function to change dict value (integer list) and target list (also integer list) to string, then use 'in' to judge whether the dict value is meet.

Categories