How to sort like values in Python - python

I was wondering how to sort like values in a list, and then break like values into a sub-list.
For example: I would want a function that probably does something like
def sort_by_like_values(list):
#python magic
>>>list=[2,2,3,4,4,10]
>>>[[2,2],[3],[4,4],[10]]
OR
>>>[2,2],[3],[4,4],[10]
I read up on the sorted api and it works well for sorting things within their own list, but doesn't break lists up into sub-lists. What module would help me out here?

Use groupby from the itertools module.
from itertools import groupby
L = [2, 2, 3, 4, 4, 10]
L.sort()
for key, iterator in groupby(L):
print key, list(iterator)
Result:
2 [2, 2]
3 [3]
4 [4, 4]
10 [10]
A couple of things to be aware of: groupby needs the data it works on to be sorted by the same key you wish to group by, or it won't work. Also, the iterator needs to be consumed before continuing to the next group, so make sure you store list(iterator) to another list or something. One-liner giving you the result you want:
>>> [list(it) for key, it in groupby(sorted(L))]
[[2, 2], [3], [4, 4], [10]]

Check the itertools module, it has the useful groupby function:
import itertools as i
for k,g in i.groupby(sorted([2,2,3,4,4,10])):
print list(g)
....
[2, 2]
[3]
[4, 4]
[10]
You should be able to modify this to get the values in a list.

As everyone else has suggested itertools.groupby (which would be my first choice) - it's also possible with collections.Counter to obtain key and frequency, sort by the key, then expand back out freq times.
from itertools import repeat
from collections import Counter
grouped = [list(repeat(key, freq)) for key, freq in sorted(Counter(L).iteritems())]

itertools.groupby() with a list comprehension works fine.
In [20]: a = [1, 1, 2, 3, 3, 4, 5, 5, 5, 6]
In [21]: [ list(subgroup) for key, subgroup in itertools.groupby(sorted(a)) ]
Out[21]: [[1, 1], [2], [3, 3], [4], [5, 5, 5], [6]]
Note that groupby() returns a list of iterators, and you have to consume these iterators in order. As per the docs:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:

If you do not wish to use itertools and can wrap your head around list comprehensions, this should also do the trick :
def group(a):
a = sorted(a)
d = [0] + [x+1 for x in range(len(a)-1) if a[x]!=a[x+1]] + [len(a)]
return [a[(d[x]):(d[x+1])] for x in range(len(d)-1)]
where ais your list

Related

How to print non repeating elements with original list

given a list of integers nums, return a list of all the element but the repeating number should not be printed more than twice
example
input: nums = [1,1,2,3,3,4,4,4,5]
output: [1,1,2,3,3,4,4,5]
A more flexible implementation using itertools:
from itertools import islice, groupby, chain
nums = [1,1,2,3,3,4,4,4,5]
output = (islice(g, 2) for _, g in groupby(nums))
output = list(chain.from_iterable(output))
print(output) # [1, 1, 2, 3, 3, 4, 4, 5]
You can replace 2 in islice(g, 2) to tune the max repeats you want.
The easiest and I guess most straight forward way to use unique collections is with a set:
list(set(nums)) -> [1, 2, 3, 4, 5]
The downside of this approuch is that sets are unordered. And we cannot really depend on how the list will be sorted after the conversion.
If order is important in your case you can do this:
list(dict.fromkeys(nums))
[1, 2, 3, 4, 5]
dicts are ordered since python3 came out, and their keys are unique. So with this small trick we get a list of the unique keys of a dictionary, but still maitain the original order!

Groupby for nestedlists

I have a list [[1,20],[1,30],[2,30],[2,50],[3,60], [1,20]]. If the first elements on the nested list is same as previous, i should remove that(not removing the duplicates on the list). In this case, i should get as [[1,20],[2,30],[3,60],[1,20]].
I am using itemgetter and groupby and have written this:
[x[0] for x in groupby(testlist, itemgetter(0))]
The above code outputs only the first element as [1, 2, 3, 1]. I want this to be in the same nested list format as [[1,20],[2,30],[3,60],[1,20]]. How can i do this in-place on the same list, 'testlist'. Is there any better way to achieve this than simply iterating the whole list again.
You're only showing the keys. itertools.groupby returns a tuple of keys and groups. You need the first item in the groups:
print [next(g) for k, g in groupby(i, itemgetter(0))]
# # [[1, 20], [2, 30], [3, 60], [1, 20]]

Python: Sort a list of list where inner list with different length?

I found the question about how to sort a list of list, but my problem is that my inner lists have different length and I want to sort them based on the last item of my inner list.
For example, I have a list:
[ [1, 2, 3],
[2, 4] ]
And I want to sort them based on the last item in my inner list, i.e. "3" and "4".
So, is there a good way to do this?
Thanks for the reply.
Take a look at python built-in sorted function.
>>> sorted(a, key=lambda x: x[-1])
[[1, 2, 3], [2, 4]]
>>> sorted(a, key=lambda x: x[-1], reverse=True) # reverse verstin
[[2, 4], [1, 2, 3]]
Look into the key parameter to sort, and then look into either using len(...) to calculate the last item, or how negative indexes work.
Good luck!

How to find 2 items in a list that are the same in Python

I have a list populated with ~100 names. The names in this list either occur once or twice and I would like to go through the list to find names that occur twice and names that only occur once. I will also need the position of the reoccurring names in the list and the positions of the names that only appear once.
I'm not sure how I would go about doing this because all the methods I can think of are inefficient as they would go through the whole list even if they have already found a match. Other methods that I can think of would return two duplicate positions. The names that occur twice will not necessarily be adjacent to each other.
For example, if this was the list:
mylist = [ 1, 2, 3, 1, 4, 4, 5, 6]
I would need something that outputs (something like):
[[0,3],[1],[2],[4,5],[6],[7]]
With those numbers being the positions of the duplicate names and the position of the names that occur once.
I am by no means an expert so any help would be appreciated.
You can use enumerate to get the pairs contain index of each element and the element itself then loop over it and store the items as key and indices as values using a collections.OrderedDict (to preserve the order) and dict.setdefault method:
>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i,j in enumerate(mylist):
... d.setdefault(j,[]).append(i)
...
>>> d.values()
[[0, 3], [1], [2], [4, 5], [6], [7]]
I would use a dictionary:
mylist = [1,2,3,1,4,4,5,6]
dic = {}
for i in range(0,len(mylist)):
if mylist[i] in dic:
dic[mylist[i]].append(i)
else:
dic[mylist[i]] = [i]
print dic.values()
# prints [[0, 3], [1], [2], [4, 5], [6], [7]]

Append feature frequency to existing list

I am looking for an fairly efficient way to append the frequency of a feature in a list each item in that list.
For example, given this list:
[['syme', 4, 2], ['said', 4, 2], ['the', 3, 5]]
I would like to append to it the frequency with which the second two items occur in the list. In the list above, this would look something like this:
[['syme', 4, 2, 2], ['said', 4, 2, 2], ['the', 3, 5, 1]]
Where the third number represents how frequently the second two numbers occur as the second two items in the lists. (for example, [4, 2] appears twice as the second two numbers and [3,5] appears once so the first two lists would append a 2 at the end and the third list would append a 1.)
The actual list may have several hundred thousand items so both efficiency AND readable code are both valued here and I would like to maintain the current order of the list.
Thanks in advance!
Probably the most performant method is to use collections.Counter to get the counts based on pairs
counts = Counter(tuple(item[1:]) for item in lst)
then update the list accordingly:
for item in lst:
item.append(counts[tuple(item[1:])])
If the order of the two items doesn't matter, wrap item[1:] with sorted(...) when creating counts and updating lst.
You can use the collections.Counter class:
from collections import Counter
my_list = [['syme', 4, 2], ['said', 4, 2], ['the', 3, 5]]
counts = Counter([(x[1],x[2],) for x in my_list])
for sub_list in my_list:
sub_list.append(counts[(sub_list[1], sub_list[2])])
If order doesn't matter:
import collections
collections.Counter(frozenset((i[1], i[2]))
a_list = [['syme', 4, 2], ['said', 4, 2], ['the', 3, 5]]
counts = Counter(frozenset(l[1], l[2]) for x in a_list)
for l in a_list:
l.append(counts[frozenset(l[1], l[2])])

Categories