Groupby for nestedlists - python

I have a list [[1,20],[1,30],[2,30],[2,50],[3,60], [1,20]]. If the first elements on the nested list is same as previous, i should remove that(not removing the duplicates on the list). In this case, i should get as [[1,20],[2,30],[3,60],[1,20]].
I am using itemgetter and groupby and have written this:
[x[0] for x in groupby(testlist, itemgetter(0))]
The above code outputs only the first element as [1, 2, 3, 1]. I want this to be in the same nested list format as [[1,20],[2,30],[3,60],[1,20]]. How can i do this in-place on the same list, 'testlist'. Is there any better way to achieve this than simply iterating the whole list again.

You're only showing the keys. itertools.groupby returns a tuple of keys and groups. You need the first item in the groups:
print [next(g) for k, g in groupby(i, itemgetter(0))]
# # [[1, 20], [2, 30], [3, 60], [1, 20]]

Related

Python: Sort a list of list where inner list with different length?

I found the question about how to sort a list of list, but my problem is that my inner lists have different length and I want to sort them based on the last item of my inner list.
For example, I have a list:
[ [1, 2, 3],
[2, 4] ]
And I want to sort them based on the last item in my inner list, i.e. "3" and "4".
So, is there a good way to do this?
Thanks for the reply.
Take a look at python built-in sorted function.
>>> sorted(a, key=lambda x: x[-1])
[[1, 2, 3], [2, 4]]
>>> sorted(a, key=lambda x: x[-1], reverse=True) # reverse verstin
[[2, 4], [1, 2, 3]]
Look into the key parameter to sort, and then look into either using len(...) to calculate the last item, or how negative indexes work.
Good luck!

Compare elements inside list of lists in Python

I'm trying to create a new list of lists by removing the rows with a duplicated value within existing list of lists.
fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9]]
sec = []
for row in fir:
if sec is None:
sec.append(row)
elif row[0] not in sec:
sec.append(row)
print(sec)
Expected output:
[['a35', 1], ['3r', 6], [5, 9]]
Actual output:
[['a35', 1], ['a35', 2], ['3r', 6], ['3r', 8], [5, 9]]
I want create a list of lists in which the values of row[0] are unique and not duplicated (e.g. the row with 'a35' should be included only once)
How can I achieve this?
you can simply save the unique value (the 1st data in the tuple), you're wrong because you compare the 1st tuple to all the data (comparing 'a35' to ['a35',1])
fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9]]
sec = []
index = []
for f in fir:
if not f[0] in index:
index.append(f[0])
sec.append(f)
print(sec)
Your current code fails because after the first iteration sec looks like this: [['a35',1]]. On the second iteration row has value of ['a35',2] which can't be found from sec thus it gets appended there.
You could use groupby to group the inner lists based on the first element. groupby returns iterable of (key, it) tuple where key is value returned by second parameter and it is iterable of elements in within the group:
>>> from itertools import groupby
>>> fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9]]
>>> [next(g) for _, g in groupby(fir, lambda x: x[0])]
[['a35', 1], ['3r', 6], [5, 9]]
Note that above assumes that lists with the same first element are next to each other in seq. If that's not the case you could sort seq before passing it to groupby but that only works if the first elements can be used as keys. With your data that's not the case since there are strings and ints which can't be compared on Python 3. You could collect the items to OrderedDict though:
from collections import OrderedDict
fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9],['a35',7]]
d = OrderedDict()
for x in fir:
d.setdefault(*x)
print([list(x) for x in d.items()])
Output:
[['a35', 1], ['3r', 6], [5, 9]]
Use List Comprehension to achieve this:
sec=[i for i in fir if i[0] not in [fir[idx][0] for idx in xrange(0,fir.index(i))]]
This selects each item from fir and compare first element of the item with all the item from index 0 till the index of that item.
As you have only two items in the inner list and you don't want to have duplicates,
Dictionary would have been the perfect data structure for your case.
I think when you loop the fir, you should add a list for recording which key you have put in the sec.

PYTHON: Create sublists of same values from nested list into one list

I have a nested list with lists that have two values, a word and a number (sorted by first value - the word):
data=[["apple",2],["cake",5],["cake",8],["chocolate",3],["chocolate",9],["chocolate",10],["grapes",6]]
How can I make it so that it groups items with the same word together as efficiently as possible?
So to make the list look like so:
data=[ [["apple",2]], [["cake",5],["cake",8]], [["chocolate",3],["chocolate",9],["chocolate",10]],[["grapes",6]] ]
The "apple" and the "grapes" are a list of their own, as they only appear once in the original list.
How could this be done? Thanks :)
Its what that itertools.groupby is for :
>>> from operator import itemgetter
>>> from itertools import groupby
>>> data=[["apple",2],["cake",5],["cake",8],["chocolate",3],["chocolate",9],["chocolate",10],["grapes",6]]
>>> [list(g) for _,g in groupby(sorted(data,key=itemgetter(0)),itemgetter(0))]
[[['apple', 2]], [['cake', 5], ['cake', 8]], [['chocolate', 3], ['chocolate', 9], ['chocolate', 10]], [['grapes', 6]]]
>>>
You can use operator.iemgetter as the key of your sorted function and groupby!

How to sort like values in Python

I was wondering how to sort like values in a list, and then break like values into a sub-list.
For example: I would want a function that probably does something like
def sort_by_like_values(list):
#python magic
>>>list=[2,2,3,4,4,10]
>>>[[2,2],[3],[4,4],[10]]
OR
>>>[2,2],[3],[4,4],[10]
I read up on the sorted api and it works well for sorting things within their own list, but doesn't break lists up into sub-lists. What module would help me out here?
Use groupby from the itertools module.
from itertools import groupby
L = [2, 2, 3, 4, 4, 10]
L.sort()
for key, iterator in groupby(L):
print key, list(iterator)
Result:
2 [2, 2]
3 [3]
4 [4, 4]
10 [10]
A couple of things to be aware of: groupby needs the data it works on to be sorted by the same key you wish to group by, or it won't work. Also, the iterator needs to be consumed before continuing to the next group, so make sure you store list(iterator) to another list or something. One-liner giving you the result you want:
>>> [list(it) for key, it in groupby(sorted(L))]
[[2, 2], [3], [4, 4], [10]]
Check the itertools module, it has the useful groupby function:
import itertools as i
for k,g in i.groupby(sorted([2,2,3,4,4,10])):
print list(g)
....
[2, 2]
[3]
[4, 4]
[10]
You should be able to modify this to get the values in a list.
As everyone else has suggested itertools.groupby (which would be my first choice) - it's also possible with collections.Counter to obtain key and frequency, sort by the key, then expand back out freq times.
from itertools import repeat
from collections import Counter
grouped = [list(repeat(key, freq)) for key, freq in sorted(Counter(L).iteritems())]
itertools.groupby() with a list comprehension works fine.
In [20]: a = [1, 1, 2, 3, 3, 4, 5, 5, 5, 6]
In [21]: [ list(subgroup) for key, subgroup in itertools.groupby(sorted(a)) ]
Out[21]: [[1, 1], [2], [3, 3], [4], [5, 5, 5], [6]]
Note that groupby() returns a list of iterators, and you have to consume these iterators in order. As per the docs:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:
If you do not wish to use itertools and can wrap your head around list comprehensions, this should also do the trick :
def group(a):
a = sorted(a)
d = [0] + [x+1 for x in range(len(a)-1) if a[x]!=a[x+1]] + [len(a)]
return [a[(d[x]):(d[x+1])] for x in range(len(d)-1)]
where ais your list

Sorting list of lists by a third list of specified non-sorted order

I have a list:
[['18411971', 'kinase_2', 36], ['75910712', 'unnamed...', 160], ...
about 60 entries long
each entry is a list with three values
I want to sort this bigger list by the first value in an order specified by another list which has them in the desired order.
The usual idiom is to sort using a key:
>>> a = [[1,2],[2,10,10],[3,4,'fred']]
>>> b = [2,1,3]
>>> sorted(a,key=lambda x: b.index(x[0]))
[[2, 10, 10], [1, 2], [3, 4, 'fred']]
This can have performance issues, though-- if the keys are hashable, this will probably be faster for long lists:
>>> order_dict = dict(zip(b, range(len(b))))
>>> sorted(a,key=lambda x: order_dict[x[0]])
[[2, 10, 10], [1, 2], [3, 4, 'fred']]
How about:
inputlist = [['18411971', 'kinase_2', 36], ['75910712', 'unnamed...', 160], ... # obviously not valid syntax
auxinput = aux = ['75910712', '18411971', ...] # ditto
keyed = { sublist[0]:sublist for sublist in inputlist }
result = [keyed[item] for item in auxinput]
There is no need to use sorting here. For large lists this would be faster, because it's O(n) rather than O(n * log n).
In case the keys aren't unique, it is possible to use some variant of an ordered dict (e.g. defaultdict(list) as per Niklas B's suggestion) to build the keyed representation.
So, if I understand you right, you have your sample input list:
a = [['18411971', 'kinase_2', 36], ['75910712', 'unnamed...', 160], ...
and you want to sort this using an extra list that mentions the order in which the first elements of the sub-lists are to occur in the output:
aux = ['75910712', '18411971', ...]
If that's right, I think the result can be achieved using something like:
sorted(a, key = lambda x: aux.index(x[0]))

Categories