Append feature frequency to existing list - python

I am looking for an fairly efficient way to append the frequency of a feature in a list each item in that list.
For example, given this list:
[['syme', 4, 2], ['said', 4, 2], ['the', 3, 5]]
I would like to append to it the frequency with which the second two items occur in the list. In the list above, this would look something like this:
[['syme', 4, 2, 2], ['said', 4, 2, 2], ['the', 3, 5, 1]]
Where the third number represents how frequently the second two numbers occur as the second two items in the lists. (for example, [4, 2] appears twice as the second two numbers and [3,5] appears once so the first two lists would append a 2 at the end and the third list would append a 1.)
The actual list may have several hundred thousand items so both efficiency AND readable code are both valued here and I would like to maintain the current order of the list.
Thanks in advance!

Probably the most performant method is to use collections.Counter to get the counts based on pairs
counts = Counter(tuple(item[1:]) for item in lst)
then update the list accordingly:
for item in lst:
item.append(counts[tuple(item[1:])])
If the order of the two items doesn't matter, wrap item[1:] with sorted(...) when creating counts and updating lst.

You can use the collections.Counter class:
from collections import Counter
my_list = [['syme', 4, 2], ['said', 4, 2], ['the', 3, 5]]
counts = Counter([(x[1],x[2],) for x in my_list])
for sub_list in my_list:
sub_list.append(counts[(sub_list[1], sub_list[2])])

If order doesn't matter:
import collections
collections.Counter(frozenset((i[1], i[2]))
a_list = [['syme', 4, 2], ['said', 4, 2], ['the', 3, 5]]
counts = Counter(frozenset(l[1], l[2]) for x in a_list)
for l in a_list:
l.append(counts[frozenset(l[1], l[2])])

Related

How to print non repeating elements with original list

given a list of integers nums, return a list of all the element but the repeating number should not be printed more than twice
example
input: nums = [1,1,2,3,3,4,4,4,5]
output: [1,1,2,3,3,4,4,5]
A more flexible implementation using itertools:
from itertools import islice, groupby, chain
nums = [1,1,2,3,3,4,4,4,5]
output = (islice(g, 2) for _, g in groupby(nums))
output = list(chain.from_iterable(output))
print(output) # [1, 1, 2, 3, 3, 4, 4, 5]
You can replace 2 in islice(g, 2) to tune the max repeats you want.
The easiest and I guess most straight forward way to use unique collections is with a set:
list(set(nums)) -> [1, 2, 3, 4, 5]
The downside of this approuch is that sets are unordered. And we cannot really depend on how the list will be sorted after the conversion.
If order is important in your case you can do this:
list(dict.fromkeys(nums))
[1, 2, 3, 4, 5]
dicts are ordered since python3 came out, and their keys are unique. So with this small trick we get a list of the unique keys of a dictionary, but still maitain the original order!

How to add two lists together, avoid repetitions, and order elements?

I have two lists filled with integers. I wish to add them together such that:
the output list has no duplicate elements,
is in order, and
contains the union of both lists.
Is there any way to do so without creating my own custom function? If not, what would a neat and tidy procedure look like?
For instance:
list1 = [1, 10, 2]
list2 = [3, 4, 10]
Output:
outputlist = [1, 2, 3, 4, 10]
Try this:
combined = [list1, list2]
union = list(set().union(*combined))
This takes advantage of the predefined method (.union()) of set() , which is what you need here.
combined can have as many elements inside it, as the asterisk in *combined means that the union of all of the elements is found.
Also, I list()ed the result but you could leave it as a set().
As #glibdud states in the comments, it's possible that this might produce a sorted list, but it's not guaranteed, so use sorted() to ensure that it's ordered. (like this union = sorted(list(set().union(*combined))))
l1 = [1, 10, 2]
l2 = [3, 4, 10]
sorted(list(set(l1 + l2)))
>>> [1, 2, 3, 4, 10]

Python: Sort a list of list where inner list with different length?

I found the question about how to sort a list of list, but my problem is that my inner lists have different length and I want to sort them based on the last item of my inner list.
For example, I have a list:
[ [1, 2, 3],
[2, 4] ]
And I want to sort them based on the last item in my inner list, i.e. "3" and "4".
So, is there a good way to do this?
Thanks for the reply.
Take a look at python built-in sorted function.
>>> sorted(a, key=lambda x: x[-1])
[[1, 2, 3], [2, 4]]
>>> sorted(a, key=lambda x: x[-1], reverse=True) # reverse verstin
[[2, 4], [1, 2, 3]]
Look into the key parameter to sort, and then look into either using len(...) to calculate the last item, or how negative indexes work.
Good luck!

How to find 2 items in a list that are the same in Python

I have a list populated with ~100 names. The names in this list either occur once or twice and I would like to go through the list to find names that occur twice and names that only occur once. I will also need the position of the reoccurring names in the list and the positions of the names that only appear once.
I'm not sure how I would go about doing this because all the methods I can think of are inefficient as they would go through the whole list even if they have already found a match. Other methods that I can think of would return two duplicate positions. The names that occur twice will not necessarily be adjacent to each other.
For example, if this was the list:
mylist = [ 1, 2, 3, 1, 4, 4, 5, 6]
I would need something that outputs (something like):
[[0,3],[1],[2],[4,5],[6],[7]]
With those numbers being the positions of the duplicate names and the position of the names that occur once.
I am by no means an expert so any help would be appreciated.
You can use enumerate to get the pairs contain index of each element and the element itself then loop over it and store the items as key and indices as values using a collections.OrderedDict (to preserve the order) and dict.setdefault method:
>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i,j in enumerate(mylist):
... d.setdefault(j,[]).append(i)
...
>>> d.values()
[[0, 3], [1], [2], [4, 5], [6], [7]]
I would use a dictionary:
mylist = [1,2,3,1,4,4,5,6]
dic = {}
for i in range(0,len(mylist)):
if mylist[i] in dic:
dic[mylist[i]].append(i)
else:
dic[mylist[i]] = [i]
print dic.values()
# prints [[0, 3], [1], [2], [4, 5], [6], [7]]

How to sort like values in Python

I was wondering how to sort like values in a list, and then break like values into a sub-list.
For example: I would want a function that probably does something like
def sort_by_like_values(list):
#python magic
>>>list=[2,2,3,4,4,10]
>>>[[2,2],[3],[4,4],[10]]
OR
>>>[2,2],[3],[4,4],[10]
I read up on the sorted api and it works well for sorting things within their own list, but doesn't break lists up into sub-lists. What module would help me out here?
Use groupby from the itertools module.
from itertools import groupby
L = [2, 2, 3, 4, 4, 10]
L.sort()
for key, iterator in groupby(L):
print key, list(iterator)
Result:
2 [2, 2]
3 [3]
4 [4, 4]
10 [10]
A couple of things to be aware of: groupby needs the data it works on to be sorted by the same key you wish to group by, or it won't work. Also, the iterator needs to be consumed before continuing to the next group, so make sure you store list(iterator) to another list or something. One-liner giving you the result you want:
>>> [list(it) for key, it in groupby(sorted(L))]
[[2, 2], [3], [4, 4], [10]]
Check the itertools module, it has the useful groupby function:
import itertools as i
for k,g in i.groupby(sorted([2,2,3,4,4,10])):
print list(g)
....
[2, 2]
[3]
[4, 4]
[10]
You should be able to modify this to get the values in a list.
As everyone else has suggested itertools.groupby (which would be my first choice) - it's also possible with collections.Counter to obtain key and frequency, sort by the key, then expand back out freq times.
from itertools import repeat
from collections import Counter
grouped = [list(repeat(key, freq)) for key, freq in sorted(Counter(L).iteritems())]
itertools.groupby() with a list comprehension works fine.
In [20]: a = [1, 1, 2, 3, 3, 4, 5, 5, 5, 6]
In [21]: [ list(subgroup) for key, subgroup in itertools.groupby(sorted(a)) ]
Out[21]: [[1, 1], [2], [3, 3], [4], [5, 5, 5], [6]]
Note that groupby() returns a list of iterators, and you have to consume these iterators in order. As per the docs:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:
If you do not wish to use itertools and can wrap your head around list comprehensions, this should also do the trick :
def group(a):
a = sorted(a)
d = [0] + [x+1 for x in range(len(a)-1) if a[x]!=a[x+1]] + [len(a)]
return [a[(d[x]):(d[x+1])] for x in range(len(d)-1)]
where ais your list

Categories