Grouping tuples using libraries in Python - python

Using Python is there an easier way without writing bunch of loops be able to count the values in a similar way. Perhaps using some library such as itertools groupby?
#original tuple array
[("A","field1"),("A","field1"),("B","field1")]
#output array
[("A","field1", 2), ("B", "field1",1)]

You can use a Counter dict to group, adding the count at the end
l = [("A","field1"),("A","field1"),("B","field1")]
from collections import Counter
print([k+(v,) for k,v in Counter(l).items()])
If you want the output ordered by the first time you encounter a tuple, you can use an OrderedDict to do the counting:
from collections import OrderedDict
d = OrderedDict()
for t in l:
d.setdefault(t, 0)
d[t] += 1
print([k+(v,) for k,v in d.items()])

I guess you just want to count the number of ocurrences of each tuple...right? If so, you can use Counter

You can use itertools.groupby. Omit the key parameter, then it will just group equal elements (assuming those are consecutive), then add the number of elements in the group.
>>> lst = [("A","field1"),("A","field1"),("B","field1")]
>>> [(k + (len(list(g)),)) for k, g in itertools.groupby(lst)]
[('A', 'field1', 2), ('B', 'field1', 1)]
If the elements are not consecitive, then this will not work, and anyway, the solution suggesting collections.Counter seems to be a better fit to the problem.

Related

Displays in order the most common items in a list with the index name

I would like to use something like the frequency vector and also display the most common indexes in a list and how many times they meet and make a top of it
I have something like this now but I don't know how I could count on it
from collections import Counter
class AttributesCount:
#staticmethod
def count():
object = [
{"width":"32"},
{"color":"red"},
{"color:":"red"},
{"color":"black"},
{"color":"orange"},
{"color":"red"}]
for i in object:
print(AttributesCount.count())
I've seen all kinds of count functions, collections but I want exactly what I said above, to make a top
How can I do this?
from collections import Counter I tried with this but I is not correctly
Your intuition to use a Counter is correct.
Given a list of dictionaries with the variable name attribute_list, we first populate the Counter.
We can loop over the keys of each dictionary in the list and add 1 to each count, but we can be more succinct using Counter's update method.
Then we can call Counter's most_common method to return a list of tuples in descending frequency:
>>> from collections import Counter
>>> frequencies = Counter()
>>> for attribute in attribute_list:
... frequencies.update(attribute.keys())
...
>>> frequencies
Counter({'width': 1, 'color': 4, 'color:': 1})
>>> frequencies.most_common()
[('color', 4), ('width', 1), ('color:', 1)]

python list of sets find symmetric difference in all elements

Consider this list of sets
my_input_list= [
{1,2,3,4,5},
{2,3,7,4,5},
set(),
{1,2,3,4,5,6},
set(),]
I want to get the only exclusive elements 6 and 7 as the response, list or set. Set preferred.
I tried
print reduce(set.symmetric_difference,my_input_list) but that gives
{2,3,4,5,6,7}
And i tried sorting the list by length, smallest first raises an error due to two empty sets. Largest first gives the same result as unsorted.
Any help or ideas please?
Thanks :)
Looks like the most straightforward solution is to count everything and return the elements that only appear once.
This solution uses chain.from_iterable (to flatten your sets) + Counter (to count things). Finally, use a set comprehension to filter elements with count == 1.
from itertools import chain
from collections import Counter
c = Counter(chain.from_iterable(my_input_list))
print({k for k in c if c[k] == 1})
{6, 7}
A quick note; the empty literal {} is used to indicate an empty dict, not set. For the latter, use set().
You could use itertools.chain and collection.Counter:
from itertools import chain
from collections import Counter
r = {k for k,v in Counter(chain.from_iterable(my_input_list)).items() if v==1}

python OrderedDict get a key index of

Can OrderedDict get a key position?
is like list of index()
test = ['a', 'b', 'c', 'd', 'e']
test.index('b') # return 1
just one line program.
such as:
print(list(your_ordered_dict).index('your_key'))
Maybe you can use lambda,like this line program:
f = lambda ordered_dict, key: list(ordered_dict).index(key)
Good luck.
Keep it simple.
from collections import OrderedDict
x = OrderedDict('test1'='a', 'test2'='b')
print(list(x.keys().index('test1'))
You can write this in two ways:
list(x).index('b')
next(i for i, k in enumerate(x) if k=='b')
The first one will be a little faster for small dicts, but a lot slower, and waste a lot of space, for huge ones. (Of course most of the time, OrderedDicts are pretty small.)
Both versions will work for any iterable; there's nothing special about OrderedDict here.
If you take the keys as a list, you can then index like:
Code:
list(x).index('b')
Test Code:
from collections import OrderedDict
x = OrderedDict(a=1, b=2)
print(list(x).index('b'))
Results:
1
The accepted answer list(x).index('b') will be O(N) every time you're searching for the position.
Instead, you can create a mapping key -> position which will be O(1) once the mapping is constructed.
ordered_dict = OrderedDict(a='', b='')
key_to_pos = {k: pos for pos, k in enumerate(ordered_dict)}
assert key_to_pos['b'] == 1

Comparing a 3-tuple to a list of 3-tuples using only the first two parts of the tuple

I have a list of 3-tuples in a Python program that I'm building while looking through a file (so one at a time), with the following setup:
(feature,combination,durationOfTheCombination),
such that if a unique combination of feature and combination is found, it will be added to the list. The list itself holds a similar setup, but the durationOfTheCombination is the sum of all duration that share the unique combination of (feature,combination). Therefore, when deciding if it should be added to the list, I need to only compare the first two parts of the tuple, and if a match is found, the duration is added to the corresponding list item.
Here's an example for clarity. If the input is
(ABC,123,10);(ABC,123,10);(DEF,123,5);(ABC,123,30);(EFG,456,30)
The output will be (ABC,123,50);(DEF,123,5);(EFG,456,30).
Is there any way to do this comparison?
You can do this with Counter,
In [42]: from collections import Counter
In [43]: lst = [('ABC',123,10),('ABC',123,10),('DEF',123,5)]
In [44]: [(i[0],i[1],i[2]*j) for i,j in Counter(lst).items()]
Out[44]: [('DEF', 123, 5), ('ABC', 123, 20)]
As per the OP suggestion if it's have different values, use groupby
In [26]: lst = [('ABC',123,10),('ABC',123,10),('ABC',123,25),('DEF',123,5)]
In [27]: [tuple(list(n)+[sum([i[2] for i in g])]) for n,g in groupby(sorted(lst,key = lambda x:x[:2]), key = lambda x:x[:2])]
Out[27]: [('ABC', 123, 45), ('DEF', 123, 5)]
If you don't want to use Counter, you can use a dict instead.
setOf3Tuples = dict()
def add3TupleToSet(a):
key = a[0:2]
if key in setOf3Tuples:
setOf3Tuples[a[0:2]] += a[2]
else:
setOf3Tuples[a[0:2]] = a[2]
def getRaw3Tuple():
for k in setOf3Tuples:
yield k + (setOf3Tuples[k],)
if __name__ == "__main__":
add3TupleToSet(("ABC",123,10))
add3TupleToSet(("ABC",123,10))
add3TupleToSet(("DEF",123,5))
print([i for i in getRaw3Tuple()])
It seems a dict is more suited than a list here, with the first 2 fields as key. And to avoid checking each time if the key is already here you can use a defaultdict.
from collections import defaultdict
d = defaultdict(int)
for t in your_list:
d[t[:2]] += t[-1]
Assuming your input is collected in a list as below, you can use pandas groupby to accomplish this quickly:
import pandas as pd
input = [('ABC',123,10),('ABC',123,10),('DEF',123,5),('ABC',123,30),('EFG',456,30)]
output = [tuple(x) for x in pd.DataFrame(input).groupby([0,1])[2].sum().reset_index().values]

Python:reduce list but keep details

say i have a list of items which some of them are similiar up to a point
but then differ by a number after a dot
['abc.1',
'abc.2',
'abc.3',
'abc.7',
'xyz.1',
'xyz.3',
'xyz.11',
'ghj.1',
'thj.1']
i want to to produce from this list a new list which collapses multiples but preserves some of their data, namely the numbers suffixes
so the above list should produce a new list
[('abc',('1','2','3','7'))
('xyz',('1','3','11'))
('ghj',('1'))
('thj',('1'))]
what I have thought, is the first list can be split by the dot into pairs
but then how i group the pairs by the first part without losing the second
I'm sorry if this question is noobish, and thanks in advance
...
wow, I didnt expect so many great answers so fast, thanks
from collections import defaultdict
d = defaultdict(list)
for el in elements:
key, nr = el.split(".")
d[key].append(nr)
#revert dict to list
newlist = d.items()
Map the list with a separator function, use itertools.groupby with a key that takes the first element, and collect the second element into the result.
from itertools import groupby, imap
list1 = ["abc.1", "abc.2", "abc.3", "abc.7", "xyz.1", "xyz.3", "xyz.11", "ghj.1", "thj.1"]
def break_up(s):
a, b = s.split(".")
return a, int(b)
def prefix(broken_up): return broken_up[0]
def suffix(broken_up): return broken_up[1]
result = []
for key, sub in groupby(imap(break_up, list1), prefix):
result.append((key, tuple(imap(suffix, sub))))
print result
Output:
[('abc', (1, 2, 3, 7)), ('xyz', (1, 3, 11)), ('ghj', (1,)), ('thj', (1,))]

Categories