Computing mean of all tuple values where 1st number is similar

Computing mean of all tuple values where 1st number is similar - python

Consider list of tuples
[(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
how to compute mean of all tuple values in pythonic way where 1st number is similar? for example the answer has to be
[(7751, 0.532144902698135), (6631, 0.03942129)]

One way is using collections.defaultdict
from collections import defaultdict
lst = [(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
d_dict = defaultdict(list)
for k,v in lst:
d_dict[k].append(v)
[(k,sum(v)/len(v)) for k,v in d_dict.items()]
#[(7751, 0.5321449026981354), (6631, 0.03942129)]

You do with groupby ,
from itertools import groupby
result = []
for i,g in groupby(sorted(lst),key=lambda x:x[0]):
grp = list(g)
result.append((i,sum(i[1] for i in grp)/len(grp)))
Using, list comprehension,
def get_avg(g):
grp = list(g)
return sum(i[1] for i in grp)/len(grp)
result = [(i,get_avg(g)) for i,g in groupby(sorted(lst),key=lambda x:x[0])]
Result
[(6631, 0.03942129), (7751, 0.5321449026981354)]

groupby from itertools is your friend:
>>> l=[(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
>>> #importing libs:
>>> from itertools import groupby
>>> from statistics import mean #(only python >= 3.4)
>>> # mean=lambda l: sum(l) / float(len(l)) #(for python < 3.4) (*1)
>>> #set the key to group and sort and sorting
>>> k=lambda x: x[0]
>>> data = sorted(l, key=k)
>>> #here it is, pythonic way:
>>> [ (k, mean([m[1] for m in g ])) for k, g in groupby(data, k) ]
Results:
[(6631, 0.03942129), (7751, 0.5321449026981354)]
EDITED (*1) Thanks Elmex80s to refer me to mean.

Related

list method to group elements in Python

Thanks for all your answers but I edit my question because it was not clear for all.
I have the following list of tuples:
[("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
I would like to have :
[("ok yes","no"),("why some","eat give about"),("tell","ask be")]
Thank you !
So I want to regroup all 1 and when a 0 appears I add the value in my list and I create a new element for the next values.

You can use itertools.groupby:
from itertools import groupby
d = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
new_d = [' '.join(j for j, _ in b) for _, b in groupby(d, key=lambda x:x[-1])]
result = [(new_d[i], new_d[i+1]) for i in range(0, len(new_d), 2)]
Output:
[('ok yes', 'no'), ('why some', 'eat give about'), ('tell', 'ask be')]

As per my understanding following code should work for your above question
list_tuples = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0)]
tups=[]
updated_list=[]
for elem in list_tuples:
if elem[1] == 0:
updated_list.append(tuple([' '.join(tups), elem[0]]))
tups=[]
else:
tups.append(elem[0])
print updated_list

One possible solution using itertools.groupby:
from operator import itemgetter
from itertools import groupby
lst = [("ok",1), ("yes",1), ("no",0), ("why",1), ("some",1), ("eat",0)]
def generate(lst):
rv = []
for v, g in groupby(lst, itemgetter(1)):
if v:
rv.append(' '.join(map(itemgetter(0), g)))
else:
for i in g:
rv.append(i[0])
yield tuple(rv)
rv = []
# yield last item if==1:
if v:
yield tuple(rv)
print([*generate(lst)])
Prints:
[('ok yes', 'no'), ('why some', 'eat')]

Create sublists with duplicate list elements

I'm new to Python and I'm trying to create sublists for list elements sharing the same base:
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
desiredOutput = [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
I've tried to first isolate the base from each list element using:
def commonNumerator(self):
checkPosition = self.find('/')
commonNumerator = self[:checkPosition]
return commonNumerator
listRawModified = [commonNumerator(x) for x in listRaw]
print(listRawModified)
which gets me:
['AKS', 'SBHS', 'SBJ', 'SBJ', 'AKS', 'SBHS', 'AKS']
but from then I don't know how to proceed to get to the desired ouput.
Can someone explain to me how to do it?

Typical usecase for itertools.groupby():
from itertools import groupby
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
def key(s):
return s.split('/')[0]
[list(g) for k, g in groupby(sorted(listRaw, key=key), key=key)]
# [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
The key() function helps in extracting the sorting/grouping key: key('AKS/STB') == 'AKS'.

Another way to do this would be to split each element and create a dictionary and then construct your desired output from that dictionary, e.g.:
In []:
d = {}
for i in listRaw:
k, v = i.split('/')
d.setdefault(k, []).append(v)
[['/'.join([k, v]) for v in d[k]] for k in d]
Out[]:
[['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]

This is a typical usecase for itertools. But you could also consider storing the values in a dictionary:
from collections import defaultdict
d = defaultdict(list)
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
for item in listRaw:
i,y = item.split('/')
d[i].append(y)
print(dict(d))
# {'AKS': ['STB', 'OSMX', 'AKX'], 'SBHS': ['AME', 'ABNX'], 'SBJ': ['OAK', 'ALS']}
You can then access the values to AKS with a simple command as:
d['AKS'] # ['STB', 'OSMX', 'AKX']

Find duplicate values in list of tuples in Python

How do I find duplicate values in the following list of tuples?
[(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
I want to get a list like:
[4081, 4082, 4086, 4090]
I have tried using itemgetter then group by option but didn't work.
How can one do this?

Use an ordered dictionary with first items as its keys and list of second items as values (for duplicates which created using dict.setdefalt()) then pick up those that have a length more than 1:
>>> from itertools import chain
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for i, j in lst:
... d.setdefault(i,[]).append(j)
...
>>>
>>> list(chain.from_iterable([j for i, j in d.items() if len(j)>1]))
[4081, 4082, 4086, 4090]

As an alternative, if you want to use groupby, here is a way to do it:
In [1]: from itertools import groupby
In [2]: ts = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
In [3]: dups = []
In [4]: for _, g in groupby(ts, lambda x: x[0]):
...: grouped = list(g)
...: if len(grouped) > 1:
...: dups.extend([dup[1] for dup in grouped])
...:
In [5]: print(dups)
[4081, 4082, 4086, 4090]
You use groupby to group from the first element of the tuple, and add the duplicate value into the list from the tuple.

Yet another approach (without any imports):
In [896]: lot = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
In [897]: d = dict()
In [898]: for key, value in lot:
...: d[key] = d.get(key, []) + [value]
...:
...:
In [899]: d
Out[899]: {1622: [4081, 4082], 1624: [4083], 1626: [4085], 1650: [4086, 4090]}
In [900]: [d[key] for key in d if len(d[key]) > 1]
Out[900]: [[4086, 4090], [4081, 4082]]
In [901]: sorted([num for num in lst for lst in [d[key] for key in d if len(d[key]) > 1]])
Out[901]: [4081, 4081, 4082, 4082]

Haven't tested this.... (edit: yup, it works)
l = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
dup = []
for i, t1 in enumerate(l):
for t2 in l[i+1:]:
if t1[0]==t2[0]:
dup.extend([t1[1], t2[1]])
print dup

combine like lists withing a nested list

i have a large nested list, and with in each nested list are two values, a company name and an amount, i am wondering if there is a way to combine the nested lists that have the same name together and then add the values? so for example here is a section of the list
[['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85], ['Gerdau', 13019.63676], ['Gerdau', 34107.12062], ['Acer', 52153.02848]
i would expect an outcome that looks like the one below
[['Acer',(481242.74+52153.02848)],['Beko', 966071.86],['Cemex', 187242.16],['Datsun', 748502.91],['Equifax', 146517.59],['Gerdau',(898579.89+13019.63676+34107.12062)],['Haribo', 265333.85]]
so essentially im trying to write a code that will go through a nested list and return a list made by finding all the lists with the same [0] element and combining there [1] element

from collections import defaultdict
d = defaultdict(float)
for name, amt in a:
d[name] += amt
What this does is to create a dict where the amount will be zero (float()) by default, and then sum up using the names as keys.
If you really need the result to be a list, you can get it this way:
>>> print d.items()
[('Equifax', 146517.59), ('Haribo', 265333.85), ('Gerdau', 945706.64738), ('Cemex', 187242.16), ('Datsun', 748502.91), ('Beko', 966071.86), ('Acer', 533395.76848)]

defaultdict is probably a good way to go but you can do this with a normal dictionary:
>>> data = [['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ...]
>>> result = {}
>>> for k, v in data:
... result[k] = result.get(k, 0) + v
>>> result
{'Acer': 533395.76848, 'Beko': 966071.86, 'Cemex': 187242.16, ... }
>>> list(result.items())
[('Acer', 533395.76848), ('Beko', 966071.86), ('Cemex', 187242.16), ...]

from collections import defaultdict
d = defaultdict(list)
l=[['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85], ['Gerdau', 13019.63676], ['Gerdau', 34107.12062], ['Acer', 52153.02848]]
for k,v in l:
d[k].append(v)
w=[]
for x,y in d.items():
w.append([x,sum(y)])
print w
Prints-
[['Equifax', 146517.59], ['Haribo', 265333.85], ['Gerdau', 945706.64738], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Beko', 966071.86], ['Acer', 533395.76848]]
If want to make it a tuple
map(tuple,w)
Output is-
[('Equifax', 146517.59), ('Haribo', 265333.85), ('Gerdau', 945706.64738), ('Cemex', 187242.16), ('Datsun', 748502.91), ('Beko', 966071.86), ('Acer', 533395.76848)]

Finding index of values in a list dynamically

I am having two lists as follows:
list_1
['A-1','A-1','A-1','A-2','A-2','A-3']
list_2
['iPad','iPod','iPhone','Windows','X-box','Kindle']
I would like to split the list_2 based on the index values in list_1. For instance,
list_a1
['iPad','iPod','iPhone']
list_a2
['Windows','X-box']
list_a3
['Kindle']
I know index method, but it needs the value to be matched to be passed along with. In this case, I would like to dynamically find the indexes of the values in list_1 with the same value. Is this possible? Any tips/hints would be deeply appreciated.
Thanks.

There are a few ways to do this.
I'd do it by using zip and groupby.
First:
>>> list(zip(list_1, list_2))
[('A-1', 'iPad'),
('A-1', 'iPod'),
('A-1', 'iPhone'),
('A-2', 'Windows'),
('A-2', 'X-box'),
('A-3', 'Kindle')]
Now:
>>> import itertools, operator
>>> [(key, list(group)) for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('A-1', [('A-1', 'iPad'), ('A-1', 'iPod'), ('A-1', 'iPhone')]),
('A-2', [('A-2', 'Windows'), ('A-2', 'X-box')]),
('A-3', [('A-3', 'Kindle')])]
So, you just want each group, ignoring the key, and you only want the second element of each element in the group. You can get the second element of each group with another comprehension, or just by unzipping:
>>> [list(zip(*group))[1] for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('iPad', 'iPod', 'iPhone'), ('Windows', 'X-box'), ('Kindle',)]
I would personally find this more readable as a sequence of separate iterator transformations than as one long expression. Taken to the extreme:
>>> ziplists = zip(list_1, list_2)
>>> pairs = itertools.groupby(ziplists, operator.itemgetter(0))
>>> groups = (group for key, group in pairs)
>>> values = (zip(*group)[1] for group in groups)
>>> [list(value) for value in values]
… but a happy medium of maybe 2 or 3 lines is usually better than either extreme.

Usually I'm the one rushing to a groupby solution ;^) but here I'll go the other way and manually insert into an OrderedDict:
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
from collections import OrderedDict
d = OrderedDict()
for code, product in zip(list_1, list_2):
d.setdefault(code, []).append(product)
produces a d looking like
>>> d
OrderedDict([('A-1', ['iPad', 'iPod', 'iPhone']),
('A-2', ['Windows', 'X-box']), ('A-3', ['Kindle'])])
with easy access:
>>> d["A-2"]
['Windows', 'X-box']
and we can get the list-of-lists in list_1 order using .values():
>>> d.values()
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
If you've noticed that no one is telling you how to make a bunch of independent lists with names like list_a1 and so on-- that's because that's a bad idea. You want to keep the data together in something which you can (at a minimum) iterate over easily, and both dictionaries and list of lists qualify.

Maybe something like this?
#!/usr/local/cpython-3.3/bin/python
import pprint
import collections
def main():
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
result = collections.defaultdict(list)
for list_1_element, list_2_element in zip(list_1, list_2):
result[list_1_element].append(list_2_element)
pprint.pprint(result)
main()

Using itertools.izip_longest and itertools.groupby:
>>> from itertools import groupby, izip_longest
>>> inds = [next(g)[0] for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
First group items of list_1 and find the starting index of each group:
>>> inds
[0, 3, 5]
Now use slicing and izip_longest as we need pairs list_2[0:3], list_2[3:5], list_2[5:]:
>>> [list_2[x:y] for x, y in izip_longest(inds, inds[1:])]
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
To get a list of dicts you can something like:
>>> inds = [next(g) for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
>>> {k: list_2[ind1: ind2[0]] for (ind1, k), ind2 in
zip_longest(inds, inds[1:], fillvalue=[None])}
{'A-1': ['iPad', 'iPod', 'iPhone'], 'A-3': ['Kindle'], 'A-2': ['Windows', 'X-box']}

You could do this if you want simple code, it's not pretty, but gets the job done.
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
list_1a = []
list_1b = []
list_1c = []
place = 0
for i in list_1[::1]:
if list_1[place] == 'A-1':
list_1a.append(list_2[place])
elif list_1[place] == 'A-2':
list_1b.append(list_2[place])
else:
list_1c.append(list_2[place])
place += 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Computing mean of all tuple values where 1st number is similar - python

Consider list of tuples [(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)] how to compute mean of all tuple values in pythonic way where 1st number is similar? for example the answer has to be [(7751, 0.532144902698135), (6631, 0.03942129)]

Related

list method to group elements in Python

Create sublists with duplicate list elements

Find duplicate values in list of tuples in Python

combine like lists withing a nested list

Finding index of values in a list dynamically

Categories

Resources