combine like lists withing a nested list - python

i have a large nested list, and with in each nested list are two values, a company name and an amount, i am wondering if there is a way to combine the nested lists that have the same name together and then add the values? so for example here is a section of the list
[['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85], ['Gerdau', 13019.63676], ['Gerdau', 34107.12062], ['Acer', 52153.02848]
i would expect an outcome that looks like the one below
[['Acer',(481242.74+52153.02848)],['Beko', 966071.86],['Cemex', 187242.16],['Datsun', 748502.91],['Equifax', 146517.59],['Gerdau',(898579.89+13019.63676+34107.12062)],['Haribo', 265333.85]]
so essentially im trying to write a code that will go through a nested list and return a list made by finding all the lists with the same [0] element and combining there [1] element

from collections import defaultdict
d = defaultdict(float)
for name, amt in a:
d[name] += amt
What this does is to create a dict where the amount will be zero (float()) by default, and then sum up using the names as keys.
If you really need the result to be a list, you can get it this way:
>>> print d.items()
[('Equifax', 146517.59), ('Haribo', 265333.85), ('Gerdau', 945706.64738), ('Cemex', 187242.16), ('Datsun', 748502.91), ('Beko', 966071.86), ('Acer', 533395.76848)]

defaultdict is probably a good way to go but you can do this with a normal dictionary:
>>> data = [['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ...]
>>> result = {}
>>> for k, v in data:
... result[k] = result.get(k, 0) + v
>>> result
{'Acer': 533395.76848, 'Beko': 966071.86, 'Cemex': 187242.16, ... }
>>> list(result.items())
[('Acer', 533395.76848), ('Beko', 966071.86), ('Cemex', 187242.16), ...]

from collections import defaultdict
d = defaultdict(list)
l=[['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85], ['Gerdau', 13019.63676], ['Gerdau', 34107.12062], ['Acer', 52153.02848]]
for k,v in l:
d[k].append(v)
w=[]
for x,y in d.items():
w.append([x,sum(y)])
print w
Prints-
[['Equifax', 146517.59], ['Haribo', 265333.85], ['Gerdau', 945706.64738], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Beko', 966071.86], ['Acer', 533395.76848]]
If want to make it a tuple
map(tuple,w)
Output is-
[('Equifax', 146517.59), ('Haribo', 265333.85), ('Gerdau', 945706.64738), ('Cemex', 187242.16), ('Datsun', 748502.91), ('Beko', 966071.86), ('Acer', 533395.76848)]

Related

Create sublists with duplicate list elements

I'm new to Python and I'm trying to create sublists for list elements sharing the same base:
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
desiredOutput = [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
I've tried to first isolate the base from each list element using:
def commonNumerator(self):
checkPosition = self.find('/')
commonNumerator = self[:checkPosition]
return commonNumerator
listRawModified = [commonNumerator(x) for x in listRaw]
print(listRawModified)
which gets me:
['AKS', 'SBHS', 'SBJ', 'SBJ', 'AKS', 'SBHS', 'AKS']
but from then I don't know how to proceed to get to the desired ouput.
Can someone explain to me how to do it?
Typical usecase for itertools.groupby():
from itertools import groupby
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
def key(s):
return s.split('/')[0]
[list(g) for k, g in groupby(sorted(listRaw, key=key), key=key)]
# [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
The key() function helps in extracting the sorting/grouping key: key('AKS/STB') == 'AKS'.
Another way to do this would be to split each element and create a dictionary and then construct your desired output from that dictionary, e.g.:
In []:
d = {}
for i in listRaw:
k, v = i.split('/')
d.setdefault(k, []).append(v)
[['/'.join([k, v]) for v in d[k]] for k in d]
Out[]:
[['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
This is a typical usecase for itertools. But you could also consider storing the values in a dictionary:
from collections import defaultdict
d = defaultdict(list)
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
for item in listRaw:
i,y = item.split('/')
d[i].append(y)
print(dict(d))
# {'AKS': ['STB', 'OSMX', 'AKX'], 'SBHS': ['AME', 'ABNX'], 'SBJ': ['OAK', 'ALS']}
You can then access the values to AKS with a simple command as:
d['AKS'] # ['STB', 'OSMX', 'AKX']

Append list based on another element in list and remove lists that contained the items

Let's say I have two lists like this:
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
I want to combine the items together by their holder (i.e 'Robert') only when they are in separate lists. Ie in the end list_all should contain:
list_all = [[['some_name','something_else'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
What is a fast and effective way of doing it?
I've tried in different ways but I'm looking for something more elegant, more simplistic.
Thank you
Here is one solution. It is often better to store your data in a more structured form, e.g. a dictionary, rather than manipulate from one list format to another.
from collections import defaultdict
list_all = [[['some_item'],'Robert'],
[['another_item'],'Robert'],
[['itemx'],'Adam'],
[['item2','item3'],'Maurice']]
d = defaultdict(list)
for i in list_all:
d[i[1]].extend(i[0])
# defaultdict(list,
# {'Adam': ['itemx'],
# 'Maurice': ['item2', 'item3'],
# 'Robert': ['some_item', 'another_item']})
d2 = [[v, k] for k, v in d.items()]
# [[['some_item', 'another_item'], 'Robert'],
# [['itemx'], 'Adam'],
# [['item2', 'item3'], 'Maurice']]
You can try this, though it's quite similar to above answer but you can do this without importing anything.
list_all = [[['some_item'], 'Robert'], [['another_item'], 'Robert'], [['itemx'], 'Adam'], [['item2', 'item3'], 'Maurice']]
x = {} # initializing a dictionary to store the data
for i in list_all:
try:
x[i[1]].extend(i[0])
except KeyError:
x[i[1]] = i[0]
list2 = [[j, i ] for i,j in x.items()]
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice']]
dict_value = {}
for val in list_all:
list_, name = val
if name in dict_value:
dict_value[name][0].extend(list_)
else:
dict_value.setdefault(name,[list_, name])
print(list(dict_value.values()))
>>>[[['some_item', 'another_item'], 'Robert'],
[['itemx'], 'Adam'],
[['item2', 'item3'], 'Maurice']]

Computing mean of all tuple values where 1st number is similar

Consider list of tuples
[(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
how to compute mean of all tuple values in pythonic way where 1st number is similar? for example the answer has to be
[(7751, 0.532144902698135), (6631, 0.03942129)]
One way is using collections.defaultdict
from collections import defaultdict
lst = [(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
d_dict = defaultdict(list)
for k,v in lst:
d_dict[k].append(v)
[(k,sum(v)/len(v)) for k,v in d_dict.items()]
#[(7751, 0.5321449026981354), (6631, 0.03942129)]
You do with groupby ,
from itertools import groupby
result = []
for i,g in groupby(sorted(lst),key=lambda x:x[0]):
grp = list(g)
result.append((i,sum(i[1] for i in grp)/len(grp)))
Using, list comprehension,
def get_avg(g):
grp = list(g)
return sum(i[1] for i in grp)/len(grp)
result = [(i,get_avg(g)) for i,g in groupby(sorted(lst),key=lambda x:x[0])]
Result
[(6631, 0.03942129), (7751, 0.5321449026981354)]
groupby from itertools is your friend:
>>> l=[(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
>>> #importing libs:
>>> from itertools import groupby
>>> from statistics import mean #(only python >= 3.4)
>>> # mean=lambda l: sum(l) / float(len(l)) #(for python < 3.4) (*1)
>>> #set the key to group and sort and sorting
>>> k=lambda x: x[0]
>>> data = sorted(l, key=k)
>>> #here it is, pythonic way:
>>> [ (k, mean([m[1] for m in g ])) for k, g in groupby(data, k) ]
Results:
[(6631, 0.03942129), (7751, 0.5321449026981354)]
EDITED (*1) Thanks Elmex80s to refer me to mean.

Python saving to a file all keys to a value on the same line

Ok so I am trying to save all the keys that as the same value on the same line.
lista = {'Cop': '911', 'Police chief': '911'}
spara = lista
fil = open("test" + ".txt","w")
print "savin to file "
for keys, values in spara.items():
spara_content = spara[keys] + ";" + keys
fil.write(spara_content)
fil.write(";")
fil.write("\n")
fil.close()
print lista
The code saves like this right now
911;Cop;
911;Police chief;
But i need the code to be like this when a key has the same value.
911;Cop;Police chief;
Sort lista dictionary items list by value (like 911), then iterate over all groups with the same value (like 911), and then just join/print all keys in each group (with group-unique value prepended):
>>> from operator import itemgetter
>>> from itertools import groupby
>>> lista = {'Cop': '911', 'Police chief': '911'}
>>> [";".join([k]+[v[0] for v in vs]) for k,vs in groupby(sorted(vals.items(), key=itemgetter(1)), itemgetter(1))]
['911;Cop;Police chief']
You can try this:
lista = {'Cop': '911', 'Police chief': '911'}
from collections import defaultdict
d = defaultdict(list)
fil = open("test.txt","w")
for a, b in lista.items():
d[b].append(a)
for a, b in d.items():
fil.write(a+';'+';'.join(b)+"\n")
fil.close()

Finding index of values in a list dynamically

I am having two lists as follows:
list_1
['A-1','A-1','A-1','A-2','A-2','A-3']
list_2
['iPad','iPod','iPhone','Windows','X-box','Kindle']
I would like to split the list_2 based on the index values in list_1. For instance,
list_a1
['iPad','iPod','iPhone']
list_a2
['Windows','X-box']
list_a3
['Kindle']
I know index method, but it needs the value to be matched to be passed along with. In this case, I would like to dynamically find the indexes of the values in list_1 with the same value. Is this possible? Any tips/hints would be deeply appreciated.
Thanks.
There are a few ways to do this.
I'd do it by using zip and groupby.
First:
>>> list(zip(list_1, list_2))
[('A-1', 'iPad'),
('A-1', 'iPod'),
('A-1', 'iPhone'),
('A-2', 'Windows'),
('A-2', 'X-box'),
('A-3', 'Kindle')]
Now:
>>> import itertools, operator
>>> [(key, list(group)) for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('A-1', [('A-1', 'iPad'), ('A-1', 'iPod'), ('A-1', 'iPhone')]),
('A-2', [('A-2', 'Windows'), ('A-2', 'X-box')]),
('A-3', [('A-3', 'Kindle')])]
So, you just want each group, ignoring the key, and you only want the second element of each element in the group. You can get the second element of each group with another comprehension, or just by unzipping:
>>> [list(zip(*group))[1] for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('iPad', 'iPod', 'iPhone'), ('Windows', 'X-box'), ('Kindle',)]
I would personally find this more readable as a sequence of separate iterator transformations than as one long expression. Taken to the extreme:
>>> ziplists = zip(list_1, list_2)
>>> pairs = itertools.groupby(ziplists, operator.itemgetter(0))
>>> groups = (group for key, group in pairs)
>>> values = (zip(*group)[1] for group in groups)
>>> [list(value) for value in values]
… but a happy medium of maybe 2 or 3 lines is usually better than either extreme.
Usually I'm the one rushing to a groupby solution ;^) but here I'll go the other way and manually insert into an OrderedDict:
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
from collections import OrderedDict
d = OrderedDict()
for code, product in zip(list_1, list_2):
d.setdefault(code, []).append(product)
produces a d looking like
>>> d
OrderedDict([('A-1', ['iPad', 'iPod', 'iPhone']),
('A-2', ['Windows', 'X-box']), ('A-3', ['Kindle'])])
with easy access:
>>> d["A-2"]
['Windows', 'X-box']
and we can get the list-of-lists in list_1 order using .values():
>>> d.values()
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
If you've noticed that no one is telling you how to make a bunch of independent lists with names like list_a1 and so on-- that's because that's a bad idea. You want to keep the data together in something which you can (at a minimum) iterate over easily, and both dictionaries and list of lists qualify.
Maybe something like this?
#!/usr/local/cpython-3.3/bin/python
import pprint
import collections
def main():
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
result = collections.defaultdict(list)
for list_1_element, list_2_element in zip(list_1, list_2):
result[list_1_element].append(list_2_element)
pprint.pprint(result)
main()
Using itertools.izip_longest and itertools.groupby:
>>> from itertools import groupby, izip_longest
>>> inds = [next(g)[0] for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
First group items of list_1 and find the starting index of each group:
>>> inds
[0, 3, 5]
Now use slicing and izip_longest as we need pairs list_2[0:3], list_2[3:5], list_2[5:]:
>>> [list_2[x:y] for x, y in izip_longest(inds, inds[1:])]
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
To get a list of dicts you can something like:
>>> inds = [next(g) for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
>>> {k: list_2[ind1: ind2[0]] for (ind1, k), ind2 in
zip_longest(inds, inds[1:], fillvalue=[None])}
{'A-1': ['iPad', 'iPod', 'iPhone'], 'A-3': ['Kindle'], 'A-2': ['Windows', 'X-box']}
You could do this if you want simple code, it's not pretty, but gets the job done.
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
list_1a = []
list_1b = []
list_1c = []
place = 0
for i in list_1[::1]:
if list_1[place] == 'A-1':
list_1a.append(list_2[place])
elif list_1[place] == 'A-2':
list_1b.append(list_2[place])
else:
list_1c.append(list_2[place])
place += 1

Categories