I am a new python user, and I need help about combining list elements under a condition.
I have a list like this:
x = [['a', 10, 20], ['b', 10, 20], ['a', 20, 100]]
I would like to combine list elements which start with the same letter in a list by summing up the other elements. for example, I'd like to obtain this list for x:
x = [['a', 30, 120], ['b', 10, 20]]
How can I achieve this ?
A one-liner using itertools.groupby():
In [45]: lis=[['a', 10, 20], ['b', 10, 20], ['a', 20, 100]]
In [46]: lis.sort(key=itemgetter(0)) #sort the list first
In [47]: lis
Out[47]: [['a', 10, 20], ['a', 20, 100], ['b', 10, 20]]
In [49]: [[k]+map(sum,zip(*[x[1:] for x in g])) for k,g in groupby(lis,key=itemgetter(0))]
Out[49]: [['a', 30, 120], ['b', 10, 20]]
A simple solution:
In [23]: lis=[['a', 10, 20], ['b', 10, 20], ['a', 20, 100]]
In [24]: ans=[]
In [25]: lis.sort(key=itemgetter(0)) #sort the list according to the first elem
In [26]: lis
Out[26]: [['a', 10, 20], ['a', 20, 100], ['b', 10, 20]]
In [27]: for x in lis:
if ans:
if x[0]==ans[-1][0]: #if the value of the first elem of last element in ans is same as x[0]
ans[-1][1]+=x[1]
ans[-1][2]+=x[2]
else:
ans.append(x)
else:ans.append(x)
....:
In [28]: ans
Out[28]: [['a', 30, 120], ['b', 10, 20]]
Without sorting the list using defaultdict():
In [69]: dic=defaultdict(list)
In [70]: for x in lis:
dic[x[0]].append(x[1:])
....:
In [71]: dic
Out[71]: defaultdict(<type 'list'>, {'a': [[10, 20], [20, 100]], 'b': [[10, 20]]})
In [72]: [[k]+map(sum,zip(*i)) for k,i in dic.items()]
Out[72]: [['a', 30, 120], ['b', 10, 20]]
Another approach using dict and map:
>>> x = [['a', 10, 20], ['b', 10, 20], ['a', 20, 100]]
>>> d = {}
>>> from operator import add
>>> for k, v1, v2 in x:
d[k] = map(add, d.get(k, [0, 0]), [v1, v2])
>>> d
{'a': [30, 120], 'b': [10, 20]}
I'm going to use the answer code for a huge data which include over millons elements. I'd like the reduce the list elements this way.
In such a case you probably don't want to be sorting the data or building a fully copy as you're iterating over it.
The following solution does neither. It can also handle sublists of any length (as long as all lengths are the same):
def add(d, l):
k = l[0] # extract the key
p = d.get(k, None) # see if we already have a partial sum for this key
if p:
d[k] = [x+y for x,y in zip(p, l[1:])] # add to the previous sum
else:
d[k] = l[1:] # create a new sum
return d
x = [['a', 10, 20], ['b', 10, 20], ['a', 20, 100]]
result = [[k] + v for k,v in reduce(add, x, {}).items()]
print(result)
Alternatively,
import collections, operator
x = [['a', 10, 20], ['b', 10, 20], ['a', 20, 100]]
d = collections.defaultdict(lambda:[0] * (len(x[0]) - 1))
for el in x:
d[el[0]] = map(operator.add, d[el[0]], el[1:])
result = [[k] + v for k,v in d.items()]
print(result)
This works exactly the same as the first version, but uses defaultdict and explicit iteration.
Related
I have a list of lists with multi columns:
column = [id, date,col1, col2...coln]
list_OfRows = [[1,date1, 10,20 ...23],
[1,date1, 1,10 ...33],
[2,date2, 3,7...8],
[2,date2, 21,9...23],
[2,date3, 10,56 ...20],
[2,date4, 10,20 ...42]]
I want to group by on id and date and do sum on cols WITHOUT USING PANDAS
RESULT = [[1,date1, 11,30 ...56],
[2,date2, 24,16...31],
[2,date3, 10,20 ...20],
[2,date4, 10,20 ...42]]
You can do it like this:
from itertools import groupby
list_OfRows.sort(key=lambda x: x[:2])
res = []
for k, g in groupby(list_OfRows, key=lambda x: x[:2]):
res.append(k + list(map(sum, zip(*[c[2:] for c in g]))))
which produces:
[[1, 'date1', 11, 30, 56],
[2, 'date2', 24, 16, 31],
[2, 'date3', 10, 56, 20],
[2, 'date4', 10, 20, 42]]
I have two lists like below,
l1=['a', 'b', 'c', 'c', 'a','a','d','b']
l2=[2, 4, 6, 8, 10, 12, 14, 16]
Now want to create a dictionary from above list such as- key would be unique from l1 and the values from l2 will be added,
so final dictionary would look like,
d={'a':24, 'b':20, 'c': 14, 'd':14}
I could do this using a for loop but execution time will be more, looking for some python shortcuts to do this most efficiently.
You can use collections.defaultdict for this with zip to iterate parallely:
from collections import defaultdict
l1 = ['a', 'b', 'c', 'c', 'a','a','d','b']
l2 = [2, 4, 6, 8, 10, 12, 14, 16]
d = defaultdict(int)
for k, v in zip(l1, l2):
d[k] += v
print(d)
# {'a': 24, 'b': 20, 'c': 14, 'd': 14}
With a dict of comprehension:
from more_itertools import unique_everseen
d = {i: sum([l2[x] for x in [y for y,val in enumerate(l1) if val==i]]) for i in list(unique_everseen(l1))}
Output:
{'a':24, 'b':20, 'c': 14, 'd':14}
l1 = ['a', 'b', 'c', 'c', 'a','a','d','b']
l2 = [2, 4, 6, 8, 10, 12, 14, 16]
idx = 0
d = {}
for v in l1:
d[v] = d.get(v, 0) + l2[idx]
idx += 1
print d
# {'a': 24, 'b': 20, 'c': 14, 'd': 14}
You have to use the zip() function. In it we iterate in the 2 lists, then we are creating a new dictionary key j which comes from the l1 and assigning a value to it i which comes from the l2. If the key from the l1 is in the dictionary key already, it value will be added as you wanted to.
l1=['a', 'b', 'c', 'c', 'a','a','d','b']
l2=[2, 4, 6, 8, 10, 12, 14, 16]
output = {}
for j, i in zip(l1, l2):
if j in output.keys():
output[j] = output[j] + i
else:
output[j] = i
print(output)
I am using Counter from collections to count the occurrence of some numbers. I am trying to put the numbers into one list and the count into another list.
The Counter(array) returns data that likes like {(30: 2, 26: 2, 18: 2)}. I would like for there to be two arrays, say A[] and B[], where A would be [30, 26, 18] and B would be [2, 2, 2].
How would I go about doing this?
You could just zip the items from the dict that Counter returns like,
>>> vals
[26, 26, 18, 18, 30, 30]
>>> import collections
>>> collections.Counter(vals)
Counter({26: 2, 18: 2, 30: 2})
>>> zip(*collections.Counter(vals).items())
[(26, 18, 30), (2, 2, 2)]
>>> a, b = zip(*collections.Counter(vals).items())
>>> a
(26, 18, 30)
>>> b
(2, 2, 2)
Counter is a subclass of dict, so you can use the normal dictionary methods
from collections import Counter
array = [1, 2, 3, 3, 4, 4, 4]
counter = Counter(array)
items = list(counter.keys())
counts = list(counter.values())
isinstance(counter, dict) # True
Use .items() to collect the keys and values:
from collections import Counter
d=['a', 'b', 'b', 'c']
l = Counter(d)
A=[k for k, v in l.items()]
print(A)
Result: ['a', 'b', 'c']
Counter is a spcialized dict. It has keys() and values() as well as items()
from collections import Counter
c =Counter( {30: 2, 26: 2, 18: 77} )
a = list(c.keys()) # make a list from the keys view
b = list(c.values()) # make a list from the values
# or # decompose the list of key,value tuples
A, B = map(list,zip(*c.items()))
print(a,b,A,B,sep="\n")
Output:
[30, 26, 18] # a
[2, 2, 77] # b
[30, 26, 18] # A
[2, 2, 77] # B
Doku:
zip()
map
I have two list contain multi dictionary, each dictionary has a list as value, these are my list:
list1 = [{'a':[12,22,61],'b':[21,12,50]},{'c':[10,11,47],'d':[13,20,45],'e':[11,24,42]},{'a':[12,22,61],'b':[21,12,50]}]
list2 = [{'f':[21,23,51],'g':[11,12,44]},{'h':[22,26,68],'i':[12,9,65],'j':[10,12,50]},{'f':[21,23,51],'g':[11,12,44]}]
In my case, i need to merge these list with this rule:
Dictionary from the first list (list1) only can be merged by
dictionary from the second list (list2) with the same listing index
After both of these list are merged, each dictionary has to be sorted based on the third number of its value
This is the expected result based on two rule above:
result = [
{'a':[12,22,61],'f':[21,23,51],'b':[21,12,50],'g':[11,12,44]},
{'h':[22,26,68],'i':[12,9,65],'j':[10,12,50],'c':[10,11,47],'d':[13,20,45],'e':[11,24,42]},
{'a':[12,22,61],'f':[21,23,51],'b':[21,12,50],'g':[11,12,44]}
]
How can i do that? is it possible to be done in python with inline looping?
Try:
[dict(a, **b) for a,b in zip(list1, list2)]
In one line (if you do not count with the import):
from collections import OrderedDict
[OrderedDict(sorted(dict(d1.items() + d2.items()).items(), key=lambda x: x[1][-1],
reverse=True)) for d1, d2 in zip(list1, list2)]
[OrderedDict([('a', [12, 22, 61]),
('f', [21, 23, 51]),
('b', [21, 12, 50]),
('g', [11, 12, 44])]),
OrderedDict([('h', [22, 26, 68]),
('i', [12, 9, 65]),
('j', [10, 12, 50]),
('c', [10, 11, 47]),
('d', [13, 20, 45]),
('e', [11, 24, 42])]),
OrderedDict([('a', [12, 22, 61]),
('f', [21, 23, 51]),
('b', [21, 12, 50]),
('g', [11, 12, 44])])]
This works in Python 2.7.
Dictionaries are not sorted by nature, so if you don't need them sorted your can merge them in a simple one-liner.
result = [ {**d1, **d2} for d1, d2 in zip(list1, list2) ] # python 3.5+
If you are using a lower version then define a merge function.
def merge(d1, d2):
result = d1.copy()
result.update(d2)
return result
And then have
result = [ merge(d1, d2) for d1, d2 in zip(list1, list2) ]
If you do need them sorted then your only option is to use an OrderedDict
from collections import OrderedDict
def merge(d1, d2):
tempD = d1.copy()
tempD.update(d2)
return OrderedDict(sorted(tempD.items(), key = lambda t: t[1][2], reverse = True))
result = [ merge(d1, d2) for d1, d2 in zip(list1, list2) ]
Or even shorter for python 3.5+ is
result = [ OrderedDict(sorted(({**d1, **d2}).items(), key = lambda t: t[1][2], reverse = True)) for d1, d2 in zip(list1, list2) ]
You can do like this for your result :
r = map(lambda x,y:dict(x.items() + y.items()), list1, list2)
Result :
[{'a': [12, 22, 61], 'b': [21, 12, 50], 'g': [11, 12, 44], 'f': [21, 23, 51]},
{'c': [10, 11, 47], 'e': [11, 24, 42], 'd': [13, 20, 45], 'i': [12, 9, 65], 'h': [22, 26, 68], 'j': [10, 12, 50]},
{'a': [12, 22, 61], 'b': [21, 12, 50], 'g': [11, 12, 44], 'f': [21, 23, 51]}]
I have a list of lists. If there are subslists that have the first three elements in common , merge them into one list and add all the fourth elements.
The problem is best explained in code and the required output.
a_list = [['apple', 50, 60, 7],
['orange', 70, 50, 8],
['apple', 50, 60, 12]]
# output:
# [['apple', 50, 60, 19], ['orange', 70, 50, 8]]
I already have code for a similar problem (given to me by another user in Stack Overflow some time ago), but i don't understand it completely so I'm unable to modify it accordingly. What this code does is it checks if the 0th and 2nd elements are the same, if they are, it merges the sublists, adding the 1st and 3th element:
import defaultdict
data = [['42x120x1800', 50, '50x90x800', 60],
['42x120x1800', 8, '50x90x800', 10],
['2x10x800', 5, '5x9x80', 6]]
d = defaultdict(lambda :[0, 0])
for sub_list in data:
key = (sub_list[0], sub_list[2])
d[key][0] += sub_list[1]
d[key][1] += sub_list[3]
new_data = [[key[0], val[0], key[1], val[1]] for key, val in d.iteritems()]
# [['2x10x800', 5, '5x9x80', 6], ['42x120x1800', 58, '50x90x800', 70]]
How should the code be modified to fit to my new problem? I'd really appreciate if you could also take the time and explain the code thoroughly, too.
You can use the same principle, by using the first three elements as a key, and using int as the default value factory for the defaultdict (so you get 0 as the initial value):
from collections import defaultdict
a_list = [['apple', 50, 60, 7],
['orange', 70, 50, 8],
['apple', 50, 60, 12]]
d = defaultdict(int)
for sub_list in a_list:
key = tuple(sub_list[:3])
d[key] += sub_list[-1]
new_data = [list(k) + [v] for k, v in d.iteritems()]
If you are using Python 3, you can simplify this to:
d = defaultdict(int)
for *key, v in a_list:
d[tuple(key)] += v
new_data = [list(k) + [v] for k, v in d.items()]
because you can use a starred target to take all 'remaining' values from a list, so each sublist is assigned mostly to key and the last value is assigned to v, making the loop just that little simpler (and there is no .iteritems() method on a dict in Python 3, because .items() is an iterator already).
So, we use a defaultdict that uses 0 as the default value, then for each key generated from the first 3 values (as a tuple so you can use it as a dictionary key) sum the last value.
So for the first item ['apple', 50, 60, 7] we create a key ('apple', 50, 60), look that up in d (where it doesn't exist, but defaultdict will then use int() to create a new value of 0), and add the 7 from that first item.
Do the same for the ('orange', 70, 50) key and value 8.
for the 3rd item we get the ('apple', 50, 60) key again and add 12 to the pre-existing 7 in d[('apple', 50, 60)]. for a total of 19.
Then we turn the (key, value) pairs back into lists and you are done. This results in:
>>> new_data
[['apple', 50, 60, 19], ['orange', 70, 50, 8]]
An alternative implementation that requires sorting the data uses itertools.groupby:
from itertools import groupby
from operator import itemgetter
a_list = [['apple', 50, 60, 7],
['orange', 70, 50, 8],
['apple', 50, 60, 12]]
newlist = [list(key) + [sum(i[-1] for i in sublists)]
for key, sublists in groupby(sorted(a_list), key=itemgetter(0, 1, 2))]
for the same output. This is going to be slower if your data isn't sorted, but it's good to know of different approaches.
I'd do something like this:
>>> a_list = [['apple', 50, 60, 7],
... ['orange', 70, 50, 8],
... ['apple', 50, 60, 12]]
>>>
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> from operator import itemgetter
>>> getter = itemgetter(0,1,2)
>>> for lst in a_list:
... d[getter(lst)].extend(lst[3:])
...
>>> d
defaultdict(<type 'list'>, {('apple', 50, 60): [7, 12], ('orange', 70, 50): [8]})
>>> print [list(k)+v for k,v in d.items()]
[['apple', 50, 60, 7, 12], ['orange', 70, 50, 8]]
This doesn't give the sum however. It could be easily be fixed by doing:
print [list(k)+[sum(v)] for k,v in d.items()]
There isn't much of a reason to prefer this over the slightly more elegant solution by Martijn, other than it will allow the user to have an input list with more than 4 items (with the latter elements being summed as expected). In other words, this would pass the list:
a_list = [['apple', 50, 60, 7, 12],
['orange', 70, 50, 8]]
as well.
Form the key from [:3] so that you get the first 3 elements.