How to reduce on a list of tuples in python

How to reduce on a list of tuples in python - python

I have an array and I want to count the occurrence of each item in the array.
I have managed to use a map function to produce a list of tuples.
def mapper(a):
return (a, 1)
r = list(map(lambda a: mapper(a), arr));
//output example:
//(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1)
I'm expecting the reduce function can help me to group counts by the first number (id) in each tuple. For example:
(11817685, 2), (2014036792, 1), (2014047115, 1)
I tried
cnt = reduce(lambda a, b: a + b, r);
and some other ways but they all don't do the trick.
NOTE
Thanks for all the advice on other ways to solve the problems, but I'm just learning Python and how to implement a map-reduce here, and I have simplified my real business problem a lot to make it easy to understand, so please kindly show me a correct way of doing map-reduce.

You could use Counter:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
counter = Counter(arr)
print zip(counter.keys(), counter.values())
EDIT:
As pointed by #ShadowRanger Counter has items() method:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
print Counter(arr).items()

Instead of using any external module you can use some logic and do it without any module:
track={}
if intr not in track:
track[intr]=1
else:
track[intr]+=1
Example code :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
output:
{2008: [9], 2006: [1, 5], 2007: [4]}

After writing my answer to a different question, I remembered this post and thought it would be helpful to write a similar answer here.
Here is a way to use reduce on your list to get the desired output.
arr = [11817685, 2014036792, 2014047115, 11817685]
def mapper(a):
return (a, 1)
def reducer(x, y):
if isinstance(x, dict):
ykey, yval = y
if ykey not in x:
x[ykey] = yval
else:
x[ykey] += yval
return x
else:
xkey, xval = x
ykey, yval = y
a = {xkey: xval}
if ykey in a:
a[ykey] += yval
else:
a[ykey] = yval
return a
mapred = reduce(reducer, map(mapper, arr))
print mapred.items()
Which prints:
[(2014036792, 1), (2014047115, 1), (11817685, 2)]
Please see the linked answer for a more detailed explanation.

If all you need is cnt, then a dict would probably be better than a list of tuples here (if you need this format, just use dict.items).
The collections module has a useful data structure for this, a defaultdict.
from collections import defaultdict
cnt = defaultdict(int) # create a default dict where the default value is
# the result of calling int
for key in arr:
cnt[key] += 1 # if key is not in cnt, it will put in the default
# cnt_list = list(cnt.items())

Related

Handle dictionary collision in python3

I currently have the code below working fine:
Can someone help me solve the collision created from having two keys with the same number in the dictionary?
I tried multiple approach (not listed here) to try create an array to handle it but my approaches are still unsuccessful.
I am using #python3.7
def find_key(dic1, n):
'''
Return the key '3' from the dict
below.
'''
d = {}
for x, y in dic1.items():
# swap keys and values
# and update the result to 'd'
d[y] = x
try:
if n in d:
return d[y]
except Exception as e:
return (e)
dic1 = {'james':2,'david':3}
# Case to test that return ‘collision’
# comment 'dic1' above and replace it by
# dic1 below to create a 'collision'
# dic1 = {'james':2,'david':3, 'sandra':3}
n = 3
print(find_key(dic1,n))
Any help would be much appreciated.

You know there should be multiple returns, so plan for that in advance.
def find_keys_for_value(d, value):
for k, v in d.items():
if v == value:
yield k
data = {'james': 2, 'david': 3, 'sandra':3}
for result in find_keys_for_value(data, 3):
print (result)

You can use a defaultdict:
from collections import defaultdict
def find_key(dct, n):
dd = defaultdict(list)
for x, y in dct.items():
dd[y].append(x)
return dd[n]
dic1 = {'james':2, 'david':3, 'sandra':3}
print(find_key(dic1, 3))
print(find_key(dic1, 2))
print(find_key(dic1, 1))
Output:
['david', 'sandra']
['james']
[]
Building a defaultdict from all keys and values is only justified if you will repeatedly search for keys of the same dict given different values, though. Otherwise, the approach of Kenny Ostrom is preferrable. In any case, the above makes little sense if left as it stands.
If you are not at ease with generators and yield, here is the approach of Kenny Ostrom translated to lists (less efficient than generators, better than the above for one-shot searches):
def find_key(dct, n):
return [x for x, y in dct.items() if y == n]
The output is the same as above.

Python: Removing list duplicates based on first 2 inner list values

Question:
I have a list in the following format:
x = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
The algorithm:
Combine all inner lists with the same starting 2 values, the third value doesn't have to be the same to combine them
e.g. "hello",0,5 is combined with "hello",0,8
But not combined with "hello",1,1
The 3rd value becomes the average of the third values: sum(all 3rd vals) / len(all 3rd vals)
Note: by all 3rd vals I am referring to the 3rd value of each inner list of duplicates
e.g. "hello",0,5 and "hello",0,8 becomes hello,0,6.5
Desired output: (Order of list doesn't matter)
x = [["hello",0,6.5], ["hi",0,6], ["hello",1,1]]
Question:
How can I implement this algorithm in Python?
Ideally it would be efficient as this will be used on very large lists.
If anything is unclear let me know and I will explain.
Edit: I have tried to change the list to a set to remove duplicates, however this doesn't account for the third variable in the inner lists and therefore doesn't work.
Solution Performance:
Thanks to everyone who has provided a solution to this problem! Here
are the results based on a speed test of all the functions:

Update using running sum and count
I figured out how to improve my previous code (see original below). You can keep running totals and counts, then compute the averages at the end, which avoids recording all the individual numbers.
from collections import defaultdict
class RunningAverage:
def __init__(self):
self.total = 0
self.count = 0
def add(self, value):
self.total += value
self.count += 1
def calculate(self):
return self.total / self.count
def func(lst):
thirds = defaultdict(RunningAverage)
for sub in lst:
k = tuple(sub[:2])
thirds[k].add(sub[2])
lst_out = [[*k, v.calculate()] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
Original answer
This probably won't be very efficient since it has to accumulate all the values to average them. I think you could get around that by having a running average with a weighting factored in, but I'm not quite sure how to do that.
from collections import defaultdict
def avg(nums):
return sum(nums) / len(nums)
def func(lst):
thirds = defaultdict(list)
for sub in lst:
k = tuple(sub[:2])
thirds[k].append(sub[2])
lst_out = [[*k, avg(v)] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]

You can try using groupby.
m = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
from itertools import groupby
m.sort(key=lambda x:x[0]+str(x[1]))
for i,j in groupby(m, lambda x:x[0]+str(x[1])):
ss=0
c=0.0
for k in j:
ss+=k[2]
c+=1.0
print [k[0], k[1], ss/c]

This should be O(N), someone correct me if I'm wrong:
def my_algorithm(input_list):
"""
:param input_list: list of lists in format [string, int, int]
:return: list
"""
# Dict in format (string, int): [int, count_int]
# So our list is in this format, example:
# [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
# so for our dict we will make keys a tuple of the first 2 values of each sublist (since that needs to be unique)
# while values are a list of third element from our sublist + counter (which counts every time we have a duplicate
# key, so we can divide it and get average).
my_dict = {}
for element in input_list:
# key is a tuple of the first 2 values of each sublist
key = (element[0], element[1])
if key not in my_dict:
# If the key do not exists add it.
# Value is in form of third element from our sublist + counter. Since this is first value set counter to 1
my_dict[key] = [element[2], 1]
else:
# If key does exist then increment our value and increment counter by 1
my_dict[key][0] += element[2]
my_dict[key][1] += 1
# we have a dict so we will need to convert it to list (and on the way calculate averages)
return _convert_my_dict_to_list(my_dict)
def _convert_my_dict_to_list(my_dict):
"""
:param my_dict: dict, key is in form of tuple (string, int) and values are in form of list [int, int_counter]
:return: list
"""
my_list = []
for key, value in my_dict.items():
sublist = [key[0], key[1], value[0]/value[1]]
my_list.append(sublist)
return my_list
my_algorithm(x)
This will return:
[['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
While your expected return is:
[["hello", 0, 6.5], ["hi", 0, 6], ["hello", 1, 1]]
If you really need ints then you can modify _convert_my_dict_to_list function.

Here's my variation on this theme: a groupby sans the expensive sort. I also changed the problem to make the input and output a list of tuples as these are fixed-size records:
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
data = [("hello", 0, 5), ("hi", 0, 6), ("hello", 0, 8), ("hello", 1, 1)]
dictionary = defaultdict(complex)
for key, group in groupby(data, itemgetter(slice(2))):
total = sum(value for (string, number, value) in group)
dictionary[key] += total + 1j
array = [(*key, value.real / value.imag) for key, value in dictionary.items()]
print(array)
OUTPUT
> python3 test.py
[('hello', 0, 6.5), ('hi', 0, 6.0), ('hello', 1, 1.0)]
>
Thanks to #wjandrea for the itemgetter replacement for lambda. (And yes, I am using complex numbers in passing for the average to track the total and count.)

Keeping the order of an OrderedDict

I have an OrderedDict that I'm passing to a function. Somewhere in the function it changes the ordering, though I'm not sure why and am trying to debug it. Here is an example of the function and the function and output:
def unnest_data(data):
path_prefix = ''
UNNESTED = OrderedDict()
list_of_subdata = [(data, ''),] # (data, prefix)
while list_of_subdata:
for subdata, path_prefix in list_of_subdata:
for key, value in subdata.items():
path = (path_prefix + '.' + key).lstrip('.').replace('.[', '[')
if not (isinstance(value, (list, dict))):
UNNESTED[path] = value
elif isinstance(value, dict):
list_of_subdata.append((value, path))
elif isinstance(value, list):
list_of_subdata.extend([(_, path) for _ in value])
list_of_subdata.remove((subdata, path_prefix))
if not list_of_subdata: break
return UNNESTED
Then, if I call it:
from collections import OrderedDict
data = OrderedDict([('Item', OrderedDict([('[#ID]', '288917'), ('Main', OrderedDict([('Platform', 'iTunes'), ('PlatformID', '353736518')])), ('Genres', OrderedDict([('Genre', [OrderedDict([('[#FacebookID]', '6003161475030'), ('Value', 'Comedy')]), OrderedDict([('[#FacebookID]', '6003172932634'), ('Value', 'TV-Show')])])]))]))])
unnest_data(data)
I get an OrderedDict that doesn't match the ordering of my original one:
OrderedDict([('Item[#ID]', '288917'), ('Item.Genres.Genre[#FacebookID]', ['6003172932634', '6003161475030']), ('Item.Genres.Genre.Value', ['TV-Show', 'Comedy']), ('Item.Main.Platform', 'iTunes'), ('Item.Main.PlatformID', '353736518')])
Notice how it has "Genre" before "PlatformID", which is not the way it was sorted in the original dict. What seems to be my error here and how would I fix it?

It’s hard to say exactly what’s wrong without a complete working example. But based on the code you’ve shown, I suspect your problem isn’t with OrderedDict at all, but rather that you’re modifying list_of_subdata while iterating through it, which will result in items being unexpectedly skipped.
>>> a = [1, 2, 3, 4, 5, 6, 7]
>>> for x in a:
... print(x)
... a.remove(x)
...
1
3
5
7
Given your use, consider a deque instead of a list.

I want to write a function that takes a list and returns a count of total number of duplicate elements in the list

I have tried this, for some unknown reason when it prints h, it prints None, so i thought if it counts the number of None printed then divided by 2 it will give the number of duplicates, but i cant use function count here
a= [1,4,"hii",2,4,"hello","hii"]
def duplicate(L):
li=[]
lii=[]
h=""
for i in L:
y= L.count(i)
if y>1:
h=y
print h
print h.count(None)
duplicate(a)

Use the Counter container:
from collections import Counter
c = Counter(['a', 'b', 'a'])
c is now a dictionary with the data: Counter({'a': 2, 'b': 1})
If you want to get a list with all duplicated elements (with no repetition), you can do as follows:
duplicates = filter(lambda k: c[k] > 1, c.iterkeys())
If you want to only count the duplicates, you can then just set
duplicates_len = len(duplicates)

You can use a set to get the count of unique elements, and then compare the sizes - something like that:
def duplicates(l):
uniques = set(l)
return len(l) - len(uniques)

i found an answer which is
a= [1,4,"hii",2,4,"hello",7,"hii"]
def duplicate(L):
li=[]
for i in L:
y= L.count(i)
if y>1:
li.append(i)
print len(li)/2
duplicate(a)

the answer by egualo is much better, but here is another way using a dictionary.
def find_duplicates(arr):
duplicates = {}
duplicate_elements = []
for element in arr:
if element not in duplicates:
duplicates[element] = False
else:
if duplicates[element] == False:
duplicate_elements.append(element)
duplicates[element] = True
return duplicate_elements
It's pretty simple and doesn't go through the lists twice which is kind of nice.
>> test = [1,2,3,1,1,2,2,4]
>> find_duplicates(test)
[1, 2]

In Python, How can I get the next and previous key:value of a particular key in a dictionary?

Okay, so this is a little hard to explain, but here goes:
I have a dictionary, which I'm adding content to. The content is a hashed username (key) with an IP address (value).
I was putting the hashes into an order by running them against base 16, and then using Collection.orderedDict.
So, the dictionary looked a little like this:
d = {'1234': '8.8.8.8', '2345':'0.0.0.0', '3213':'4.4.4.4', '4523':'1.1.1.1', '7654':'1.3.3.7', '9999':'127.0.0.1'}
What I needed was a mechanism that would allow me to pick one of those keys, and get the key/value item one higher and one lower. So, for example, If I were to pick 2345, the code would return the key:value combinations '1234:8.8.8.8' and '3213:4.4.4.4'
So, something like:
for i in d:
while i < len(d)
if i == '2345':
print i.nextItem
print i.previousItem
break()

Edit: OP now states that they are using OrderedDicts but the use case still requires this sort of approach.
Since dicts are not ordered you cannot directly do this. From your example, you are trying to reference the item like you would use a linked list.
A quick solution would be instead to extract the keys and sort them then iterate over that list:
keyList=sorted(d.keys())
for i,v in enumerate(keyList):
if v=='eeee':
print d[keyList[i+1]]
print d[keyList[i-1]]
The keyList holds the order of your items and you have to go back to it to find out what the next/previous key is to get the next/previous value. You also have to check for i+1 being greater than the list length and i-1 being less than 0.
You can use an OrderedDict similarly but I believe that you still have to do the above with a separate list as OrderedDict doesn't have next/prev methods.

As seen in the OrderedDict source code,
if you have a key and you want to find the next and prev in O(1) here's how you do that.
>>> from collections import OrderedDict
>>> d = OrderedDict([('aaaa', 'a',), ('bbbb', 'b'), ('cccc', 'c'), ('dddd', 'd'), ('eeee', 'e'), ('ffff', 'f')])
>>> i = 'eeee'
>>> link_prev, link_next, key = d._OrderedDict__map['eeee']
>>> print 'nextKey: ', link_next[2], 'prevKey: ', link_prev[2]
nextKey: ffff prevKey: dddd
This will give you next and prev by insertion order. If you add items in random order then just keep track of your items in sorted order.

You could also use the list.index() method.
This function is more generic (you can check positions +n and -n), it will catch attempts at searching a key that's not in the dict, and it will also return None if there's nothing before of after the key:
def keyshift(dictionary, key, diff):
if key in dictionary:
token = object()
keys = [token]*(diff*-1) + sorted(dictionary) + [token]*diff
newkey = keys[keys.index(key)+diff]
if newkey is token:
print None
else:
print {newkey: dictionary[newkey]}
else:
print 'Key not found'
keyshift(d, 'bbbb', -1)
keyshift(d, 'eeee', +1)

Try:
pos = 0
d = {'aaaa': 'a', 'bbbb':'b', 'cccc':'c', 'dddd':'d', 'eeee':'e', 'ffff':'f'}
for i in d:
pos+=1
if i == 'eeee':
listForm = list(d.values())
print(listForm[pos-1])
print(listForm[pos+1])
As in #AdamKerz's answer enumerate seems pythonic, but if you are a beginner this code might help you understand it in an easy way.
And I think its faster + smaller compared to sorting followed by building list & then enumerating

You could use a generic function, based on iterators, to get a moving window (taken from this question):
import itertools
def window(iterable, n=3):
it = iter(iterable)
result = tuple(itertools.islice(it, n))
if len(result) == n:
yield result
for element in it:
result = result[1:] + (element,)
yield result
l = range(8)
for i in window(l, 3):
print i
Using the above function with OrderedDict.items() will give you three (key, value) pairs, in order:
d = collections.OrderedDict(...)
for p_item, item, n_item in window(d.items()):
p_key, p_value = p_item
key, value = item
# Or, if you don't care about the next value:
n_key, _ = n_item
Of course using this function the first and last values will never be in the middle position (although this should not be difficult to do with some adaptation).
I think the biggest advantage is that it does not require table lookups in the previous and next keys, and also that it is generic and works with any iterable.

Maybe it is an overkill, but you can keep Track of the Keys inserted with a Helper Class and according to that list, you can retrieve the Key for Previous or Next. Just don't forget to check for border conditions, if the objects is already first or last element. This way, you will not need to always resort the ordered list or search for the element.
from collections import OrderedDict
class Helper(object):
"""Helper Class for Keeping track of Insert Order"""
def __init__(self, arg):
super(Helper, self).__init__()
dictContainer = dict()
ordering = list()
#staticmethod
def addItem(dictItem):
for key,value in dictItem.iteritems():
print key,value
Helper.ordering.append(key)
Helper.dictContainer[key] = value
#staticmethod
def getPrevious(key):
index = (Helper.ordering.index(key)-1)
return Helper.dictContainer[Helper.ordering[index]]
#Your unordered dictionary
d = {'aaaa': 'a', 'bbbb':'b', 'cccc':'c', 'dddd':'d', 'eeee':'e', 'ffff':'f'}
#Create Order over keys
ordered = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
#Push your ordered list to your Helper class
Helper.addItem(ordered)
#Get Previous of
print Helper.getPrevious('eeee')
>>> d

You can store the keys and values in temp variable in prior, and can access previous and next key,value pair using index.
It is pretty dynamic, will work for any key you query. Please check this code :
d = {'1234': '8.8.8.8', '2345':'0.0.0.0', '3213':'4.4.4.4', '4523':'1.1.1.1', '7654':'1.3.3.7', '9999':'127.0.0.1'}
ch = raw_input('Pleasure Enter your choice : ')
keys = d.keys()
values = d.values()
#print keys, values
for k,v in d.iteritems():
if k == ch:
ind = d.keys().index(k)
print keys[ind-1], ':',values[ind-1]
print keys[ind+1], ':',values[ind+1]

I think this is a nice Pythonic way of resolving your problem using a lambda and list comprehension, although it may not be optimal in execution time:
import collections
x = collections.OrderedDict([('a','v1'),('b','v2'),('c','v3'),('d','v4')])
previousItem = lambda currentKey, thisOrderedDict : [
list( thisOrderedDict.items() )[ z - 1 ] if (z != 0) else None
for z in range( len( thisOrderedDict.items() ) )
if (list( thisOrderedDict.keys() )[ z ] == currentKey) ][ 0 ]
nextItem = lambda currentKey, thisOrderedDict : [
list( thisOrderedDict.items() )[ z + 1 ] if (z != (len( thisOrderedDict.items() ) - 1)) else None
for z in range( len( thisOrderedDict.items() ) )
if (list( thisOrderedDict.keys() )[ z ] == currentKey) ][ 0 ]
assert previousItem('c', x) == ('b', 'v2')
assert nextItem('c', x) == ('d', 'v4')
assert previousItem('a', x) is None
assert nextItem('d',x) is None

Another way that seems simple and straight forward: this function returns the key which is offset positions away from k
def get_shifted_key(d:dict, k:str, offset:int) -> str:
l = list(d.keys())
if k in l:
i = l.index(k) + offset
if 0 <= i < len(l):
return l[i]
return None

i know how to get next key:value of a particular key in a dictionary:
flag = 0
for k, v in dic.items():
if flag == 0:
code...
flag += 1
continue
code...{next key and value in for}

if correct :
d = { "a": 1, "b":2, "c":3 }
l = list( d.keys() ) # make a list of the keys
k = "b" # the actual key
i = l.index( k ) # get index of the actual key
for the next :
i = i+1 if i+1 < len( l ) else 0 # select next index or restart 0
n = l [ i ]
d [ n ]
for the previous :
i = i-1 if i-1 >= 0 else len( l ) -1 # select previous index or go end
p = l [ i ]
d [ p ]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to reduce on a list of tuples in python - python

Related

Handle dictionary collision in python3

Python: Removing list duplicates based on first 2 inner list values

Keeping the order of an OrderedDict

I want to write a function that takes a list and returns a count of total number of duplicate elements in the list

In Python, How can I get the next and previous key:value of a particular key in a dictionary?

Categories

Resources