Trying to find majority element in a list - python

I'm writing a function to find a majority in a Python list.
Thinking that if I can write a hash function that can map every element to a single slot in the new array or to a unique identifier, perhaps for a dictionary, that should be the best and it should be undoable. I am not sure how to progress. My hash function is obviously useless, any tips on what I can/should do, or if this is even a reasonable approach?
def find_majority(k):
def hash_it(q):
return q
map_of = [0]*len(k)
for i in k:
mapped_to = hash_it(i) #hash function
map_of[mapped_to]+=1
find_majority([1,2,3,4,3,3,2,4,5,6,1,2,3,4,5,1,2,3,4,6,5])

Python has a built-in class called Counter that will do this for you.
>>> from collections import Counter
>>> c = Counter([1,2,3,4,3,3,2,4,5,6,1,2,3,4,5,1,2,3,4,6,5])
>>> c.most_common()
[(3, 5), (2, 4), (4, 4), (1, 3), (5, 3), (6, 2)]
>>> value, count = c.most_common()[0]
>>> print value
3
See the docs.
http://docs.python.org/2/library/collections.html#collections.Counter

There is an easy way to realize like this
l = [1,2,3,4,3,3,2,4,5,6,1,2,3,4,5,1,2,3,4,6,5]
print(max(set(l), key = l.count)) # 3

I think your approach is to use another array as big as k as your "hash map". If k is huge but the number of unique elements is not so huge, you would be wasting a lot of space. Furthermore, to find the majority, you would have to loop through your map_of hashmap/array to find the max.
On the other hand, a dictionary/set (where hashing is not your concern, and the underlying array structure will probably be more compact for average cases) seems a little more appropriate. Needless to say, with the occurring elements as keys and their occurrences as values, you can find what you want in one single iteration.
So, something like:
def find_majority(k):
myMap = {}
maximum = ( '', 0 ) # (occurring element, occurrences)
for n in k:
if n in myMap: myMap[n] += 1
else: myMap[n] = 1
# Keep track of maximum on the go
if myMap[n] > maximum[1]: maximum = (n,myMap[n])
return maximum
And as expected, we get what we want.
>>> find_majority([1,2,3,4,3,3,2,4,5,6,1,2,3,4,5,1,2,3,4,6,5])
(3, 5)
Of course, Counters and other cool modules will let you do what you want in finer syntax.

Related

Python - how to take the max length from a value of dictionary? Without lambda

Let us say I have this function:
def frequent_base(self):
dict = {}
for i in range(len(self.items)):
if self.items[i].base not in dict:
dict[self.items[i].base] = [(self.items[i].value)]
else:
dict[self.items[i].base] += [(self.items[i].base)]
return max(len(dict[self.items]), key=len(d))
Now, I can make it complicated and build a function which returns me index and such..
but it is bad coding and bad habit and takes long time ( especially in a test ).
How do I take the length?
let us say I have:
key1 with length 3 of value ( key1 has 3 values )
key2 with length 4 of value ( key 2 has 4 values )
key3 with length 2 of value ( key 3 has 2 values )
How do I take, not the key itself, not the value itself, but the len of values of key? which is 4 in this case.
or how do I take the key itself and then say length of value of that key? But I want to use Max function, I need to understand how to use that function good, with the key.
I will write and make myself super clear:
dict[1] = [1,2,3]
dict[2] = [1,2,3,4,5]
dict[3] = [1,2,3,7,8,9,10]
dict = {1: [1,2,3], 2:[1,2,3,4,5], 3:[1,2,3,7,8,9,10]}
I wish to return not dict[3], not 3, not the list of dict[3] it self.
I wish to return the length of the dict[3], which is 7
def frequent_base(self):
dict = {}
for i in range(len(self.items)):
if self.items[i].base not in dict:
dict[self.items[i].base] = [(self.items[i].value)]
else:
dict[self.items[i].base] += [(self.items[i].base)]
def key_for_len(dictionary):
return dictionary[1]
return max(dict.items(), key= key_for_len)
I am received error
Only thing you seem to need is maximal length amongst the values of your dictionary. You can easily get all the values using d.values() (d.items() would give you (key, value) tuples, which are harder to compare). Now we can easily calculate lengths of each value with generator comprehension (very much like list comprehension, which would also work) - len(v) for v in d.values(). Now we have an iterable with all the lengths, so it's just a matter of finding the maximum -
max(len(v) for v in d.values())
Should you need to get key or value with maximum length, we'd have to take a slightly different approach - we can use max key = argument to specify how we decide which element in the iterable is maximal - it is obvious when we are trying to get a maximum from few numbers, but not when we try to decide if (1, 3) is bigger than (2, 2) - in such case, we need to create a function that maps our items to easily comparable things like a number. In this case, we'd have tuples of (key, value) and we are comparing length of value - thus our function would be
def lenOfValue(kv):
return len(kv[1]) # kv[1] - 2nd element of a (key, value) tuple
code(1)
Then we pass that to max:
print(max(d.items(), key = lenOfValue))
And we get (2, [3, 4, 5, 6])
Bonus: lambda
Lambdas can be used here, which are really just a shorthand that lets us skip defining whole another function that we will probably never use again.
This code would be functionally exactly the same.
print(max(d.items(), key = lambda kv: len(kv[1])))
code(2)
Lambdas are nothing very complicated, being just a notation for creating simple, one-liner functions without all the bother of a def block, return etc.
Because Python's functions are objects like nearly anything else, this piece of code:
lenOfValue = lambda kv: len(kv[1])
really is in no way different that our previously used more lengthy definition of:
def lenOfValue(kv):
return len(kv[1])
It saves us few words and shows us the middle step between code(1) and code(2).
a_dict = {'some_key':[67,30,10], 'another_key':[87]}
max({ (k,len(v)) for k,v in a_dict.items() })
('some_key', 3)

Python sorting function

em...I feel confused about this code, I am really new to coding, and wonder anyone can help:
def takeSecond(elem):# If I pass the argument as a list, why don't I need to specify it here?
return elem[1] # what return elem[1] means,does it return the y in (x,y)? but why!?
random = [(2, 2), (3, 4), (4, 1), (1, 3)]
random.sort(key=takeSecond) # key = takeSecond what is this even mean lol?..
print('Sorted list:', random)
Let's say you have a list of items, which contains:
list = [a,b,c]
If you use the following piece of code:
def func():
return list[1]
You'd get "b" as output. The number "1" is not the first element, it's the second element. The counting in python starts at 0. So basically, if you want to access "a", you have to use:
def func():
return list[0]
Because 0 = a, 1 = b, 2 = c, and so on. If you change return elem[1] to return elem[0] you get the variable x.
Here your trying to sort based on second value in given list
takeSecond function is always returns the second value in the given element.
In random.sort() method calling with your passing key parameter as takeSecond function, which means sorting perform based on that element it happen. In our case it will return second element. So sorting is perform based on second element
And then printing the sorted list

iterate over list of tuples in two notations

I'm iterating over a list of tuples, and was just wondering if there is a smaller notation to do the following:
for tuple in list:
(a,b,c,d,e) = tuple
or the equivalent
for (a,b,c,d,e) in list:
tuple = (a,b,c,d,e)
Both of these snippits allow me to access the tuple per item as well as as a whole. But is there a notation that somehow combines the two lines into the for-statement? It seems like such a Pythonesque feature that I figured it might exist in some shape or form.
The pythonic way is the first option you menioned:
for tup in list:
a,b,c,d,e = tup
This might be a hack that you could use. There might be a better way, but that's why it's a hack. Your examples are all fine and that's how I would certainly do it.
>>> list1 = [(1, 2, 3, 4, 5)]
>>> for (a, b, c, d, e), tup in zip(list1, list1):
print a, b, c, d, e
print tup
1 2 3 4 5
(1, 2, 3, 4, 5)
Also, please don't use tuple as a variable name.
There isn't anything really built into Python that lets you do this, because the vast majority of the time, you only need to access the tuple one way or the other: either as a tuple or as separate elements. In any case, something like
for t in the_list:
a,b,c,d,e = t
seems pretty clean, and I can't imagine there'd be any good reason to want it more condensed than that. That's what I do on the rare occasions that I need this sort of access.
If you just need to get at one or two elements of the tuple, say perhaps c and e only, and you don't need to use them repeatedly, you can access them as t[2] and t[4]. That reduces the number of variables in your code, which might make it a bit more readable.

Data structure that sorts elements by value on insert in Python

I need a queue structure that sorts elements (id, value) by value on insert. Also, I need to be able to remove the element with the highest value. I don't need this structure to be thread- safe. In Java, this would, I guess, correspond to PriorirtyQueue.
What structure should I use in Python? Could you provide a single toy example?
Python has something similar (which is really a thread-safe wrapper for heapq):
from Queue import PriorityQueue
q = PriorityQueue()
q.put((-1, 'foo'))
q.put((-3, 'bar'))
q.put((-2, 'baz'))
Instead of the largest, you can get the lowest number with q.get():
>>> q.get()
(-3, 'bar')
If you don't like negatives, you can override the _get method:
class PositivePriorityQueue(PriorityQueue):
def _get(self, heappop=max):
return heappop(self.queue)
You can use the heapq module.
From docs:
This module provides an implementation of the heap queue algorithm,
also known as the priority queue algorithm.
heapq uses a priority queue, but it's a minimum heap so you will need to make the value negative. Also you will need to put the id second since sorting is done from left to right.
>>> import heapq
>>> queue = []
>>> heapq.heappush(queue, (-1, 'a'))
>>> heapq.heappush(queue, (-2, 'a'))
>>> heapq.heappop(queue)
(-2, 'a')
I think what you're looking for can be found in the heapq library. From http://docs.python.org/2/library/heapq.html :
Heap elements can be tuples. This is useful for assigning comparison values (such as task priorities) alongside the main record being tracked:
>>> import heapq
>>>
>>> h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')
Is this the desired behavior?

transitive closure python tuples

Does anyone know if there's a python builtin for computing transitive closure of tuples?
I have tuples of the form (1,2),(2,3),(3,4) and I'm trying to get (1,2),(2,3),(3,4),(1,3)(2,4)
Thanks.
There's no builtin for transitive closures.
They're quite simple to implement though.
Here's my take on it:
def transitive_closure(a):
closure = set(a)
while True:
new_relations = set((x,w) for x,y in closure for q,w in closure if q == y)
closure_until_now = closure | new_relations
if closure_until_now == closure:
break
closure = closure_until_now
return closure
call:
transitive_closure([(1,2),(2,3),(3,4)])
result:
set([(1, 2), (1, 3), (1, 4), (2, 3), (3, 4), (2, 4)])
call:
transitive_closure([(1,2),(2,1)])
result:
set([(1, 2), (1, 1), (2, 1), (2, 2)])
Just a quick attempt:
def transitive_closure(elements):
elements = set([(x,y) if x < y else (y,x) for x,y in elements])
relations = {}
for x,y in elements:
if x not in relations:
relations[x] = []
relations[x].append(y)
closure = set()
def build_closure(n):
def f(k):
for y in relations.get(k, []):
closure.add((n, y))
f(y)
f(n)
for k in relations.keys():
build_closure(k)
return closure
Executing it, we'll get
In [3]: transitive_closure([(1,2),(2,3),(3,4)])
Out[3]: set([(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)])
We can perform the "closure" operation from a given "start node" by repeatedly taking a union of "graph edges" from the current "endpoints" until no new endpoints are found. We need to do this at most (number of nodes - 1) times, since this is the maximum length of a path. (Doing things this way avoids getting stuck in infinite recursion if there is a cycle; it will waste iterations in the general case, but avoids the work of checking whether we are done i.e. that no changes were made in a given iteration.)
from collections import defaultdict
def transitive_closure(elements):
edges = defaultdict(set)
# map from first element of input tuples to "reachable" second elements
for x, y in elements: edges[x].add(y)
for _ in range(len(elements) - 1):
edges = defaultdict(set, (
(k, v.union(*(edges[i] for i in v)))
for (k, v) in edges.items()
))
return set((k, i) for (k, v) in edges.items() for i in v)
(I actually tested it for once ;) )
Suboptimal, but conceptually simple solution:
def transitive_closure(a):
closure = set()
for x, _ in a:
closure |= set((x, y) for y in dfs(x, a))
return closure
def dfs(x, a):
"""Yields single elements from a in depth-first order, starting from x"""
for y in [y for w, y in a if w == x]:
yield y
for z in dfs(y, a):
yield z
This won't work when there's a cycle in the relation, i.e. a reflexive point.
Here's one essentially the same as the one from #soulcheck that works on adjacency lists rather than edge lists:
def inplace_transitive_closure(g):
"""g is an adjacency list graph implemented as a dict of sets"""
done = False
while not done:
done = True
for v0, v1s in g.items():
old_len = len(v1s)
for v2s in [g[v1] for v1 in v1s]:
v1s |= v2s
done = done and len(v1s) == old_len
If you have a lot of tupels (more than 5000), you might want to consider using the scipy code for matrix powers (see also http://www.ics.uci.edu/~irani/w15-6B/BoardNotes/MatrixMultiplication.pdf)
from scipy.sparse import csr_matrix as csr
def get_closure(tups):
index2id = list(set([tup[0] for tup in tups]) | set([tup[1] for tup in tups]));
id2index = {index2id[i]:i for i in xrange(len(index2id))};
tups_re = tups + [(index2id[i],index2id[i],) for i in xrange(len(index2id))]; # Unfortunately you have to make the relation reflexive first - you could also add the diagonal to M
M = csr( ([True for tup in tups_re],([id2index[tup[0]] for tup in tups_re],[id2index[tup[1]] for tup in tups_re])),shape=(len(index2id),len(index2id)),dtype=bool);
M_ = M**n; # n is maximum path length of your relation
temp = M_.nonzero();
#TODO: You might want to remove the added reflexivity tupels again
return [(index2id[temp[0][i]],index2id[temp[1][i]],) for i in xrange(len(temp[0]))];
In the best case, you can choose n wisely if you know a bit about your relation/graph -- that is how long the longest path can be. Otherwise you have to choose M.shape[0], which might blow up in your face.
This detour also has its limits, in particular you should be sure than the closure does not get too large (the connectivity is not too strong), but you would have the same problem in the python implementation.
You can create a graph from those tuples then use connnected components algorithm from the created graph. Networkx is library that supports connnected components algorithm.

Categories