duplicates function, adding

duplicates function, adding - python

Hello I am trying to develop a function to find duplicates in a list. Below is the code that I have obtained thus far. I am cannot seem to figure out how to get the code to correctly add the number of duplicated numbers.
import collections
myList = [5, 9, 14, 5, 2, 5, 1]
def find_duplicates(aList, target):
if target not in aList:
print (target, "occurred 0 times")
else:
n=0
print (target, "occurred",n+1,"times")
the output of the code shows:
find_duplicates(myList, 5)
5 occurred 1 times
Obviously I am missing something for the program to properly track how many times the value occurs? Can someone please help?
I am not allowed to use the count() or sort() built in functions.

To only count the number of duplicates, just iterate over the list, comparing each value. If you find a match, increment a counter, than report the counter. To make this better, I would return the count then print to console outside of the def.
import collections
def find_duplicates(aList, target):
n = 0
for obj in aList:
if obj is target:
n += 1
return n
myList = [5, 9, 14, 5, 2, 5, 1]
target = 5
num_dup = find_duplicates(myList, target)
print (target, "occurred", num_dup, "times")
This should echo out:
5 occurred 3 times
Or do this (with list.count(x)):
myList = [5, 9, 14, 5, 2, 5, 1]
target = 5
num_dup = myList.count(target)
print (target, "occurred", num_dup, "times")
This should echo out:
5 occurred 3 times

You forgot to increment n in your code, so it always print 1. I think your code should look like:
import collections
myList = [5, 9, 14, 5, 2, 5, 1]
def find_duplicates(aList, target):
if target not in aList:
print (target, "occurred 0 times")
else:
n= aList.count(5)
print (target, "occurred",n,"times")
without using count and reading the target from shell:
import collections
myList = [5, 9, 14, 5, 2, 5, 2]
def find_duplicates(aList, target):
result = 0
for item in aList:
if item == target:
result += 1
return result
try:
target = int(raw_input("Choose a number to find duplicates: ")) # for python 3.X use input instead of raw_input
res = find_duplicates(myList, target)
print (target, " occurred ", res, " times")
except:
print("Write a number, not anything else")
This works for integers, if you want to use floats, just change int(...) for float(...)

it's a simple case of using a dictionary. Check the following code:
def frequency(l):
counter = {}
for x in l:
counter[x] = counter.get(x, 0) + 1
return counter
It will iterate over the list, saving each element as a key to the counter dictionary. Note the special form counter.get(x, 0), it will return the value of counter[x] if x is already on the dict, else it will return zero.
Checking the results is a matter of using:
print(frequency(myList))
>>> {9: 1, 2: 1, 5: 3, 14: 1, 1: 1}
You can get the number of appearances of any member by inspecting the dictionary:
frq = frequency(myList)
print(frq[14])
>>> 1
print(frq[1])
>>> 1
Of course it's possible to write a wrapper:
def target_frequencty(target, my_list):
frq = frequencty(my_list)
return frq.get(target, 0)
Enjoy.

Related

Loops and Dictionary

How to loop in a list while using dictionaries and return the value that repeats the most, and if the values are repeated the same amount return that which is greater?
Here some context with code unfinished
def most_frequent(lst):
dict = {}
count, itm = 0, ''
for item in lst:
dict[item] = dict.get(item, 0) + 1
if dict[item] >= count:
count, itm = dict[item], item
return itm
#lst = ["a","b","b","c","a","c"]
lst = [2, 3, 2, 2, 1, 3, 3,1,1,1,1] #this should return 1
lst2 = [2, 3, 2, 2, 1, 3, 3] # should return 3
print(most_frequent(lst))

Here is a different way to go about it:
def most_frequent(lst):
# Simple check to ensure lst has something.
if not lst:
return -1
# Organize your data as: {number: count, ...}
dct = {}
for i in lst:
dct[i] = dct[i] + 1 if i in dct else 1
# Iterate through your data and create a list of all large elements.
large_list, large_count = [], 0
for num, count in dct.items():
if count > large_count:
large_count = count
large_list = [num]
elif count == large_count:
large_list.append(num)
# Return the largest element in the large_list list.
return max(large_list)
There are many other ways to solve this problem, including using filter and other built-ins, but this is intended to give you a working solution so that you can start thinking on how to possibly optimize it better.
Things to take out of this; always think:
How can I break this problem down into smaller parts?
How can I organize my data so that it is more useful and easier to manipulate?
What shortcuts can I use along the way to make this function easier/better/faster?

Your code produces the result as you describe in your question, i.e. 1. However, your question states that you want to consider the case where two list elements are co-equals in maximum occurrence and return the largest. Therefore, tracking and returning a single element doesn't satisfy this requirement. You need to compile the dict and then evaluate the result.
def most_frequent(lst):
dict = {}
for item in lst:
dict[item] = dict.get(item, 0) + 1
itm = sorted(dict.items(), key = lambda kv:(-kv[1], -kv[0]))
return itm[0]
#lst = ["a","b","b","c","a","c"]
lst = [2, 3, 2, 2, 2, 2, 1, 3, 3,1,1,1,1] #this should return 1
lst2 = [2, 3, 2, 2, 1, 3, 3] # should return 3
print(most_frequent(lst))
I edited the list 'lst' so that '1' and '2' both occur 5 times. The result returned is a tuple:
(2,5)

I reuse your idea which is quite neat, and I just modified your program a bit.
def get_most_frequent(lst):
counts = dict()
most_frequent = (None, 0) # (item, count)
ITEM_IDX = 0
COUNT_IDX = 1
for item in lst:
counts[item] = counts.get(item, 0) + 1
if most_frequent[ITEM_IDX] is None:
# first loop, most_frequent is "None"
most_frequent = (item, counts[item])
elif counts[item] > most_frequent[COUNT_IDX]:
# if current item's "counter" is bigger than the most_frequent's counter
most_frequent = (item, counts[item])
elif counts[item] == most_frequent[COUNT_IDX] and item > most_frequent[ITEM_IDX]:
# if the current item's "counter" is the same as the most_frequent's counter
most_frequent = (item, counts[item])
else:
pass # do nothing
return most_frequent
lst1 = [2, 3, 2, 2, 1, 3, 3,1,1,1,1, 2] # 1: 5 times
lst2 = [2, 3, 1, 3, 3, 2, 2] # 3: 3 times
lst3 = [1]
lst4 = []
print(get_most_frequent(lst1))
print(get_most_frequent(lst2))
print(get_most_frequent(lst3))
print(get_most_frequent(lst4))

Return the only element in the hash table in Python

I am working on a problem as:
In a non-empty array of integers, every number appears twice except for one, find that single number.
I tried to work it out by the hash table:
class Solution:
def singleNumber(self, array):
hash = {}
for i in array:
if i not in hash:
hash[i] = 0
hash[i] += 1
if hash[i] == 2:
del hash[i]
return hash.keys()
def main():
print(Solution().singleNumber([1, 4, 2, 1, 3, 2, 3]))
print(Solution().singleNumber([7, 9, 7]))
main()
returning the result as:
dict_keys([4])
dict_keys([9])
Process finished with exit code 0
I am not sure if there is any way that I can return only the number, e.g. 4 and 9. Thanks for your help.

Instead of return hash.keys() do return hash.popitem()[0] or return list(hash.keys())[0].
Of course this assumes that there is at least one pair in the hashmap. You can check for this using len(hash) > 0 before accessing the first element:
class Solution:
def singleNumber(self, array):
hash = {}
for i in array:
if i not in hash:
hash[i] = 0
hash[i] += 1
if hash[i] == 2:
del hash[i]
return hash.popitem()[0] if len(hash) > 0 else -1 # or throw an error

One solution that might be simpler is to use the .count method.
myList = [1, 4, 2, 1, 3, 2, 3]
non_repeating_numbers = []
for n in myList:
if myList.count(n) < 2:
non_repeating_numbers.append(n)
Applied to your code it could look something like this:
class Solution:
def singleNumber(self, array):
for n in array:
if array.count(n) < 2:
return n
def main():
print(Solution().singleNumber([1, 4, 2, 1, 3, 2, 3]))
print(Solution().singleNumber([7, 9, 7]))

'int' object is not subscriptable?

I'm doing an algorithm challenge on www.edabit.com, where you have a list of dice rolls, and:
if the number is 6, the next number on the list is amplified by a factor of 2
if the number is 1, the next number on the list is 0
this is my code:
def rolls(lst):
out = 0
iterate = 0
if lst[iterate] == 1:
out+=lst[iterate]
lst[iterate+1] = 0
iterate+=1
rolls(lst[iterate])
elif lst[iterate] == 6:
out+=lst[iterate]
lst[iterate+1] = lst[iterate+1]*2
iterate+=1
rolls(lst[iterate])
else:
out+=lst[iterate]
iterate+=1
The console gives me "TypeError: 'int' object is not subscriptable"
Any ideas? Also any other errors you spot would be useful.
I tried on other IDE's, but it gives the same output.
for a series like "1, 6, 2, 3, 2, 4, 5, 6, 2" I expect 27

As stated in the comments, for this kind of problem I wouldn't use recursion. A loop will be enough:
l = [1, 6, 2, 3, 2, 4, 5, 6, 2]
from itertools import accumulate
print(sum(accumulate([0] + l, lambda a, b: b*{1:0, 6:2}.get(a, 1))))
Prints:
27

If you have to use recursion for this problem, then you will need to pass two arguments first lst and second iterate. Note that lst[iterate] is a single element which you are passing to the function when calling it recursively.
Thus modify the function to take two arguments lst and iterate. And initially pass arguments as full list for lst and a 0 for iterate. rolls(lst, 0) should be your initial function call.
I suppose you want out variable to contain sum of all entries in lst when you visit them, so that also needs to be passed as an argument, making your initial call rolls(lst, 0, 0). I have edited the function to return the sum calculated in out accordingly.
def rolls(lst, iterate, out):
if iterate == len(lst):
return out
if lst[iterate] == 1:
out += lst[iterate]
if iterate + 1 < len(lst): #In order to avoid index out of bounds exception
lst[iterate + 1] = 0
rolls(lst, iterate + 1, out)
elif lst[iterate] == 6:
out += lst[iterate]
if iterate + 1 < len(lst): #In order to avoid index out of bounds exception
lst[iterate + 1] = lst[iterate + 1] * 2
rolls(lst, iterate + 1, out)
else:
out += lst[iterate]
rolls(lst, iterate+1, out)

Instead of looking to the next item, you can look at the previous item:
from itertools import islice
def rolls(lst):
if not lst:
return 0
total = prev = lst[0]
for x in islice(lst, 1, None):
if prev == 1:
x = 0
elif prev == 6:
x *= 2
prev = x
total += x
return total
For example:
>>> rolls([1, 6, 2, 3, 2, 4, 5, 6, 2])
27
>>> rolls([])
0
>>> rolls([1])
1
>>> rolls([2])
2
>>> rolls([3])
3
>>> rolls([4])
4
>>> rolls([6,1])
8
>>> rolls([6,2])
10
>>> rolls([6,1,5])
13

Python finding repeating sequence in list of integers?

I have a list of lists and each list has a repeating sequence. I'm trying to count the length of repeated sequence of integers in the list:
list_a = [111,0,3,1,111,0,3,1,111,0,3,1]
list_b = [67,4,67,4,67,4,67,4,2,9,0]
list_c = [1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18,10]
Which would return:
list_a count = 4 (for [111,0,3,1])
list_b count = 2 (for [67,4])
list_c count = 10 (for [1,2,3,4,5,6,7,8,9,0])
Any advice or tips would be welcome. I'm trying to work it out with re.compile right now but, its not quite right.

Guess the sequence length by iterating through guesses between 2 and half the sequence length. If no pattern is discovered, return 1 by default.
def guess_seq_len(seq):
guess = 1
max_len = len(seq) / 2
for x in range(2, max_len):
if seq[0:x] == seq[x:2*x] :
return x
return guess
list_a = [111,0,3,1,111,0,3,1,111,0,3,1]
list_b = [67,4,67,4,67,4,67,4,2,9,0]
list_c = [1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18,10]
print guess_seq_len(list_a)
print guess_seq_len(list_b)
print guess_seq_len(list_c)
print guess_seq_len(range(500)) # test of no repetition
This gives (as expected):
4
2
10
1
As requested, this alternative gives longest repeated sequence. Hence it will return 4 for list_b. The only change is guess = x instead of return x
def guess_seq_len(seq):
guess = 1
max_len = len(seq) / 2
for x in range(2, max_len):
if seq[0:x] == seq[x:2*x] :
guess = x
return guess

I took Maria's faster and more stackoverflow-compliant answer and made it find the largest sequence first:
def guess_seq_len(seq, verbose=False):
seq_len = 1
initial_item = seq[0]
butfirst_items = seq[1:]
if initial_item in butfirst_items:
first_match_idx = butfirst_items.index(initial_item)
if verbose:
print(f'"{initial_item}" was found at index 0 and index {first_match_idx}')
max_seq_len = min(len(seq) - first_match_idx, first_match_idx)
for seq_len in range(max_seq_len, 0, -1):
if seq[:seq_len] == seq[first_match_idx:first_match_idx+seq_len]:
if verbose:
print(f'A sequence length of {seq_len} was found at index {first_match_idx}')
break
return seq_len

This worked for me.
def repeated(L):
'''Reduce the input list to a list of all repeated integers in the list.'''
return [item for item in list(set(L)) if L.count(item) > 1]
def print_result(L, name):
'''Print the output for one list.'''
output = repeated(L)
print '%s count = %i (for %s)' % (name, len(output), output)
list_a = [111, 0, 3, 1, 111, 0, 3, 1, 111, 0, 3, 1]
list_b = [67, 4, 67, 4, 67, 4, 67, 4, 2, 9, 0]
list_c = [
1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 0, 23, 18, 10
]
print_result(list_a, 'list_a')
print_result(list_b, 'list_b')
print_result(list_c, 'list_c')
Python's set() function will transform a list to a set, a datatype that can only contain one of any given value, much like a set in algebra. I converted the input list to a set, and then back to a list, reducing the list to only its unique values. I then tested the original list for each of these values to see if it contained that value more than once. I returned a list of all of the duplicates. The rest of the code is just for demonstration purposes, to show that it works.
Edit: Syntax highlighting didn't like the apostrophe in my docstring.

How to optimize this Python code?

def maxVote(nLabels):
count = {}
maxList = []
maxCount = 0
for nLabel in nLabels:
if nLabel in count:
count[nLabel] += 1
else:
count[nLabel] = 1
#Check if the count is max
if count[nLabel] > maxCount:
maxCount = count[nLabel]
maxList = [nLabel,]
elif count[nLabel]==maxCount:
maxList.append(nLabel)
return random.choice(maxList)
nLabels contains a list of integers.
The above function returns the integer with highest frequency, if more than one have same frequency then a randomly selected integer from them is returned.
E.g. maxVote([1,3,4,5,5,5,3,12,11]) is 5

import random
import collections
def maxvote(nlabels):
cnt = collections.defaultdict(int)
for i in nlabels:
cnt[i] += 1
maxv = max(cnt.itervalues())
return random.choice([k for k,v in cnt.iteritems() if v == maxv])
print maxvote([1,3,4,5,5,5,3,3,11])

In Python 3.1 or future 2.7 you'd be able to use Counter:
>>> from collections import Counter
>>> Counter([1,3,4,5,5,5,3,12,11]).most_common(1)
[(5, 3)]
If you don't have access to those versions of Python you could do:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in nLabels:
d[i] += 1
>>> max(d, key=lambda x: d[x])
5

It appears to run in O(n) time. However there may be a bottleneck in checking if nLabel in count since this operation could also potentially run O(n) time as well, making the total efficiency O(n^2).
Using a dictionary instead of a list in this case is the only major efficiency boost I can spot.

I'm not sure what exactly you want to optimize, but this should work:
from collections import defaultdict
def maxVote(nLabels):
count = defaultdict(int)
for nLabel in nLabels:
count[nLabel] += 1
maxCount = max(count.itervalues())
maxList = [k for k in count if count[k] == maxCount]
return random.choice(maxList)

Idea 1
Does the return really need to be random, or can you just return a maximum? If you just need to nondeterministically return a max frequency, you could just store a single label and remove the list logic, including
elif count[nLabel]==maxCount:
maxList.append(nLabel)
Idea 2
If this method is called frequently, would it be possible to only work on new data, as opposed to the entire data set? You could cache your count map and then only process new data. Assuming your data set is large and the calculations are done online, this could net huge improvements.

Complete example:
#!/usr/bin/env python
def max_vote(l):
"""
Return the element with the (or a) maximum frequency in ``l``.
"""
unsorted = [(a, l.count(a)) for a in set(l)]
return sorted(unsorted, key=lambda x: x[1]).pop()[0]
if __name__ == '__main__':
votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]
print max_vote(votes)
# => 5
Benchmarks:
#!/usr/bin/env python
import random
import collections
def max_vote_2(l):
"""
Return the element with the (or a) maximum frequency in ``l``.
"""
unsorted = [(a, l.count(a)) for a in set(l)]
return sorted(unsorted, key=lambda x: x[1]).pop()[0]
def max_vote_1(nlabels):
cnt = collections.defaultdict(int)
for i in nlabels:
cnt[i] += 1
maxv = max(cnt.itervalues())
return random.choice([k for k,v in cnt.iteritems() if v == maxv])
if __name__ == '__main__':
from timeit import Timer
votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]
print max_vote_1(votes)
print max_vote_2(votes)
t = Timer("votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]; max_vote_2(votes)", \
"from __main__ import max_vote_2")
print 'max_vote_2', t.timeit(number=100000)
t = Timer("votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]; max_vote_1(votes)", \
"from __main__ import max_vote_1")
print 'max_vote_1', t.timeit(number=100000)
Yields:
5
5
max_vote_2 1.79455208778
max_vote_1 2.31705093384

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

duplicates function, adding - python

Related

Loops and Dictionary

Return the only element in the hash table in Python

'int' object is not subscriptable?

Python finding repeating sequence in list of integers?

How to optimize this Python code?

Categories

Resources