Count the duplicates in a list of tuples

Count the duplicates in a list of tuples - python

I have a list of tuples: a = [(1,2),(1,4),(1,2),(6,7),(2,9)] I want to check if one of the individual elements of each tuple matches the same position/element in another tuple, and how many times this occurs.
For example: If only the 1st element in some tuples has a duplicate, return the tuple and how many times it's duplicated.
I can do that with the following code:
a = [(1,2), (1,4), (1,2), (6,7), (2,9)]
coll_list = []
for t in a:
coll_cnt = 0
for b in a:
if b[0] == t[0]:
coll_cnt = coll_cnt + 1
print "%s,%d" %(t,coll_cnt)
coll_list.append((t,coll_cnt))
print coll_list
I want to know if there is a more effective way to do this?

You can use a Counter
from collections import Counter
a = [(1,2),(1,4),(1,2),(6,7),(2,9)]
counter=Counter(a)
print counter
This will output:
Counter({(1, 2): 2, (6, 7): 1, (2, 9): 1, (1, 4): 1})
It is a dictionary like object with the item (tuples in this case) as the key and a value containing the number of times that key was seen. Your (1,2) tuple is seen twice, while all others are only seen once.
>>> counter[(1,2)]
2
If you are interested in each individual portion of the tuple, you can utilize the same logic for each element in the tuple.
first_element = Counter([x for (x,y) in a])
second_element = Counter([y for (x,y) in a])
first_element and second_element now contain a Counter of the number of times values are seen per element in the tuple
>>> first_element
Counter({1: 3, 2: 1, 6: 1})
>>> second_element
Counter({2: 2, 9: 1, 4: 1, 7: 1})
Again, these are dictionary like objects, so you can check how frequent a specific value appeared directly:
>>> first_element[2]
1
In the first element of your list of tuples, the value 2 appeared 1 time.

use collections library. In the following code val_1, val_2 give you duplicates of each first elements and second elements of the tuples respectively.
import collections
val_1=collections.Counter([x for (x,y) in a])
val_2=collections.Counter([y for (x,y) in a])
>>> print val_1
<<< Counter({1: 3, 2: 1, 6: 1})
This is the number of occurrences of the first element of each tuple
>>> print val_2
<<< Counter({2: 2, 9: 1, 4: 1, 7: 1})
This is the number of occurrences of the second element of each tuple

You can make count_map, and store the count of each tuple as the value.
>>> count_map = {}
>>> for t in a:
... count_map[t] = count_map.get(t, 0) +1
...
>>> count_map
{(1, 2): 2, (6, 7): 1, (2, 9): 1, (1, 4): 1}

Using pandas this is simple and very fast:
import pandas
print(pandas.Series(data=[(1,2),(1,4),(1,2),(6,7),(2,9)]).value_counts())
(1, 2) 2
(1, 4) 1
(6, 7) 1
(2, 9) 1
dtype: int64

Maybe Dictionary can work better. Because in your code, you are traveling the list for twice. And this makes the complexity of your code O(n^2). And this is not a good thing :)
Best way is the travelling for once and to use 1 or 2 conditions for each traverse. Here is the my first solution for such kind of problem.
a = [(1,2),(1,4),(1,2),(6,7),(2,9)]
dict = {}
for (i,j) in a:
if dict.has_key(i):
dict[i] += 1
else:
dict[i] = 1
print dict
For this code, this will give the output:
{1: 3, 2: 1, 6: 1}
I hope it will be helpful.

Related

What's a more Pythonic way of grabbing N items from a dictionary?

In Python, suppose I want to grab N arbitrary items from a dictionary—say, to print them, to inspect a few items. I don't care which items I get. I don't want to turn the dictionary into a list (as does some code I have seen); that seems like a waste. I can do it with the following code (where N = 5), but it seems like there has to be a more Pythonic way:
count = 0
for item in my_dict.items():
if count >= 5:
break
print(item)
count += 1
Thanks in advance!

You can use itertools.islice to slice any iterable (not only lists):
>>> import itertools
>>> my_dict = {i: i for i in range(10)}
>>> list(itertools.islice(my_dict.items(), 5))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]

I might use zip and range:
>>> my_dict = {i: i for i in range(10)}
>>> for _, item in zip(range(5), my_dict.items()):
... print(item)
...
(0, 0)
(1, 1)
(2, 2)
(3, 3)
(4, 4)
The only purpose of the range here is to give an iterable that will cause zip to stop after 5 iterations.

You can modify what you have slightly:
for count, item in enumerate(dict.items()):
if count >= 5:
break
print(item)
Note: in this case when you're looping through .items(), you're getting a key/value pair, which can be unpacked as you iterate:
for count, (key, value) in enumerate(dict.items()):
if count >= 5:
break
print(f"{key=} {value=})
If you want just the keys, you can just iterate over the dict.
for count, key in enumerate(dict):
if count >= 5:
break
print(f"{key=})
If you want just the values:
for count, value in enumerate(dict.values()):
if count >= 5:
break
print(f"{value=})
And last note: using dict as a variable name overwrites the built in dict and makes it unavailable in your code.

Typically, I would like to use slice notation to do this, but dict.items() returns an iterator, which is not slicable.
You have two main options:
Make it something that slice notation works on:
x = {'a':1, 'b':2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
for item, index in list(x.items())[:5]:
print(item)
Use something that works on iterators. In this case, the built-in (and exceedingly popular) itertools package:
import itertools
x = {'a':1, 'b':2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
for item in itertools.islice(x.items(), 5):
print(item)

Given a list of numbers, how many different ways can you add them together to get a sum S?

Given a list of numbers, how many different ways can you add them together to get a sum S?
Example:
list = [1, 2]
S = 5
1) 1+1+1+1+1 = 5
2) 1+1+1+2 = 5
3) 1+2+2 = 5
4) 2+1+1+1 = 5
5) 2+2+1 = 5
6) 1+2+1+1 = 5
7) 1+1+2+1 = 5
8) 2+1+2 = 5
Answer = 8
This is what I've tried, but it only outputs 3 as the answer
lst = [1, 2]
i = 1
result = 0
while i <= 5:
s_list = [sum(comb) for comb in combinations_with_replacement(lst, i)]
for val in s_list:
if val == 5:
result += 1
i+= 1
print(result)
However, this outputs three. I believe it outputs three because it doesn't account for the different order you can add the numbers in. Any ideas on how to solve this.
The problem should work for much larger data: however, I give this simple example to give the general idea.

Using both itertools.combinations_with_replacement and permutations:
import itertools
l = [1,2]
s = 5
res = []
for i in range(1, s+1):
for tup in itertools.combinations_with_replacement(l, i):
if sum(tup) == s:
res.extend(list(itertools.permutations(tup, i)))
res = list(set(res))
print(res)
[(1, 2, 2),
(2, 2, 1),
(1, 1, 2, 1),
(1, 2, 1, 1),
(2, 1, 1, 1),
(1, 1, 1, 2),
(2, 1, 2),
(1, 1, 1, 1, 1)]
print(len(res))
# 8

How about using dynamic programming? I believe it's more easy to understand and can be implemented easily.
def cal(target, choices, record):
min_choice = min(choices)
if min_choice > target:
return False
for i in range(0, target+1):
if i == 0:
record.append(1)
elif i < min_choice:
record.append(0)
elif i == min_choice:
record.append(1)
else:
num_solution = 0
j = 0
while j < len(choices) and i-choices[j] >= 0:
num_solution += record[i-choices[j]]
j += 1
record.append(num_solution)
choices = [1, 2]
record = []
cal(5, choices, record)
print(record)
print(f"Answer:{record[-1]}")
The core idea here is using an extra record array to record how many ways can be found to get current num, e.g. record[2] = 2 means we can use to ways to get a sum of 2 (1+1 or 2).
And we have record[target] = sum(record[target-choices[i]]) where i iterates over choices. Try to think, the way of getting sum=5 must be related with the way of getting sum=4 and so on.

Use Dynamic Programming.
We suppose that your list consists of [1,2,5] so we have this recursive function :
f(n,[1,2,5]) = f(n-1,[1,2,5]) + f(n-2,[1,2,5]) + f(n-5,[1,2,5])
Because if the first number in sum is 1 then you have f(n-1,[1,2,5]) options for the rest and if it is 2 you have f(n-2,[1,2,5]) option for the rest and so on ...
so start from f(1) and work your way up with Dynamic programming. this solution in the worst case is O(n^2) and this happens when your list has O(n) items.
Solution would be something like this:
answers = []
lst = [1,2]
number = 5
def f(target):
val = 0
for i in lst: #O(lst.count())
current = target - i
if current > 0:
val += answers[current-1]
if lst.__contains__(target): #O(lst.count())
val += 1
answers.insert(target,val)
j = 1;
while j<=number: #O(n) for while loop
f(j)
j+=1
print(answers[number-1])
here is a working version.

You'd want to use recursion to traverse through each possibility for each stage of addition, and pass back the numbers used once you've reached a number that is equal to the expected.
def find_addend_combinations(sum_value, addend_choices, base=0, history=None):
if history is None: history = []
if base == sum_value:
return tuple(history)
elif base > sum_value:
return None
else:
results = []
for v in addend_choices:
r = find_addend_combinations(sum_value, addend_choices, base + v,
history + [v])
if isinstance(r, tuple):
results.append(r)
elif isinstance(r, list):
results.extend(r)
return results
You could write the last part a list comprehension but I think this way is clearer.

Combinations with the elements in a different order are considered to equivalent. For example, #3 and #5 from your list of summations are considered equivalent if you are only talking about combinations.
In contrast, permutations consider the two collections unique if they are comprised of the same elements in a different order.
To get the answer you are looking for you need to combine both concepts.
First, use your technique to find combinations that meet your criteria
Next, permute the collection of number from the combination
Finally, collect the generated permutations in a set to remove duplicates.
[ins] In [01]: def combination_generator(numbers, k, target):
...: assert k > 0, "Must be a positive number; 'k = {}".format(k)
...: assert len(numbers) > 0, "List of numbers must have at least one element"
...:
...: for candidate in (
...: {'numbers': combination, 'sum': sum(combination)}
...: for num_elements in range(1, k + 1)
...: for combination in itertools.combinations_with_replacement(numbers, num_elements)
...: ):
...: if candidate['sum'] != target:
...: continue
...: for permuted_candidate in itertools.permutations(candidate['numbers']):
...: yield permuted_candidate
...:
[ins] In [02]: {candidate for candidate in combination_generator([1, 2], 5, 5)}
Out[02]:
{(1, 1, 1, 1, 1),
(1, 1, 1, 2),
(1, 1, 2, 1),
(1, 2, 1, 1),
(1, 2, 2),
(2, 1, 1, 1),
(2, 1, 2),
(2, 2, 1)}

How to isolate specific rows of a Cartesian Product - Python

I've created a cartesian product and printed from the following code:
A = [0, 3, 5]
B = [5, 10, 15]
C = product(A, B)
for n in C:
print(n)
And we have a result of
(0, 5)
(0, 10)
(0, 15)
(3, 5)
(3, 10)
(3, 15)
(5, 5)
(5, 10)
(5, 15)
Is it possible to check the sum of each set within the cartesian product to a target value of 10? Can we then return or set aside the sets that match?
Result should be:
(0, 10)
(5, 5)
Finally, how can I count the frequency of numbers in my resulting subset to show that 0 appeared once, 10 appeared once, and 5 appeared twice? I would appreciate any feedback or guidance.

You can use a list comprehension like this:
C = [p for p in product(A, B) if sum(p) == 10]
And to count the frequency of the numbers in the tuples, as #Amadan suggests, you can use collections.Counter after using itertools.chain.from_iterable to chain the sub-lists into one sequence:
from collections import Counter
from itertools import chain
Counter(chain.from_iterable(C))
which returns, given your sample input:
Counter({5: 2, 0: 1, 10: 1})
which is an instance of a dict subclass, so you can iterate over its items for output:
for n, freq in Counter({5: 2, 0: 1, 10: 1}).items():
print('{}: {}'.format(n, freq))

Python: is index() buggy at all?

I'm working through this thing on pyschools and it has me mystified.
Here's the code:
def convertVector(numbers):
totes = []
for i in numbers:
if i!= 0:
totes.append((numbers.index(i),i))
return dict((totes))
Its supposed to take a 'sparse vector' as input (ex: [1, 0, 1 , 0, 2, 0, 1, 0, 0, 1, 0])
and return a dict mapping non-zero entries to their index.
so a dict with 0:1, 2:1, etc where x is the non zero item in the list and y is its index.
So for the example number it wants this: {0: 1, 9: 1, 2: 1, 4: 2, 6: 1}
but instead gives me this: {0: 1, 4: 2} (before its turned to a dict it looks like this:
[(0, 1), (0, 1), (4, 2), (0, 1), (0, 1)]
My plan is for i to iterate through numbers, create a tuple of that number and its index, and then turn that into a dict. The code seems straightforward, I'm at a loss.
It just looks to me like numbers.index(i) is not returning the index, but instead returning some other, unsuspected number.
Is my understanding of index() defective? Are there known index issues?
Any ideas?

index() only returns the first:
>>> a = [1,2,3,3]
>>> help(a.index)
Help on built-in function index:
index(...)
L.index(value, [start, [stop]]) -> integer -- return first index of value.
Raises ValueError if the value is not present.
If you want both the number and the index, you can take advantage of enumerate:
>>> for i, n in enumerate([10,5,30]):
... print i,n
...
0 10
1 5
2 30
and modify your code appropriately:
def convertVector(numbers):
totes = []
for i, number in enumerate(numbers):
if number != 0:
totes.append((i, number))
return dict((totes))
which produces
>>> convertVector([1, 0, 1 , 0, 2, 0, 1, 0, 0, 1, 0])
{0: 1, 9: 1, 2: 1, 4: 2, 6: 1}
[Although, as someone pointed out though I can't find it now, it'd be easier to write totes = {} and assign to it directly using totes[i] = number than go via a list.]

What you're trying to do, it could be done in one line:
>>> dict((index,num) for index,num in enumerate(numbers) if num != 0)
{0: 1, 2: 1, 4: 2, 6: 1, 9: 1}

Yes your understanding of list.index is incorrect. It finds the position of the first item in the list which compares equal with the argument.
To get the index of the current item, you want to iterate over with enumerate:
for index, item in enumerate(iterable):
# blah blah

The problem is that .index() looks for the first occurence of a certain argument. So for your example it always returns 0 if you run it with argument 1.
You could make use of the built in enumerate function like this:
for index, value in enumerate(numbers):
if value != 0:
totes.append((index, value))

Check the documentation for index:
Return the index in the list of the first item whose value is x. It is
an error if there is no such item.
According to this definition, the following code appends, for each value in numbers a tuple made of the value and the first position of this value in the whole list.
totes = []
for i in numbers:
if i!= 0:
totes.append((numbers.index(i),i))
The result in the totes list is correct: [(0, 1), (0, 1), (4, 2), (0, 1), (0, 1)].
When turning it into again, again, the result is correct, since for each possible value, you get the position of its first occurrence in the original list.
You would get the result you want using i as the index instead:
result = {}
for i in range(len(numbers)):
if numbers[i] != 0:
result[i] = numbers[i]

index() returns the index of the first occurrence of the item in the list. Your list has duplicates which is the cause of your confusion. So index(1) will always return 0. You can't expect it to know which of the many instances of 1 you are looking for.
I would write it like this:
totes = {}
for i, num in enumerate(numbers):
if num != 0:
totes[i] = num
and avoid the intermediate list altogether.

Riffing on #DSM:
def convertVector(numbers):
return dict((i, number) for i, number in enumerate(numbers) if number)
Or, on re-reading, as #Rik Poggi actually suggests.

Is there a way in Python to index a list of containers (tuples, lists, dictionaries) by an element of a container?

I have been poking around for a recipe / example to index a list of tuples without taking a modification of the decorate, sort, undecorate approach.
For example:
l=[(a,b,c),(x,c,b),(z,c,b),(z,c,d),(a,d,d),(x,d,c) . . .]
The approach I have been using is to build a dictionary using defaultdict of the second element
from collections import defaultdict
tdict=defaultdict(int)
for myTuple in l:
tdict[myTuple[1]]+=1
Then I have to build a list consisting of only the second item in the tuple for each item in the list. While there are a number of ways to get there a simple approach is to:
tempList=[myTuple[1] for myTuple in l]
and then generate an index of each item in tdict
indexDict=defaultdict(dict)
for key in tdict:
indexDict[key]['index']=tempList.index(key)
Clearly this does not seem very Pythonic. I have been trying to find examples or insights thinking that I should be able to use something magical to get the index directly. No such luck so far.
Note, I understand that I can take my approach a little more directly and not generating tdict.
output could be a dictionary with the index
indexDict={'b':{'index':0},'c':{'index':1},'d':{'index':4},. . .}
After learning a lot from Nadia's responses I think the answer is no.
While her response works I think it is more complicated than needed. I would simply
def build_index(someList):
indexDict={}
for item in enumerate(someList):
if item[1][1] not in indexDict:
indexDict[item[1][1]]=item[0]
return indexDict

This will generate the result you want
dict((myTuple[1], index) for index, myTuple in enumerate(l))
>>> l = [(1, 2, 3), (4, 5, 6), (1, 4, 6)]
>>> dict((myTuple[1], index) for index, myTuple in enumerate(l))
{2: 0, 4: 2, 5: 1}
And if you insist on using a dictionary to represent the index:
dict((myTuple[1], {'index': index}) for index, myTuple in enumerate(l))
The result will be:
{2: {'index': 0}, 4: {'index': 2}, 5: {'index': 1}}
EDIT
If you want to handle key collision then you'll have to extend the solution like this:
def build_index(l):
indexes = [(myTuple[1], index) for index, myTuple in enumerate(l)]
d = {}
for e, index in indexes:
d[e] = min(index, d.get(e, index))
return d
>>> l = [(1, 2, 3), (4, 5, 6), (1, 4, 6), (2, 4, 6)]
>>> build_index(l)
{2: 0, 4: 2, 5: 1}
EDIT 2
And a more generalized and compact solution (in a similar definition to sorted)
def index(l, key):
d = {}
for index, myTuple in enumerate(l):
d[key(myTuple)] = min(index, d.get(key(myTuple), index))
return d
>>> index(l, lambda a: a[1])
{2: 0, 4: 2, 5: 1}
So the answer to your question is yes: There is a way in Python to index a list of containers (tuples, lists, dictionaries) by an element of a container without preprocessing. But your request of storing the result in a dictionary makes it impossible to be a one liner. But there is no preprocessing here. The list is iterated only once.

If i think this is what you're asking...
l = ['asd', 'asdxzc']
d = {}
for i, x in enumerate(l):
d[x] = {'index': i}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count the duplicates in a list of tuples - python

You can make count_map, and store the count of each tuple as the value. >>> count_map = {} >>> for t in a: ... count_map[t] = count_map.get(t, 0) +1 ... >>> count_map {(1, 2): 2, (6, 7): 1, (2, 9): 1, (1, 4): 1}

Using pandas this is simple and very fast: import pandas print(pandas.Series(data=[(1,2),(1,4),(1,2),(6,7),(2,9)]).value_counts()) (1, 2) 2 (1, 4) 1 (6, 7) 1 (2, 9) 1 dtype: int64

Related

What's a more Pythonic way of grabbing N items from a dictionary?

Given a list of numbers, how many different ways can you add them together to get a sum S?

How to isolate specific rows of a Cartesian Product - Python

Python: is index() buggy at all?

Is there a way in Python to index a list of containers (tuples, lists, dictionaries) by an element of a container?

Categories

Resources