Python: Compare two lists and create a dictionary - python

In Python, I have a list of pairs (A) and a list of integers (B). A and B always have the same length. I want to know of a fast way of finding all the elements (pairs) of A that correspond to the same value in B (by comparison of indices of A and B) and then store the values in a dictionary (C) (the keys of the dictionary would correspond to elements of B). As an example, if
A = [(0, 0), (0, 1), (0, 3), (0, 6), (0, 7), (1, 3), (1, 7)]
B = [ 2, 5, 5, 1, 5, 4, 1 ]
then
C = {1: [(0,6),(1,7)], 2: [(0,0)], 4: [(1,3)], 5[(0,1), (0,3), (0,7)]}
Presently, I am trying this approach:
C = {}
for a, b in zip(A, B):
C.setdefault(b, [])
C[b].append(a)
While this approach gives me the desired result, I would like some approach which will be way faster (since I need to work with big datasets). I will be thankful if anyone can suggest a fast way to implement this (i.e. find the dictionary C once one is in knowledge of lists A and B).

I would have suggested
for i in range (0,len(B)):
C2.setdefault(B[i], [])
C2[B[i]].append(A[i])
it would save the zip (A,B) process

import collections
C = collections.defaultdict(list)
for ind, key in enumerate(B):
C[key].append(A[ind])

Related

Using enumerate() to enumerate items with letters rather than numbers

I'm trying to use the built-in function enumerate() to label some points or vertices where each point is represented by its coordinates in a list(or set) of tuples which essentially looks like {(4,5), (6,8), (1,2)}
I want to assign a letter starting from "a" in ascending order to each tuple in this set, using enumerate() does exactly the same but It's written in a way that it returns the value of the index of each item so that it's a number starting from 0.
is there any way to do it other than writing my own enumerate()?
Check this out:
import string
tup = {(4,5), (6,8), (1,2)}
dic = {i: j for i, j in zip(string.ascii_lowercase, tup)}
This returns:
{'a': (4, 5), 'b': (6, 8), 'c': (1, 2)}
This is enumerate's signature.
enumerate(iterable, start=0)
Use start as 65 for 'A' and 97 for 'a'.
lst=[(1,2),(2,3),(3,4),...]
for idx,val in enumerate(lst,65):
print(chr(idx),val)
A (1, 2)
B (2, 3)
C (3, 4)
Maybe this is a way to get what you want, using chr():
L = [(4,5), (6,8), (1,2)]
for k, v in enumerate(L):
print(chr(65 + k), v)
Output :
A (4, 5)
B (6, 8)
C (1, 2)
The enumerate function is defined as follow :
enumerate(iterable, start=0)
I think just have to write your own enumerate or an wrapper around enumerate.

Find overlapping elements in a list of tuples?

From my understanding of the intersection function, it finds complete overlap between elements in a list. For example:
tup_1 = [(1,2,3),(4,5,6)]
tup_2 = [(4,5,6)]
ol_tup = set(tup_1).intersection(tup_2)
print ol_tup
would yield:
set([(4, 5, 6)])
However, suppose my list of tuples are set up as this:
tup_1 = [(1,2,3),(4,5,5)]
tup_2 = [(4,5,6)]
Where there's an overlap in 2 elements of the 2nd tuple in tup_1 and 1st tuple in tup_2. If I want to python to return these 2 tuples: (4,5,5) and (4,5,6), is there an easier way than this nested for loop (below)?
for single_tuple_1 in tup_1:
for single_tuple_2 in tup_2:
if single_tuple_1[0] == single_tuple_2[0] and single_tuple_1[1] == single_tuple_2[1]:
print single_tuple_1,single_tuple_2
EDIT:
For this case, suppose order matters and suppose the tuples contain 5 elements:
tup_1 = [(1,2,3,4,5),(4,5,6,7,8),(11,12,13,14,15)]
tup_2 = [(1,2,3,4,8),(4,5,1,7,8),(11,12,13,14,-5)]
And I would like to find the tuples that intersect with each other in their respective first 4 elements. So the result should be:
[(1,2,3,4,5),(1,2,3,4,8),(11,12,13,14,15),(11,12,13,14,-5)]
How would the code change to accommodate this?
If you want to return all the pairs of "overlapping" tuples there's no way around comparing all the pairs, i.e. a quadratic algorithm. But you could make the code a bit more elegant using a list comprehension, product for the combinations and zip and sum for the comparison:
>>> tup_1 = [(1,2,3),(4,5,5),(7,8,9)]
>>> tup_2 = [(4,5,6),(0,5,5),(9,8,7)]
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if sum(1 for ai, bi in zip(a, b) if ai == bi) >= 2]
[((4, 5, 5), (4, 5, 6)), ((4, 5, 5), (0, 5, 5))]
Note: This checks whether two tuples have the same element in at least two positions, i.e. order matters. If order should not matter, you can convert a and b to set instead and check the size of their intersection, but that might fail for repeated numbers, i.e. the intersection of (1,1,2) and (1,1,3) would just be 1 instead of 2.
If you only want to match the first two, or first two and last two elements, you can compare slices of the tuples in an accordant disjunction:
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if a[:2] == b[:2]]
[((4, 5, 5), (4, 5, 6))]
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if a[:2] == b[:2] or a[-2:] == b[-2:]]
[((4, 5, 5), (4, 5, 6)), ((4, 5, 5), (0, 5, 5))]
This is one way using a list comprehension. The logic as written checks for an overlap of at least 2 elements.
Note that if there is no overlap you will be left with the one element of tup_2, but that can be trivially identified.
from itertools import chain
tup_1 = [(1,2,3),(4,5,5)]
tup_2 = [(4,5,6)]
y = sorted(tup_2[0])
res = [i for i in chain(tup_1, tup_2) if
sum(i==j for i, j in zip(sorted(i), y)) > 1]
print res
[(4, 5, 5), (4, 5, 6)]

All combinations of list elements in certain order

I have a list as follows:
((0,n1,n2,...,nX),(0,n1,n2,...,nY),(1,n1,n2,...,nZ),(2,n1,n2,...,nR),(2,n1,n2,...,nS))
I would like to return all possible combinations of list elements in such way:
(0,n1,n2,...,nX),(1,n1,n2,...,nZ),(2,n1,n2,...,nR)
(0,n1,n2,...,nY),(1,n1,n2,...,nZ),(2,n1,n2,...,nR)
(0,n1,n2,...,nX),(1,n1,n2,...,nZ),(2,n1,n2,...,nS)
(0,n1,n2,...,nY),(1,n1,n2,...,nZ),(2,n1,n2,...,nS)
So, I have understood and worked out that I need to iterate through elements and check first item of the list element to group elements by first item.
It could let me to do maybe a for loop? and try to manualy create all combinations?
However I wonder if there is any better approach?
I need to keep in mind that the elements must be in order ascending by first item of the elements --> 0, 1, 2
EDIT:
This is my list in other words:
((0,A), (0,B), (1,C), (2,D),(2,E))
how to return as follows:
(0,A),(1,C),(2,D)
(0,B),(1,C),(2,D)
(0,A),(1,C),(2,E)
(0,B),(1,C),(2,E)
?
The problem becomes easier if you change your data structures a bit. More specifically, just group all elements with the same "ID" in the same list.
For your example you have 3 lists:
a = [(0,n1,n2,...,nX),(0,n1,n2,...,nY)]
b = [(1,n1,n2,...,nZ)]
c = [(2,n1,n2,...,nR),(2,n1,n2,...,nS)]
Let me know if you have trouble separating the lists out like this, and I'll amend my answer.
Then you can use the itertools.product function to get all the combinations that you want.
import itertools
for i in itertools.product(a, b, c):
print i
Or if you want to see all the combinations as a list you can simply do:
list(itertools.product(a, b, c))
Similarly, you can use tuple() or set() if you want to see all the combinations as a tuple or a set.
EDIT:
If you to not have your elements already grouped together, and instead you have a flattened list(or tuple) of tuples you can create a list that groups tuples according to their "ID" (i.e., the first value of the simple tuples). Here's a function to do it. I assume there is no order in how the tuples are initially given (otherwise we can probably make this grouping more efficient)
def groupList(flatlist):
tempdict = {}
for element in flatlist:
id = element[0]
if id in tempdict:
tempdict[id].append(element)
else:
tempdict[id] = [element]
return list(tempdict.values())
Now you can used this "grouped" list to get all the combinations. Let's assume that you initial list is l, then you can do:
list(itertools.product(*groupList(l)))
Notice the * when passing the argument. This tells python to use the elements of this list as separate arguments to the function.
Example Input:
l = ((0, 10), (0, 20), (1, 30), (2, 40), (2, 50))
Example Output:
[((0, 10), (1, 30), (2, 40)), ((0, 10), (1, 30), (2, 50)), ((0, 20),
(1, 30), (2, 40)), ((0, 20), (1, 30), (2, 50))]

Return a sequence of a variable length whose summation is equal to a given integer

In the form f(x,y,z) where x is a given integer sum, y is the minimum length of the sequence, and z is the maximum length of the sequence. But for now let's pretend we're dealing with a sequence of a fixed length, because it will take me a long time to write the question otherwise.
So our function is f(x,r) where x is a given integer sum and r is the length of a sequence in the list of possible sequences.
For x = 10, and r = 2, these are the possible combinations:
1 + 9
2 + 8
3 + 7
4 + 6
5 + 5
Let's store that in Python as a list of pairs:
[(1,9), (2,8), (3,7), (4,6), (5,5)]
So usage looks like:
>>> f(10,2)
[(1,9), (2,8), (3,7), (4,6), (5,5)]
Back to the original question, where a sequence is return for each length in the range (y,x). I the form f(x,y,z), defined earlier, and leaving out sequences of length 1 (where y-z == 0), this would look like:
>>> f(10,1,3)
[{1: [(1,9), (2,8), (3,7), (4,6), (5,5)],
2: [(1,1,8), (1,2,7), (1,3,6) ... (2,4,4) ...],
3: [(1,1,1,7) ...]}]
So the output is a list of dictionaries where the value is a list of pairs. Not exactly optimal.
So my questions are:
Is there a library that handles this already?
If not, can someone help me write both of the functions I mentioned? (fixed sequence length first)?
Because of the huge gaps in my knowledge of fairly trivial math, could you ignore my approach to integer storage and use whatever structure the makes the most sense?
Sorry about all of these arithmetic questions today. Thanks!
The itertools module will definately be helpful as we're dealing with premutations - however, this looks suspiciously like a homework task...
Edit: Looks like fun though, so I'll do an attempt.
Edit 2: This what you want?
from itertools import combinations_with_replacement
from pprint import pprint
f = lambda target_sum, length: [sequence for sequence in combinations_with_replacement(range(1, target_sum+1), length) if sum(sequence) == target_sum]
def f2(target_sum, min_length, max_length):
sequences = {}
for length in range(min_length, max_length + 1):
sequence = f(target_sum, length)
if len(sequence):
sequences[length] = sequence
return sequences
if __name__ == "__main__":
print("f(10,2):")
print(f(10,2))
print()
print("f(10,1,3)")
pprint(f2(10,1,3))
Output:
f(10,2):
[(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)]
f(10,1,3)
{1: [(10,)],
2: [(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)],
3: [(1, 1, 8),
(1, 2, 7),
(1, 3, 6),
(1, 4, 5),
(2, 2, 6),
(2, 3, 5),
(2, 4, 4),
(3, 3, 4)]}
The problem is known as Integer Partitions, and has been widely studied.
Here you can find a paper comparing the performance of several algorithms (and proposing a particular one), but there are a lot of references all over the Net.
I just wrote a recursive generator function, you should figure out how to get a list out of it yourself...
def f(x,y):
if y == 1:
yield (x, )
elif y > 1:
for head in range(1, x-y+2):
for tail in f(x-head, y-1):
yield tuple([head] + list(tail))
def f2(x,y,z):
for u in range(y, z+1):
for v in f(x, u):
yield v
EDIT: I just see it is not exactly what you wanted, my version also generates duplicates where only the ordering differs. But you can simply filter them out by ordering all results and check for duplicate tuples.

construct graph from python set type

The short question, is there an off the self function to make a graph from a collection of python sets?
The longer question: I have several python sets. They each overlap or some are sub sets of others. I would like to make a graph (as in nodes and edges) nodes are the elements in the sets. The edges are intersection of the sets with weighted by number of elements in the intersection of the sets. There are several graphing packages for python. (NetworkX, igraph,...) I am not familiar with the use of any of them. Will any of them make a graph directly from a list of sets ie, MakeGraphfromSets(alistofsets)
If not do you know of an example of how to take the list of sets to define the edges. It actually looks like it might be straight forward but an example is always good to have.
It's not too hard to code yourself:
def intersection_graph(sets):
adjacency_list = {}
for i, s1 in enumerate(sets):
for j, s2 in enumerate(sets):
if j == i:
continue
try:
lst = adjacency_list[i]
except KeyError:
adjacency_list[i] = lst = []
weight = len(s1.intersection(s2))
lst.append( (j, weight) )
return adjacency_list
This function numbers each set with its index within sets. We do this because dict keys must be immutable, which is true of integers but not sets.
Here's an example of how to use this function, and it's output:
>>> sets = [set([1,2,3]), set([2,3,4]), set([4,2])]
>>> intersection_graph(sets)
{0: [(1, 2), (2, 1)], 1: [(0, 2), (2, 2)], 2: [(0, 1), (1, 2)]}
def MakeGraphfromSets(sets):
egs = []
l = len(sets)
for i in range(l):
for j in range(i,l):
w = sets[i].intersection(sets[j])
egs.append((i,j,len(w)))
return egs
# (source set index,destination set index,length of intersection)
sets = [set([1,2,3]), set([2,3,4]), set([4,2])]
edges = MakeGraphfromSets(sets)
for e in edges:
print e
OUTPUT:
(0, 0, 3)
(0, 1, 2)
(0, 2, 1)
(1, 1, 3)
(1, 2, 2)
(2, 2, 2)

Categories