Efficiently combine two uneven lists into dictionary based on condition - python

I have two lists of tuples and I want to map elements in e to elements in s based on a condition. The condition is that the 1st element of something in e needs to be >= to the 1st element in s and elements 1+2 in e need to be <= to elements 1+2 in s. The 1st number in each s tuple is a start position and the second is the length. I can do it as follows:
e = [('e1',3,3),('e2',6,2),('e3',330,3)]
s = [('s1',0,10),('s2',11,24),('s3',35,35),('s4',320,29)]
for i in e:
d = {i:j for j in s if i[1]>=j[1] and i[1]+i[2]<=j[1]+j[2]}
print(d)
Output (what I want):
{('e1', 3, 3): ('s1', 0, 10)}
{('e2', 6, 2): ('s1', 0, 10)}
{('e3', 330, 3): ('s4', 320, 29)}
Is there a more efficient way to get to this result, ideally without the for loop (at least not a double loop)? I've have tried some things with zip as well as something along the lines of
list(map(lambda i,j: {i:j} if i[1]>=j[1] and i[1]+i[2]<=j[1]+j[2] else None, e, s))
but it is not giving me quite what I am looking for.
The elements in s will never overlap. For example, you wouldn't have ('s1',0,10) and ('s2', 5,15). In other words, the range (0, 0+10) will never overlap with (5,5+15) or that of any other tuple. Additionally, all the tuples in e will be unique.

The constraint that the tuples in s can't overlap is pretty strong. In particular, it implies that each value in e can only match with at most one value in s (I think the easiest way to show that is to assume two distinct, non-overlapping tuples in s match with a single element in e and derive a contradiction).
Because of that property, any s-tuple s1 matching an e-tuple e1 has the property that among all tuples sx in s with sx[1]<=e1[1], it is the one with the greatest sum(sx[1:]), since if it weren't then the s-tuple with a small enough 1st coordinate and greater sum would also be a match (which we already know is impossible).
That observation lends itself to a fairly simple algorithm where we linearly walk through both e and s (sorted), keeping track of the s-tuple with the biggest sum. If that sum is big enough compared to the e-tuple we're looking at, add the pair to our result set.
def pairs(e, s):
# Might be able to skip or replace with .sort() depending
# on the rest of your program
e = sorted(e, key=lambda t: t[1])
s = sorted(s, key=lambda t: t[1])
i, max_seen, max_sum, results = 0, None, None, {}
for ex in e:
while i < len(s) and (sx:=s[i])[1] <= ex[1]:
if not max_seen or sum(sx[1:]) > max_sum:
max_seen, max_sum = sx, sum(sx[1:])
i += 1
if max_seen and max_sum > sum(ex[1:]):
results[ex] = s[i]
return results

Related

Using tuples as indexes to compare items across a list

I have a list of three tuples and a list of three strings:
pairs = [(0, 1), (0, 2), (1, 2)]
values = ['aac', 'ccc', 'caa']
I would like to use the elements of the pairs as indexes to compare the strings in the following way:
The first pair of indexes, (0, 1) operate across the first letter of each string: a from the first, c from the second, and c from the third. That is, it compares the values at index 0 and 1 in the sequence a, c, c. Since a is lexically less than c, this comparison should give 'smaller'.
The second pair is (0, 2) and operates across the second letter of each string: a, c, a. Since they're both a, the result should be 'equal'.
Finally, (1, 2) is checked on c, c, a, resulting in 'bigger'.
So the total expected output is the following list:
['smaller', 'bigger', 'equal']
I have tried the following code:
n=0
for x,y in pairs:
if ord(values[x][n])>ord(values[y][n]):
print('bigger')
n+=1
elif ord(values[x][n])<ord(values[y][n]):
print('smaller')
n+=1
else:
print('equal')
n+=1
However, not only does it print the results instead of building a list, it also gives incorrect results (smaller, equal, bigger). How do I achieve my intended result?
You could use a list comprehension with the zip function to combine the two lists:
pairs = [(0, 1), (0, 2), (1, 2)]
values = ['aac', 'ccc', 'caa']
result = [ ("smaller","equal","bigger")[(v[x]>v[y])+(v[x]>=v[y])]
for v,(x,y) in zip(zip(*values),pairs) ]
print(result)
['smaller', 'equal', 'bigger']
zip(*values) will create tuples with the nth character of each string: ('a','c','c'), ('a','c','a'), ('c','c','a')
zip(zip(*values),pairs) combines those character tuples with each corresponding pair: (('a','c','c'),(0,1)), (('a','c','a'),(0,2)), (('c','c','a'),(1,2))
these become v (the nth characters of each value) and x,y (the nth index pair)
The appropriate keyword is then chosen in ("smaller","equal","bigger") using the index 0, 1 or 2
Python treats True as 1 and False as 0 when adding booleans (comparison results) so the index will be 1+1 if the v[x] is greater than v[y], It will be 0+1 if v[x] is equal to v[y] and zero otherwise. BTW, you don't need ord() to compare characters.
Your code is good, there are just a few things to improve.
No need to compare the ord, you can compare characters directly.
Instead of printing, save each step in a list
You can use enumerate to enumerate iterables:
out = []
for n, (x,y) in enumerate(pairs):
if values[x][n]>values[y][n]:
out.append('bigger')
elif values[x][n]<values[y][n]:
out.append('smaller')
else:
out.append('equal')
out
Output:
>>> out
['smaller', 'equal', 'bigger']
NB. I am not commenting on the global logic as what you ultimately want to do was not explicited

Check if any keys in a dictionary match any values

I'm trying to solve this problem:
Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n).
If d(a) = b and d(b) = a, where a ≠ b, then a and b are an amicable pair and each of a and b are called amicable numbers.
For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so d(284) = 220.
Evaluate the sum of all the amicable numbers under 10000.
I came up with a dictionary that holds x:d(x) for all numbers 0 to 9999 like so:
sums = {x:sum(alecproduct.find_factors(x))-x for x,y in enumerate(range(10**4))}
Where alecproduct.findfactors is a function from my own module that returns a list of all the factors of a number
I'm not sure where to go from here, though. I've tried iterating over the dictionary and creating tuples out of each k-v pair like so:
for k,v in sums.items():
dict_tups.append((k,v))
But I don't think this helps me. Any advice on how I can detect if any of the dictionary keys match any of the dictionary values?
Edit - My solution based on 6502's answer:
sums,ap = {x:sum(find_factors(x))-x for x,y in enumerate(range(10**4))}, []
for x in sums:
y = sums[x]
if sums.get(y) == x and x != y:
ap.append(x)
print(ap)
print('\nSum: ', sum(ap))
Your problem is almost solved already... just get all couples out:
for x in my_dict:
y = my_dict[x]
if my_dict.get(y) == x:
# x/y is an amicable pair
...
note that every pair will be extracted twice (both x/y and y/x) and perfect numbers (numbers that are the sum of their divisors) only once; not sure from your problem text if 6/6 is considered an amicable pair or not.
This code should give you a list of all keys that are also values.
my_test = [key for key in my_dict.keys() if key in my_dict.values()]
You don't need .keys() because, this is the default behavior however, I wanted to be explicit for this example.
Alternatively, a for loop example can be seen below.
for key, value in my_dict.iteritems():
if key == value:
print key # or do stuff
iterating through keys and values of your sums dictionary to create a new list with all the amicable numbers solves the problem, here is the code snippet.
amicable_list=[]
for i in sums.keys():
if i in sums.values():
if (sums.get(sums.get(i,-1),-1) == i) and (i != sums[i]):
amicable_list.append(i)
You could use sets:
x_dx = {(x, sum(alecproduct.find_factors(x)) - x) for x in range(10 ** 4)}
x_dx = {t for t in x_dx if t[0] != t[1]}
dx_x = {(t[1], t[0]) for t in x_dx}
amicable_pairs = x_dx & dx_x
As in 6502's answer, all amicable pairs are extracted twice.
A way to remove these 'duplicates' could be (although it's certainly a mouthful):
amicable_pairs_sorted = {tuple(sorted(t)) for t in amicable_pairs}
amicable_pairs_ascending = sorted(list(amicable_pairs_sorted))

How to form an unique collection with one element taken from each array?

Say I have 3 integer arrays: {1,2,3}, {2,3}, {1}
I must take exactly one element from each array, to form a new array where all numbers are unique. In this example, the correct answers are: {2,3,1} and {3,2,1}. (Since I must take one element from the 3rd array, and I want all numbers to be unique, I must never take the number 1 from the first array.)
What I have done:
for a in array1:
for b in array2:
for c in array3:
if a != b and a != c and b != c:
AddAnswer(a,b,c)
This is brute force, which works, but it doesn't scale well. What if now we are dealing with 20 arrays instead of just 3. I don't think it's good to write a 20 nested for-loops. Is there a clever way to do this?
What about:
import itertools
arrs = [[1,2,3], [2,3], [1]]
for x in itertools.product(*arrs):
if len(set(x)) < len(arrs): continue
AddAnswer(x)
AddAnswer(x) is called twice, with the tuples:
(2, 3, 1)
(3, 2, 1)
You can think of this as finding a matching in a bipartite graph.
You are trying to select one element from each set, but are not allowed to select the same element twice, so you are trying to match sets to numbers.
You can use the matching function in the graph library NetworkX to do this efficiently.
Python example code:
import networkx as nx
A=[ [1,2,3], [2,3], [1] ]
numbers = set()
for s in A:
for n in s:
numbers.add(n)
B = nx.Graph()
for n in numbers:
B.add_node('%d'%n,bipartite=1)
for i,s in enumerate(A):
set_name = 's%d'%i
B.add_node(set_name,bipartite=0)
for n in s:
B.add_edge(set_name,n)
matching = nx.maximal_matching(B)
if len(matching) != len(A):
print 'No complete matching'
else:
for number,set_name in matching:
print 'choose',number,'from',set_name
This is a simple, efficient method for finding a single matching.
If you want to enumerate through all matchings you may wish to read:
Algorithms for Enumerating All Perfect, Maximum and
Maximal Matchings in Bipartite Graphs by Takeaki UNO which gives O(V) complexity per matching.
A recursive solution (not tested):
def max_sets(list_of_sets, excluded=[]):
if not list_of_sets:
return [set()]
else:
res = []
for x in list_of_sets[0]:
if x not in excluded:
for candidate in max_sets(list_of_sets[1:], exclude+[x]):
candidate.add(x)
res.append(candidate)
return res
(You could probably dispense with the set but it's not clear if it was in the question or not...)

Sort list of lists by unique reversed absolute condition

Context - developing algorithm to determine loop flows in a power flow network.
Issue:
I have a list of lists, each list represents a loop within the network determined via my algorithm. Unfortunately, the algorithm will also pick up the reversed duplicates.
i.e.
L1 = [a, b, c, -d, -a]
L2 = [a, d, c, -b, -a]
(Please note that c should not be negative, it is correct as written due to the structure of the network and defined flows)
Now these two loops are equivalent, simply following the reverse structure throughout the network.
I wish to retain L1, whilst discarding L2 from the list of lists.
Thus if I have a list of 6 loops, of which 3 are reversed duplicates I wish to retain all three.
Additionally, The loop does not have to follow the format specified above. It can be shorter, longer, and the sign structure (e.g. pos pos pos neg neg) will not occur in all instances.
I have been attempting to sort this by reversing the list and comparing the absolute values.
I am completely stumped and any assistance would be appreciated.
Based upon some of the code provided by mgibson I was able to create the following.
def Check_Dup(Loops):
Act = []
while Loops:
L = Loops.pop()
Act.append(L)
Loops = Popper(Loops, L)
return Act
def Popper(Loops, L):
for loop in Loops:
Rev = loop[::-1]
if all (abs(x) == abs(y) for x, y in zip(loop_check, Rev)):
Loops.remove(loop)
return Loops
This code should run until there are no loops left discarding the duplicates each time. I'm accepting mgibsons answers as it provided the necessary keys to create the solution
I'm not sure I get your question, but reversing a list is easy:
a = [1,2]
a_rev = a[::-1] #new list -- if you just want an iterator, reversed(a) also works.
To compare the absolute values of a and a_rev:
all( abs(x) == abs(y) for x,y in zip(a,a_rev) )
which can be simplified to:
all( abs(x) == abs(y) for x,y in zip(a,reversed(a)) )
Now, in order to make this as efficient as possible, I would first sort the arrays based on the absolute value:
your_list_of_lists.sort(key = lambda x : map(abs,x) )
Now you know that if two lists are going to be equal, they have to be adjacent in the list and you can just pull that out using enumerate:
def cmp_list(x,y):
return True if x == y else all( abs(a) == abs(b) for a,b in zip(a,b) )
duplicate_idx = [ idx for idx,val in enumerate(your_list_of_lists[1:])
if cmp_list(val,your_list_of_lists[idx]) ]
#now remove duplicates:
for idx in reversed(duplicate_idx):
_ = your_list_of_lists.pop(idx)
If your (sub) lists are either strictly increasing or strictly decreasing, this becomes MUCH simpler.
lists = list(set( tuple(sorted(x)) for x in your_list_of_lists ) )
I don't see how they can be equivalent if you have c in both directions - one of them must be -c
>>> a,b,c,d = range(1,5)
>>> L1 = [a, b, c, -d, -a]
>>> L2 = [a, d, -c, -b, -a]
>>> L1 == [-x for x in reversed(L2)]
True
now you can write a function to collapse those two loops into a single value
>>> def normalise(loop):
... return min(loop, [-x for x in reversed(L2)])
...
>>> normalise(L1)
[1, 2, 3, -4, -1]
>>> normalise(L2)
[1, 2, 3, -4, -1]
A good way to eliminate duplicates is to use a set, we just need to convert the lists to tuples
>>> L=[L1, L2]
>>> set(tuple(normalise(loop)) for loop in L)
set([(1, 2, 3, -4, -1)])
[pair[0] for pair in frozenset(sorted( (c,negReversed(c)) ) for c in cycles)]
Where:
def negReversed(list):
return tuple(-x for x in list[::-1])
and where cycles must be tuples.
This takes each cycle, computes its duplicate, and sorts them (putting them in a pair that are canonically equivalent). The set frozenset(...) uniquifies any duplicates. Then you extract the canonical element (in this case I arbitrarily chose it to be pair[0]).
Keep in mind that your algorithm might be returning cycles starting in arbitrary places. If this is the case (i.e. your algorithm might return either [1,2,-3] or [-3,1,2]), then you need to consider these as equivalent necklaces
There are many ways to canonicalize necklaces. The above way is less efficient because we don't care about canonicalizing the necklace directly: we just treat the entire equivalence class as the canonical element, by turning each cycle (a,b,c,d,e) into {(a,b,c,d,e), (e,a,b,c,d), (d,e,a,b,c), (c,d,e,a,b), (b,c,d,e,a)}. In your case since you consider negatives to be equivalent, you would turn each cycle into {(a,b,c,d,e), (e,a,b,c,d), (d,e,a,b,c), (c,d,e,a,b), (b,c,d,e,a), (-a,-b,-c,-d,-e), (-e,-a,-b,-c,-d), (-d,-e,-a,-b,-c), (-c,-d,-e,-a,-b), (-b,-c,-d,-e,-a)}. Make sure to use frozenset for performance, as set is not hashable:
eqClass.pop() for eqClass in {frozenset(eqClass(c)) for c in cycles}
where:
def eqClass(cycle):
for rotation in rotations(cycle):
yield rotation
yield (-x for x in rotation)
where rotation is something like Efficient way to shift a list in python but yields a tuple

find value of forloop at which event occurred Python

hey guys, this is very confusing...
i am trying to find the minimum of an array by:
for xpre in range(100): #used pre because I am using vapor pressures with some x molarity
xvalue=xarray[xpre]
for ppre in range(100): #same as xpre but vapor pressures for pure water, p
pvalue=parray[p]
d=math.fabs(xvalue-pvalue) #d represents the difference(due to vapor pressure lowering, a phenomenon in chemistry)
darray.append(d) #darray stores the differences
mini=min(darray) #mini is the minimum value in darray
darr=[] #this is to make way for a new set of floats
all the arrays (xarr,parr,darr)are already defined and what not. they have 100 floats each
so my question is how would I find the pvap and the xvap # which min(darr) is found?
edit
have changed some variable names and added variable descriptions, sorry guys
A couple things:
Try enumerate
Instead of darr being a list, use a dict and store the dvp values as keys, with the xindex and pindex variables as values
Here's the code
for xindex, xvalue in enumerate(xarr):
darr = {}
for pindex, pvalue in enumerate(parr):
dvp = math.fabs(xvalue - pvalue)
darr[dvp] = {'xindex': xindex, 'pindex': pindex}
mini = min(darr.keys())
minix = darr[mini]['xindex']
minip = darr[mini]['pindex']
minindex = darr.keys().index(mini)
print "minimum_index> {0}, is the difference of xarr[{1}] and parr[{2}]".format(minindex, minix, minip)
darr.clear()
Explanation
The enumerate function allows you to iterate over a list and also receive the index of the item. It is an alternative to your range(100). Notice that I don't have the line where I get the value at index xpre, ppre, this is because the enumerate function gives me both index and value as a tuple.
The most important change, however, is that instead of your darr being a list like this:
[130, 18, 42, 37 ...]
It is now a dictionary like this:
{
130: {'xindex': 1, 'pindex': 4},
18: {'xindex': 1, 'pindex': 6},
43: {'xindex': 1, 'pindex': 9},
...
}
So now, instead of just storing the dvp values alone, I am also storing the indices into x and p which generated those dvp values. Now, if I want to know something, say, Which x and p values produce the dvp value of 43? I would do this:
xindex = darr[43]['xindex']
pindex = darr[43]['pindex']
x = xarr[xindex]
p = parr[pindex]
Now x and p are the values in question.
Note I personally would store the values which produced a particular dvp, and not the indices of those values. But you asked for the indices so I gave you that answer. I'm going to assume that you have a reason for wanting to handle indices like this, but in Python generally you do not find yourself handling indices in this way when you are programming in Pythonic manner. This is a very C way of doing things.
Edit: This doesn't answer the OP's question:
min_diff, min_idx = min((math.fabs(a - b), i) for i, (a, b) in enumerate(zip(xpre, ppre)
right to left:
zip takes xpre and ppre and makes a tuple of the 1st, 2nd, ... elements respectively, like so:
[ (xpre[0],ppre[0]) , (xpre[1],ppre[1]) , ... ]
enumerate enumerates adds the index by just counting upwards from 0:
[ (0 , (xpre[0],ppre[0]) ) , (1 , (xpre[1],ppre[1]) ) , ... ]
This unpacks each nestet tuple:
for i, (a, b) in ...
i is the index generated by enumerate, a and b are the elements of xarr and parr.
This builds a tuple consisting of a difference and the index:
(math.fabs(a - b), i)
The whole thing inbetween the min(...) is a generator expression. min then finds the minimal value in these values, and the assignment unpacks them:
min_diff, min_idx = min(...)

Categories