python itertools skipping ahead

python itertools skipping ahead - python

I have a list of lists. Using itertools, I am basically doing
for result in product([A,B],[C,D],[E,F,G]):
# test each result
and the result is the desired product, with each result containing one element from each of the lists. My code tests each of the results element-by-element, looking for the first (and best) 'good' one. There can be a very very large number to test.
Let's say I'm testing the first result 'ACE'. Let's say when I test the second element 'C' I find that 'ACE' is a bad result. There is no need to test 'ACF' or 'ACG'. I would want to skip from the failed ACE directly to trying ADE. Anyway to do this without just throwing the unwanted results on the floor?
If I was implementing this with nested for loops, I would be trying to manipulate the for loop indexes inside the loop and that would not be very nice ... but I do want to skip testing a lot of results. Can I skip ahead efficiently in itertools?

itertools is not the best way to go with the concern you have.
If you just have 3 sets to combine, just loop over and when you fail, break the loops. (If you code is complex, set a variable and break right outside.
for i1 in [A, B]:
for i2 in [C, D]:
for i3 in [E, F, G]:
if not test(i1, i2, i3):
break
However, if the number of sets that you have is variable, then use a recursive function (backtrack):
inp_sets = ([A,B],[C,D],[E,F,G])
max_col = len(inp_sets)
def generate(col_index, current_set):
if col_index == max_col:
if test(current_set):
return current_set
else:
return None
else:
found = False
for item in inp_sets[col_index]:
res = generate(col_index+1, current_set + [item]):
if res:
return res
elif (col_index == max_col - 1):
# Here we are skipping the rest of the checks for last column
# Change the condition if you want to skip for more columns
return None
result = generate(0, [])

Related

Built in (remove) function not working with function variable

Have a good day everyone, pardon my lack of understanding, but I can't seem to figure out why python built in function does not work when being called with another function variable and it just doesn't do what I want at all. Here is the code
def ignoreten(h):
ignoring = False
for i in range (1,len(h)-2):
if ignoring == True and h[i]==10:
h.remove(10)
if ignoring == False and h[i] ==10:
ignoring = True
The basic idea of this is just to decided the first 10 in a list, keep it, continue iterating until you faced another 10, then just remove that 10 to avoid replication, I had searched around but can't seem to find any solution and that's why I have to bring it up here. Thank you

The code you listed
def ignoreten(h):
ignoring = False
for i in range (1,len(h)-2):
if ignoring == True and h[i]==10:
h.remove(10)
if ignoring == False and h[i] ==10:
ignoring = True
Will actually do almost the exact opposite of what you want. It'll iterate over h (sort of, see [1]), and if it finds 10 twice, it'll remove the first occurrence from the list. (And, if it finds 10 three times, it'll remove the first two occurrences from the list.)
Note that list.remove will:
Remove the first item from the list whose value is equal to x. It
raises a ValueError if there is no such item.
Also note that you're mutating the list you're iterating over, so there's some additional weirdness here which may be confusing you, depending on your input.
From your follow-up comment to my question, it looks like you want to remove only the second occurrence of 10, not the first and not any subsequent occurrences.
Here are a few ways:
Iterate, store index, use del
def ignoreten(h):
index = None
found_first = False
for i,v in enumerate(h):
if v == 10:
if not found_first:
found_first = True
else:
index = i
break
if index is not None:
del h[index]
A little more verbose than necessary, but explicit, safe, and modifiable without much fear.
Alternatively, you could delete inside the loop but you want to make sure you immediately break:
def ignoreten(h):
found_first = False
for i,v in enumerate(h):
if v == 10:
if not found_first:
found_first = True
else:
del h[i]
break
Collect indices of 10s, remove second
def ignoreten(h):
indices = [i for (i,v) in enumerate(h) if v == 10]
if len(indices) > 1:
del h[indices[1]] # The second index of 10 is at indices[1]
Clean, but will unnecessarily iterate past the second 10 and collect as many indices of 10s are there are. Not likely a huge issue, but worth pointing out.
Collect indices of 10s, remove second (v2, from comments)
def ignoreten(h):
indices = [i for (i,v) in enumerate(h) if v == 10]
for i in reversed(indices[1:]):
del h[i]
From your comment asking about removing all non-initial occurrences of 10, if you're looking for in-place modification of h, then you almost definitely want something like this.
The first line collects all the indices of 10 into a list.
The second line is a bit tricky, but working inside-out it:
[1:] "throws out" the first element of that list (since you want to keep the first occurrence of 10)
reversed iterates over that list backwards
del h[i] removes the values at those indices.
The reason we iterate backwards is because doing so won't invalidate the rest of our indices that we've yet to delete.
In other words, if the list h was [1, 10, 2, 10, 3, 10], our indices list would be [1, 3, 5].
In both cases we skip 1, fine.
But if we iterate forwards, once we delete 3, and our list shrinks to 5 elements, when we go to delete 5 we get an IndexError.
Even if we didn't go out of bounds to cause an IndexError, our elements would shift and we'd be deleting incorrect values.
So instead, we iterate backwards over our indices, delete 5, the list shrinks to 5 elements, and index 3 is still valid (and still 10).
With list.index
def ignoreten(h):
try:
second_ten = h.index(10, h.index(10)+1)
del h[second_ten]
except ValueError:
pass
The inner .index call finds the first occurrence, the second uses the optional begin parameter to start searching after that. Wrapped in try/except in case there are less than two occurrences.
⇒ Personally, I'd prefer these in the opposite order of how they're listed.
[1] You're iterating over a weird subset of the list with your arguments to range. You're skipping (not applying your "is 10" logic to) the first and last two elements this way.
Bonus: Walrus abuse
(don't do this)
def ignoreten(h):
x = 0
return [v for v in h if v != 10 or (x := x + 1) != 1]
(unlike the previous versions that operated on h in-place, this creates a new list without the second occurrence of 10)
But the walrus operator is contentious enough already, please don't let this code out in the wild. Really.

Trying to write a recursive function to get just the unique numbers in list

I'm trying to do it where the list is split in two and the function picks the unique numbers from the first half and then calls itself again with the second half of the list. I'm just a little stuck at this point, as this only gives me the first half numbers.
def function(Lst):
s = set()
mid = len(Lst)//2
for item in Lst[:mid]:
Thank you for the direction.
Was able to use a that for loop to add to the set and then recursevely redo it for the right half.

Something like this should do the work although it's not optimal:
def unique(Lst):
s = set()
if len(Lst) == 1:
return {Lst[0]}
mid = len(Lst)//2
for item in Lst[:mid]:
if item not in s:
s.add(item)
s |= unique(Lst[mid:])
return s
P.S. if you want uniqueness, you can just do this: set([1,2,3,3,3,4,5,6,7,7,8,8])
EDIT:
your problem was that you forgot to add results from recursive call to the set, which already contains your items (s)

One recursive solution, for example, would be:
def unique(Lst):
if len(Lst) == 1:
return {Lst[0]}
m = len(Lst) // 2
left_values = unique(Lst[:m])
right_values = unique(Lst[m:])
merge_step = set.union(left_values, right_values)
return merge_step
The divide & conquer solution does pretty much this:
If the subproblem size is 1, return a set containing the only element
The merge step merges the left and right subproblems using set union. The union of two sets is the set of all unique elements contained in at least one of those sets.

Rephrasing nested for loops in Python

The following code has multiple loops and I want to reduce it to optimise the time complexity as well.
for a in file1:
if a[0] in [i[1] for i in file2]:
for b in file2:
if a[0] == b[1]:
c.append(int(b[0]))
continue
else:
# do stuff
I tried the following to make it more efficient. Although, I couldn't find an alternative to the if statement.
for a, b in zip(file1, file2):
if a[0] in [i[1] for i in file2]:
if a[0] == b[1]:
c.append(int(b[0]))
continue
else:
# do stuff
Also, the outputs for both the operations are different. The first piece of code does show a correct result.

Your second solution is actually slower. The idea of zip (or rather, it should be like itertools.product, zip produces N pairs) produces NxM pairs, so your entire solution is now O(NxMxM), whereas the first should be O(Nx2M). I'm not sure what your continue statement does, that seems pointless.
My tip is to precalculate some of your values, and to use sets/dictionaries. [i[1] for i in file2] will be the same every loop, so take that out.
Also, since you are aligning b with a by value, let's instead create a reverse lookup dictionary.
# build reverse lookup dictionary
reverse = dict()
for b in file2:
if not b[1] in reverse:
reverse[b[1]] = [b]
else:
reverse[b[1]].append(b)
# check to see if a[0] matches any b[1], if it does append all matching b[0] to c
for a in file1:
if a[0] in reverse:
b_valid = reverse[a[0]]
for b in b_valid:
c.append(int(b[0]))
else:
# do stuff
This brings it down somewhere along the lines of O(N+M) (potentially worse given poor dictionary creation times and lookup times).

Try:
next((x for x in file2 if a[0] == x[1]), None)
That will give you what fits, and you should be able to append if it is not None.

Algorithm for finding the possible palindromic strings in a list containing a list of possible subsequences

I have "n" number of strings as input, which i separate into possible subsequences into a list like below
If the Input is : aa, b, aa
I create a list like the below(each list having the subsequences of the string):
aList = [['a', 'a', 'aa'], ['b'], ['a', 'a', 'aa']]
I would like to find the combinations of palindromes across the lists in aList.
For eg, the possible palindromes for this would be 5 - aba, aba, aba, aba, aabaa
This could be achieved by brute force algorithm using the below code:
d = []
def isPalindrome(x):
if x == x[::-1]: return True
else: return False
for I in itertools.product(*aList):
a = (''.join(I))
if isPalindrome(a):
if a not in d:
d.append(a)
count += 1
But this approach is resulting in a timeout when the number of strings and the length of the string are bigger.
Is there a better approach to the problem ?

Second version
This version uses a set called seen, to avoid testing combinations more than once.
Note that your function isPalindrome() can simplified to single expression, so I removed it and just did the test in-line to avoid the overhead of an unnecessary function call.
import itertools
aList = [['a', 'a', 'aa'], ['b'], ['a', 'a', 'aa']]
d = []
seen = set()
for I in itertools.product(*aList):
if I not in seen:
seen.add(I)
a = ''.join(I)
if a == a[::-1]:
d.append(a)
print('d: {}'.format(d))

Current approach has disadvantage and that most of generated solutions are finally thrown away when checked that solution is/isn't palindrome.
One Idea is that once you pick solution from one side, you can immediate check if there is corresponding solution in last group.
For example lets say that your space is this
[["a","b","c"], ... , ["b","c","d"]]
We can see that if you pick "a" as first pick, there is no "a" in last group and this exclude all possible solutions that would be tried other way.

For larger input you could probably get some time gain by grabbing words from the first array, and compare them with the words of the last array to check that these pairs still allow for a palindrome to be formed, or that such a combination can never lead to one by inserting arrays from the remaining words in between.
This way you probably cancel out a lot of possibilities, and this method can be repeated recursively, once you have decided that a pair is still in the running. You would then save the common part of the two words (when the second word is reversed of course), and keep the remaining letters separate for use in the recursive part.
Depending on which of the two words was longer, you would compare the remaining letters with words from the array that is next from the left or from the right.
This should bring a lot of early pruning in the search tree. You would thus not perform the full Cartesian product of combinations.
I have also written the function to get all substrings from a given word, which you probably already had:
def allsubstr(str):
return [str[i:j+1] for i in range(len(str)) for j in range(i, len(str))]
def getpalindromes_trincot(aList):
def collectLeft(common, needle, i, j):
if i > j:
return [common + needle + common[::-1]] if needle == needle[::-1] else []
results = []
for seq in aRevList[j]:
if seq.startswith(needle):
results += collectRight(common+needle, seq[len(needle):], i, j-1)
elif needle.startswith(seq):
results += collectLeft(common+seq, needle[len(seq):], i, j-1)
return results
def collectRight(common, needle, i, j):
if i > j:
return [common + needle + common[::-1]] if needle == needle[::-1] else []
results = []
for seq in aList[i]:
if seq.startswith(needle):
results += collectLeft(common+needle, seq[len(needle):], i+1, j)
elif needle.startswith(seq):
results += collectRight(common+seq, needle[len(seq):], i+1, j)
return results
aRevList = [[seq[::-1] for seq in seqs] for seqs in aList]
return collectRight('', '', 0, len(aList)-1)
# sample input and call:
input = ['already', 'days', 'every', 'year', 'later'];
aList = [allsubstr(word) for word in input]
result = getpalindromes_trincot(aList)
I did a timing comparison with the solution that martineau posted. For the sample data I have used, this solution is about 100 times faster:
See it run on repl.it
Another Optimisation
Some gain could also be found in not repeating the search when the first array has several entries with the same string, like the 'a' in your example data. The results that include the second 'a' will obviously be the same as for the first. I did not code this optimisation, but it might be an idea to improve the performance even more.

Assign variable name to list in if statement python

I apologize for the poor title. Wasn't sure exactly how to word my question.
I have code below which uses a list of tuples called propadd. The if statement tests the tuples for matching conditions. If the match matches to only 1 tuple from the list of tuples, it executes the exact same code as that in the if statement so as to assign this matching tuple to variable v in order to update the cursor rows with values from this matching tuple. I would like to know if there's a way to get rid of the assignment of the exact same code to v after the if statement. Is it possible to assign the list to v in the if statement while checking for the length of the matches? This is part of a larger amount of code that follows this methodology. I believe that doing this will make my code faster.
if len([item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500]) == 1:
v=[item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500]
row1[1]=v[0][1]
row1[2]=v[0][2]
elif len([item for item in custadd if item[0]==row1[4]]) == 1:
k=[item for item in custadd if item[0]==row1[4]]
row1[1]=k[0][1]
row1[2]=k[0][2]
elif len([item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group()]) == 1
m=[item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group()]
row1[1]=m[0][1]
row1[2]=m[0][2]

It will make your code slightly faster, and what is much more important, more readable and less error prone. Whether the list you create passes the test len(...) == 1 or not, it is computed. So why not just compute it once? Of course you will have to replace elif with else-if:
# Compute v
v = [item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500]
if len(v) == 1:
row1[1]=v[0][1]
row1[2]=v[0][2]
else:
# If v fails, compute k
k = [item for item in custadd if item[0]==row1[4]]
if len(k) == 1:
row1[1]=k[0][1]
row1[2]=k[0][2]
else:
# If k fails, compute m
m = [item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group()]
if len(m) == 1:
row1[1]=m[0][1]
row1[2]=m[0][2]
Coming from C, this is much more cumbersome than if(v = (....)) { } else .... However, there is another way to do this. You can use the fact that each expression in the list comprehensions is a generator:
v = (item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500)
k = (item for item in custadd if item[0]==row1[4])
m = (item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group())
for gen in (v, k, m):
l = list(gen)
if len(l) == 1:
row1[1] = l[0][1]
row1[2] = l[0][2]
break
In this case, the expressions v, k, m are generators, which are objects that are lazily evalutated iterables. They are not actually computing the list. You can go through each one and assign the one that matches when it is found, ignoring the others. The list is not computed until the statement l = list(gen). I think the second approach is much more Pythonic because it uses a single for loop no matter how many conditions you have, instead of a sequence of else statements marching off the page.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python itertools skipping ahead - python

Related

Built in (remove) function not working with function variable

Trying to write a recursive function to get just the unique numbers in list

Rephrasing nested for loops in Python

Algorithm for finding the possible palindromic strings in a list containing a list of possible subsequences

Assign variable name to list in if statement python

Categories

Resources