Rephrasing nested for loops in Python - python

The following code has multiple loops and I want to reduce it to optimise the time complexity as well.
for a in file1:
if a[0] in [i[1] for i in file2]:
for b in file2:
if a[0] == b[1]:
c.append(int(b[0]))
continue
else:
# do stuff
I tried the following to make it more efficient. Although, I couldn't find an alternative to the if statement.
for a, b in zip(file1, file2):
if a[0] in [i[1] for i in file2]:
if a[0] == b[1]:
c.append(int(b[0]))
continue
else:
# do stuff
Also, the outputs for both the operations are different. The first piece of code does show a correct result.

Your second solution is actually slower. The idea of zip (or rather, it should be like itertools.product, zip produces N pairs) produces NxM pairs, so your entire solution is now O(NxMxM), whereas the first should be O(Nx2M). I'm not sure what your continue statement does, that seems pointless.
My tip is to precalculate some of your values, and to use sets/dictionaries. [i[1] for i in file2] will be the same every loop, so take that out.
Also, since you are aligning b with a by value, let's instead create a reverse lookup dictionary.
# build reverse lookup dictionary
reverse = dict()
for b in file2:
if not b[1] in reverse:
reverse[b[1]] = [b]
else:
reverse[b[1]].append(b)
# check to see if a[0] matches any b[1], if it does append all matching b[0] to c
for a in file1:
if a[0] in reverse:
b_valid = reverse[a[0]]
for b in b_valid:
c.append(int(b[0]))
else:
# do stuff
This brings it down somewhere along the lines of O(N+M) (potentially worse given poor dictionary creation times and lookup times).

Try:
next((x for x in file2 if a[0] == x[1]), None)
That will give you what fits, and you should be able to append if it is not None.

Related

Which one will run faster

lst is a very large list of integers
a,b and c are numbers
code 1
if a in lst or b in lst or c in lst:
print("found in lst")
code 2
if a in lst:
print("found in lst")
elif b in lst:
print("found in lst")
elif c in lst:
print("found in lst")
is there any speed difference in code 1 and code 2
Strictly speaking, both of your code samples will run equally since the or operator supports short-cuiting. That means that Python will not evaluate subsequent conditions unless the one before it evaluated to False. For example, if we have A or B or C, unless both A and B are False, Python will not evaluate C.
The same thing occurs with your if statements: C will only be evaluated if both A and B are False.
However, on large lists, you shouldn't be doing any sort of query like this. If don't need to index the elements of the list, a hash table (Dictionary) is almost always better since you'll get O(1) lookup time.

python function changing list values

I'm trying to call a function recursively, and passing parts of the list to the function.
def op3(c,i):
d = []
for l in range(0,len(c),+1):
d.append(c[l])
a, b = [], []
if len(d)>=3:
for l in range(1,len(d)-1,+1):
a.append(d[l])
for l in range(2,len(d),+1):
b.append(d[l])
A,B = [],[]
for j in range(0,len(a),+1):
a[j][i] = a[j][i]+a[j][i+1]
a[j].pop(i+1)
insertf(a,final)
A.append(a[j])
op3(A,i+1)
for k in range(0,len(b),+1):
b[k][i+1] = b[k][i+1]+b[k][i+2]
b[k].pop(i+2)
insertf(b,final)
B.append(b[k])
op3(B,i+1)
but the values in the original list are changed in list 'b' to the new values of d after the first nested 'for' loop runs.
i'm fairly new to python. i have read that this is just how lists work in python. is there a way around this?
All the modified C-style for loops make my head hurt. Trying to parse...
def op3(c,i):
d = c[:]
if len(d)>=3:
a=d[1:-1]
b=d[2:]
#A,B=[],[]
for item in a:
item[i] += item.pop(i+1)
insertf(a,final) # Totally unknown behaviour, does this modify a?
#A.append(item) # Seems pointless, A ends up a copy of a, and op3
# # does not modify c (only its items)
op3(a,i+1)
for item in b:
item[i+1] += item.pop(i+2)
insertf(b,final)
op3(b,i+1)
So from what the code does, it expects a list of lists, and it modifies the inner lists. It also calls itself recursively in a manner that seems to have no stop condition, but will break if the inner lists run out any((len(ci)<=i+2 for ci in c)).
On the whole, I'd say we can't provide a good answer because this code fragment just doesn't express what you want done. It seems likely the key point here is that lists are not two-dimensional; every list a[j] or b[k] is an independent object (though a[j] is b[j-1] since you extract them from the same list c), and has no knowledge of the manipulations you do on a,b,c,d,A,B.
Could you please describe what data structures you have and expect, and what sort of processing you're trying to do? It feels a bit like it would fit in about one expression of numpy.

for-in loop's upper limit changing in each loop

How can I update the upper limit of a loop in each iteration? In the following code, List is shortened in each loop. However, the lenList in the for, in loop is not, even though I defined lenList as global. Any ideas how to solve this? (I'm using Python 2.sthg)
Thanks!
def similarity(List):
import difflib
lenList = len(List)
for i in range(1,lenList):
import numpy as np
global lenList
a = List[i]
idx = [difflib.SequenceMatcher(None, a, x).ratio() for x in List]
z = idx > .9
del List[z]
lenList = len(List)
X = ['jim','jimmy','luke','john','jake','matt','steve','tj','pat','chad','don']
similarity(X)
Looping over indices is bad practice in python. You may be able to accomplish what you want like this though (edited for comments):
def similarity(alist):
position = 0
while position < len(alist):
item = alist[position]
position += 1
# code here that modifies alist
A list will evaluate True if it has any entries, or False when it is empty. In this way you can consume a list that may grow during the manipulation of its items.
Additionally, if you absolutely have to have indices, you can get those as well:
for idx, item in enumerate(alist):
# code here, where items are actual list entries, and
# idx is the 0-based index of the item in the list.
In ... 3.x (I believe) you can even pass an optional parameter to enumerate to control the starting value of idx.
The issue here is that range() is only evaluated once at the start of the loop and produces a range generator (or list in 2.x) at that time. You can't then change the range. Not to mention that numbers and immutable, so you are assigning a new value to lenList, but that wouldn't affect any uses of it.
The best solution is to change the way your algorithm works not to rely on this behaviour.
The range is an object which is constructed before the first iteration of your loop, so you are iterating over the values in that object. You would instead need to use a while loop, although as Lattyware and g.d.d.c point out, it would not be very Pythonic.
What you are effectively looping on in the above code is a list which got generated in the first iteration itself.
You could have as well written the above as
li = range(1,lenList)
for i in li:
... your code ...
Changing lenList after li has been created has no effect on li
This problem will become quite a lot easier with one small modification to how your function works: instead of removing similar items from the existing list, create and return a new one with those items omitted.
For the specific case of just removing similarities to the first item, this simplifies down quite a bit, and removes the need to involve Numpy's fancy indexing (which you weren't actually using anyway, because of a missing call to np.array):
import difflib
def similarity(lst):
a = lst[0]
return [a] + \
[x for x in lst[1:] if difflib.SequenceMatcher(None, a, x).ratio() > .9]
From this basis, repeating it for every item in the list can be done recursively - you need to pass the list comprehension at the end back into similarity, and deal with receiving an empty list:
def similarity(lst):
if not lst:
return []
a = lst[0]
return [a] + similarity(
[x for x in lst[1:] if difflib.SequenceMatcher(None, a, x).ratio() > .9])
Also note that importing inside a function, and naming a variable list (shadowing the built-in list) are both practices worth avoiding, since they can make your code harder to follow.

python itertools skipping ahead

I have a list of lists. Using itertools, I am basically doing
for result in product([A,B],[C,D],[E,F,G]):
# test each result
and the result is the desired product, with each result containing one element from each of the lists. My code tests each of the results element-by-element, looking for the first (and best) 'good' one. There can be a very very large number to test.
Let's say I'm testing the first result 'ACE'. Let's say when I test the second element 'C' I find that 'ACE' is a bad result. There is no need to test 'ACF' or 'ACG'. I would want to skip from the failed ACE directly to trying ADE. Anyway to do this without just throwing the unwanted results on the floor?
If I was implementing this with nested for loops, I would be trying to manipulate the for loop indexes inside the loop and that would not be very nice ... but I do want to skip testing a lot of results. Can I skip ahead efficiently in itertools?
itertools is not the best way to go with the concern you have.
If you just have 3 sets to combine, just loop over and when you fail, break the loops. (If you code is complex, set a variable and break right outside.
for i1 in [A, B]:
for i2 in [C, D]:
for i3 in [E, F, G]:
if not test(i1, i2, i3):
break
However, if the number of sets that you have is variable, then use a recursive function (backtrack):
inp_sets = ([A,B],[C,D],[E,F,G])
max_col = len(inp_sets)
def generate(col_index, current_set):
if col_index == max_col:
if test(current_set):
return current_set
else:
return None
else:
found = False
for item in inp_sets[col_index]:
res = generate(col_index+1, current_set + [item]):
if res:
return res
elif (col_index == max_col - 1):
# Here we are skipping the rest of the checks for last column
# Change the condition if you want to skip for more columns
return None
result = generate(0, [])

Remove items from a list while iterating without using extra memory in Python

My problem is simple: I have a long list of elements that I want to iterate through and check every element against a condition. Depending on the outcome of the condition I would like to delete the current element of the list, and continue iterating over it as usual.
I have read a few other threads on this matter. Two solutions seam to be proposed. Either make a dictionary out of the list (which implies making a copy of all the data that is already filling all the RAM in my case). Either walk the list in reverse (which breaks the concept of the alogrithm I want to implement).
Is there any better or more elegant way than this to do it ?
def walk_list(list_of_g):
g_index = 0
while g_index < len(list_of_g):
g_current = list_of_g[g_index]
if subtle_condition(g_current):
list_of_g.pop(g_index)
else:
g_index = g_index + 1
li = [ x for x in li if condition(x)]
and also
li = filter(condition,li)
Thanks to Dave Kirby
Here is an alternative answer for if you absolutely have to remove the items from the original list, and you do not have enough memory to make a copy - move the items down the list yourself:
def walk_list(list_of_g):
to_idx = 0
for g_current in list_of_g:
if not subtle_condition(g_current):
list_of_g[to_idx] = g_current
to_idx += 1
del list_of_g[to_idx:]
This will move each item (actually a pointer to each item) exactly once, so will be O(N). The del statement at the end of the function will remove any unwanted items at the end of the list, and I think Python is intelligent enough to resize the list without allocating memory for a new copy of the list.
removing items from a list is expensive, since python has to copy all the items above g_index down one place. If the number of items you want to remove is proportional to the length of the list N, then your algorithm is going to be O(N**2). If the list is long enough to fill your RAM then you will be waiting a very long time for it to complete.
It is more efficient to create a filtered copy of the list, either using a list comprehension as Marcelo showed, or use the filter or itertools.ifilter functions:
g_list = filter(not_subtle_condition, g_list)
If you do not need to use the new list and only want to iterate over it once, then it is better to use ifilter since that will not create a second list:
for g_current in itertools.ifilter(not_subtle_condtion, g_list):
# do stuff with g_current
The built-in filter function is made just to do this:
list_of_g = filter(lambda x: not subtle_condition(x), list_of_g)
How about this?
[x for x in list_of_g if not subtle_condition(x)]
its return the new list with exception from subtle_condition
For simplicity, use a list comprehension:
def walk_list(list_of_g):
return [g for g in list_of_g if not subtle_condition(g)]
Of course, this doesn't alter the original list, so the calling code would have to be different.
If you really want to mutate the list (rarely the best choice), walking backwards is simpler:
def walk_list(list_of_g):
for i in xrange(len(list_of_g), -1, -1):
if subtle_condition(list_of_g[i]):
del list_of_g[i]
Sounds like a really good use case for the filter function.
def should_be_removed(element):
return element > 5
a = range(10)
a = filter(should_be_removed, a)
This, however, will not delete the list while iterating (nor I recommend it). If for memory-space (or other performance reasons) you really need it, you can do the following:
i = 0
while i < len(a):
if should_be_removed(a[i]):
a.remove(a[i])
else:
i+=1
print a
If you perform a reverse iteration, you can remove elements on the fly without affecting the next indices you'll visit:
numbers = range(20)
# remove all numbers that are multiples of 3
l = len(numbers)
for i, n in enumerate(reversed(numbers)):
if n % 3 == 0:
del numbers[l - i - 1]
print numbers
The enumerate(reversed(numbers)) is just a stylistic choice. You may use a range if that's more legible to you:
l = len(numbers)
for i in range(l-1, -1, -1):
n = numbers[i]
if n % 3 == 0:
del numbers[i]
If you need to travel the list in order, you can reverse it in place with .reverse() before and after the reversed iteration. This won't duplicate your list either.

Categories