Python: Weird for loop behavior - python

I'm trying to solve this: CodeEval.
The problem requires me to go through a list of possible candidates of points in a XY-coordinates. Then if they fulfill the requirements I add them to a "confirmed" list and then add the surrounding points to a "tosearch" list. However this does not behave at all the way I expect it to behave.
Example code:
Starting point
tosearch=[[0,0]]
for point in tosearch:
if conditions filled:
confirmed.append(point)
#Basically Im trying to add (x,y-1) etc. to the tosearct list
tosearch.append([point[0],point[1]-1]) #1
tosearch.append([point[0]+1,point[1]]) #2
tosearch.append([point[0]-1,point[1]-1])#3
tosearch.append([point[0],point[1]+1]) #4
tosearch.remove(point)
else:
tosearch.remove(point)
This seems to result in always ignoring half of the appends. So in this case #1 and #3 are being ignored. If I left only 1&2 then only 2 would execute. I dont get it...
Maybe the problem is else where so here is the whole code:
Pastebin

You're modifying the collection while iterating over it.
2 options:
copy the list, iterate the copy, and alter the original.
keep track of what changes need to be made, and make them all at after iterating.

The problem is you are modifying tosearch in the body of the loop that iterates tosearch. Since tosearch is changing, it can't be iterated reliably.
You probably don't need to iterate at all. Just use a while loop:
searched = set() # if you need to keep track of searched items
tosearch = [(0,0)] #use tuples so you can put them in a set
confirmed = []
while tosearch:
point = tosearch.pop()
searched.add(point) # if you need to keep track
if CONDITIONS_MET:
confirmed.append(point)
# tosearch.append() ....

Related

Efficient reverse order comparison of huge growing list in Python

In Python, my goal is to maintain a unique list of points (complex scalars, rounded), while steadily creating new ones with a function, like in this pseudo code
list_of_points = []
while True
# generate new point according to some rule
z = generate()
# check whether this point is already there
if z not in list_of_points:
list_of_points.append(z)
if some_condition:
break
Now list_of_points can become potentially huge (like 10 million entries or even more) during the process and duplicates are quite frequent. In fact about 50% of the time, a newly created point is already somewhere in the list. However, what I know is that oftentimes the already existing point is near the end of the list. Sometimes it is in the "bulk" and only very occasionally it can be found near the beginning.
This brought me to the idea of doing the search in reverse order. But how would I do this most efficiently (in terms of raw speed), given my potentially large list which grows during the process. Is the list container even the best way here?
I managed to gain some performance by doing this
list_of_points = []
while True
# generate new point according to some rule
z = generate()
# check very end of list
if z in list_of_points[-10:]:
continue
# check deeper into the list
if z in list_of_points[-100:-10]:
continue
# check the rest
if z not in list_of_points[:-100]:
list_of_points.append(z)
if some_condition:
break
Apparently, this is not very elegant. Using instead a second, FIFO-type container (collection.deque), gives about the same speed up.
Your best bet might to be to use a set instead of a list, python sets use hashing to insert items, so it is very fast. And, you can skip the step of checking if an item is already in the list by simply trying to add it, if it is already in the set it wont be added since duplicates are not allowed.
Stealing your pseudo code axample
set_of_points = {}
while True
# get size of set
a = len(set_of_points)
# generate new point according to some rule
z = generate()
# try to add z to the set
set_of_points.add(z)
b = len(set_of_points)
# if a == b it was not added, thus already existed in the set
if some_condition:
break
Use a set. This is what sets are for. Ah - you already have answer saying that. So my other comment: this part of your code appears to be incorrect:
# check the rest
if z not in list_of_points[100:]:
list_of_points.append(z)
In context, I believe you meant to write list_of_points[:-100] there instead. You already checked the last 100, but, as is, you're skipping checking the first 100 instead.
But even better, use plain list_of_points. As the list grows longer, the cost to possibly do 100 redundant comparisons becomes trivial compared to the cost of copying len(list_of_points) - 100 elements

Python: Easy way to loop through dictionary parameters from a list of evaluated strings?

I have a dictionary created from a json file. This dictionary has a nested structure and every few weeks additional parameters are added.
I use a script to generate additional copies of the existing parameters when I want multiple "legs" added. So I first add the additional legs. So say I start with 1 leg as my template and I want 10 legs, I will just clone that leg 9 more times and add it to the list.
Then I loop through each of the parameters (called attributes) and have to clone certain elements for each leg that was added so that it has a 1:1 match. I don't care about the content so cloning the first leg value is fine.
So I do the following:
while len(data['attributes']['groupA']['params']['weights']) < legCount:
data['attributes']['groupA']['params']['weights'].append(data['attributes']['groupA']['params']['weights'][0])
while len(data['attributes']['groupB']['paramsGroup']['factors']) < legCount:
data['attributes']['groupB']['paramsGroup']['factors'].append(data['attributes']['groupB']['paramsGroup']['factors'][0])
while len(data['attributes']['groupC']['items']['delta']) < legCount:
data['attributes']['groupC']['items']['delta'].append(data['attributes']['groupC']['items']['delta'][0])
What I'd like to do is make these attributes all strings and just loop through them dynamically so that when I need to add additional ones, I can just paste one string into my list and it works without having another while loop.
So I converted it to this:
attribs = [
"data['attributes']['groupA']['params']['weights']",
"data['attributes']['groupB']['paramsGroup']['factors']",
"data['attributes']['groupC']['items']['delta']",
"data['attributes']['groupD']['xxxx']['yyyy']"
]
for attrib in attribs:
while len(eval(attrib)) < legCount:
eval(attrib).append(eval(attrib)[0])
In this case eval is safe because there is no user input, just a defined list of entries. Tho I wouldn't mind finding an alternative to eval either.
It works up until the last line. I don't think the .append is working on the eval() result. It's not throwing an error.. just not appending to the element.
Any ideas on the best way to handle this?
Not 100% sure this will fix it, but I do notice one thing.
In your above code in your while condition you are accessing:
data['attributes']['groupA']['params']['weights']
then you are appending to
data['attributes']['groupA']['params']['legs']
In your below code it looks like you are appending to 'weights' on the first iteration. However, this doesn't explain the other attributes you are evaluating... just one red flag I noticed.
Actually my code was working. I was just checking the wrong variable. Thanks Me! :)

Duplicate element being added through for loop

NOTE: I do not want to use del
I am trying to understand algorithms better which is why I want to avoid the built-in del statement.
I am populating a list of 10 randomly generated numbers. I then am trying to remove an item from the list by index, using a for loop:
if remove_index < lst_size:
for value in range(remove_index, lst_size-1):
lst[value] = lst[value+1]
lst_size -= 1
Everything works fine, except that the loop is adding the last item twice. Meaning, if the 8th item has the value 4, it will add a 9th item also valued 4. I am not sure why it is doing this. I still am able to move the value at the selected index (while moving everything up), but it adds on the duplicate.
Nothing is being added to your list. It starts out with lst_size elements, and, since you don't delete any, it retains the same number by the time you're done.
If, once you've copied all the items from remove_index onwards to the previous index in the list, you want to remove the last item, then you can do so either using del or lst.pop().
At the risk of sounding flippant, this is a general rule: if you want to do something, you have to do it. Saying "I don't want to use del" won't alter that fact.
Merely decrementing lst_size will have no effect on the list - while you may be using it to store the size of your list, they are not connected, and changing one has no effect on the other.

Even positioned numbers are escaped unfortunatly python3

I have a function:
def fun(l):
for i in l:
if len(i)==10:
l.append('+91 {} {}'.format(i[:5],i[5:]))
l.remove(i)
if len(i)==11:
j=list(''.join(i))
j.remove(i[0])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
if len(i)==12:
j=list(''.join(i))
j.remove(i[0])
j.remove(i[1])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
if len(i)==13:
j=list(''.join(i))
j.remove(i[0])
j.remove(i[1])
j.remove(i[2])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
return l
say l=['9195969878','07895462130','919875641230']
I am getting the output as
['+91 91959 69878','7895462130','+91 98756 41230']
But i have suppose to get the output as:
['+91 91959 69878','+91 78954 62130,'+91 98756 41230']
Actually this function is escaping all that is positioned even no in 'l' list. Kindly suggest
The first problem is that you're mutating the list while iterating over it. In this particular case, this caused the loop to skip some items, as you deleted items that were earlier. In other Python versions it might trigger an error. But you're returning your result, so I don't see why you're mutating the list at all.
Secondly your code does some roundabout things, in particular ''.join(i) which is absolutely redundant (it literally rebuilds the same string), and series of remove() calls which almost certainly don't do what you expect. If you remove the first item from [1,2,3], the list becomes [2,3], and if you follow that by removing the second item (index 1) you end up with [2]. This is the same sort of issue your for loop has with the other remove.
I would also restructure the code a bit to avoid code duplication. I get something like:
def fun(l):
return ['+91 {} {}'.format(i[-10:-5],i[-5:])
for i in l]
This never alters l, makes one single pass, and joins all the different length behaviours by observing that we're using parts at a fixed distance from the end. There is one caveat: other lengths aren't handled separately. I don't know if those occur, or how you actually want them handled (the old code would leave them as is). We can easily enough specify other behaviour:
def fun(l):
return ['+91 {} {}'.format(i[-10:-5],i[-5:]) if 10<=len(i)<=13
else i
for i in l]
This still doesn't reproduce the behaviour that reformatted numbers were appended at the end, but I'm not sure you really wanted that. It made little sense for the loop to process its own output in the first place.
You are modifying the list l as you go - I would suggest to create a new list and add things to this list. Is there a reason you want to mutate in place?
If you are intent on mutating in place, why not just do something like this?
l[index] = '+91 {} {}'.format(i[:5],i[5:])
Also, here is the first google result for "python phone number library": https://github.com/daviddrysdale/python-phonenumbers as it may be of use to you. (Never used it, am not the maintainer.)

Python list.remove items present in second list

I've searched around and most of the errors I see are when people are trying to iterate over a list and modify it at the same time. In my case, I am trying to take one list, and remove items from that list that are present in a second list.
import pymysql
schemaOnly = ["table1", "table2", "table6", "table9"]
db = pymysql.connect(my connection stuff)
tables = db.cursor()
tables.execute("SHOW TABLES")
tablesTuple = tables.fetchall()
tablesList = []
# I do this because there is no way to remove items from a tuple
# which is what I get back from tables.fetchall
for item in tablesTuple:
tablesList.append(item)
for schemaTable in schemaOnly:
tablesList.remove(schemaTable)
When I put various print statements in the code, everything looks like proper and like it is going to work. But when it gets to the actual tablesList.remove(schemaTable) I get the dreaded ValueError: list.remove(x): x not in list.
If there is a better way to do this I am open to ideas. It just seemed logical to me to iterate through the list and remove items.
Thanks in advance!
** Edit **
Everyone in the comments and the first answer is correct. The reason this is failing is because the conversion from a Tuple to a list is creating a very badly formatted list. Hence there is nothing that matches when trying to remove items in the next loop. The solution to this issue was to take the first item from each Tuple and put those into a list like so: tablesList = [x[0] for x in tablesTuple] . Once I did this the second loop worked and the table names were correctly removed.
Thanks for pointing me in the right direction!
I assume that fetchall returns tuples, one for each database row matched.
Now the problem is that the elements in tablesList are tuples, whereas schemaTable contains strings. Python does not consider these to be equal.
Thus when you attempt to call remove on tablesList with a string from schemaTable, Python cannot find any such value.
You need to inspect the values in tablesList and find a way convert them to a strings. I suspect it would be by simply taking the first element out of the tuple, but I do not have a mySQL database at hand so I cannot test that.
Regarding your question, if there is a better way to do this: Yes.
Instead of adding items to the list, and then removing them, you can append only the items that you want. For example:
for item in tablesTuple:
if item not in schemaOnly:
tablesList.append(item)
Also, schemaOnly can be written as a set, to improve search complexity from O(n) to O(1):
schemaOnly = {"table1", "table2", "table6", "table9"}
This will only be meaningful with big lists, but in my experience it's useful semantically.
And finally, you can write the whole thing in one list comprehension:
tablesList = [item for item in tablesTuple if item not in schemaOnly]
And if you don't need to keep repetitions (or if there aren't any in the first place), you can also do this:
tablesSet = set(tablesTuple) - schemaOnly
Which is also has the best big-O complexity of all these variations.

Categories