Removing specific index from list while iterating / improving nested loops - python

I am in a situation where I have 3 nested loops. Every x iterations, I want to restart the 2nd for loop.
If an element in the 3rd for loop meets a certain condition, I want to remove that element from the list.
I'm not sure how to implement this and using a list comprehension or creating a new list wouldn't really work based on the similar questions I read.
Example pseudocode:
items_of_interest = ["apple", "pear"]
while True: # restart 10,000 iterations (API key only last 10,000 requests)
api_key = generate_new_api_key()
for i in range(10000):
html = requests.get(f"http://example.com/{api_key}/items").text
for item in items_of_interest:
if item in html:
items_of_interest.remove(item)
The original code is a lot bigger with a lot of checks, constantly parsing an API for something, and it's a bit messy to organize as you can tell. I'm not sure how to reduce the complexity.

Without knowing the full picture, it's hard to say which approach is optimal. In any case, here's one approach using comprehension.
items_of_interest = ["apple", "pear"]
while True: # restart 10,000 iterations (API key only last 10,000 requests)
api_key = generate_new_api_key()
for i in range(10000):
html = requests.get(f"http://example.com/{api_key}/items").text
# Split your text blob into separate strings in a set
haystack = set(html.split(' '))
# Exclude the found items!
items_of_interest = list(set(items_of_interest).difference(haystack))

It works much like you suggest. The relevant keyword is del. eg
>>> x = range(5)
>>> for i in ['a','b','c']:
... print ('i:' + str(i) )
... for j in x:
... print('j:' + str(j))
... if j == 3:
... del x[j]
...
i:a
j:0
j:1
j:2
j:3
i:b
j:0
j:1
j:2
j:4
i:c
j:0
j:1
j:2
j:4
3 has been removed from the list x for the later passes.
See also Python doco https://docs.python.org/3.7/tutorial/datastructures.html and SO answers like Difference between del, remove and pop on lists

Related

How to assign a new value to the list in running loop

My data cutting loop seems to run ok in the loop, but when it prints the result outside the loop, the contents are unchanged. Presuming it's buggy because I'm trying to assign to what the for loop is running through, but I don't know.
For reference, it's a small web review scraper project I'm working on. To get it formatted to CSV with pandas I think all the data needs to end at the same point (length), so I'm cutting any lists that are longer than the shortest. The values "cust_stars_result, rev_result, cust_res" are all lists with basics strings stored inside, in this case equal to lengths 16, 12, and 15. I try to slice everything down to 12 in the end but the results are overwritten. What is the right/best way to go about this?
star_len = len(cust_stars_result)
rev_len = len(rev_result)
custname_len = len(cust_res)
print('customer name length: ' + str(custname_len) + ' -- review length: ' + str(rev_len) + ' -- star length: ' + str(star_len))
datalen = [star_len, rev_len, custname_len]
print(min(datalen))
datapack = [cust_stars_result, rev_result, cust_res]
# LOOPER FOR CULLING
for data in datapack:
if len(data) != min(datalen):
print("operating culler to make data even length")
print(len(data))
data = data[: min(datalen)]
print(len(data)) #this comes out OK
else:
print("equal length, skipping culler")
pass
print(datapack) # prints the original values
Inside your loop you update the data variable but that's just reassigning the value of that variable. You want to do something like
for i, data in enumerate(datapack):
...
datapack[i] = data[: min(datalen)]
This will update the datapack element
While "trying to assign to what the for loop is running through" is a real issue, in this case the problem is rather that your code is not assigning anything to datapack when you change data. Instead, what it does is assign each item in datapack to data, so when you change data, datapack remain unchanged.
Instead, try either adding each item to new list, and then assigning datapack to equal the new list:
temp = []
for data in datapack:
...
temp.append(data[:min(datalen)])
datapack = temp
Or try using a range or enumerate loop:
for i, data in enumerate(datapack):
...
datapack[i] = data[:min(datalen)]
There are more fancy ways (but less readable and debuggable) to accomplish what you're doing here (slicing off the end of the list), such as the below which uses list comprehension and map:
mindatalen = min(map(len, datapack))
datapack = [data[:mindatalen]for data in datapack]

Extract words from random strings

Below I have some strings in a list:
some_list = ['a','l','p','p','l','l','i','i','r',i','r','a','a']
Now I want to take the word april from this list. There are only two april in this list. So I want to take that two april from this list and append them to another extract list.
So the extract list should look something like this:
extract = ['aprilapril']
or
extract = ['a','p','r','i','l','a','p','r','i','l']
I tried many times trying to get the everything in extract in order, but I still can't seems to get it.
But I know I can just do this
a_count = some_list.count('a')
p_count = some_list.count('p')
r_count = some_list.count('r')
i_count = some_list.count('i')
l_count = some_list.count('l')
total_count = [a_count,p_count,r_count,i_count,l_count]
smallest_count = min(total_count)
extract = ['april' * smallest_count]
Which I wouldn't be here If I just use the code above.
Because I made some rules for solving this problem
Each of the characters (a,p,r,i and l) are some magical code elements, these code elements can't be created out of thin air; they are some unique code elements, that has some uniquw identifier, like a secrete number that is associated with them. So you don't know how to create this magical code elements, the only way to get the code elements is to extract them to a list.
Each of the characters (a,p,r,i and l) must be in order. Imagine they are some kind of chains, they will only work if they are together. Meaning that we got to put p next to and in front of a, and l must come last.
These important code elements are some kind of top secrete stuff, so if you want to get it, the only way is to extract them to a list.
Below are some examples of a incorrect way to do this: (breaking the rules)
import re
word = 'april'
some_list = ['aaaaaaappppppprrrrrriiiiiilll']
regex = "".join(f"({c}+)" for c in word)
match = re.match(regex, text)
if match:
lowest_amount = min(len(g) for g in match.groups())
print(word * lowest_amount)
else:
print("no match")
from collections import Counter
def count_recurrence(kernel, string):
# we need to count both strings
kernel_counter = Counter(kernel)
string_counter = Counter(string)
effective_counter = {
k: int(string_counter.get(k, 0)/v)
for k, v in kernel_counter.items()
}
min_recurring_count = min(effective_counter.values())
return kernel * min_recurring_count
This might sounds really stupid, but this is actually a hard problem (well for me). I originally designed this problem for myself to practice python, but it turns out to be way harder than I thought. I just want to see how other people solve this problem.
If anyone out there know how to solve this ridiculous problem, please help me out, I am just a fourteen-year-old trying to do python. Thank you very much.
I'm not sure what do you mean by "cannot copy nor delete the magical codes" - if you want to put them in your output list you will need to "copy" them somehow.
And btw your example code (a_count = some_list.count('a') etc) won't work since count will always return zero.
That said, a possible solution is
worklist = [c for c in some_list[0]]
extract = []
fail = False
while not fail:
lastpos = -1
tempextract = []
for magic in magics:
if magic in worklist:
pos = worklist.index(magic, lastpos+1)
tempextract.append(worklist.pop(pos))
lastpos = pos-1
else:
fail = True
break
else:
extract.append(tempextract)
Alternatively, if you don't want to pop the elements when you find them, you may compute the positions of all the occurences of the first element (the "a"), and set lastpos to each of those positions at the beginning of each iteration
May not be the most efficient way, although code works and is more explicit to understand the program logic:
some_list = ['aaaaaaappppppprrrrrriiiiiilll']
word = 'april'
extract = []
remove = []
string = some_list[0]
for x in range(len(some_list[0])//len(word)): #maximum number of times `word` can appear in `some_list[0]`
pointer = i = 0
while i<len(word):
j=0
while j<(len(string)-pointer):
if string[pointer:][j] == word[i]:
extract.append(word[i])
remove.append(pointer+j)
i+=1
pointer = j+1
break
j+=1
if i==len(word):
for r_i,r in enumerate(remove):
string = string[:r-r_i] + string[r-r_i+1:]
remove = []
elif j==(len(string)-pointer):
break
print(extract,string)

Extracting multiple data from a single list

I working on a text file that contains multiple information. I converted it into a list in python and right now I'm trying to separate the different data into different lists. The data is presented as following:
CODE/ DESCRIPTION/ Unity/ Value1/ Value2/ Value3/ Value4 and then repeat, an example would be:
P03133 Auxiliar helper un 203.02 417.54 437.22 675.80
My approach to it until now has been:
Creating lists to storage each information:
codes = []
description = []
unity = []
cost = []
Through loops finding a code, based on the code's structure, and using the code's index as base to find the remaining values.
Finding a code's easy, it's a distinct type of information amongst the other data.
For the remaining values I made a loop to find the next value that is numeric after a code. That way I can delimitate the rest of the indexes:
The unity would be the code's index + index until isnumeric - 1, hence it's the first information prior to the first numeric value in each line.
The cost would be the code's index + index until isnumeric + 2, the third value is the only one I need to store.
The description is a little harder, the number of elements that compose it varies across the list. So I used slicing starting at code's index + 1 and ending at index until isnumeric - 2.
for i, carc in enumerate(txtl):
if carc[0] == "P" and carc[1].isnumeric():
codes.append(carc)
j = 0
while not txtl[i+j].isnumeric():
j = j + 1
description.append(" ".join(txtl[i+1:i+j-2]))
unity.append(txtl[i+j-1])
cost.append(txtl[i+j])
I'm facing some problems with this approach, although there will always be more elements to the list after a code I'm getting the error:
while not txtl[i+j].isnumeric():
txtl[i+j] list index out of range.
Accepting any solution to debug my code or even new solutions to problem.
OBS: I'm also going to have to do this to a really similar data font, but the code would be just a sequence of 7 numbers, thus harder to find amongst the other data. Any solution that includes this facet is also appreciated!
A slight addition to your code should resolve this:
while i+j < len(txtl) and not txtl[i+j].isnumeric():
j += 1
The first condition fails when out of bounds, so the second one doesn't get checked.
Also, please use a list of dict items instead of 4 different lists, fe:
thelist = []
thelist.append({'codes': 69, 'description': 'random text', 'unity': 'whatever', 'cost': 'your life'})
In this way you always have the correct values together in the list, and you don't need to keep track of where you are with indexes or other black magic...
EDIT after comment interactions:
Ok, so in this case you split the line you are processing on the space character, and then process the words in the line.
from pprint import pprint # just for pretty printing
textl = 'P03133 Auxiliar helper un 203.02 417.54 437.22 675.80'
the_list = []
def handle_line(textl: str):
description = ''
unity = None
values = []
for word in textl.split()[1:]:
# it splits on space characters by default
# you can ignore the first item in the list, as this will always be the code
# str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296
if not word.replace(',', '').replace('.', '').isnumeric():
if len(description) == 0:
description = word
else:
description = f'{description} {word}' # I like f-strings
elif not unity:
# if unity is still None, that means it has not been set yet
unity = word
else:
values.append(word)
return {'code': textl.split()[0], 'description': description, 'unity': unity, 'values': values}
the_list.append(handle_line(textl))
pprint(the_list)
str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296

Transposition Cipher in Python

Im currently trying to code a transposition cipher in python. however i have reached a point where im stuck.
my code:
key = "german"
length = len(key)
plaintext = "if your happy and you know it clap your hands, clap your hands"
Formatted = "".join(plaintext.split()).replace(",","")
split = split_text(formatted,length)
def split_text(formatted,length):
return [formatted[i:i + length] for i in range(0, len(formatted), length)]
def encrypt():
i use that to count the length of the string, i then use the length to determine how many columns to create within the program. So it would create this:
GERMAN
IFYOUR
HAPPYA
NDYOUK
NOWITC
LAPYOU
RHANDS
CLAPYO
URHAND
S
this is know where im stuck. as i want to get the program to create a string by combining the columns together. so it would combine each column to create:
IHNNLRCUSFADOAHLRYPYWPAAH .....
i know i would need a loop of some sort but unsure how i would tell the program to create such a string.
thanks
you can use slices of the string to get each letter of the string in steps of 6 (length)
print(formatted[0::length])
#output:
ihnnlrcus
Then just loop through all the possible start indices in range(length) and link them all together:
def encrypt(formatted,length):
return "".join([formatted[i::length] for i in range(length)])
note that this doesn't actually use split_text, it would take formatted directly:
print(encrypt(formatted,length))
the problem with using the split_text you then cannot make use of tools like zip since they stop when the first iterator stops (so because the last group only has one character in it you only get the one group from zip(*split))
for i in zip("stuff that is important","a"):
print(i)
#output:
("s","a")
#nothing else, since one of the iterators finished.
In order to use something like that you would have to redefine the way zip works by allowing some of the iterators to finish and continue until all of them are done:
def myzip(*iterators):
iterators = tuple(iter(it) for it in iterators)
while True: #broken when none of iterators still have items in them
group = []
for it in iterators:
try:
group.append(next(it))
except StopIteration:
pass
if group:
yield group
else:
return #none of the iterators still had items in them
then you can use this to process the split up data like this:
encrypted_data = ''.join(''.join(x) for x in myzip(*split))

Python, I need the following code to finish quicker

I need the following code to finish quicker without threads or multiprocessing. If anyone knows of any tricks that would be greatly appreciated. maybe for i in enumerate() or changing the list to a string before calculating, I'm not sure.
For the example below, I have attempted to recreate the variables using a random sequence, however this has rendered some of the conditions inside the loop useless ... which is ok for this example, it just means the 'true' application for the code will take slightly longer.
Currently on my i7, the example below (which will mostly bypass some of its conditions) completes in 1 second, I would like to get this down as much as possible.
import random
import time
import collections
import cProfile
def random_string(length=7):
"""Return a random string of given length"""
return "".join([chr(random.randint(65, 90)) for i in range(length)])
LIST_LEN = 18400
original = [[random_string() for i in range(LIST_LEN)] for j in range(6)]
LIST_LEN = 5
SufxList = [random_string() for i in range(LIST_LEN)]
LIST_LEN = 28
TerminateHook = [random_string() for i in range(LIST_LEN)]
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exclude above from benchmark
ListVar = original[:]
for b in range(len(ListVar)):
for c in range(len(ListVar[b])):
#If its an int ... remove
try:
int(ListVar[b][c].replace(' ', ''))
ListVar[b][c] = ''
except: pass
#if any second sufxList delete
for d in range(len(SufxList)):
if ListVar[b][c].find(SufxList[d]) != -1: ListVar[b][c] = ''
for d in range(len(TerminateHook)):
if ListVar[b][c].find(TerminateHook[d]) != -1: ListVar[b][c] = ''
#remove all '' from list
while '' in ListVar[b]: ListVar[b].remove('')
print(ListVar[b])
ListVar = original[:]
That makes a shallow copy of ListVar, so your changes to the second level lists are going to affect the original also. Are you sure that is what you want? Much better would be to build the new modified list from scratch.
for b in range(len(ListVar)):
for c in range(len(ListVar[b])):
Yuck: whenever possible iterate directly over lists.
#If its an int ... remove
try:
int(ListVar[b][c].replace(' ', ''))
ListVar[b][c] = ''
except: pass
You want to ignore spaces in the middle of numbers? That doesn't sound right. If the numbers can be negative you may want to use the try..except but if they are only positive just use .isdigit().
#if any second sufxList delete
for d in range(len(SufxList)):
if ListVar[b][c].find(SufxList[d]) != -1: ListVar[b][c] = ''
Is that just bad naming? SufxList implies you are looking for suffixes, if so just use .endswith() (and note that you can pass a tuple in to avoid the loop). If you really do want to find the the suffix is anywhere in the string use the in operator.
for d in range(len(TerminateHook)):
if ListVar[b][c].find(TerminateHook[d]) != -1: ListVar[b][c] = ''
Again use the in operator. Also any() is useful here.
#remove all '' from list
while '' in ListVar[b]: ListVar[b].remove('')
and that while is O(n^2) i.e. it will be slow. You could use a list comprehension instead to strip out the blanks, but better just to build clean lists to begin with.
print(ListVar[b])
I think maybe your indentation was wrong on that print.
Putting these suggestions together gives something like:
suffixes = tuple(SufxList)
newListVar = []
for row in original:
newRow = []
newListVar.append(newRow)
for value in row:
if (not value.isdigit() and
not value.endswith(suffixes) and
not any(th in value for th in TerminateHook)):
newRow.append(value)
print(newRow)

Categories