Removing item in list during loop - python

I have the code below. I'm trying to remove two strings from lists predict strings and test strings if one of them has been found in the other. The issue is that I have to split up each of them and check if there is a "portion" of one string inside the other. If there is then I just say there is a match and then delete both strings from the list so they are no longer iterated over.
ValueError: list.remove(x): x not in list
I get the above error though and I am assuming this is because I can't delete the string from test_strings since it is being iterated over? Is there a way around this?
Thanks
for test_string in test_strings[:]:
for predict_string in predict_strings[:]:
split_string = predict_string.split('/')
for string in split_string:
if (split_string in test_string):
no_matches = no_matches + 1
# Found match so remove both
test_strings.remove(test_string)
predict_strings.remove(predict_string)
Example input:
test_strings = ['hello/there', 'what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings =['hello/there/mister', 'interesting/what/that/is']
so I want there to be a match between hello/there and hello/there/mister and for them to be removed from the list when doing the next comparison.
After one iteration I expect it to be:
test_strings == ['what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings == ['interesting/what/that/is']
After the second iteration I expect it to be:
test_strings == ['yo/do/di/doodle', 'ding/dong/darn']
predict_strings == []

You should never try to modify an iterable while you're iterating over it, which is still effectively what you're trying to do. Make a set to keep track of your matches, then remove those elements at the end.
Also, your line for string in split_string: isn't really doing anything. You're not using the variable string. Either remove that loop, or change your code so that you're using string.
You can use augmented assignment to increase the value of no_matches.
no_matches = 0
found_in_test = set()
found_in_predict = set()
for test_string in test_strings:
test_set = set(test_string.split("/"))
for predict_string in predict_strings:
split_strings = set(predict_string.split("/"))
if not split_strings.isdisjoint(test_set):
no_matches += 1
found_in_test.add(test_string)
found_in_predict.add(predict_string)
for element in found_in_test:
test_strings.remove(element)
for element in found_in_predict:
predict_strings.remove(element)

From your code it seems likely that two split_strings match the same test_string. The first time through the loop removes test_string, the second time tries to do so but can't, since it's already removed!
You can try breaking out of the inner for loop if it finds a match, or use any instead.
for test_string, predict_string in itertools.product(test_strings[:], predict_strings[:]):
if any(s in test_string for s in predict_string.split('/')):
no_matches += 1 # isn't this counter-intuitive?
test_strings.remove(test_string)
predict_strings.remove(predict_string)

Related

Stop a loop when a value is found and then add values to a list

How can I make a loop that eliminates zeroes from a list of strings that looks something like the following?
List
GR0030
GR00000000013
GR093
I'd like to eliminate the zeroes between the GR and the first number different than zero. I've thought I could solve this problem with something like this:
entry = ""
for x in list:
if x.isalpha():
entry = entry + x
else:
if x == 0:
entry = entry
else:
entry = entry + x[(list.index(x)):-1]
break
list1.append(entry) # the answer list
But, it does not work. I'm just getting a list full of GR in each row. What am I doing wrong?
A regular expression will do here. The expression matches the first group of zeroes, and replaces them with an empty string. To prevent us from reading past the first group, we set count=1.
Your approach could work, but you'd have to keep track of whether or not you've seen a zero before. You also should try to avoid repeated concatenation of strings, as it isn't very efficient.
import re
def strip_intermediate_zeroes(s):
return re.sub('0+', '', s, count=1)
items = ['GR0030', 'GR00000000013', 'GR093']
print(list(map(strip_intermediate_zeroes, items)))
The above code snippet assumes that there's at least one zero after "GR". If such an assumption cannot be made, you can explicitly check for that assumption as a quick fix:
def strip_intermediate_zeroes(s):
if s.startswith('GR0'):
return re.sub('0+', '', s, count=1)
return s
This seems like a natural fit for a regex combined with re.sub(). (?<=^GR)0* means 0 or more zeros that follow 'GR' at the beginning of a string.
import re
l = [
'GR0030',
'GR00000000013',
'GR093',
]
rx = re.compile(r'(?<=^GR)0*')
[rx.sub('', s) for s in l]
# ['GR30', 'GR13', 'GR93']
This is very specific in that it won't change strings like 'SP0091', 'ADGR0000400013', or '000ab'.

Python: How to move the position of an output variable using the split() method

This is my first SO post, so go easy! I have a script that counts how many matches occur in a string named postIdent for the substring ff. Based on this it then iterates over postIdent and extracts all of the data following it, like so:
substring = 'ff'
global occurences
occurences = postIdent.count(substring)
x = 0
while x <= occurences:
for i in postIdent.split("ff"):
rawData = i
required_Id = rawData[-8:]
x += 1
To explain further, if we take the string "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff", it is clear there are 3 instances of ff. I need to get the 8 preceding characters at every instance of the substring ff, so for the first instance this would be 909a9090.
With the rawData, I essentially need to offset the variable required_Id by -1 when I get the data out of the split() method, as I am currently getting the last 8 characters of the current string, not the string I have just split. Another way of doing it could be to pass the current required_Id to the next iteration, but I've not been able to do this.
The split method gets everything after the matching string ff.
Using the partition method can get me the data I need, but does not allow me to iterate over the string in the same way.
Get the last 8 digits of each split using a slice operation in a list-comprehension:
s = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
print([x[-8:] for x in s.split('ff') if x])
# ['909a9090', '90434390', 'sdfs9000']
Not a difficult problem, but tricky for a beginner.
If you split the string on 'ff' then you appear to want the eight characters at the end of every substring but the last. The last eight characters of string s can be obtained using s[-8:]. All but the last element of a sequence x can similarly be obtained with the expression x[:-1].
Putting both those together, we get
subject = '090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff'
for x in subject.split('ff')[:-1]:
print(x[-8:])
This should print
909a9090
90434390
sdfs9000
I wouldn't do this with split myself, I'd use str.find. This code isn't fancy but it's pretty easy to understand:
fullstr = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
search = "ff"
found = None # our next offset of
last = 0
l = 8
print(fullstr)
while True:
found = fullstr.find(search, last)
if found == -1:
break
preceeding = fullstr[found-l:found]
print("At position {} found preceeding characters '{}' ".format(found,preceeding))
last = found + len(search)
Overall I like Austin's answer more; it's a lot more elegant.

How to make python check EACH value

I am working on this function and I want to Return a list of the elements of L that end with the specified token in the order they appear in the original list.
def has_last_token(s,word):
""" (list of str, str) -> list of str
Return a list of the elements of L that end with the specified token in the order they appear in the original list.
>>> has_last_token(['one,fat,black,cat', 'one,tiny,red,fish', 'two,thin,blue,fish'], 'fish')
['one,tiny,red,fish', 'two,thin,blue,fish']
"""
for ch in s:
ch = ch.replace(',' , ' ')
if word in ch:
return ch
So I know that when I run the code and test out the example I provided, it checks through
'one,fat,black,cat'
and sees that the word is not in it and then continues to check the next value which is
'one,tiny,red,fish'
Here it recognizes the word fish and outputs it. But the code doesn't check for the last input which is also valid. How can I make it check all values rather then just check until it sees one valid output?
expected output
>>> has_last_token(['one,fat,black,cat', 'one,tiny,red,fish', 'two,thin,blue,fish'], 'fish')
>>> ['one,tiny,red,fish', 'two,thin,blue,fish']
I'll try to answer your question altering your code and your logic the least I can, in case you understand the answer better this way.
If you return ch, you'll immediately terminate the function.
One way to accomplish what you want is to simply declare a list before your loop and then append the items you want to that list accordingly. The return value would be that list, like this:
def has_last_token(s, word):
result = []
for ch in s:
if ch.endswith(word): # this will check only the string's tail
result.append(ch)
return result
PS: That ch.replace() is unnecessary according to the function's docstring
You are returning the first match and this exits the function. You want to either yield from the loop (creating a generator) or build a list and return that. I would just use endswith in a list comprehension. I'd also rename things to make it clear what's what.
def has_last_token(words_list, token):
return [words for words in words_list if words.endswith(token)]
Another way is to use rsplit to split the last token from the rest of the string. If you pass the second argument as 1 (could use named argument maxsplit in py3 but py2 doesn't like it) it stops after one split, which is all we need here.
You can then use filter rather than an explicit loop to check each string has word as its final token and return a list of only those strings which do have word as their final token.
def has_last_token(L, word):
return filter(lambda s: s.rsplit(',', 1)[-1] == word, L)
result = has_last_token(['one,fat,black,cat',
'one,tiny,red,fish',
'two,thin,blue,fish',
'two,thin,bluefish',
'nocommas'], 'fish')
for res in result:
print(res)
Output:
one,tiny,red,fish
two,thin,blue,fish

How to replace a list of words with a string and keep the formatting in python?

I have a list containing the lines of a file.
list1[0]="this is the first line"
list2[1]="this is the second line"
I also have a string.
example="TTTTTTTaaaaaaaaaabcccddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeefffff"
I want to replace list[0] with the string (example). However I want to keep the word length. For example the new list1[0] should be "TTTT TT TTa aaaaa aaaa". The only solution I could come up with was to turn the string example into a list and use a for loop to read letter by letter from the string list into the original list.
for line in open(input, 'r'):
list1[i] = listString[i]
i=i+1
However this does not work from what I understand because Python strings are immutable? What's a good way for a beginner to approach this problem?
I'd probably do something like:
orig = "this is the first line"
repl = "TTTTTTTaaaaaaaaaabcccddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeefffff"
def replace(orig, repl):
r = iter(repl)
result = ''.join([' ' if ch.isspace() else next(r) for ch in orig])
return result
If repl could be shorter than orig, consider r = itertools.cycle(repl)
This works by creating an iterator out of the replacement string, then iterating over the original string, keeping the spaces, but using the next character from the replacement string instead of any non-space characters.
The other approach you could take would be to note the indexes of the spaces in one pass through orig, then insert them at those indexes in a pass of repl and return a slice of the result
def replace(orig, repl):
spaces = [idx for idx,ch in enumerate(orig) if ch.isspace()]
repl = list(repl)
for idx in spaces:
repl.insert(idx, " ")
# add a space before that index
return ''.join(repl[:len(orig)])
However I couldn't imagine the second approach to be any faster, is certain to be less memory-efficient, and I don't find it easier to read (in fact I find it HARDER to read!) It also don't have a simple workaround if repl is shorter than orig (I guess you could do repl *= 2 but that's uglier than sin and still doesn't guarantee it'll work)

How to check if an element of a list contains some substring

The below code does not work as intended and looks like optimising to search in the complete list instead of each element separately and always returning true.
Intended code is to search the substring in each element of the list only in each iteration and return true or false. But it's actually looking into complete list.
In the below code the print statement is printing complete list inside <<>> if I use find() or in operator but prints only one word if I use == operator.
The issue code:
def myfunc(mylist):
for i in range(len(mylist)):
count = 0
for word in mylist:
print('<<{}>>'.format(word))
if str(word).casefold().find('abc') or 'def' in str(word).casefold():
count += 1
abcdefwordlist.append(str(word))
break
This code search for 'abc' or 'def' in mylist insted of the word.
If I use str(word).casefold() == 'abc' or str(word).casefold() == 'def' then it compares with word only.
How can I check word contains either of 'abc' or 'def' in such a loop.
You have several problems here.
abcdefwordlist is not defined (at least not in the code you showed us).
You're looping over the length of the list and then over the list of word itself, which means that too many elements will be added to your resulting array.
This function doesn't return anything, unless you meant for it to just update abcdefwordlist from outside of it.
You had the right idea with 'def' in str(word) but you have to use it in for both substrings. To sum up, a function that does what you want would look like this:
def myfunc(mylist):
abcdefwordlist = [] # unless it already exists elsewhere
for word in mylist:
if 'abc' in str(word).lower() or 'def' in str(word).lower():
abcdefwordlist.append(word)
return abcdefwordlist
This can also be sortened to a one-liner using list comprehension:
def myfunc(mylist):
return [word for word in mylist if 'abc' in str(word).lower() or 'def' in str(word).lower()]
BTW I used lower() instead of casefold() because the substrings I'm searching for are definetly lowercase

Categories