Want to remove elements based on first character - Python - python

This is a program that lists all the substrings except the one that starts with vowel letters.
However, I don't understand why startswith() function doesn't work as I expected. It is not removing the substrings that start with the letter 'A'.
Here is my code:
ban = 'BANANA'
cur_pos=0
sub = []
#Finding the substrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
#removing the substrings that starts with vowels
for i in sub:
if (i.startswith(('A','E','I','O','U'))):
sub.remove(i)
print(sub)

Why this doesn't work...
To answer your question, the mantra for this issue is delete array elements in reverse order, which I occasionally forget and wonder whatever has gone wrong.
Explanation
The problem isn't with startswith() but using remove() inside this specific type of for loop, which uses an iterator rather than a range.
for i in sub:
This fails in this code for the following reason.
ban = 'BANANA'
cur_pos=0
sub = []
#Finding the substrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
print(sub)
#removing the subtrings that start with vowels
for i in sub:
if (i.startswith(('A','E','I','O','U'))):
sub.remove(i)
print(sub)
print(sub)
I've added some print statements to assist debugging.
Initially the array is:
['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'A', 'AN', 'ANA', 'ANAN', 'ANANA', '', '', 'N', 'NA', 'NAN', 'NANA', '', '', '', 'A', 'AN', 'ANA', '', '', '', '', 'N', 'NA', '', '', '', '', '', 'A']
...then we eventually get to remove the first 'A', which seems to be removed fine...
['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'AN', 'ANA', 'ANAN', ...etc...
...but there is some nastiness happening behind the scenes that shows up when we reach the next vowel...
['B', 'BA', 'BAN', 'BANA', 'BANAN', 'BANANA', '', 'AN', 'ANAN',
Notice that 'ANA' was removed, not the expected 'AN'!
Why?
Because the remove() modified the array and shifted all the elements along by one position, but the for loop index behind the scenes does not know about this. The index is still pointing to the next element which it expects is 'AN' but because we moved all the elements by one position it is actually pointing to the 'ANA' element.
Fixing the problem
One way is to append vowel matches to a new empty array:
ban = 'BANANA'
cur_pos=0
sub = []
add = []
#Finding the subtrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
#adding the subtrings that don't start with vowels
for i in sub:
if (not i.startswith(('A','E','I','O','U'))):
add.append(i)
print(add)
Another way
There is, however a simple way to modifying the original array, as you wanted, and that's to iterate through the array in reverse order using an index-based for loop.
The important part here is that you are not modifying any of the array elements that you are processing, only the parts that you are finished with, so that when you remove an element from the array, the array index won't point to the wrong element. This is common and acceptable practice, so long as you understand and make clear what you're doing.
ban = 'BANANA'
cur_pos=0
sub = []
#Finding the subtrings
for i in range(len(ban)):
limit=1
for j in range(len(ban)):
a = ban[cur_pos:limit]
sub.append(a)
limit+=1
cur_pos+=1
#removing the badtrings that start with vowels, in reverse index order
start = len(sub)-1 # last element index, less one (zero-based array indexing)
stopAt = -1 # first element index, less one (zero-based array indexing)
step = -1 # step backwards
for index in range(start,stopAt,step): # count backwards from last element to the first
i = sub[index]
if (i.startswith(('A','E','I','O','U'))):
print('#'+str(index)+' = '+i)
del sub[index]
print(sub)
For more details, see the official page on for
https://docs.python.org/3/reference/compound_stmts.html#index-6
Aside: This is my favourite array problem.
Edit: I just got bitten by this in Javascript, while removing DOM nodes.

It is not a good practice to iterate a list then removing item during the loop. I suggest you change it to this:
sub2=list()
#removing the substrings that starts with vowels
for i in sub:
if not (i.startswith(('A','E','I','O','U'))):
sub2.append(i)
print(sub2)
So if the substring do not starts with vowel, then add it to another list sub2.

As mentioned in the comments in python you shouldn't remove items from a list while iterating its elements since you mutate the original list before the loop ends. If you want to do that you'll either have to use a another list and then assign it to your old one or do it directly using a list comprehension like so:
sub = [i for i in sub if not i.startswith(('A','E','I','O','U'))]

Related

Remove extra space when punctuation appears in the string

I have a list of tokenised sentences, for example :
text = ['Selegiline',
'-',
'induced',
'postural',
'hypotension',
'in',
'Parkinson',
"'",
's',
'disease',
':',
'a',
'longitudinal',
'study',
'on',
'the',
'effects',
'of',
'drug',
'withdrawal',
'.']
I want to convert this list into a string, but when punctuation such as - or : appear, I want to remove the extra space, so the final output would look something like this:
Selegiline-induced postural hypotension in Parkinson's disease: a longitudinal study on the effects of drug withdrawal
I tried splitting the list into equal chunks and checking if pair of two objects are words then using a single space; otherwise, no space:
def chunks(xs, n):
n = max(1, n)
return (xs[i:i+n] for i in range(0, len(xs), n))
data_first = list(chunks(text, 2))
def check(data):
second_order = []
for words in data:
if all(c.isalpha() for c in words[0]) and all(c.isalpha() for c in words[1]):
second_order.append(" ".join(words))
else:
second_order.append("".join(words))
return second_order
check(data_first)
But I have to iterate it until the last word (recursive solution). Is there a better way to do this?
One option might be creating a dictionary of punctuation and the replacement string since each punctuation seems to follow different rules (a colon should retain the space after itself, where a dash should not).
Something like:
punctdict={' - ':'-',' : ':': '," ' ":"'"}
sentence=' '.join(text)
for k,v in punctdict.items():
sentence = sentence.replace(k, v)
text = ['Selegiline',
'-',
'induced',
'postural',
'hypotension',
'in',
'Parkinson',
"'",
's',
'disease',
':',
'a',
'longitudinal',
'study',
'on',
'the',
'effects',
'of',
'drug',
'withdrawal',
'.']
def txt_join(txt):
ans=""
for s in txt:
if(s==".") or (s==":"):
ans=ans.strip()+s+" "
elif s=="'" or (s=="-"):
ans=ans.strip()+s
else:
ans=ans+s+" "
return ans
print(txt_join(text))
As I understood this will give you the expected result. In this algo. It normaly loop through text list and according to the punctuation it will add spaces.(According to the punctuation have to add if/elif/else conditions.)
What you're looking for is list comprehension. you can read more about it here
you could do a list comprehension and then use the replace module to replace space with no space kind of like you've done with append in your solution. You may find this solution useful. It uses .strip instead of replace. I would always avoid using for loops on lists as list comprehension is much less complex and faster. Also this is my first answer so sorry if it's a bit confusing.

remove elements from list of strings while traversing [duplicate]

This question already has answers here:
Modifying list while iterating [duplicate]
(7 answers)
How to remove items from a list while iterating?
(25 answers)
Closed 3 years ago.
how to remove elements from a list of strings while traversing through it.
I have a list
list1 = ['', '$', '32,324', '$', '32', '$', '(35', ')', '$', '32,321']
i want to remove $ fro the list and if a ) or )% or % comes add that to the previous elemt of the list.
expected output is :
['', '32,324', '32', '(35)', '32,321']
what i have tried is
for j,element in enumerate(list1):
if element == '%' or element == ")%" or element ==')':
list1[j-1] = list1[j-1] + element
list1.pop(j)
elif element == '$':
list1.pop(j)
but the output i am getting is
['', '32,324', '32', '(35)', '$', '32,321']
whis is not the expected output. Please help
This question is different from the suggested reference is, here I have to do a concatenation with the previous element if the current element is ),)% or %.
What Green Cloak Guy said is mostly correct. Editing the size of the list (by calling .pop()) is causing you to have an unexpected j value. To me, the easiest way to fix this problem while keeping your existing code is to simply not mutate your list, and build up a new one instead:
new_list = []
for j,element in enumerate(list1):
if element == '%' or element == ")%" or element ==')':
ret[len(ret) - 1] += element # add at the end of the previous element
elif element != '$':
new_list.push(element)
However, I would encourage you to think about your edge cases here. What happens when a ')' is followed by another ')' in your list? This may be a special case in your if statement. Hope this helped!
Instead of attempting to remove and merge elements dynamically while iterating on the list, it will be much easier to make a new list based on the conditions here.
list1 = ['', '$', '32,324', '$', '32', '$', '(35', ')', '$', '32,321']
out = []
for element in list1:
if element == "$":
continue #skip if $ present
elif element in ("%", ")", ")%"):
out[-1] = out[-1] + element #merge with last element of out so far.
else:
out.append(element)
print(out)
#Output:
['', '32,324', '32', '(35)', '32,321']
I think this list comprehension works (haven't seen an example of how % is handled):
[ (a+b if b in (')',')%','%') else a) for a,b in zip(list1,list1[1:]+['']) if a not in ('$',')',')%','%')]
The idea is to:
make a list of pairings of elements and their successors
filter out elements that should be removed
add the successor as appropriate to those that we keep

Comparing strings in a list and appending those that have the same first and last character to a new list

I'm in an Intro to Python class and was given this assignment:
Given a list of strings, return a new list containing all the strings from the original list that begin and end with the same character. Matching is not case-sensitive, meaning 'a' should match with 'A'. Do not alter the original list in any way.
I was running into problems with slicing and comparing the strings because the possible lists given include '' (empty string). I'm pretty stumped and any help would be appreciated.
def first_last(strings):
match=[]
x=''
count=0
while count<len(strings):
if x[0] == x[-1]:
match.append(x)
x+=x
count+=1
So, when given:
['aba', 'dcn', 'z', 'zz', '']
or
['121', 'NbA', '898', '']
I get this:
string index out of range
When I should be seeing:
['aba', 'z', 'zz']
and
['121', '898']
Your list contains an empty string (''). Thus, you will have to check for the length of each element that you are currently iterating over. Also, it does not seem that you use x:
def first_last(strings):
match=[]
count=0
while count<len(strings):
if strings[count]:
if strings[count][0].lower() == strings[count][-1].lower():
match.append(strings[count])
count += 1
return match
Note, however, that you can also use list comprehension:
s = ['aba', 'dcn', 'z', 'zz', '']
final_strings = [i for i in s if i and i[0].lower() == i[-1].lower()]
def first_last(strings):
match=[]
for x in strings:
if x is '' continue;
if x.lower()[0] == x.lower()[-1]:
match.append(x)
return match
Test if the list element is not None first:
def first_last(strings):
match = []
for element in strings:
if element and element[0].lower() == element[-1].lower():
match.append(element)
return match
or with list comp:
match = [element for element in strings if element and element[0].lower() == element[-1].lower()]

Sort text based on last 3rd character

I am using the sorted() function to sort the text based on last character
which works perfectly
def sort_by_last_letter(strings):
def last_letter(s):
return s[-1]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))
Output
['a', 'from', 'hello', 'letter', 'last']
My requirement is to sort based on last 3rd character .But problem is few of the words are less than 3 character in that case it should be sorted based on next lower placed character (2 if present else last).Searching to do it in pythonic way
Presently I am getting
IndexError: string index out of range
def sort_by_last_letter(strings):
def last_letter(s):
return s[-3]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))
You can use:
return sorted(strings,key=lambda x: x[max(0,len(x)-3)])
So thus we first calculate the length of the string len(x) and subtract 3 from it. In case the string is not that long, we will thus obtain a negative index, but by using max(0,..) we prevent that and thus take the last but one, or the last character in case these do not exist.
This will work given every string has at least one character. This will produce:
>>> sorted(["hello","from","last","letter","a"],key=lambda x: x[max(0,len(x)-3)])
['last', 'a', 'hello', 'from', 'letter']
In case you do not care about tie-breakers (in other words if 'a' and 'abc' can be reordered), you can use a more elegant approach:
from operator import itemgetter
return sorted(strings,key=itemgetter(slice(-3,None)))
What we here do is generating a slice with the last three characters, and then compare these substrings. This then generates:
>>> sorted(strings,key=itemgetter(slice(-3,None)))
['a', 'last', 'hello', 'from', 'letter']
Since we compare with:
['a', 'last', 'hello', 'from', 'letter']
# ['a', 'ast', 'llo', 'rom', 'ter'] (comparison key)
You can simply use the minimum of the string length and 3:
def sort_by_last_letter(strings):
def last_letter(s):
return s[-min(len(s), 3)]
return sorted(strings,key=last_letter)
print(sort_by_last_letter(["hello","from","last","letter","a"]))

How to force process identical elements individually in a list of lists?

I am writing a program that tags parts of speech, producing a list of lists. Here is an example function from the program:
phrase = [['he',''],['is', ''],['believed', ''],['to',''],['have',''],['believed','']]
def parts_tagger(input_list):
parts = []
for [x,y] in input_list:
prior_word = input_list[input_list.index([x,y]) - 1][0]
if x.startswith('be') and y == '' and prior_word == 'is':
parts.append([x,'passive'])
else:
parts.append([x,y])
return parts
print (parts_tagger(phrase))
When you run this piece of code, Python finds the first word to which the condition applies (the first "believed") and tags it correctly:
[['he', ''], ['is', ''], ['believed', 'passive'], ['to', ''], ['have', ''], ['believed', 'passive']]
But then it somehow applies the same tag to other identical words (the second "believed") in the list to which the condition does not apply. What am I doing wrong? How can fix this and force Python to treat each item in the list indivdually?
The problem is with this line
prior_word = input_list[input_list.index([x,y]) - 1][0]
list.index returns the index of the first match.
Return the index in the list of the first item whose value is x. It is an error if there is no such item.
You can use enumerate to solve your problem. Change your loop and the next line to these.
for ind,[x,y] in enumerate(input_list):
prior_word = input_list[ind - 1][0]
The output will be as expected
[['he', ''], ['is', ''], ['believed', 'passive'], ['to', ''], ['have', ''], ['believed', '']]
As Shawn pointed out below (in a now deleted comment), I think that you would need to start with the second index with yourself manually filling the value for the first element. This is because for the first element, you will not have any previous value. There are two work-around(s) for this
Start with the second element
for ind,[x,y] in enumerate(input_list[1:],start=1):
Add an condition in your body.
for ind,[x,y] in enumerate(input_list):
prior_index = ind - 1
if prior_index<0:
# Do something
break
prior_word = input_list[ind - 1][0]

Categories