Remove items in a sequence from a string Python - python

Okay so I'm trying to make a function that will take a string and a sequence of items (in the form of either a list, a tuple or a string) and remove all items from that list from the string.
So far my attempt looks like this:
def eliminate(s, bad_characters):
for item in bad_characters:
s = s.strip(item)
return s
However, for some reason when I try this or variations of this, it only returns either the original string or a version with only the first item in bad_characters removed.
>>> eliminate("foobar",["o","b"])
'foobar'
Is there a way to remove all items in bad_characters from the given string?

The reason your solution doesn't work is because str.strip() only removes characters from the outsides of the string, i.e. characters on the leftmost or rightmost end of the string. So, in the case of 'foobar', str.strip() with a single character argument would only work if you wanted to remove the characters 'f' and 'r'.
You could eliminate more of the inner characters with strip, but you would need to include one of the outer characters as well.
>>> 'foobar'.strip('of')
'bar'
>>> 'foobar'.strip('o')
'foobar'
Here's how to do it by string-joining a generator expression:
def eliminate(s, bad_characters):
bc = set(bad_characters)
return ''.join(c for c in s if c not in bc)

Try to replace the bad characters as empty strings.
def eliminate(s, bad_characters):
for item in bad_characters:
s = s.replace(item, '')
return s
strip() doesn't work as it tries to remove beginning and tail part of the original string only.

strip is not a correct choice for this task as it remove the characters from leading and trailing of the string, instead you can use str.translate method :
>>> s,l="foobar",["o","b"]
>>> s.translate(None,''.join(l))
'far'

Try this, may be time consuming using recursion
def eliminate(s, seq):
while seq:
return eliminate(s.replace(seq.pop(),""), seq)
return s
>>>eliminate("foobar",["o","b"])
'far'

Related

How can I search through a string and extract all specific characters?

So say that I have a string which is something along the lines of "One2three4". Is it possible for me to look through the string and take the integers and put them in their own string, so my final result will be "24". Thanks
Using str.join() and str.isdigit():
>>> s = "One2three4"
>>> ''.join(c for c in s if c.isdigit())
'24'
This method looks through the string once and checks if each character is a digit or not; the characters that satisfy this are joined into a new string. In complexity terms, this is O(n), and as we need to check every character in the string, this is the best we can do.

Dot notation string manipulation

Is there a way to manipulate a string in Python using the following ways?
For any string that is stored in dot notation, for example:
s = "classes.students.grades"
Is there a way to change the string to the following:
"classes.students"
Basically, remove everything up to and including the last period. So "restaurants.spanish.food.salty" would become "restaurants.spanish.food".
Additionally, is there any way to identify what comes after the last period? The reason I want to do this is I want to use isDigit().
So, if it was classes.students.grades.0 could I grab the 0 somehow, so I could use an if statement with isdigit, and say if the part of the string after the last period (so 0 in this case) is a digit, remove it, otherwise, leave it.
you can use split and join together:
s = "classes.students.grades"
print '.'.join(s.split('.')[:-1])
You are splitting the string on . - it'll give you a list of strings, after that you are joining the list elements back to string separating them by .
[:-1] will pick all the elements from the list but the last one
To check what comes after the last .:
s.split('.')[-1]
Another way is to use rsplit. It works the same way as split but if you provide maxsplit parameter it'll split the string starting from the end:
rest, last = s.rsplit('.', 1)
'classes.students'
'grades'
You can also use re.sub to substitute the part after the last . with an empty string:
re.sub('\.[^.]+$', '', s)
And the last part of your question to wrap words in [] i would recommend to use format and list comprehension:
''.join("[{}]".format(e) for e in s.split('.'))
It'll give you the desired output:
[classes][students][grades]
The best way to do this is using the rsplit method and pass in the maxsplit argument.
>>> s = "classes.students.grades"
>>> before, after = s.rsplit('.', maxsplit=1) # rsplit('.', 1) in Python 2.x onwards
>>> before
'classes.students'
>>> after
'grades'
You can also use the rfind() method with normal slice operation.
To get everything before last .:
>>> s = "classes.students.grades"
>>> last_index = s.rfind('.')
>>> s[:last_index]
'classes.students'
Then everything after last .
>>> s[last_index + 1:]
'grades'
if '.' in s, s.rpartition('.') finds last dot in s,
and returns (before_last_dot, dot, after_last_dot):
s = "classes.students.grades"
s.rpartition('.')[0]
If your goal is to get rid of a final component that's just a single digit, start and end with re.sub():
s = re.sub(r"\.\d$", "", s)
This will do the job, and leave other strings alone. No need to mess with anything else.
If you do want to know about the general case (separate out the last component, no matter what it is), then use rsplit to split your string once:
>>> "hel.lo.there".rsplit(".", 1)
['hel.lo', 'there']
If there's no dot in the string you'll just get one element in your array, the entire string.
You can do it very simply with rsplit (str.rsplit([sep[, maxsplit]]) , which will return a list by breaking each element along the given separator.
You can also specify how many splits should be performed:
>>> s = "res.spa.f.sal.786423"
>>> s.rsplit('.',1)
['res.spa.f.sal', '786423']
So the final function that you describe is:
def dimimak_cool_function(s):
if '.' not in s: return s
start, end = s.rsplit('.', 1)
return start if end.isdigit() else s
>>> dimimak_cool_function("res.spa.f.sal.786423")
'res.spa.f.sal'
>>> dimimak_cool_function("res.spa.f.sal")
'res.spa.f.sal'

From list of strings, extract only characters within brackets

I have a list of strings that have variable construction but have a character sequence enclosed in square brackets. I want to extract only the sequence enclosed by the square brackets. There is only one instance of square brackets per string, which simplifies the process.
I am struggling to do so in an elegant manner, and this is clearly a simple problem with Python's large string library.
What is a simple expression to do this?
Check regular expression, "re"
Something like this should do the trick
import re
s = "hello_from_adele[this_is_the_string_i_am_looking_for]this_is_not_it"
match = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print match.group(1)
If you provide an example, we can be more specific
You don't even need re to do this:
In [11]: strng = "This is some text [that has brackets] followed by more text"
In [12]: strng[strng.index("[")+1:strng.index("]")]
Out[12]: 'that has brackets'
This uses string slicing to return the characters inside the brackets. index() returns the 0-based position of its argument. Since we don't want to include the [ at the beginning, we add 1. The second argument of the slice is the stop position, but it is not included in the returned substring, so we don't need to add anything to it.
If you prefer not to use regex for whatever reason, it should be easy to do with string splitting since you're guaranteed to have one and only one instance of [ and ].
s = "some[string]to check"
_, midright = s.split("[")
target, _ = midright.split("]")
or
target = s.split("[")[1].split("]")[0] # ewww

Remove substrings of variable length from string

I have a list of strings where all of the strings roughly follow the format 'foo\tbar\tfoo\n' in that there are three segments of variable length that are separated by two tabs (\t) and with a newline indicator at the end (\n).
I want to remove everything except for the text before the first \, so that it would return as 'foo'. Given that the first segment is of variable length, I'm not sure how I can do that.
Use str.split():
>>> string = 'foo\tbar\tfoo\n'
>>> string.split('\t', 1)[0]
'foo'
This splits the string by the first occurrence of the '\t' tab character, which returns a list with two elements. The [0] selects the first element in the list, which is the part of the string before the first '\t' occurrence.
Just search for the first \t character, and get everything before it. Slicing makes this easy.
newstr = oldstr[:oldstr.find("\t")]
Try with:
t = 'foo\tbar\tfoo\n'
t[:t.index("\t")]

Splice a string based on certain characters

I'm looking for a way to examine only certain characters within a string. For example:
#Given the string
s= '((hello+world))'
s[1:')'] #This obviously doesn't work because you can only splice a string using ints
Basically I want the program to start at the second occurence of ( and then from there splice until it hits the first occurence of ). So then maybe from there I can return it to another fucntion or whatever. Any solutions?
You can do it as follows: (assuming you want the innermost parenthesis)
s[s.rfind("("):s.find(")")+1] if you want "(hello+world)"
s[s.rfind("(")+1:s.find(")")] if you want "hello+world"
You can strip parenthesis (if, in your case, they always appear at the beginning and the end of the string):
>>> s= '((hello+world))'
>>> s.strip('()')
'hello+world'
Another option is to use regular expression to extract what is inside the double parenthesis:
>>> re.match('\(\((.*?)\)\)', s).group(1)
'hello+world'

Categories