removing part of a string (up to but not including) in python - python

I'm trying to strip off part of a string.
e.g. Strip:-
a = xyz-abc
to leave:-
a = -abc
I would usually use lstrip e.g.
a.lstrip('xyz')
but in this case I don't know what xyz is going to be, so I need a way to just strip everything to the left of '-'.
Is it possible to set that option with lstrip or do I have to go about it a different way?
Thanks.

If there's only one - character, this will work:
'xyz-abc'.split('-')[1]
If you want the '-' in there, you have to reattach it:
>>> '-' + 'xyz-abc'.split('-')[1]
'-abc'
There's also count parameter that allows you to split only at the first - character.
>>> '-' + 'xyz-ab-c'.split('-', 1)[1]
'-ab-c'
partition is also potentially useful:
>>> 'xyz-abc'.partition('-')
('xyz', '-', 'abc')
It splits at the first occurrence of the separator:
>>> ''.join('xyz-ab-c'.partition('-')[1:])
'-ab-c'

>>> a = 'xyz-abc'
>>> a.find('-') # return the index of the first instance of '-'
3
>>> a[a.find('-'):] # return the string of everything past that index
'-abc'
You could use a conjunction of .find and splicing.

If there is no guarantee that the text to the left of - doesn't contain dashes of its own, the reversed version of find called rfind is even more useful:
>>> s = "xyv-er-hdgcfh-abc"
>>> print s[s.rfind("-"):]
-abc

Related

How to remove everything before certain character in Python

I'm new to python and struggle with a certain task:
I have a String that could have anything in it, but it always "ends" the same.
It can be just a Filename, a complete path, or just a random string, ending with a Version Number.
Example:
C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1
string-anotherstring-15.1R7-S8.1
string-anotherstring.andanother-15.1R7-S8.1
What always is the same (looking from the end) is that if you reach the second dot and go 2 characters in front of it, you always match the part that I'm interested in.
Cutting everything after a certain string was "easy," and I solved it myself - that's why the string ends with the version now :)
Is there a way to tell python, "look for the second dot from behind the string and go 2 in front of it and delete everything in front of that so that I get the Version as a string?
Happy for any pointers in the right direction.
Thanks
If you want the version number, can you use the hyphen (-) to split the string? Or do you need to depend on the dots only?
Please see below use of rsplit and join which can help you.
>>> a = 'string-anotherstring.andanother-15.1R7-S8.1'
>>> a.rsplit('-')
['string', 'anotherstring.andanother', '15.1R7', 'S8.1']
>>> a.rsplit('-')[-2:] #Get everything from second last to the end
['15.1R7', 'S8.1']
>>> '-'.join(a.rsplit('-')[-2:]) #Get everything from second last to the end, and join them with a hyphen
'15.1R7-S8.1'
>>>
For using dots, use the same way
>>> a
'string-anotherstring.andanother-15.1R7-S8.1'
>>> data = a.rsplit('.')
>>> [data[-3][-2:]]
['15']
>>> [data[-3][-2:]] + data[-2:]
['15', '1R7-S8', '1']
>>> '.'.join([data[-3][-2:]] + data[-2:])
'15.1R7-S8.1'
>>>
You can build a regex from the end mark of a line using the anchor $.
Using your own description, use the regex:
(\d\d\.[^.]*)\.[^.]*$
Demo
If you want the last characters from the end included, just move the capturing parenthesis:
(\d\d\.[^.]*\.[^.]*)$
Demo
Explanation:
(\d\d\.[^.]*\.[^.]*)$
^ ^ #digits
^ # a literal '.'
^ # anything OTHER THAN a '.'
^ # literal '.'
^ # anything OTHER THAN a '.'
^ # end of line
Assuming I understand this correctly, there are two ways to do this that come to mind:
Including both, since I might not understand this correctly, and for completeness reasons. I think the split/parts solution is cleaner, particularly when the 'certain character' is a dot.
>>> msg = r'C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1'
>>> re.search(r'.*(..\..*)', msg).group(1)
'S8.1'
>>> parts = msg.split('.')
>>> ".".join((parts[-2][-2:], parts[-1]))
'S8.1'
For your example, you can split the string by the separator '-', and then join the last two indices. Like so:
txt = "string-anotherstring-15.1R7-S8.1"
x = txt.split("-")
y = "".join(x[-2:])
print(y) # outputs 15.1R7S8.1

Python string regular expression

I need to do a string compare to see if 2 strings are equal, like:
>>> x = 'a1h3c'
>>> x == 'a__c'
>>> True
independent of the 3 characters in middle of the string.
You need to use anchors.
>>> import re
>>> x = 'a1h3c'
>>> pattern = re.compile(r'^a.*c$')
>>> pattern.match(x) != None
True
This would check for the first and last char to be a and c . And it won't care about the chars present at the middle.
If you want to check for exactly three chars to be present at the middle then you could use this,
>>> pattern = re.compile(r'^a...c$')
>>> pattern.match(x) != None
True
Note that end of the line anchor $ is important , without $, a...c would match afoocbarbuz.
Your problem could be solved with string indexing, but if you want an intro to regex, here ya go.
import re
your_match_object = re.match(pattern,string)
the pattern in your case would be
pattern = re.compile("a...c") # the dot denotes any char but a newline
from here, you can see if your string fits this pattern with
print pattern.match("a1h3c") != None
https://docs.python.org/2/howto/regex.html
https://docs.python.org/2/library/re.html#search-vs-match
if str1[0] == str2[0]:
# do something.
You can repeat this statement as many times as you like.
This is slicing. We're getting the first value. To get the last value, use [-1].
I'll also mention, that with slicing, the string can be of any size, as long as you know the relative position from the beginning or the end of the string.

Strip all matching characters from string

Given any of the following strings:
'test'
'test='
'test=='
'test==='
I'd like to run a function on it that will remove any/all '=' characters from the end. Now, I could write something like this in two seconds, in fact, here goes one, and I can imaging a dozen alternative approaches:
def cleanup():
p = passwd()
while True:
new_p = p.rstrip('=')
if len(new_p) == len(p):
return new_p
p = new_p
But I was wondering if anything like that already exists as part of the Python Standard Library?
str.rstrip() already removes all matching characters:
>>> 'test===='.rstrip('=')
'test'
There is no need to loop.
All you need is str.rstrip:
>>> 'test'.rstrip('=')
'test'
>>> 'test='.rstrip('=')
'test'
>>> 'test=='.rstrip('=')
'test'
>>> 'test==='.rstrip('=')
'test'
>>>
From the docs:
str.rstrip([chars])
Return a copy of the string with trailing characters removed.
It should be noted however that str.rstrip only removes characters from the right end of the string. You need to use str.lstrip to remove characters from the left end and str.strip to remove characters from both ends.

Python regex to find only second quotes of paired quotes

I wondering if there is some way to find only second quotes from each pair in string, that has paired quotes.
So if I have string like '"aaaaa"' or just '""' I want to find only the last '"' from it. If I have '"aaaa""aaaaa"aaaa""' I want only the second, fourth and sixth '"'s. But if I have something like this '"aaaaaaaa' or like this 'aaa"aaa' I don't want to find anything, since there are no paired quotes. If i have '"aaa"aaa"' I want to find only second '"', since the third '"' has no pair.
I've tried to implement lookbehind, but it doesn't work with quantifiers, so my bad attempt was '(?<=\"a*)\"'.
You don't really need regex for this. You can do:
[i for i, c in enumerate(s) if c == '"'][1::2]
To get the index of every other '"'. Example usage:
>>> for s in ['"aaaaa"', '"aaaa""aaaaa"aaaa""', 'aaa"aaa', '"aaa"aaa"']:
print(s, [i for i, c in enumerate(s) if c == '"'][1::2])
"aaaaa" [6]
"aaaa""aaaaa"aaaa"" [5, 12, 18]
aaa"aaa []
"aaa"aaa" [4]
import re
reg = re.compile(r'(?:\").*?(\")')
then
for match in reg.findall('"this is", "my test"'):
print(match)
gives
"
"
If your necessity is to change the second quote you can also match the whole string and put the pattern before the second quote into a capture group. Then making the substitution by the first match group + the substitution string would archive the issue.
For example, this regex will match everything before the second quote and put it into a group
(\"[^"]*)\"
if you replace whole the match (which includes the second quote) by only the value of the capture group (which does not include the second quote), then you would just cut it off.
See the online example
import re
p = re.compile(ur'(\"[^"]*)\"')
test_str = u"\"test1\"test2\"test3\""
subst = r"\1"
result = re.sub(p, subst, test_str)
print result #result -> "test1test2"test3
Please read my answer about why you don't want to use regular expressions for such a problem, even though you can do that kind of non-regular job with it.
Ok then you probably want one of the solutions I give in the linked answer, where you'll want to use a recursive regex to match all the matching pairs.
Edit: the following has been written before the update to the question, which was asking only for second double quotes.
Though if you want to find only second double quotes in a string, you do not need regexps:
>>> s1='aoeu"aoeu'
>>> s2='aoeu"aoeu"aoeu'
>>> s3='aoeu"aoeu"aoeu"aoeu'
>>> def find_second_quote(s):
... pos_quote_1 = s2.find('"')
... if pos_quote_1 == -1:
... return -1
... pos_quote_2 = s[pos_quote_1+1:].find('"')
... if pos_quote_2 == -1:
... return -1
... return pos_quote_1+1+pos_quote_2
...
>>> find_second_quote(s1)
-1
>>> find_second_quote(s2)
4
>>> find_second_quote(s3)
4
>>>
here it either returns -1 if there's no second quote, or the position of the second quote if there is one.
a parser is probably better, but depending on what you want to get out of it, there are other ways. if you need the data between the quotes:
import re
re.findall(r'".*?"', '"aaaa""aaaaa"aaaa""')
['"aaaa"',
'"aaaaa"',
'""']
if you need the indices, you could do it as a generator or other equivalent like this:
def count_quotes(mystr):
count = 0
for i, x in enumerate(mystr):
if x == '"':
count += 1
if count % 2 == 0:
yield i
list(count_quotes('"aaaa""aaaaa"aaaa""'))
[5, 12, 18]

Split a string in python

a="aaaa#b:c:"
>>> for i in a.split(":"):
... print i
... if ("#" in i): //i=aaaa#b
... print only b
In the if loop if i=aaaa#b how to get the value after the hash.should we use rsplit to get the value?
The following can replace your if statement.
for i in a.split(':'):
print i.partition('#')[2]
>>> a="aaaa#b:c:"
>>> a.split(":",2)[0].split("#")[-1]
'b'
a = "aaaa#b:c:"
print(a.split(":")[0].split("#")[1])
I'd suggest from: Python Docs
str.rsplit([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter
string. If maxsplit is given, at most maxsplit splits are done, the
rightmost ones. If sep is not specified or None, any whitespace string
is a separator. Except for splitting from the right, rsplit() behaves
like split() which is described in detail below.
so to answer your question yes.
EDIT:
It depends on how you wish to index your strings too, it looks like Rstring does it from the right, so if your data is always "rightmost" you could index by 0 (or 1, not sure how python indexes), every time, rather then having to do a size check of the returned array.
do you really need to use split? split create a list, so isn't so efficient...
what about something like this:
>>> a = "aaaa#b:c:"
>>> a[a.find('#') + 1]
'b'
or if you need particular occurence, use regex instead...
split would do the job nicely. Use rsplit only if you need to split from the last '#'.
a="aaaa#b:c:"
>>> for i in a.split(":"):
... print i
... b = i.split('#',1)
... if len(b)==2:
... print b[1]

Categories