I'm trying to remove the last 3 characters from a string in Python, I don't know what these characters are so I can't use rstrip, I also need to remove any white space and convert to upper-case.
An example would be:
foo = "Bs12 3ab"
foo.replace(" ", "").rstrip(foo[-3:]).upper()
This works and gives me "BS12" which is what I want, however if the last 4th & 3rd characters are the same I lose both, e.g. if foo = "BS11 1AA" I just get "BS".
Examples of foo could be:
BS1 1AB
bs11ab
BS111ab
The string could be 6 or 7 characters and I need to drop the last 3 (assuming no white space).
Removing any and all whitespace:
foo = ''.join(foo.split())
Removing last three characters:
foo = foo[:-3]
Converting to capital letters:
foo = foo.upper()
All of that code in one line:
foo = ''.join(foo.split())[:-3].upper()
It doesn't work as you expect because strip is character based. You need to do this instead:
foo = foo.replace(' ', '')[:-3].upper()
>>> foo = "Bs12 3ab"
>>> foo[:-3]
'Bs12 '
>>> foo[:-3].strip()
'Bs12'
>>> foo[:-3].strip().replace(" ","")
'Bs12'
>>> foo[:-3].strip().replace(" ","").upper()
'BS12'
You might have misunderstood rstrip slightly, it strips not a string but any character in the string you specify.
Like this:
>>> text = "xxxxcbaabc"
>>> text.rstrip("abc")
'xxxx'
So instead, just use
text = text[:-3]
(after replacing whitespace with nothing)
>>> foo = 'BS1 1AB'
>>> foo.replace(" ", "").rstrip()[:-3].upper()
'BS1'
I try to avoid regular expressions, but this appears to work:
string = re.sub("\s","",(string.lower()))[:-3]
split
slice
concentrate
This is a good workout for beginners and it's easy to achieve.
Another advanced method is a function like this:
def trim(s):
return trim(s[slice])
And for this question, you just want to remove the last characters, so you can write like this:
def trim(s):
return s[ : -3]
I think you are over to care about what those three characters are, so you lost. You just want to remove last three, nevertheless who they are!
If you want to remove some specific characters, you can add some if judgements:
def trim(s):
if [conditions]: ### for some cases, I recommend using isinstance().
return trim(s[slice])
What's wrong with this?
foo.replace(" ", "")[:-3].upper()
Aren't you performing the operations in the wrong order? You requirement seems to be foo[:-3].replace(" ", "").upper()
It some what depends on your definition of whitespace. I would generally call whitespace to be spaces, tabs, line breaks and carriage returns. If this is your definition you want to use a regex with \s to replace all whitespace charactors:
import re
def myCleaner(foo):
print 'dirty: ', foo
foo = re.sub(r'\s', '', foo)
foo = foo[:-3]
foo = foo.upper()
print 'clean:', foo
print
myCleaner("BS1 1AB")
myCleaner("bs11ab")
myCleaner("BS111ab")
Related
I'm new to python and struggle with a certain task:
I have a String that could have anything in it, but it always "ends" the same.
It can be just a Filename, a complete path, or just a random string, ending with a Version Number.
Example:
C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1
string-anotherstring-15.1R7-S8.1
string-anotherstring.andanother-15.1R7-S8.1
What always is the same (looking from the end) is that if you reach the second dot and go 2 characters in front of it, you always match the part that I'm interested in.
Cutting everything after a certain string was "easy," and I solved it myself - that's why the string ends with the version now :)
Is there a way to tell python, "look for the second dot from behind the string and go 2 in front of it and delete everything in front of that so that I get the Version as a string?
Happy for any pointers in the right direction.
Thanks
If you want the version number, can you use the hyphen (-) to split the string? Or do you need to depend on the dots only?
Please see below use of rsplit and join which can help you.
>>> a = 'string-anotherstring.andanother-15.1R7-S8.1'
>>> a.rsplit('-')
['string', 'anotherstring.andanother', '15.1R7', 'S8.1']
>>> a.rsplit('-')[-2:] #Get everything from second last to the end
['15.1R7', 'S8.1']
>>> '-'.join(a.rsplit('-')[-2:]) #Get everything from second last to the end, and join them with a hyphen
'15.1R7-S8.1'
>>>
For using dots, use the same way
>>> a
'string-anotherstring.andanother-15.1R7-S8.1'
>>> data = a.rsplit('.')
>>> [data[-3][-2:]]
['15']
>>> [data[-3][-2:]] + data[-2:]
['15', '1R7-S8', '1']
>>> '.'.join([data[-3][-2:]] + data[-2:])
'15.1R7-S8.1'
>>>
You can build a regex from the end mark of a line using the anchor $.
Using your own description, use the regex:
(\d\d\.[^.]*)\.[^.]*$
Demo
If you want the last characters from the end included, just move the capturing parenthesis:
(\d\d\.[^.]*\.[^.]*)$
Demo
Explanation:
(\d\d\.[^.]*\.[^.]*)$
^ ^ #digits
^ # a literal '.'
^ # anything OTHER THAN a '.'
^ # literal '.'
^ # anything OTHER THAN a '.'
^ # end of line
Assuming I understand this correctly, there are two ways to do this that come to mind:
Including both, since I might not understand this correctly, and for completeness reasons. I think the split/parts solution is cleaner, particularly when the 'certain character' is a dot.
>>> msg = r'C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1'
>>> re.search(r'.*(..\..*)', msg).group(1)
'S8.1'
>>> parts = msg.split('.')
>>> ".".join((parts[-2][-2:], parts[-1]))
'S8.1'
For your example, you can split the string by the separator '-', and then join the last two indices. Like so:
txt = "string-anotherstring-15.1R7-S8.1"
x = txt.split("-")
y = "".join(x[-2:])
print(y) # outputs 15.1R7S8.1
The following code:
s = "www.wired.com"
print s
s = s.lstrip('www.')
print s
outputs:
www.wired.com
ired.com
Note the missing w on the second line. I'm not sure I understand the behavior. I would expect:
www.wired.com
wired.com
EDIT:
Following the first two answers, I now understand the behavior. My question is now: how do I strip the leading www. without touching the rest?
The argument to string.lstrip is a list of characters:
>>> help(string.lstrip)
Help on function lstrip in module string:
lstrip(s, chars=None)
lstrip(s [,chars]) -> string
Return a copy of the string s with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
>>>
It removes ALL occurrences of those leading characters.
print s.lstrip('w.') # does the same!
[EDIT]:
If you wanted to strop the initial www., but only if it started with that, you could use a regular expression or something like:
s = s[4:] if s.startswith('www.') else s
According to the documentation:
The chars argument is a string specifying the set of characters to be removed...The chars argument is not a prefix; rather, all combinations of its values are stripped
You would achieve the same result by just saying:
'www.wired.com'.lstrip('w.')
If you wanted something more general, I would do something like this:
i = find(s, 'www.')
if i >= 0:
s = s[0:i] + s[i+4:]
To remove the leading www.
>>> import re
>>> s = "www.wired.com"
>>> re.sub(r'^www\.', '', s)
'wired.com'
I want to check either given words contain special character or not.
so below is my python code
The literal 'a#bcd' has '#', so it will be matchd and it's ok.
but 'a1bcd' has no special character. but it was filtered too!!
import re
regexp = re.compile('[~`!##$%^&*()-_=+\[\]{}\\|;:\'\",.<>/?]+')
if regexp.search('a#bcd') :
print 'matched!! nich catch!!'
if regexp.search('a1bcd') :
print 'something is wrong here!!!'
result :
python ../special_char.py
matched!! nich catch!!
something is wrong here!!!
I have no idea why it works like above..someone help me..T_T;;;
thanks~
Move the dash in you regular expression to the start of the [] group, like this:
regexp = re.compile('[-~`!##$%^&*()_=+\[\]{}\\|;:\'\",.<>/?]+')
Where you had the dash, it was read with the surrounding characters as )-_ and since it is inside [] it is interpreted as asking to match a range from ) to _. If you move the dash to just after the [ it has no special meaning and instead matches itself.
Here's an interactive session showing the specific problem there was in your regular expression:
>>> import re
>>> print re.search('[)-_]', 'abcd')
None
>>> print re.search('[)-_]', 'a1b')
<_sre.SRE_Match object at 0x7f71082247e8>
>>> print re.search('[)-_]', 'a1b').group(0)
1
After fixing it:
>>> print re.search('[-)_]', 'a1b')
None
Unless there's some reason not visible in your question, I'd also say that the final + is not needed.
re will be relatively slow for this
I'd suggest trying
specialchars = '''-~`!##$%^&*()_=+[]{}\\|;:'",.<>/?'''
len(word) != len(word.translate(None, specialchars))
or
set(word) & set(specialchars)
I'm trying to strip off part of a string.
e.g. Strip:-
a = xyz-abc
to leave:-
a = -abc
I would usually use lstrip e.g.
a.lstrip('xyz')
but in this case I don't know what xyz is going to be, so I need a way to just strip everything to the left of '-'.
Is it possible to set that option with lstrip or do I have to go about it a different way?
Thanks.
If there's only one - character, this will work:
'xyz-abc'.split('-')[1]
If you want the '-' in there, you have to reattach it:
>>> '-' + 'xyz-abc'.split('-')[1]
'-abc'
There's also count parameter that allows you to split only at the first - character.
>>> '-' + 'xyz-ab-c'.split('-', 1)[1]
'-ab-c'
partition is also potentially useful:
>>> 'xyz-abc'.partition('-')
('xyz', '-', 'abc')
It splits at the first occurrence of the separator:
>>> ''.join('xyz-ab-c'.partition('-')[1:])
'-ab-c'
>>> a = 'xyz-abc'
>>> a.find('-') # return the index of the first instance of '-'
3
>>> a[a.find('-'):] # return the string of everything past that index
'-abc'
You could use a conjunction of .find and splicing.
If there is no guarantee that the text to the left of - doesn't contain dashes of its own, the reversed version of find called rfind is even more useful:
>>> s = "xyv-er-hdgcfh-abc"
>>> print s[s.rfind("-"):]
-abc
I am close but I am not sure what to do with the restuling match object. If I do
p = re.search('[/#.* /]', str)
I'll get any words that start with # and end up with a space. This is what I want. However this returns a Match object that I dont' know what to do with. What's the most computationally efficient way of finding and returning a string which is prefixed with a #?
For example,
"Hi there #guy"
After doing the proper calculations, I would be returned
guy
The following regular expression do what you need:
import re
s = "Hi there #guy"
p = re.search(r'#(\w+)', s)
print p.group(1)
It will also work for the following string formats:
s = "Hi there #guy " # notice the trailing space
s = "Hi there #guy," # notice the trailing comma
s = "Hi there #guy and" # notice the next word
s = "Hi there #guy22" # notice the trailing numbers
s = "Hi there #22guy" # notice the leading numbers
That regex does not do what you think it does.
s = "Hi there #guy"
p = re.search(r'#([^ ]+)', s) # this is the regex you described
print p.group(1) # first thing matched inside of ( .. )
But as usually with regex, there are tons of examples that break this, for example if the text is s = "Hi there #guy, what's with the comma?" the result would be guy,.
So you really need to think about every possible thing you want and don't want to match. r'#([a-zA-Z]+)' might be a good starting point, it literally only matches letters (a .. z, no unicode etc).
p.group(0) should return guy. If you want to find out what function an object has, you can use the dir(p) method to find out. This will return a list of attributes and methods that are available for that object instance.
As it's evident from the answers so far regex is the most efficient solution for your problem. Answers differ slightly regarding what you allow to be followed by the #:
[^ ] anything but space
\w in python-2.x is equivalent to [A-Za-z0-9_], in py3k is locale dependent
If you have better idea what characters might be included in the user name you might adjust your regex to reflect that, e.g., only lower case ascii letters, would be:
[a-z]
NB: I skipped quantifiers for simplicity.
(?<=#)\w+
will match a word if it's preceded by a # (without adding it to the match, a so-called positive lookbehind). This will match "words" that are composed of letters, numbers, and/or underscore; if you don't want those, use (?<=#)[^\W\d_]+
In Python:
>>> strg = "Hi there #guy!"
>>> p = re.search(r'(?<=#)\w+', strg)
>>> p.group()
'guy'
You say: """If I do p = re.search('[/#.* /]', str) I'll get any words that start with # and end up with a space."" But this is incorrect -- that pattern is a character class which will match ONE character in the set #/.* and space. Note: there's a redundant second / in the pattern.
For example:
>>> re.findall('[/#.* /]', 'xxx#foo x/x.x*x xxxx')
['#', ' ', '/', '.', '*', ' ']
>>>
You say that you want "guy" returned from "Hi there #guy" but that conflicts with "and end up with a space".
Please edit your question to include what you really want/need to match.