I'm writing server side in python.
I noticed that the client sent me one of the parameter like this:
"↵ tryit1.tar↵ "
I want to get rid of spaces (and for that I use the replace command), but I also want to get rid of the special character: "↵".
How can I get rid of this character (and other weird characters, which are not -,_,*,.) using python command?
A regex would be good here:
re.sub('[^a-zA-Z0-9-_*.]', '', my_string)
>>> import string
>>> my_string = "↵ tryit1.tar↵ "
>>> acceptable_characters = string.letters + string.digits + "-_*."
>>> filter(lambda c: c in acceptable_characters, my_string)
'tryit1.tar'
I would use a regex like this:
import re
string = "↵ tryit1.tar↵ "
print re.sub(r'[^\w.]', '', string) # tryit1.tar
Related
I am having a hard time doing Data Analysis on a large text that has lots of non-alphabetical chars. I tried using
string = filter(str.isalnum, string)
but I also have "#" in my text that I want to keep. How do I make an exception for a character like "#" ?
It is easier to use regular expressions:
string = re.sub("[^A-Za-z0-9#]", "", string)
You can use re.sub
re.sub(r'[^\w\s\d#]', '', string)
Example:
>>> re.sub(r'[^\w\s\d#]', '', 'This is # string 123 *$^%')
This is # string 123
One way to do this would be to create a function that returns True or False if an input character is valid.
import string
valid_characters = string.ascii_letters + string.digits + '#'
def is_valid_character(character):
return character in valid_characters
# Instead of using `filter`, we `join` all characters in the input string
# if `is_valid_character` is `True`.
def get_valid_characters(string):
return "".join(char for char in string if is_valid_character(char))
Some example output:
>>> print(valid_characters)
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789#
>>> get_valid_characters("!Hello_#world?")
'Helloworld'
>>> get_valid_characters("user#example")
'user#example'
A simpler way to write it would be using regex. This will accomplish the same thing:
import re
def get_valid_characters(string):
return re.sub(r"[^\w\d#]", "", string)
You could use a lambda function to specify your allowed characters. But also note that filter returns a <filter object> which is an iterator over the returned values. So you will have to stich it back to a string:
string = "?filter_#->me3!"
extra_chars = "#!"
filtered_object = filter(lambda c: c.isalnum() or c in extra_chars, string)
string = "".join(filtered_object)
print(string)
Gives:
filter#me3!
how can I replace/delete a part of a string, like this
string = '{DDBF1F} this is my string {DEBC1F}'
#{DDBF1F} the code between Parentheses is random, I only know it is made out of 6 characters
the output should be
this is my string
I tried this, I know it doesn't work, but I tried :3
string = '{DDBF1F} Hello {DEBC1F}'
string.replace(f'{%s%s%s%s%s%s}', 'abc')
print(string)
Use the re library to perform a regex replace, like this:
import re
text = '{DDBF1F} Hello {DEBC1F}'
result = re.sub(r"(\s?\{[A-F0-9]{6}\}\s?)", "", text)
print(result)
If the length of the strings within the brackets is fixed, you can use slicing to get the inner substring:
>>> string = '{DDBF1F} this is my string {DEBC1F}'
>>> string[8:-8]
' this is my string '
(string[9:-9] if you want to remove the surrounding spaces)
If hardcoding the indexes feels bad, they can be derived using str.index (if you can be certain that the string will not contain an embedded '}'):
>>> start = string.index('}')
>>> start
7
>>> end = string.index('{', start)
>>> end
27
>>> string[start+1:end]
' this is my string '
This code works
string = '{DDBF1F} this is my string {DEBC1F}'
st=string.split(' ')
new_str=''
for i in st:
if i.startswith('{') and i.endswith('}'):
pass
else:
new_str=new_str+" "+ i
print(new_str)
So I have strings with a date somewhere in the middle, like 111_Joe_Smith_2010_Assessment and I want to truncate them such that they become something like 111_Joe_Smith_2010. The code that I thought would work is
reverseString = currentString[::-1]
stripper = re.search('\d', reverseString)
But for some reason this doesn't always give me the right result. Most of the time it does, but every now and then, it will output a string that looks like 111_Joe_Smith_2010_A.
If anyone knows what's wrong with this, it would be super helpful!
You can use re.sub and $ to match and substitute alphabetical characters
and underscores until the end of the string:
import re
d = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
new_s = [re.sub('[a-zA-Z_]+$', '', i) for i in d]
Output:
['111_Joe_Smith_2010', '111_Bob_Smith_2010']
You could strip non-digit characters from the end of the string using re.sub like this:
>>> import re
>>> re.sub(r'\D+$', '', '111_Joe_Smith_2010_Assessment')
'111_Joe_Smith_2010'
For your input format you could also do it with a simple loop:
>>> s = '111_Joe_Smith_2010_Assessment'
>>> i = len(s) - 1
>>> while not s[i].isdigit():
... i -= 1
...
>>> s[:i+1]
'111_Joe_Smith_2010'
You can use the following approach:
def clean_names():
names = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
for name in names:
while not name[-1].isdigit():
name = name[:-1]
print(name)
Here is another solution using rstrip() to remove trailing letters and underscores, which I consider a pretty smart alternative to re.sub() as used in other answers:
import string
s = '111_Joe_Smith_2010_Assessment'
new_s = s.rstrip(f'{string.ascii_letters}_') # For Python 3.6+
new_s = s.rstrip(string.ascii_letters+'_') # For other Python versions
print(new_s) # 111_Joe_Smith_2010
I'm writing small crawler using scrapy.
One of XPath's is containing price followed by "zł" (polish currency mark) the problem is it's obfuscated by new line characters, spaces and non breaking spaces.
so when I do :
sel.xpath("div/div/span/span/text()[normalize-space(.)]").extract()
I get:
[u'\n 1\xa0740,00 z\u0142\n \n \n ']
Which I want to change to
[u'1740,00']
or simply into float variable.
What is the /best/simplest/fastest way to do this?
You can use re.findall to extract the characters from the string:
>>> import re
>>> s = u'\n 1\xa0740,00 z\u0142\n \n \n '
>>> L = re.findall(r'[\d,]', s)
>>> "".join(L)
'1740,00'
If you are interested only in ascii digits then the fastest method is to use bytes.translate():
import string
keep = string.digits.encode() + b',' # characters to keep
delete = bytearray(set(range(0x100)) - set(bytearray(keep))) # to delete
result = unicode_string.encode('ascii', 'ignore').translate(None, delete).decode()
You could write it more succinctly using Unicode .translate():
import string
import sys
keep = set(map(ord, string.digits + ',')) # characters to keep
table = dict.fromkeys(i for i in range(sys.maxunicode + 1) if i not in keep)
result = unicode_string.translate(table)
The result is the same but before Python 3.5, it is always dog-slow (the situation is better in Python 3.5 for ascii-only case).
I am looking for a way to prefix strings in python with a single backslash, e.g. "]" -> "]". Since "\" is not a valid string in python, the simple
mystring = '\' + mystring
won't work. What I am currently doing is something like this:
mystring = r'\###' + mystring
mystring.replace('###','')
While this works most of the time, it is not elegant and also can cause problems for strings containing "###" or whatever the "filler" is set to. Is there a bette way of doing this?
You need to escape the backslash with a second one, to make it a literal backslash:
mystring = "\\" + mystring
Otherwise it thinks you're trying to escape the ", which in turn means you have no quote to terminate the string
Ordinarily, you can use raw string notation (r'string'), but that won't work when the backslash is the last character
The difference between print a and just a:
>>> a = 'hello'
>>> a = '\\' + a
>>> a
'\\hello'
>>> print a
\hello
Python strings have a feature called escape characters. These allow you to do special things inside as string, such as showing a quote (" or ') without closing the string you're typing
See this table
So when you typed
mystring = '\' + mystring
the \' is an escaped apostrophe, meaning that your string now has an apostrophe in it, meaning it isn't actually closed, which you can see because the rest of that line is coloured.
To type a backslash, you must escape one, which is done like this:
>>> aBackSlash = '\\'
>>> print(aBackSlash)
\
You should escape the backslash as follows:
mystring = "\\" + mystring
This is because if you do '\' it will end up escaping the second quotation. Therefore to treat the backslash literally, you must escape it.
Examples
>>> s = 'hello'
>>> s = '\\' + s
>>> print
\hello
Your case
>>> mystring = 'it actually does work'
>>> mystring = '\\' + mystring
>>> print mystring
\it actually does work
As a different way of approaching the problem, have you considered string formatting?
r'\%s' % mystring
or:
r'\{}'.format(mystring)