How to remove specific symbols from a text using python? - python

I have a string like this:
string = 'This is my text of 2013-02-11, & it contained characters like this! (Exceptional)'
These are the symbols I want to remove from my String.
!, #, #, %, ^, &, *, (, ), _, +, =, `, /
What I have tried is:
listofsymbols = ['!', '#', '#', '%', '^', '&', '*', '(', ')', '_', '+', '=', '`', '/']
exceptionals = set(chr(e) for e in listofsymbols)
string.translate(None,exceptionals)
The error is:
an integer is required
Please help me doing this!

Try this
>>> my_str = 'This is my text of 2013-02-11, & it contained characters like this! (Exceptional)'
>>> my_str.translate(None, '!##%^&*()_+=`/')
This is my text of 2013-02-11, it contained characters like this Exceptional
Also, please refrain from naming variables that are already built-in names or part of the standard library.

How about this? I've also renamed string to s to avoid it getting mixed up with the built-in module string.
>>> s = 'This is my text of 2013-02-11, & it contained characters like this! (Exceptional)'
>>> listofsymbols = ['!', '#', '#', '%', '^', '&', '*', '(', ')', '_', '+', '=', '`', '/']
>>> print ''.join([i for i in s if i not in listofsymbols])
This is my text of 2013-02-11, it contained characters like this Exceptional

Another proposal, easily expandable to more complex filter criteria or other input data type:
from itertools import ifilter
def isValid(c): return c not in "!##%^&*()_+=`/"
print "".join(ifilter(isValid, my_string))

Related

Is it possible to convert an ASCII integer to a character without access to builtins?

In Python 3.7, assuming we have no access to functions such as str() and must rely entirely on int.__str__() etc., is it possible to get the character represented by an ASCII integer? E.G
>>> a = 97
>>> b = a.__magicfunction__()
>>> b
'a'
def magicfunction(i):
ascii=[None] * 32
ascii+=[' ', '!', "'", '#', '$', '%', '&', '"', '(', ')', '*', '+', '`', '-', '+', '/']
ascii+=list('0123456789:;<=>?#')
ascii+=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
ascii+=list('[\]^_`')
ascii+=list('abcdefghijklmnopqrstuvwxyz')
return ascii[i]
print(magicfunction(97))
produces:
a
You never said we should not use bytes:
bytes([97]).decode()

Partitioning a string with multiple delimiters

I know partition() exists, but it only takes in one value, I'm trying to partition around various values:
for example say I wanted to partition around symbols in a string:
input: "function():"
output: ["function", "(", ")", ":"]
I can't seem to find an efficient way to handle variable amounts of partitioning.
You can use re.findall with an alternation pattern that matches either a word or a non-space character:
re.findall(r'\w+|\S', s)
so that given s = 'function():', this returns:
['function', '(', ')', ':']
You could re.split by \W and use (...) to keep the delimiters, then remove empty parts.
>>> import re
>>> s = "function(): return foo + 3"
>>> [s for s in re.split(r"(\W)", s) if s.strip()]
['function', '(', ')', ':', 'return', 'foo', '+', '3']
Note that this will split after every special character; if you want to keep certain groups of special characters together, e.g. == or <=, you should test those first with |.
>>> s = "function(): return foo + 3 == 42"
>>> [s for s in re.split(r"(\W)", s) if s.strip()]
['function', '(', ')', ':', 'return', 'foo', '+', '3', '=', '=', '42']
>>> [s for s in re.split(r"(==|!=|<=|\W)", s) if s.strip()]
['function', '(', ')', ':', 'return', 'foo', '+', '3', '==', '42']

How to query python list with multiple characters in each one

I know the title is a little messy and if someone wants to fix it, they are more than welcome to.
Anyways, I'm having trouble querying a python list with multiple values, I have looked on other Stackoverflow questions and none of seem to match what I'm looking for.
So, this is the code I have so far, its supposed to use a for loop statement, so that it goes through each character and then uses and if in statements to check whether a character in the user input matches anything in the list.
In my example, it only uses symbols, but hopefully that shouldn't be much of a problem
Anyways here is the code
string = input("What symbol character would you like to check")
symbols=[' ', '!', '#', '$', '%', '&', '"', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '#', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~',"'"]
def symbol():
for string in symbols:
if string in symbols:
return True
elif string not in symbols:
return False
if symbol():
print('ok')
if not symbol():
print('What Happened?')
*Update, I also need solution to be able to accept letters and numbers as well as the symbols.
For example, if user enters !a, that it will still detect the '!' and evaluate to True.
If you loop over the input string, you should be able to get the solution you're looking for. How about something like this?
input_string = raw_input("What symbol character would you like to check? ")
symbols = [' ', '!', '#', '$', '%', '&', '"', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '#', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~',"'"]
def symbol(input_str):
for char in input_str:
if char in symbols:
return True
return False
if symbol(input_string):
print('ok')
else:
print('What Happened?')
raw_input() avoids some trouble. Before I changed that, I was getting an unexpected EOF while parsing error. I changed the names of your variables to help a bit and avoid potential conflicts.
By moving the return False line outside the for loop, it lets the loop check every character in the input string first. If it checks every one, and nothing matches, then it will default to returning False.
Also, you have two calls to symbol() in your question, which I don't think is necessary. One if can check for a True return value. Lacking that, we move to the else statement and can safely know that the function returned False.

Split a string in python with spaces and punctuations mark , unicode characters , etc.

I want to split string like this:
string = '[[he (∇((comesΦf→chem,'
based on spaces, punctuation marks also unicode characters. I mean, what I expect in output is in following mode:
out= ['[', '[', 'he',' ', '(','∇' , '(', '(', 'comes','Φ', 'f','→', 'chem',',']
I am using
re.findall(r"[\w\s\]+|[^\w\s]",String,re.unicode)
for this case, but it returned following output:
output=['[', '[', 'he',' ', '(', '\xe2', '\x88', '\x87', '(', '(', 'comes\xce', '\xa6', 'f\xe2', '\x86', '\x92', 'chem',',']
Please tell me how can i solve this problem.
Without using regexes and assuming words only contain ascii characters:
from string import ascii_letters
from itertools import groupby
LETTERS = frozenset(ascii_letters)
def is_alpha(char):
return char in LETTERS
def split_string(text):
for key, tokens in groupby(text, key=is_alpha):
if key: # Found letters, join them and yield a word
yield ''.join(tokens)
else: # not letters, just yield the single tokens
yield from tokens
Example result:
In [2]: list(split_string('[[he (∇((comesΦf→chem,'))
Out[2]: ['[', '[', 'he', ' ', '(', '∇', '(', '(', 'comes', 'Φ', 'f', '→', 'chem', ',']
If you are using a python version less than 3.3 you can replace yield from tokens with:
for token in tokens: yield token
If you are on python2 keep in mind that split_string accepts a unicode string.
Note that modifying the is_alpha function you can define different kinds of grouping. For example if you wanted to considered all unicode letters as letters you could do: is_alpha = str.isalpha (or unicode.isalpha in python2):
In [3]: is_alpha = str.isalpha
In [4]: list(split_string('[[he (∇((comesΦf→chem,'))
Out[4]: ['[', '[', 'he', ' ', '(', '∇', '(', '(', 'comesΦf', '→', 'chem', ',']
Note the 'comesΦf' that before was splitted.
Hope i halp.
In [33]: string = '[[he (∇((comesΦf→chem,'
In [34]: re.split('\W+', string)
Out[34]: ['', 'he', 'comes', 'f', 'chem', '']

python regular expression

I am newbie to python. I have an array of words and each word has to be checked to see whether it contains any special characters or digits. If it contains so then i have to skip that word. How should i do this?
Does it have to be a regular expression? If not, you can use the isalpha() string method.
My reading of the problem is that you want to discard any words that contain non-alphabetical characters. Try the following:
>>> array = ['hello', 'hello2', '?hello', '?hello2']
>>> filtered = filter(str.isalpha, array)
>>> print filtered
['hello']
You could also write it as a list comprehension:
>>> filtered = [word for word in array if word.isalpha()]
>>> print filtered
['hello']
If there are only a few characters you want to exclude then use a blacklist, otherwise use a white list.
import string
abadword="""aaaa
bbbbb"""
words=["oneGoodWord", "a,bc",abadword, "xx\n",'123',"gone", "tab tab", "theEnd.","anotherGoodWord"]
bad=list(string.punctuation) #string.punctuation='!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
bad+=['\n','\t','1'] #add some more characters you don't want
bad+=['one'] #this is redundant as in function skip set(word) becomes a set of word's characters. 'one' cannot match a character.
print bad #bad = ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '#', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~', '\n', '\t', '1', 'one']
bad=set(bad)
def skip(word):
return len(set(word) & bad)==0 #word has no characters in common with bad word
print "good words:"
print filter(skip,words) #prints ['oneGoodWord', 'gone', 'anotherGoodWord']

Categories