Check when string contains only special characters in python - python

So what I want to do is check if the string contains only special characters.
An example should make it clear
Hello -> Valid
Hello?? -> Valid
?? -> Not Valid
Same thing done for all special characters including "."

You can use this regex with anchors to check if your input contains only non-word (special) characters:
^\W+$
If underscore also to be treated a special character then use:
^[\W_]+$
RegEx Demo
Code:
>>> def spec(s):
if not re.match(r'^[_\W]+$', s):
print('Valid')
else:
print('Invalid')
>>> spec('Hello')
Valid
>>> spec('Hello??')
Valid
>>> spec('??')
Invalid

You can use a costume python function :
>>> import string
>>> def check(s):
... return all(i in string.punctuation for i in s)
string.punctuation contain all special characters and you can use all function to check if all of the characters are special!

Here's the working code:
import string
def checkString(your_string):
for let in your_string.lower():
if let in string.ascii_lowercase:
return True
return False

import string
s = input("Enter a string:")
if all(i in string.punctuation for i in s):
print ("Only special characters")
else:
print ("Valid")
use the above loop to set boolean events and use it accordingly

Related

split string on any special character using python

currently I can have many dynamic separators in string like
new_123_12313131
new$123$12313131
new#123#12313131
etc etc . I just want to check if there is a special character in string then just get value after last separator like in this example just want 12313131
This is a good use case for isdigit():
l = [
'new_123_12313131',
'new$123$12313131',
'new#123#12313131',
]
output = []
for s in l:
temp = ''
for char in s:
if char.isdigit():
temp += char
output.append(temp)
print(output)
Result: ['12312313131', '12312313131', '12312313131']
Assuming you define 'special character' as anything thats not alphanumeric, you can use the str.isalnum() function to determine the first special character and leverage it something like this:
def split_non_special(input) -> str:
"""
Find first special character starting from the end and get the last piece
"""
for i in reversed(input):
if not i.isalnum():
return input.split(i)[-1] # return as soon as a separator is found
return '' # no separator found
# inputs = ['new_123_12313131', 'new$123$12313131', 'new#123#12313131', 'eefwfwrfwfwf3243']
# outputs = [split_non_special(input) for input in inputs]
# ['12313131', '12313131', '12313131', ''] # outputs
just get value after last separator
the more obvious way is using re.findall:
from re import findall
findall(r'\d+$',text) # ['12313131']
Python supplies what seems to be what you consider "special" characters using the string library as string.punctuation. Which are these characters:
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
Using that in conjunction with the re module you can do this:
from string import punctuation
import re
re.split(f"[{punctuation}]", my_string)
my_string being the string you want to split.
Results for your examples
['new', '123', '12313131']
To get just digits you can use:
re.split("\d", my_string)
Results:
['123', '12313131']

Python Removing non-alphabetical characters with exceptions

I am having a hard time doing Data Analysis on a large text that has lots of non-alphabetical chars. I tried using
string = filter(str.isalnum, string)
but I also have "#" in my text that I want to keep. How do I make an exception for a character like "#" ?
It is easier to use regular expressions:
string = re.sub("[^A-Za-z0-9#]", "", string)
You can use re.sub
re.sub(r'[^\w\s\d#]', '', string)
Example:
>>> re.sub(r'[^\w\s\d#]', '', 'This is # string 123 *$^%')
This is # string 123
One way to do this would be to create a function that returns True or False if an input character is valid.
import string
valid_characters = string.ascii_letters + string.digits + '#'
def is_valid_character(character):
return character in valid_characters
# Instead of using `filter`, we `join` all characters in the input string
# if `is_valid_character` is `True`.
def get_valid_characters(string):
return "".join(char for char in string if is_valid_character(char))
Some example output:
>>> print(valid_characters)
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789#
>>> get_valid_characters("!Hello_#world?")
'Helloworld'
>>> get_valid_characters("user#example")
'user#example'
A simpler way to write it would be using regex. This will accomplish the same thing:
import re
def get_valid_characters(string):
return re.sub(r"[^\w\d#]", "", string)
You could use a lambda function to specify your allowed characters. But also note that filter returns a <filter object> which is an iterator over the returned values. So you will have to stich it back to a string:
string = "?filter_#->me3!"
extra_chars = "#!"
filtered_object = filter(lambda c: c.isalnum() or c in extra_chars, string)
string = "".join(filtered_object)
print(string)
Gives:
filter#me3!

Is there a way to check against all special characters?

Is there any way to test for all special characters in python other than manually putting them in, perhaps something similar to the .isalnum or .isalpha functions? I'm relatively new to coding, so I have no idea.
Assuming that any non alphanumeric character counts as special, you can add not in front isalnum and will return true when there's any special character:
test = "1$%a"
print not test.isalnum()
# returns true
You could define your own is_alphanumeric function:
from string import digits, letters
def is_alphanumeric(mystring):
""" Returns true if all characters of `mystring` are either letters or digits:
>>> is_alphanumeric('hello wörld')
True
>>> is alphanumeric('Hello World!')
False
"""
return all(character in chain(digits, letters) for character in mystring)
If you want to restrict it to ascii:
from string import digits, letters_ascii
def is_alphanumeric_ascii(mystring):
""" Returns true if all characters of `mystring` are either ascii letters or digits:
>>> is_alphanumeric('hello wörld')
False
>>> is alphanumeric('Hello World')
True
"""
return all(character in chain(digits, letters_ascii) for character in mystring)

How to check if a given character is considered as 'special' by the Python regex engine?

Is there an easy way to verify that the given character has a special regex function?
Of course I can collect regex characters in a list like ['.', "[", "]", etc.] to check that, but I guess there is a more elegant way.
You could use re.escape. For example:
>>> re.escape("a") == "a"
True
>>> re.escape("[") == "["
False
The idea is that if a character is a special one, then re.escape returns the character with a backslash in front of it. Otherwise, it returns the character itself.
You can use re.escape within all function as following :
>>> def checker(st):
... return all(re.escape(i)==i for i in st)
...
>>> checker('aab]')
False
>>> checker('aab')
True
>>> checker('aa.b3')
False
Per the documentation, re.escape will (emphasis mine):
Return string with all non-alphanumerics backslashed; this is useful
if you want to match an arbitrary literal string that may have regular
expression metacharacters in it.
So it tells you whether a character could be a meaningful one, not whether it is. For example:
>>> re.escape('&') == '&'
False
This is useful for processing arbitrary strings, as it ensures that all control characters are escaped, but not for telling you which actually needed to be. The simplest approach, in my view, is the one dismissed in the question:
char in set(r'.^$*+?{}[]\| ')
Elegance lies in the eyes of the beholder, however (IMHO) this (below) is the most generic/"timeproof" way of checking if a character is considered to be special by the Python Regex engine -
def isFalsePositive(char):
m = re.match(char, 'a')
if m is not None and m.end() == 1:
return True
else:
return False
def isSpecial(char):
try:
m = re.match(char, char)
except:
return True
if m is not None and m.end() == 1:
if isFalsePositive(char):
return True
else:
return False
else:
return True
P.S. -
isFalsePositive() may be overkill to check the special case of '.' (dot). :-)

How to check if a string only contains letters?

I'm trying to check if a string only contains letters, not digits or symbols.
For example:
>>> only_letters("hello")
True
>>> only_letters("he7lo")
False
Simple:
if string.isalpha():
print("It's all letters")
str.isalpha() is only true if all characters in the string are letters:
Return true if all characters in the string are alphabetic and there is at least one character, false otherwise.
Demo:
>>> 'hello'.isalpha()
True
>>> '42hello'.isalpha()
False
>>> 'hel lo'.isalpha()
False
The str.isalpha() function works. ie.
if my_string.isalpha():
print('it is letters')
For people finding this question via Google who might want to know if a string contains only a subset of all letters, I recommend using regexes:
import re
def only_letters(tested_string):
match = re.match("^[ABCDEFGHJKLM]*$", tested_string)
return match is not None
You can leverage regular expressions.
>>> import re
>>> pattern = re.compile("^[a-zA-Z]+$")
>>> pattern.match("hello")
<_sre.SRE_Match object; span=(0, 5), match='hello'>
>>> pattern.match("hel7lo")
>>>
The match() method will return a Match object if a match is found. Otherwise it will return None.
An easier approach is to use the .isalpha() method
>>> "Hello".isalpha()
True
>>> "Hel7lo".isalpha()
False
isalpha() returns true if there is at least 1 character in the string and if all the characters in the string are alphabets.
Actually, we're now in globalized world of 21st century and people no longer communicate using ASCII only so when anwering question about "is it letters only" you need to take into account letters from non-ASCII alphabets as well. Python has a pretty cool unicodedata library which among other things allows categorization of Unicode characters:
unicodedata.category('陳')
'Lo'
unicodedata.category('A')
'Lu'
unicodedata.category('1')
'Nd'
unicodedata.category('a')
'Ll'
The categories and their abbreviations are defined in the Unicode standard. From here you can quite easily you can come up with a function like this:
def only_letters(s):
for c in s:
cat = unicodedata.category(c)
if cat not in ('Ll','Lu','Lo'):
return False
return True
And then:
only_letters('Bzdrężyło')
True
only_letters('He7lo')
False
As you can see the whitelisted categories can be quite easily controlled by the tuple inside the function. See this article for a more detailed discussion.
The string.isalpha() function will work for you.
See http://www.tutorialspoint.com/python/string_isalpha.htm
Looks like people are saying to use str.isalpha.
This is the one line function to check if all characters are letters.
def only_letters(string):
return all(letter.isalpha() for letter in string)
all accepts an iterable of booleans, and returns True iff all of the booleans are True.
More generally, all returns True if the objects in your iterable would be considered True. These would be considered False
0
None
Empty data structures (ie: len(list) == 0)
False. (duh)
(1) Use str.isalpha() when you print the string.
(2) Please check below program for your reference:-
str = "this"; # No space & digit in this string
print str.isalpha() # it gives return True
str = "this is 2";
print str.isalpha() # it gives return False
Note:- I checked above example in Ubuntu.
A pretty simple solution I came up with: (Python 3)
def only_letters(tested_string):
for letter in tested_string:
if letter not in "abcdefghijklmnopqrstuvwxyz":
return False
return True
You can add a space in the string you are checking against if you want spaces to be allowed.

Categories