How to remove special characters from a string - python

I've been looking into how to create a program that removes any whitespaces/special characters from user input. I wan't to be left with just a string of numbers but I've not been able to work out quite how to do this. Is it possible anyone can help?
x = (input("Enter a debit card number: "))
x.translate(None, '!.;,')
print(x)
The code I have created is possibly to basic but yeah, it also doesn't work. Can anyone please help? :) I'm using Python3.

The way str.translate works is different in Py 3.x - it requires mapping a dictionary of ordinals to values, so instead you use:
x = input("Enter a debit card number: ")
result = x.translate(str.maketrans({ord(ch):None for ch in '!.;,'}))
Although you're better off just removing all non digits:
import re
result = re.sub('[^0-9], x, '')
Or using builtins:
result = ''.join(ch for ch in x if ch.isidigit())
It's important to note that strings are immutable and their methods return a new string - be sure to either assign back to the object or some other object to retain the result.

You don't need translate for this aim . instead you can use regex :
import re
x = input("Enter a debit card number: ")
x = re.sub(r'[\s!.;,]*','',x)
[\s!.;,]* match a single character present in the list below:
Quantifier: * Between zero and unlimited times, as many times as
possible, giving back as needed [greedy] \s match any white space
character [\r\n\t\f ] !.;, a single character in the list !.;,
literally
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl.

Assuming that you wanted only numbers (credit card number)
import re: # or 'from re import sub'
s = re.sub('[^0-9]+', "", my_str);
I've used this as an input:
my_str = "663388191712-483498-39434347!2848484;290850 2332049832048 23042 2 2";
What I've got is only numbers (because you mentioned that you want to be left with only numbers):
66338819171248349839434347284848429085023320498320482304222

Related

Can't wrap my head around how to remove a list of characters from another list

I've been able to isolate the list (or string) of characters I want excluded from a user entered string. But I don't see how to then remove all these unwanted characters. After I do this, I think I can try joining the user string so it all becomes one alphabet input like the instructions say.
Instructions:
Remove all non-alpha characters
Write a program that removes all non-alpha characters from the given input.
For example, if the input is:
-Hello, 1 world$!
the output should be:
Helloworld
My code:
userEntered = input()
makeList = userEntered.split()
def split(userEntered):
return list(userEntered)
if userEntered.isalnum() == False:
for i in userEntered:
if i.isalpha() == False:
#answer = userEntered[slice(userEntered.index(i))]
reference = split(userEntered)
excludeThis = i
print(excludeThis)
When I print excludeThis, I get this as my output:
-
,
1
$
!
So I think I might be on the right track. I need to figure it out how to get these characters out of the user input. Any help is appreciated.
Loop over the input string. If the character is alphabetic, add it to the result string.
userEntered = input()
result = ''
for char in userEntered:
if char.isalpha():
result += char
print(result)
This can also be done with a regular expression:
import re
userEntered = input()
result = re.sub(r'[^a-z]', '', userEntered, flags=re.I)
The regexp [^a-z] matches anything except an alphabetic character. The re.I flag makes it case-insensitive. These are all replaced with an empty string, which removes them.
There's basically two main parts to this: distinguish alpha from non-alpha, and get a string with only the former. If isalpha() is satisfactory for the former, then that leaves the latter. My understanding is that the solution that is considered most Pythonic would be to join a comprehension. This would like this:
''.join(char for char in userEntered if char.isalpha())
BTW, there are several places in the code where you are making it more complicated than it needs to be. In Python, you can iterate over strings, so there's no need to convert userEntered to a list. isalnum() checks whether the string is all alphanumeric, so it's rather irrelevant (alphanumeric includes digits). You shouldn't ever compare a boolean to True or False, just use the boolean. So, for instance, if i.isalpha() == False: can be simplified to just if not i.isalpha():.

Splitting a string into smaller strings using split()

I'm trying to write a D&D dice roller program in Python. I'd like to have the input be typed in the form "xdy+z" (ex. 4d6+12, meaning roll 4 6-sided di and add 12 to the result), and have the program "roll the dice", add the modifier, and output the result. I'm trying to figure out how to split the string into numbers so I can have the program do the math.
I know of the split() function, and I'm trying to use it. When I input the example above, I get the string [4 6 12], but when I have the program print string[1], I get a white space because it's still one full string. I'd either like to figure out how to get the program to identify the individual numbers in the string, or a way to split the full string into smaller strings (like string1 = [4], string2 = [6], string3 = [12]).
Yes, I tried Google and searching this site, but I'm not sure what the terminology for this type of process is to it's been hard to find help.
Here's the relevant code:
separators = ["d", "+", "-"]
for sep in separators:
inputText = inputText.replace(sep, ' ')
Solution without regular expressions, using find, slicing and split. First, find the position of + or - to get the last number. Then, split the rest at d. Also, convert the partial strings to int:
s = "4d6+12"
if "-" in s:
pos = s.find("-")
add = int(s[pos:])
elif "+" in s:
pos = s.find("+")
add = int(s[pos:])
else:
add = 0
pos = len(s)
rolls, sides = map(int, s[:pos].split("d"))
print(f"{rolls=} {sides=} {add=}")
You can use a regular expression for this instead of splitting the string up. Just capture each numeric position with a group. Check out the docs on the re package for details.
Here's a sample:
import re
pattern = r"(?P<rolls>\d+)d(?P<sides>\d+)\+(?P<add>\d+)"
match = re.match(pattern, input_text)
rolls = int(match["rolls"])
sides = int(match["sides"])
add = int(match["add"])
With input_text as "4d6+12" as in your example, the resulting values are:
print(rolls) # 4
print(sides) # 6
print(add) # 12

Return a string of country codes from an argument that is a string of prices

So here's the question:
Write a function that will return a string of country codes from an argument that is a string of prices (containing dollar amounts following the country codes). Your function will take as an argument a string of prices like the following: "US$40, AU$89, JP$200". In this example, the function would return the string "US, AU, JP".
Hint: You may want to break the original string into a list, manipulate the individual elements, then make it into a string again.
Example:
> testEqual(get_country_codes("NZ$300, KR$1200, DK$5")
> "NZ, KR, DK"
As of now, I'm clueless as to how to separate the $ and the numbers. I'm very lost.
I would advice using and looking up regex expressions
https://docs.python.org/2/library/re.html
If you use re.findall it will return you a list of all matching strings, and you can use a regex expression like /[A-Z]{2}$ to find all the two letter capital words in the list.
After that you can just create a string from the resulting list.
Let me know if that is not clear
def test(string):
return ", ".join([item.split("$")[0] for item in string.split(", ")])
string = "NZ$300, KR$1200, DK$5"
print test(string)
Use a regular expression pattern and append the matches to a string. (\w{2})\$ matches exactly 2 word characters followed by by a $.
def get_country_codes(string):
matches = re.findall(r"(\w{2})\$", string)
return ", ".join(match for match in matches)

re.rompile returns true with false, not allow symbols

Im trying to use regex do check a variable for accepted letters and numbers. This is my def:
def special_match(strg, search=re.compile(r'[a-z0-9]').search):
if bool(search(strg)) is True:
print ('Yes: ' + strg)
elif:
print ('nej: ')
while 1:
variabel = raw_input('Enter something: ')
special_match(variabel)
sys.exit()
And it seems that is accepts not allow symbols in combination with allow symbols:
Enter something: qwerty
Yes: qwerty
Enter something: 1234
Yes: 1234
Enter something: !!!!
nej!
Enter something: 1234qwer!!!!
Yes: 1234qwer!!!!
Enter something:
The last one should not be accepted.. What I'm doing wrong??
All your regular expression search is doing is checking to see if at least one of the characters is present.
If you want to require that the entire string contains nothing but those characters, then you can use:
r'^[a-z0-9]*$'
That anchors the pattern at both the start and end of the string, and only matches if all of the characters in between are in the specified set of characters.
Note that this will also match the empty string. If you wish to require at least one character, then you can change the * to +.
the search method finds for regex you gave and if it finds then returns a Match object here 1234qwer!!! has [a-z0-9] but !!!! doesnt.
Try a!!!. that will also return True.
You could try doing
re.search(r"[^a-z0-9]",word)
and if this returns True that means your word has something other than digits and alphabets and that should be rejected.
NOTE: ^ means not.
The only thing that regex does is check that there is a number or a letter in your string. If you want to check that it only has numbers and letters, you need to anchor your pattern at the start and end, and add a repeat: r'^[a-z0-9]+$'
Note though that there is no need to use regex for this: the string isalnum() method will do the same thing.
There are a couple of other odd things in your code; you should definitely not be compiling a regex in the function signature and passing on the resulting search method; also you should not be converting the result to bool explicitly, and you should not compare bools with is True. A more Pythonic version, assuming you wanted to stick to the regex, would be:
def special_match(strg, search=None):
if not search:
search = re.compile(r'[a-z0-9]').search
if search(strg):
print ('Yes: ' + strg)
else:
print ('nej: ')
Also note elif is a syntax error on its own.

Retrieve part of string, variable length

I'm trying to learn how to use Regular Expressions with Python. I want to retrieve an ID number (in parentheses) in the end from a string that looks like this:
"This is a string of variable length (561401)"
The ID number (561401 in this example) can be of variable length, as can the text.
"This is another string of variable length (99521199)"
My coding fails:
import re
import selenium
# [Code omitted here, I use selenium to navigate a web page]
result = driver.find_element_by_class_name("class_name")
print result.text # [This correctly prints the whole string "This is a text of variable length (561401)"]
id = re.findall("??????", result.text) # [Not sure what to do here]
print id
This should work for your example:
(?<=\()[0-9]*
?<= Matches something preceding the group you are looking for but doesn't consume it. In this case, I used \(. ( is a special character, so it has to be escaped with \. [0-9] matches any number. The * means match any number of the directly preceding rule, so [0-9]* means match as many numbers as there are.
Solved this thanks to Kaz's link, very useful:
http://regex101.com/
id = re.findall("(\d+)", result.text)
print id[0]
You can use this simple solution :
>>> originString = "This is a string of variable length (561401)"
>>> str1=OriginalString.replace("("," ")
'This is a string of variable length 561401)'
>>> str2=str1.replace(")"," ")
'This is a string of variable length 561401 '
>>> [int(s) for s in string.split() if s.isdigit()]
[561401]
First, I replace parantheses with space. and then I searched the new string for integers.
No need to really use regular expressions here, if it is always at the end and always in parenthesis you can split, extract last element and remove the parenthesis by taking the substring ([1:-1]). Regexes are relatively time expensive.
line = "This is another string of variable length (99521199)"
print line.split()[-1][1:-1]
If you did want to use regular expressions I would do this:
import re
line = "This is another string of variable length (99521199)"
id_match = re.match('.*\((\d+)\)',line)
if id_match:
print id_match.group(1)

Categories