Remove extra characters in the string in Python - python

I have couple of strings (each string is a set of words) which has special characters in them. I know using strip() function, we can remove all occurrences of only one specific character from any string. Now, I would like to remove set of special characters (include !##%&*()[]{}/?<> ) etc.
What is the best way you can get these unwanted characters removed from the strings.
in-str = "#John, It's a fantastic #week-end%, How about () you"
out-str = "John, It's a fantastic week-end, How about you"

import string
s = "#John, It's a fantastic #week-end%, How about () you"
for c in "!##%&*()[]{}/?<>":
s = string.replace(s, c, "")
print s
prints "John, It's a fantastic week-end, How about you"

The strip function removes only leading and trailing characters.
For your purpose I would use python set to store your characters, iterate over your input string and create new string from characters not present in the set. According to other stackoverflow article this should be efficient. At the end, just remove double spaces by clever " ".join(output_string.split()) construction.
char_set = set("!##%&*()[]{}/?<>")
input_string = "#John, It's a fantastic #week-end%, How about () you"
output_string = ""
for i in range(0, len(input_string)):
if not input_string[i] in char_set:
output_string += input_string[i]
output_string = " ".join(output_string.split())
print output_string

Try out this:
import re
foo = 'a..!b...c???d;;'
chars = [',', '!', '.', ';', '?']
print re.sub('[%s]' % ''.join(chars), '', foo)
I presume that this is what you wanted.

try
s = "#John, It's a fantastic #week-end%, How about () you"
chars = "!##%&*()[]{}/?<>"
s_no_chars = "".join([k for k in s if k not in chars])
s_no_chars_spaces = " ".join([ d for d in "".join([k for k in s if k not in chars]).split(" ") if d])

Related

Python String adjust

Hello is use some method like .isupper() in a loop, or string[i+1] to find my lower char but i don't know how to do that
input in function -> "ThisIsMyChar"
expected -> "This is my char"
I´ve done it with regex, could be done with less code but my intention is readable
import re
def split_by_upper(input_string):
pattern = r'[A-Z][a-z]*'
matches = re.findall(pattern, input_string)
if (matches):
output = matches[0]
for word in matches[1:]:
output += ' ' + word[0].lower() + word[1:]
return output
else:
return input_string
print(split_by_upper("ThisIsMyChar"))
>> split_by_upper() -> "This is my char"
You could use re.findall and str.lower:
>>> import re
>>> s = 'ThisIsMyChar'
>>> ' '.join(w.lower() if i >= 1 else w for i, w in enumerate(re.findall('.[^A-Z]*', s)))
'This is my char'
You should first try by yourself. If you didn't get it done, you can do something like this:
# to parse input string
def parse(str):
result= "" + str[0];
for i in range(1, len(str)):
ch = str[i]
if ch.isupper():
result += " ";
result += ch.lower();
return result;
# input string
str = "ThisIsMyChar";
print(parse(str))
First you need to run a for loop and check for Uppercase words then when you find it just add a space at the starting, lower the word and increment it to your new string. Simple, more code is explained in comments in the code itself.
def AddSpaceInTitleCaseString(string):
NewStr = ""
# Check for Uppercase string in the input string char-by-char.
for i in string:
# If it found one, add it to the NewStr variable with a space and lowering it's case.
if i.isupper(): NewStr += f" {i.lower()}"
# Else just add it as usual.
else: NewStr += i
# Before returning the NewStr, remove all the leading and trailing spaces from it.
# And as shown in your question I'm assuming that you want the first letter or your new sentence,
# to be in uppercase so just use 'capitalize' function for it.
return NewStr.strip().capitalize()
# Test.
MyStr = AddSpaceInTitleCaseString("ThisIsMyChar")
print(MyStr)
# Output: "This is my char"
Hope it helped :)
Here is a concise regex solution:
import re
capital_letter_pattern = re.compile(r'(?!^)[A-Z]')
def add_spaces(string):
return capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string)
if __name__ == '__main__':
print(add_spaces('ThisIsMyChar'))
The pattern searches for capital letters ([A-Z]), and the (?!^) is negative lookahead that excludes the first character of the input ((?!foo) means "don't match foo, ^ is "start of line", so (?!^) is "don't match start of line").
The .sub(...) method of a pattern is usually used like pattern.sub('new text', 'my input string that I want changed'). You can also use a function in place of 'new text', in which case the function is called with the match object as an argument, and the value returned by the function is used as the replacement string.
The expression capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string) replaces all matches (all capital letters except at the start of the line) using a lambda function to add a space before and make the letter lowercase. match[0] means "the entirety of the matched text", which in this case is the captial letter.
You can split it via Regex using r"(?<!^)(?=[A-Z])" pattern:
import re
txt = 'ThisIsMyChar'
c = re.compile(r"(?<!^)(?=[A-Z])")
first, *rest = map(str.lower, c.split(txt))
print(f'{first.title()} {" ".join(rest)}')
Pattern explanation:
(?<!^) checks to see if it is not at the beginning.
(?=[A-Z]) checks to see there a capital letter after it.
note These are non-capturing groups.

How to insert space by punctuation?

I have strings like so: hey what is up!, "what did you say?", "he said 'well'", etc. and a regex expression like so: [!%&'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]´. These are my delimiters and into the strings shown a space shall be inserted like so: "hey what is up !", "what did you say ?", "he said ' well '". So if one of the delimiters is in front of another character sequence, add a space, and if its is after, add space as well.
How can I achieve this? I do not want to split by these delimiters.
Here's my solution but I would be curious how to solve it with regex.
space = set("[!%&'()$#\"/\*+,-.:;<=>?#[]^_´`{|}~]")
for sent in self.sentences:
sent = list(sent)
for i, char in enumerate(sent):
# Make sure to respect length of string when indexing
if i != 0:
# insert space in front if char is punctuation
if sent[i] in space and sent[i - 1] != " ":
sent.insert(i, " ")
if i != len(sent)-1:
# insert space after if char is punctuation
if sent[i] in space and sent[i + 1] != " ":
sent.insert(i + 1, " ")
You could expand your pattern to catch optional spaces and then replace by capture group plus spaces before and after (loop only for demo, not neccessary):
import re
strings = ["hey what is up!", "what did you say?", "he said 'well'"]
pattern = r'(\s?[!%&\'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]\s?)'
for string in strings:
print(re.sub(pattern, r' \1 ', string))
This will give this output:
hey what is up !
what did you say ?
he said ' well '
Without the aid of the re module you could simply do this:
punctuation = "!%&'()$#\"/\\*+,-.:;<=>?#[]^_´{|}~"
mystring = "Well hello! How are you?"
mylist = list(mystring)
i = 0
for c in mystring:
if c in punctuation:
mylist.insert(i, ' ')
i += 2
else:
i += 1
print(''.join(mylist))
You can make a loop that goes through your strings and when it finds a ponctuation character use the slice function to cut your string in half and concatenate with a space in between.
For example:
for i in yourString:
if yourString[i] == '!':
newString = yourString.slice(0, i) + " " + yourString.slice(i + 1)
It only checks for "!" but you could replace it with a dictionnary of ponctuation characters

printing . before every character in a string

I have a string and it's "java is fun for sure" and i want to print
delete every vowel letters(aeiou)
print a "." before every character
so the out come would be like ".j.v. .s. .f.n. .f.r. .s.r"
I have tried this
s = str(input())
s.translate({ord(i): None for i in 'aeiou '})
the outcome is "jvsfnfrsr" but i don't know how to print "." before the letters.
Some help would be awesome! I'm sure this is a very simple issue, but for some reason i cannot come up with it!
Thx in advance! :)
import re
s = str(input())
s = s.translate({ord(i): None for i in 'aeiouAEIOU'})
print(re.sub('([^\s])', r'.\1', s))
Input: "java is fun for sure"
Output: ".j.v .s .f.n .f.r .s.r"
A solution using regex
You can use the 3-param version of maketrans to create the needed translation dictionary. Use the the sep param of print(..) to place the .:
s = "java is fun for sure"
s1 = s.translate(str.maketrans("", "", "aeiou")) # AEIOU are kept as is
print("", *s.translate(str.maketrans("", "", "aeiou")), sep=".")
or in short:
print("",*"java is fun for sure".translate(str.maketrans("", "", "aeiou")),sep=".")
The * before the string-var decomposes the string into its letters:
print(*"abc", sep = "#") # == print("a","b","c", sep = "#")
Output:
.j.v. .s. .f.n. .f.r. .s.r
If you need the resulting string you can use str.join():
s2 = '.' + '.'.join(s1)
What about this? It retains capitalised letters also.
vowels = 'aeiouAEIOU'
tmp = ''.join([char for char in str1 if char not in vowels])
final = ''.join(['.'+char for char in tmp])

What would be the easiest way to search through a list?

It's actually a string but I just converted it to a list because the answer is supposed to be returned as a list. I've been looking at this problem for hours now and cannot get it. I'm supposed to take a string, like "Mary had a little lamb" for example and another string such as "ab" for example and search through string1 seeing if any of the letters from string2 occur. So if done correctly with the two example it would return
["a=4","b=1"]
I have this so far:
def problem3(myString, charString):
myList = list(myString)
charList = list(charString)
count = 0
newList = []
newString = ""
for i in range(0,len(myList)):
for j in range(0,len(charList)):
if charList[j] == myList[i]:
count = count + 1
newString = charList[j] + "=" + str(count)
newList.append(newString)
return newList
Which returns [a=5] I know it's something with the newList.append(string) and where it should be placed, anyone have any suggestions?
You can do this very easily with list comprehensions and the count function that strings (and lists!) have:
Split the search string into a list of chars.
For each character in the search string, loop over the input string and determine how much it occurs (via count).
Example:
string = 'Mary had a little lamb'
search_string = 'ab'
search_string_chars = [char for char in search_string]
result = []
for char in search_string_chars:
result.append('%s=%d' % (char, string.count(char)))
Result:
['a=4', 'b=1']
Note that you don't need to split the search_string ('ab') into a list of characters, as strings are already lists of characters - the above was done that way to illustrate the concept. Hence, a reduced version of the above could be (which also yields the same result):
string = 'Mary had a little lamb'
search_string = 'ab'
result = []
for char in search_string:
result.append('%s=%d' % (char, string.count(char)))
Here's a possible solution using Counter as mentioned by coder,
from collections import Counter
s = "Mary had a little lambzzz"
cntr = Counter(s)
test_str = "abxyzzz"
results = []
for letter in test_str:
if letter in s:
occurrances = letter + "=" + str(cntr.get(letter))
else:
occurrances = letter + "=" + "0"
if occurrances not in results:
results.append(occurrances)
print(results)
output
['a=4', 'b=1', 'x=0', 'y=1', 'z=3']
import collections
def count_chars(s, chars):
counter = collections.Counter(s)
return ['{}={}'.format(char, counter[char]) for char in set(chars)]
That's all. Let Counter do the work of actually counting the characters in the string. Then create a list comprehension of format strings using the characters in chars. (chars should be a set and not a list so that if there are duplicate characters in chars, the output will only show one.)

small issue with whitespeace/punctuation in python?

I have this function that will convert text language into English:
def translate(string):
textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go',
'lol': 'laugh out loud', 'ur': 'your',}
translatestring = ''
for word in string.split(' '):
if word in textDict:
translatestring = translatestring + textDict[word]
else:
translatestring = translatestring + word
return translatestring
However, if I want to translate y u l8? it will return whyyoul8?. How would I go about separating the words when I return them, and how do I handle punctuation? Any help appreciated!
oneliner comprehension:
''.join(textDict.get(word, word) for word in re.findall('\w+|\W+', string))
[Edit] Fixed regex.
You're adding words to a string without spaces. If you're going to do things this way (instead of the way suggested to your in your previous question on this topic), you'll need to manually re-add the spaces since you split on them.
"y u l8" split on " ", gives ["y", "u", "l8"]. After substitution, you get ["why", "you", "late"] - and you're concatenating these without adding spaces, so you get "whyyoulate". Both forks of the if should be inserting a space.
You can just add a + ' ' + to add a space. However, I think what you're trying to do is this:
import re
def translate_string(str):
textDict={'y':'why', 'r':'are', "l8":'late', 'u':'you', 'gtg':'got to go', 'lol': 'laugh out loud', 'ur': 'your',}
translatestring = ''
for word in re.split('([^\w])*', str):
if word in textDict:
translatestring += textDict[word]
else:
translatestring += word
return translatestring
print translate_string('y u l8?')
This will print:
why you late?
This code handles stuff like question marks a bit more gracefully and preserves spaces and other characters from your input string, while retaining your original intent.
I'd like to suggest the following replacement for this loop:
for word in string.split(' '):
if word in textDict:
translatestring = translatestring + textDict[word]
else:
translatestring = translatestring + word
for word in string.split(' '):
translatetring += textDict.get(word, word)
The dict.get(foo, default) will look up foo in the dictionary and use default if foo isn't already defined.
(Time to run, short notes now: When splitting, you could split based on punctuation as well as whitespace, save the punctuation or whitespace, and re-introduce it when joining the output string. It's a bit more work, but it'll get the job done.)

Categories