Remove everything but certain characters from a string | Python [duplicate] - python

This question already has answers here:
Keeping only certain characters in a string using Python?
(3 answers)
Closed 10 months ago.
How would I remove everything but certain characters from a string such as (+,1,2,3,4,5,6,7,8,9,0)
math = ("tesfsgfs9r543+54")
output = ("9543+54")

You can use regular expressions.
import re
output = re.sub("[^+0-9]", "", math)
Using iterators is also possible, but it probably is slower. (not recommended)
output = ''.join(ch for ch in math if ch in "+1234567890")
Using a for loop.
def keep_characters(string, char_collection):
result = ""
for ch in string:
if ch in char_collection:
result += ch
return result
output = keep_characters(math, "+1234567890")

Related

how to only keep certain characters in a string [duplicate]

This question already has answers here:
Keeping only certain characters in a string using Python?
(3 answers)
Closed 1 year ago.
I'm just a beginner so this might be a stupid question but, I'm trying to remove every character from a string except the ones in a list
for example:
you have a string H][e,l}l.o1;4.I want only letters and numbers in the output.
It should look like this:
Hello14
Does anyone have any idea what needs to come behind the str1 = or any other methods?
This is what I tried so far:
def stringCleaner(s):
# initialize an empty string
str1 = " "
chars = set('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
for x in range(len(s)):
if any((c in chars) for c in s):
str1=
return str1
It will be better to simply iterate over your input string and create a list of allowed characters and join() the list into a string again at the end.
def stringCleaner(s):
# initialize an empty list
str1 = []
chars = set('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
for c in s:
if c in chars:
str1.append(c)
return ''.join(str1)
And then you should be able to see that its only a few short steps to get even better code such as the answer that #user1740577 has posted.
You can for any char in s check in chars list and .join() all of them if exist in chars list.
Try this:
def stringCleaner(s):
chars = set('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
return ''.join(c for c in s if c in chars)
stringCleaner('H][e,l}l.o1;4.I')
# 'Hello14I'

How to delete certain words from string without spaces? [duplicate]

This question already has answers here:
How to remove substring from string in Python 3
(2 answers)
Closed 2 years ago.
Is there I way to delete words from a string in Python if it doesn't have spaces. For example, if you have the string "WUBHELLOWUB" I want to remove "WUB". I tried
s = 'WUBHELLOWUB'
while 'WUB' in s:
ind = s.find('WUB')
s = s[:ind] + s[ind+1:]
print(s)
but it did not work.
You can use regex
import re
data=r"\S*WUB\S*"
re.sub(data, '','WUBWUBHELLO')

Spliting ones and zeros in binary with regex [duplicate]

This question already has answers here:
Split binary number into groups of zeros and ones
(3 answers)
How to split a binary string into groups that containt only ones or zeros with Java regular expressions? [duplicate]
(5 answers)
Closed 2 years ago.
I need to split ones and zeros in any binary representation like this.
code = 10001100
output_list = [1,000,11,00]
I couldnt find the pattern.
and I am using python3.x
You don't really need a regex for this problem. You can use groupby from itertools to do this:
import itertools
code = "10001100"
gs = [list(g) for _, g in itertools.groupby(code)]
If you want to use regex, then:
import re
code = r'10001100'
output_list = re.findall(r'(0+|1+)', code)
regex is not required. Here is pythonic way to do it:
code = '10001100'
output_list = []
interim_list = [code[i] + ',' if i != len(code)-1 and code[i] != code[i+1] else code[i] for i in range(len(code))]
output_list.append(''.join(interim_list))
print(output_list)
>>> print(output_list)
['1,000,11,00']

Python re wrong output [duplicate]

This question already has answers here:
get index of character in python list
(4 answers)
Regular expression to match a dot
(7 answers)
Closed 3 years ago.
I want to find the position of '.', but when i run code below:
text = 'Hello world.'
pattern = '.'
search = re.search(pattern,text)
print(search.start())
print(search.end())
Output is:
0
1
Place of '.' isn't 0 1.
So why is it giving wrong output?
You can use find method for this task.
my_string = "test"
s_position = my_string.find('s')
print (s_position)
Output
2
If you really want to use RegEx be sure to escape the dot character or it will be interpreted as a special character.
The dot in RegEx matches any character except the newline symbol.
text = 'Hello world.'
pattern = '\.'
search = re.search(pattern,text)
print(search.start())
print(search.end())

Match content between curly braces than also can contain curly braces [duplicate]

This question already has answers here:
Matching Nested Structures With Regular Expressions in Python
(6 answers)
Closed 8 years ago.
If I have a string:
s = aaa{bbb}ccc{ddd{eee}fff}ggg
is it possible to find all matches based on outer curly braces?
m = re.findall(r'\{.+?\}', s, re.DOTALL)
returns
['{bbb}', '{ddd{eee}']
but I need:
['{bbb}', '{ddd{eee}fff}']
Is it possible with python regex?
If you want it to work in any depth, but don't necessarily need to use regex, you can implement a simple stack based automaton:
s = "aaa{bbb}ccc{ddd{eee}fff}ggg"
def getStuffInBraces(text):
stuff=""
count=0
for char in text:
if char=="{":
count += 1
if count > 0:
stuff += char
if char=="}":
count -= 1
if count == 0 and stuff != "":
yield stuff
stuff=""
getStuffInBraces is an iterator, so if you want a list of results, you can use print(list(getStuffInBraces(s))).
{(?:[^{}]*{[^{]*})*[^{}]*}
Try this.See demo.
https://regex101.com/r/fA6wE2/28
P.S It will only work the {} is not more than 1 level deep.
You could use this regex also.
\{(?:{[^{}]*}|[^{}])*}
DEMO
>>> s = 'aaa{bbb}ccc{ddd{eee}fff}ggg'
>>> re.findall(r'\{(?:{[^{}]*}|[^{}])*}', s)
['{bbb}', '{ddd{eee}fff}']
Use recursive regex for 1 level deep.
\{(?:(?R)|[^{}])*}
Code:
>>> import regex
>>> regex.findall(r'\{(?:(?R)|[^{}])*}', s)
['{bbb}', '{ddd{eee}fff}']
But this would be supported by the external regex module.

Categories