This question already has answers here:
Matching Nested Structures With Regular Expressions in Python
(6 answers)
Closed 8 years ago.
If I have a string:
s = aaa{bbb}ccc{ddd{eee}fff}ggg
is it possible to find all matches based on outer curly braces?
m = re.findall(r'\{.+?\}', s, re.DOTALL)
returns
['{bbb}', '{ddd{eee}']
but I need:
['{bbb}', '{ddd{eee}fff}']
Is it possible with python regex?
If you want it to work in any depth, but don't necessarily need to use regex, you can implement a simple stack based automaton:
s = "aaa{bbb}ccc{ddd{eee}fff}ggg"
def getStuffInBraces(text):
stuff=""
count=0
for char in text:
if char=="{":
count += 1
if count > 0:
stuff += char
if char=="}":
count -= 1
if count == 0 and stuff != "":
yield stuff
stuff=""
getStuffInBraces is an iterator, so if you want a list of results, you can use print(list(getStuffInBraces(s))).
{(?:[^{}]*{[^{]*})*[^{}]*}
Try this.See demo.
https://regex101.com/r/fA6wE2/28
P.S It will only work the {} is not more than 1 level deep.
You could use this regex also.
\{(?:{[^{}]*}|[^{}])*}
DEMO
>>> s = 'aaa{bbb}ccc{ddd{eee}fff}ggg'
>>> re.findall(r'\{(?:{[^{}]*}|[^{}])*}', s)
['{bbb}', '{ddd{eee}fff}']
Use recursive regex for 1 level deep.
\{(?:(?R)|[^{}])*}
Code:
>>> import regex
>>> regex.findall(r'\{(?:(?R)|[^{}])*}', s)
['{bbb}', '{ddd{eee}fff}']
But this would be supported by the external regex module.
Related
This question already has answers here:
Keeping only certain characters in a string using Python?
(3 answers)
Closed 10 months ago.
How would I remove everything but certain characters from a string such as (+,1,2,3,4,5,6,7,8,9,0)
math = ("tesfsgfs9r543+54")
output = ("9543+54")
You can use regular expressions.
import re
output = re.sub("[^+0-9]", "", math)
Using iterators is also possible, but it probably is slower. (not recommended)
output = ''.join(ch for ch in math if ch in "+1234567890")
Using a for loop.
def keep_characters(string, char_collection):
result = ""
for ch in string:
if ch in char_collection:
result += ch
return result
output = keep_characters(math, "+1234567890")
This question already has answers here:
How do I reverse a string in Python?
(19 answers)
Closed 2 years ago.
I am trying to create a program in which the user inputs a statement containing two '!' surrounding a string. (example: hello all! this is a test! bye.) I am to grab the string within the two exclamation points, and print it in reverse letter by letter. I have been able to find the start and endpoints that contain the statement, however I am having difficulties creating an index that would cycle through my variable userstring in reverse and print.
test = input('Enter a string with two "!" surrounding portion of the string:')
expoint = test.find('!')
#print (expoint)
twoexpoint = test.find('!', expoint+1)
#print (twoexpoint)
userstring = test[expoint+1 : twoexpoint]
#print(userstring)
number = 0
while number < len(userstring) :
letter = [twoexpoint - 1]
print (letter)
number += 1
twoexpoint - 1 is the last index of the string you need relative to the input string. So what you need is to start from that index and reduce. In your while loop:
letter = test[twoexpoint- number - 1]
Each iteration you increase number which will reduce the index and reverse the string.
But this way you don't actually use the userstring you already found (except for the length...). Instead of caring for indexes, just reverse the userstring:
for letter in userstring[::-1]:
print(letter)
Explanation we use regex to find the pattern
then we loop for every occurance and we replace the occurance with the reversed string. We can reverse string in python with mystring[::-1] (works for lists too)
Python re documentation Very usefull and you will need it all the time down the coder road :). happy coding!
Very usefull article Check it out!
import re # I recommend using regex
def reverse_string(a):
matches = re.findall(r'\!(.*?)\!', a)
for match in matches:
print("Match found", match)
print("Match reversed", match[::-1])
for i in match[::-1]:
print(i)
In [3]: reverse_string('test test !test! !123asd!')
Match found test
Match reversed tset
t
s
e
t
Match found 123asd
Match reversed dsa321
d
s
a
3
2
1
You're overcomplicating it. Don't bother with an index, simply use reversed() on userstring to cycle through the characters themselves:
userstring = test[expoint+1:twoexpoint]
for letter in reversed(userstring):
print(letter)
Or use a reversed slice:
userstring = test[twoexpoint-1:expoint:-1]
for letter in userstring:
print(letter)
This question already has answers here:
How to use a variable inside a regular expression?
(12 answers)
Closed 3 years ago.
We just learned about using regular expression in my first python course (extremely new to programming), and one of the homework problems that I am struggling with requires us to use regular expression to find all the words of length n or longer, and then use that regular expression to find the longest word used from a text file.
I have no problem when I want to test out a specific length, but it returns an empty list when I use an arbitrary variable n:
import re
with open('shakespeare.txt') as file:
shakespeare = file.read()
n = 10 #if I take this out and put an actual number in the curly bracket below, it works just fine.
words = re.findall('^[A-Za-z\'\-]{n,}', shakespeare, re.M)
print(words)
len(words)
I'm not sure what I did wrong and how to resolve this. Any help is greatly appreciated!
For more context...
To find the longest word, I used:
#for word with special characters such as '-' and '''
longest_word = max(re.findall('\S+', shakespeare, re.M), key = len)
#for word without special characters:
longest_pure_word = max(re.findall('[A-Za-z]+ ', shakespeare, re.M), key = len)
output1(special char): tragical-comical-historical-pastoral
output2(pure word): honorificabilitudinitatibus
I didn't use n because I couldn't get the first part of the question to work.
Try this:
import re
with open('shakespeare.txt') as file:
shakespeare = file.read()
n = 10
words = re.findall('^[A-Za-z\'\-]{'+str(n)+',}', shakespeare, re.M)
print(words)
len(words)
This question already has answers here:
Python replace string pattern with output of function
(4 answers)
Closed 5 years ago.
Say I have the following string:
mystr = "6374696f6e20????28??????2c??2c????29"
And I want to replace every sequence of "??" with its length\2. So for the example above, I want to get the following result:
mystr = "6374696f6e2022832c12c229"
Meaning:
???? replaced with 2
?????? replaced with 3
?? replaced with 1
???? replaced with 2
I tried the following but I'm not sure it's the good approach, and anyway -- it doesn't work:
regex = re.compile('(\?+)')
matches = regex.findall(mystr)
if matches:
for match in matches:
match_length = len(match)/2
if (match_length > 0):
mystr= regex .sub(match_length , mystr)
You can use a callback function in Python's re.sub. FYI lambda expressions are shorthand to create anonymous functions.
See code in use here
import re
mystr = "6374696f6e20????28??????2c??2c????29"
regex = re.compile(r"\?+")
print(re.sub(regex, lambda m: str(int(len(m.group())/2)), mystr))
There seems to be uncertainty about what should happen in the case of ???. The above code will result in 1 since it converts to int. Without int conversion the result would be 1.0. If you want to ??? to become 1? you can use the pattern (?:\?{2})+ instead.
This question already has answers here:
How can I find the number of overlapping sequences in a String with Python? [duplicate]
(4 answers)
Closed 9 years ago.
I wanted to count the number of times that a string like 'aa' appears in 'aaa' (or 'aaaa').
The most obvious code gives the wrong (or at least, not the intuitive) answer:
'aaa'.count('aa')
1 # should be 2
'aaaa'.count('aa')
2 # should be 3
Does anyone have a simple way to fix this?
From str.count() documentation:
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
So, no. You are getting the expected result.
If you want to count number of overlapping matches, use regex:
>>> import re
>>>
>>> len(re.findall(r'(a)(?=\1)', 'aaa'))
2
This finds all the occurrence of a, which is followed by a. The 2nd a wouldn't be captured, as we've used look-ahead, which is zero-width assertion.
haystack = "aaaa"
needle = "aa"
matches = sum(haystack[i:i+len(needle)] == needle
for i in xrange(len(haystack)-len(needle)+1))
# for Python 3 use range instead of xrange
The solution is not taking overlap into consideration.
Try this:
big_string = "aaaa"
substring = "aaa"
count = 0
for char in range(len(big_string)):
count += big_string[char: char + len(subtring)] == substring
print count
You have to be careful, because you seem to looking for non-overlapping substrings. To fix this I'd do:
len([s.start() for s in re.finditer('(?=aa)', 'aaa')])
And if you don't care about the position where the substring starts you can do:
len([_ for s in re.finditer('(?=aa)', 'aaa')])
Although someone smarter than myself might be able to show that there are performances differences :)