I have a string s containing:-
Hello {full_name} this is my special address named {address1}_{address2}.
I am attempting to match all instances of strings that is contained within the curly brackets.
Attempting:-
matches = re.findall(r'{.*}', s)
gives me
['{full_name}', '{address1}_{address2}']
but what I am actually trying to retrieve is
['{full_name}', '{address1}', '{address2}']
How can I do that?
>>> import re
>>> text = 'Hello {full_name} this is my special address named {address1}_{address2}.'
>>> re.findall(r'{[^{}]*}', text)
['{full_name}', '{address1}', '{address2}']
Try a non-greedy match:
matches = re.findall(r'{.*?}', s)
You need a non-greedy quantifier:
matches = re.findall(r'{.*?}', s)
Note the question mark ?.
Related
i have regex https://regex101.com/r/2H5ew6/1
(\!|\#)(1)
Hello!1 World
and i wanna get first mark (!|#) and change the number 1 to another number 2
I did
{\1}2_
\1\\2_
but it adds extra text and i just wanna change the number
i expect result to be
Hello!2_World
and ifusing # to be
Hello#2_World
Match and capture either ! or # in a named capture group, here called char, if followed by one or more digits and a whitespace:
(?P<char>[!#])\d+\s
Substitute with the named capture, \g<char> followed by 2_:
\g<char>2_
Demo
If you only want the substitution if there's a 1 following either ! or #, replace \d+ with 1.
In your substitution you need to change the {\1}2_ to just 2_.
string = "Hello!1 World"
pattern = "(\!|\#)(1)"
replacement = "2_"
result = re.sub(pattern, replacement, string)
Why not: string.replace('!1 ', '!2_').replace('#1 ', '#2_') ?
>>> string = "Hello!1 World"
>>> repl = lambda s: s.replace('!1 ', '!2_').replace('#1 ', '#2_')
>>> string2 = repl(string)
>>> string2
'Hello!2_World'
>>> string = "Hello!12 World"
>>> string2 = repl(string)
>>> string2
'Hello!12 World'
The replacement for you pattern should be \g<1>2_
Regex demo
You could also shorten your pattern to a single capture with a character class [!#] and a match and use the same replacement as above.
([!#])1
Regex demo
Or with a lookbehind assertion without any groups and replace with 2_
(?<=[!#])1
Regex demo
Is it possible to return the contents that match a wildcard (like .*) in a regex pattern in Python?
For example, a match like:
re.search('stack.*flow','stackoverflow')
would return the string 'over'.
Use a capturing group:
>>> import re
>>> re.search('stack(.*)flow', 'stackoverflow').group(1)
'over'
Yes, you can capture your result. For this, just use the ()
matchobj = re.search('stack(.*)flow','stackoverflow')
print(matchobj.group(1)) # => over
I have a string "Name(something)" and I am trying to extract the portion of the string within the parentheses!
Iv'e tried the following solutions but don't seem to be getting the results I'm looking for.
n.split('()')
name, something = n.split('()')
You can use a simple regex to catch everything between the parenthesis:
>>> import re
>>> s = 'Name(something)'
>>> re.search('\(([^)]+)', s).group(1)
'something'
The regex matches the first "(", then it matches everything that's not a ")":
\( matches the character "(" literally
the capturing group ([^)]+) greedily matches anything that's not a ")"
as an improvement on #Maroun Maroun 's answer:
re.findall('\(([^)]+)', s)
it finds all instances of strings in between parentheses
You can use split as in your example but this way
val = s.split('(', 1)[1].split(')')[0]
or using regex
You can use re.match:
>>> import re
>>> s = "name(something)"
>>> na, so = re.match(r"(.*)\((.*)\)" ,s).groups()
>>> na, so
('name', 'something')
that matches two (.*) which means anything, where the second is between parentheses \( & \).
You can look for ( and ) (need to escape these using backslash in regex) and then match every character using .* (capturing this in a group).
Example:
import re
s = "name(something)"
regex = r'\((.*)\)'
text_inside_paranthesis = re.match(regex, s).group(1)
print(text_inside_paranthesis)
Outputs:
something
Without regex you can do the following:
text_inside_paranthesis = s[s.find('(')+1:s.find(')')]
Outputs:
something
I want to replace the string
ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg
with
ID12345678
How can I replace this via regex?
I tried this - it didn't work.
import re
re.sub(r'_\w+_\d_\d+_\w+','')
Thank you
You can use re.sub with pattern [^_]* that match any sub-string from your text that not contain _ and as re.sub replace the pattern for first match you can use it in this case :
>>> s="ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
>>> import re
>>> re.sub(r'([^_]*).*',r'\1',s)
'ID12345678'
But if it could be appear any where in your string you can use re.search as following :
>>> re.search(r'ID\d+',s).group(0)
'ID12345678'
>>> s="_S3_MPRAGE_ADNI_ID12345678_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
>>> re.search(r'ID\d+',s).group(0)
'ID12345678'
But without regex simply you can use split() :
>>> s.split('_',1)[0]
'ID12345678'
I guess the first part is variable, then
import re
s = "ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
print re.sub(r'_.*$', r'', s)
I have a string:
This is #lame
Here I want to extract lame. But here is the issue, the above string can be
This is lame
Here I dont extract anything. And then this string can be:
This is #lame but that is #not
Here i extract lame and not
So, output I am expecting in each case is:
[lame]
[]
[lame,not]
How do I extract these in robust way in python?
Use re.findall() to find multiple patterns; in this case for anything that is preceded by #, consisting of word characters:
re.findall(r'(?<=#)\w+', inputtext)
The (?<=..) construct is a positive lookbehind assertion; it only matches if the current position is preceded by a # character. So the above pattern matches 1 or more word characters (the \w character class) only if those characters were preceded by an # symbol.
Demo:
>>> import re
>>> re.findall(r'(?<=#)\w+', 'This is #lame')
['lame']
>>> re.findall(r'(?<=#)\w+', 'This is lame')
[]
>>> re.findall(r'(?<=#)\w+', 'This is #lame but that is #not')
['lame', 'not']
If you plan on reusing the pattern, do compile the expression first, then use the .findall() method on the compiled regular expression object:
at_words = re.compile(r'(?<=#)\w+')
at_words.findall(inputtext)
This saves you a cache lookup every time you call .findall().
You should use re lib here is an example:
import re
test case = "This is #lame but that is #not"
regular = re.compile("#[\w]*")
lst= regular.findall(test case)
This will give the output you requested:
import re
regex = re.compile(r'(?<=#)\w+')
print regex.findall('This is #lame')
print regex.findall('This is lame')
print regex.findall('This is #lame but that is #not')