Cant we give a string in the Regex? For example, re.compile('((.*)?=<Bangalore>)'), in the below code i have mentioned <Bangalore> but its not displaying.
I want to extract the text before Bangalore.
import re
regex = re.compile('((.*)?=<>)')
line = ("Kathick Kumar, Bangalore who was a great person and lived from 29th
March 1980 - 21 Dec 2014")
result = regex.search(line)
print(result)
Desired output: Kathick Kumar, Bangalore
Something like this?
import re
regex = re.compile('(.*Bangalore)')
result = regex.search(line)
>>> print result.groups()
('Kathick Kumar, Bangalore',)
Use (.*)(?:Bangalore) pattern
>>> line = ("Kathick Kumar, Bangalore who was a great person and lived from 29thMarch 1980 - 21 Dec 2014")
>>> import re
>>> regex = re.compile('(.*)(?:Bangalore)')
>>> result = regex.search(line)
>>> print(result.group(0))
Kathick Kumar, Bangalore
>>>
Related
I have Python strings that follow one of two formats:
"#gianvitorossi/ FALL 2012 #highheels ..."
OR:
"#gianvitorossi FALL 2012 #highheels ..."
I want to extract just the #gianvitorossi portion.
I'm trying the following:
...
company = p['edge_media_to_caption']['edges'][0]['node']['text']
company = company.replace('/','')
company = company.replace('\t','')
company = company.replace('\n','')
c = company.split(' ')
company = c[0]
This works in some of the names. However, in the example below:
My code is returning #gianvitorossi FALL rather than just #gianvitorossi as expected.
You should split with the '/' character
company = "mystring"
c = company.split('/')
company = c[0]
well it worked on my machine. for ending characters such as slash, you can use rstrip(your_symbols).
you could do that using regular expression, here what you could do
import re
text1 = "#gianvitorossi/ FALL 2012 #highheels ..."
text2 = "#gianvitorossi FALL 2012 #highheels ..."
patt = "#[A-Za-z]+"
print(re.findall(patt, text1))
if your text might include numbers you could modify the code to be as follows
import re
text1 = "#gianvitorossi/ FALL 2012 #highheels ..."
text2 = "#gianvitorossi FALL 2012 #highheels ..."
patt = "#[A-Za-z0-9]+"
print(re.findall(patt, text1))
You can get it by using split and replace, which if your requirements above are exhaustive, should be enough:
s.split(' ')[0].replace('/','')
An example:
s = ["#gianvitorossi/ FALL 2012 #highheels ...","#gianvitorossi FALL 2012 #highheels ..."]
for i in s:
print(i.split(' ')[0].replace('/',''))
#gianvitorossi
#gianvitorossi
If you don‘t want to use regular expressions, you could use this:
original = "#gianvitorossi/ FALL 2012 #highheels ..."
extract = original.split(' ')[0]
if extract[-1] == "/":
extract = extract[:-1]
I have a string :
"abc mysql 23 rufos kanso engineer"
I want the regex to output the string before the word "engineer" till it sees a number.
That is the regex should output :
23 rufos kanso
Another example:
String:
def grusol defno 1635 minos kalopo, ruso engineer okas puno"
I want the regex to output the string before the word "engineer" till it sees a number.
That is the regex should output :
1635 minos kalopo, ruso
I am able to achieve this by a series of regex .
Can I do this in one shot?
Thanks
The pattern I'd use: ((\d+)(?!.*\d).*)engineer -- it looks for the latest digit and goes from there.
Something similar to (\d.*)engineer would also work but only if there's only one digit in the string.
>>> import re
>>> string = '123 abc mysql 23 rufos kanso engineer'
>>> pattern = r'((\d+)(?!.*\d).*)engineer'
>>> re.search(pattern, string).group(1)
'23 rufos kanso '
>>>
Edit
In case there are digits after the 'engineer' part, the pattern mentioned above does not work, as you have pointed out in the comment. I tried to solve it, but honestly I couldn't come up with a new pattern (sorry).
The workaround I could suggest is, assuming 'engineer' is still the 'key' word, splitting your initial string by said word.
Here is the illustration of what I mean:
>>> string = '123 abc mysql 23 rufos kanso engineer 1234 b65 de'
>>> string.split('engineer')
['123 abc mysql 23 rufos kanso ', ' 1234 b65 de']
>>> string.split('engineer')[0]
'123 abc mysql 23 rufos kanso '
# hence, there would be no unexpected digits
>>> s = string.split('engineer')[0]
>>> pattern = r'((\d+)(?!.*\d).*)'
>>> re.search(pattern, s).group(1)
'23 rufos kanso '
Use positive look-ahead to match until the word engineer preceded by a digit.
The regex - (?=\d)(.+)(?=engineer)
Just to get an idea:
import re
pattern = r"(?=\d)(.+)(?=engineer)"
input = [ "\"def grusol defno 1635 minos kalopo, ruso engineer okas puno\"", "\"abc mysql 23 rufos kanso engineer\"" ]
matches = []
for item in input:
matches.append(re.findall(pattern, item))
Outputting:
[['1635 minos kalopo, ruso '], ['23 rufos kanso ']]
Have a look at this site. It is great to play around with regex and it explains every steps.
Here is a solution to your problem: link
I would like to return a list that contains the span and values of all matches that exists in a sentence, but I don't know the exact python syntax.
import re
sens = 'SCOTT GORRAN 08:09:17 thx 08:19:33 where do you have eur m17->m27? 6s. imm rolls. KATHERINE DOUGLAS 08:19:47 sec BARRY CLARK 08:20:16 84 SCOTT GORRAN 08:21:25 offer usd 25k/bp pls'
pattern = r'([\+-]?((\d+(\,\d{3})+)(\.\d+)?)|\d+(\.\d+)?|(\.\d+))'
compiled = re.compile(pattern, re.IGNORECASE)
list_of_matches =[]
for match in compiled.match(sens):
match_tuple = (match.span(), match.value())
list_of_matches.append(match_tuple)
output should be something like this:
[((start,end),value) ... ]
[((13,15),08) , ((16,18),09), ((26,28), 08), ((29,31), 19) ....]
Anyone know how I can accomplish this?
Thanks
I am trying to get all the digits from following string after the word classes (or its variations)
Accepted for all the goods and services in classes 16 and 41.
expected output:
16
41
I have multiple strings which follows this pattern and some others such as:
classes 5 et 30 # expected output 5, 30
class(es) 32,33 # expected output 32, 33
class 16 # expected output 5
Here is what I have tried so far: https://regex101.com/r/eU7dF6/3
(class[\(es\)]*)([and|et|,|\s]*(\d{1,}))+
But I am able to get only the last matched digit i.e. 41 in the above example.
I suggest grabbing all the substring with numbers after class or classes/class(es) and then get all the numbers from those:
import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']
See IDEONE demo
As \G construct is not supported, nor can you access the captures stack using Python re module, you cannot use your approach.
However, you can do it the way you did with PyPi regex module.
>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
res.extend(x.captures("num"))
>>> print res
['16', '41']
You can do it in 2 steps.Regex engine remebers only the last group in continous groups.
x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])
Output:['16', '41']
If you dont want string use
print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))
Output:[16, 41]
If you have to do it in one regex use regex module
import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]
Output:[16, 41]
I am trying to replace occurrences of the work 'brunch' with 'BRUNCH'. I am using a regex which correctly identifies the occurrence, but when I try to use re.sub it is replacing more text than identified with re.findall. The regex that I am using is:
re.compile(r'(?:^|\.)(?![^.]*saturday)(?![^.]*sunday)(?![^.]*weekend)[^.]*(brunch)',re.IGNORECASE)
The string is
str = 'Valid only for dine-in January 2 - March 31, 2015. Excludes brunch, happy hour, holidays, and February 13 - 15, 2015.'
I want it to produce:
'Valid only for dine-in January 2 - March 31, 2015. Excludes BRUNCH, happy hour, holidays, and February 13 - 15, 2015.'
The steps:
>>> reg.findall(str)
>>> ['brunch']
>>> reg.sub('BRUNCH',str)
>>> Valid only for dine-in January 2 - March 31, 2015BRUNCH, happy hour, holidays, and February 13 - 15, 2015.
Edit:
The final solution that I used was:
re.compile(r'((?:^|\.))(?![^.]*saturday)(?![^.]*sunday)(?![^.]*weekend)([^.]*)(brunch)',re.IGNORECASE)
re.sub('\g<1>\g<2>BRUNCH',str)
For re.sub use
(^|\.)(?![^.]*saturday)(?![^.]*sunday)(?![^.]*weekend)([^.]*)(brunch)
Replace by \1\2BRUNCH.See demo.
https://regex101.com/r/eZ0yP4/16
Through regex:
(^|\.)(?![^.]*saturday)(?![^.]*sunday)(?![^.]*weekend)([^.]*)brunch
DEMO
Replace the matched characters by \1\2BRUNCH
Why does it match more than brunch
Because your regex actually does match more than brunch
See link on how the regex match
Why doesnt it show in findall?
Because you have wraped only the brunch in paranthesis
>>> reg = re.compile(r'(?:^|\.)(?![^.]*saturday)(?![^.]*sunday)(?![^.]*weekend)[^.]*(brunch)',re.IGNORECASE)
>>> reg.findall(str)
['brunch']
After wraping entire ([^.]*brunch) in paranthesis
>>> reg = re.compile(r'(?:^|\.)(?![^.]*saturday)(?![^.]*sunday)(?![^.]*weekend)([^.]*brunch)',re.IGNORECASE)
>>> reg.findall(str)
[' Excludes brunch']
re.findall ignores those are not caputred