extracting python .finditer() span=( ) results [duplicate] - python

This question already has answers here:
Python Regex - How to Get Positions and Values of Matches
(4 answers)
Closed 4 years ago.
After using .finditer() on a string I want to extract the index of the 'Match object; span=(xx,xx) and use them in a print(search_text[xx:xx]) statement.
How would I extract the locations of the search results.
matches = search_pattern.finditer(search_text)
print(search_text[xx:xx]) # need to find a way to get the slice indexes

You can use the span method
matches = search_pattern.finditer(search_text)
print ([m.span() for m in matches])

I think your question might have already been answered. Look here: Python Regex - How to Get Positions and Values of Matches
Peter Hoffmann gave this answer (which I linked above):
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
print m.start(), m.group()
Please let me know if this does not help.

Related

why python regex is not finding numbers? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I'm trying to find numbers in a string.
import re
text = "42 ttt 1,234 uuu 6,789,001"
finder = re.compile(r'\d{1,3}(,\d{3})*')
print(re.findall(finder, text))
It returns this:
['', ',234', ',745']
What's wrong with regex?
How can I get ['42', '1,234', '6,789,745']?
Note: I'm getting correct result at https://regexr.com
You indicate with parentheses (...) what the groups are that should be captured by the regex.
In your case, you only capture the part after (and including) the first comma. Instead, you can capture the whole number by putting a group around everything, and make the parentheses you need for * non-capturing through an initial ?:, like so:
r'(\d{1,3}(?:,\d{3})*)'
This gives the correct result:
>>> print(re.findall(finder, text))
['42', '1,234', '6,789,001']
you just need to change your finder like this.
finder = re.compile(r'\d+\,?\d+,?\d*')

python regex - extract value from string [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I would like to know how can I get from a string and using reg expressions all values until the comma starting from the end. See below example, I would like to get the value "CA 0.810" into a variable:
prue ="VA=-0.850,0.800;CA=-0.863,0.800;SP=-0.860,0.810;MO=-0.860,0.810;SUN=MO -0.850,CA 0.810"
So far, I have the below code:
test = re.findall('([0-9]+)$',prue)
print test
However, I only get below output:
['810']
Could you please advise how can I get "CA 0.810" into the test variable?
You can do this using the split method. From the docs, it will:
Return a list of the words in the string, using sep as the delimiter string.
So if you can take your string:
prue = "VA=-0.850,0.800;CA=-0.863,0.800;SP=-0.860,0.810;MO=-0.860,0.810;SUN=MO -0.850,CA 0.810"
you can do :
prue.split(",")
which will return a list of the strings split by the commas:
['VA=-0.850', '0.800;CA=-0.863', '0.800;SP=-0.860', '0.810;MO=-0.860', '0.810;SUN=MO -0.850', 'CA 0.810']
So if you just want the last item ('CA 0.8101') into a variable named test, you can just take the last element from the list by indexing with -1:
test = prue.split(",")[-1]
test is now: 'CA 0.810'
Hope this helps!

REGex in python does not extract right [duplicate]

This question already has answers here:
Python - re.findall returns unwanted result
(4 answers)
Closed 6 years ago.
test = """1d48bac (TAIL, ticket: TAG-AB123-6, origin/master) Took example of 123
6f2c5f9 (ticket: TAG-CD456) Took example of 456
9aa5436 (ticket: TAG-EF567-3) Took example of 6789"""
I want to write a regex in python that will extract just the tag- i.e.output should be
[TAG-AB123-6, TAG-CD456, TAGEF567-3]
I tired a regex
print re.findall("TAG-[A-Z]{0,9}\d{0,5}(-\d{0,2})?", test)
but this gives me
['-6', '', '-3']
what am I doing wrong?
Your optional capturing group needs to be made a non-capturing one:
>>> print re.findall(r"TAG-[A-Z]{0,9}\d{0,5}(?:-\d{0,2})?", test)
['TAG-AB123-6', 'TAG-CD456', 'TAG-EF567-3']
findall returns all capturing groups. If there are no capturing groups it will return all the matches.
In addition, note that you can also take advantage of this behaviour (the fact that re.findall returns a list of captures if any instead of the whole match). This allows to describe all the context around the target substring and to easily extract the part you want:
>>> re.findall(r'ticket: ([^,)]*)', test)
['TAG-AB123-6', 'TAG-CD456', 'TAG-EF567-3']

Splitting two concatenated terms in python [duplicate]

This question already has answers here:
Split a string at uppercase letters
(22 answers)
Closed 6 years ago.
In general I have a string say
temp = "ProgramFields"
Now I want to split strings like these into two terms(I can identify tow strings based on uppercase character)
term1 = "Program"
term2 = "Field"
How to achieve this in python?
I tried regular expression and splitting terms but nothing gave me the result that I expected
Python code -
re.split("[A-Z][a-z]*","ProgramField")
Any suggestions?
You have to include groups:
re.split('([A-Z][a-z]*)', 'ProgramField)

Python regex findall works but match does not [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 8 years ago.
I'm testing this in IPython. The variable t is being set from text in a dictionary and returns:
u'http://www.amazon.com/dp/B003T0G9GM/ref=wl_it_dp_v_nS_ttl/177-5611794-0982247?_encoding=UTF8&colid=SBGZJRGMR8TA&coliid=I205LCXDIRSLL3'
using this code:
r = r'amazon\.com/dp/(\w{10})'
m = re.findall(r,t)
matches correctly and m returns [u'B003T0G9GM']
Using this code,
p = re.compile(r)
m = p.match(t)
m returns None
This appears correct to me after reading this documentation.
https://docs.python.org/2/howto/regex.html#grouping
I also tested here to verify the regex before trying this in IPython
http://regex101.com/r/gG8eQ2/1
What am I missing?
SHould be using search, not match. This is what you should have:
p = re.compile(r)
m = p.search(t)
if m: print(m.group(1)) # gives: B003T0G9GM
Match checks only the begining of string. Search goes over whole string.

Categories