When using "re.search", how do I search for the second instance rather than the first? - python

re.search looks for the first instance of something. In the following code, "\t" appears twice. Is there a way to make it skip forward to the second instance?
code = ['69.22\t82.62\t134.549\n']
list = []
text = code
m = re.search('\t(.+?)\n', text)
if m:
found = m.group(1)
list.append(found)
result:
list = ['82.62\t134.549']
expected:
list = ['134.549']

This modified version of your expression does return the desired output:
import re
code = '69.22\t82.62\t134.549\n'
print(re.findall(r'.*\t(.+?)\n', code))
Output
['134.549']
I'm though guessing that maybe you'd like to design an expression, somewhat similar to:
(?<=[\t])(.+?)(?=[\n])
DEMO

There is only one solution for greater than the "second" tab.
You can do it like this :
^(?:[^\t]*\t){2}(.*?)\n
Explained
^ # BOS
(?: # Cluster
[^\t]* # Many not tab characters
\t # A tab
){2} # End cluster, do 2 times
( .*? ) # (1), anything up to
\n # first newline
Python code
>>> import re
>>> text = '69.22\t82.62\t134.549\n'
>>> m = re.search('^(?:[^\t]*\t){2}(.*?)\n', text)
>>> if m:
>>> print( m.group(1) )
...
134.549
>>>

Related

Why the regex to match a project tag is not working?

Can anyone provide guidance on why the following regex is not working?
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:'
if re.findall('%s-(\d+):'%project,line):
print line
EXPECTED OUTPUT:-
AirPortFamily-1425.9
You should match the optional groups of digits preceded by a dot:
if re.findall(r'(%s-\d+(?:\.\d+)*):'%project,line):
Answer by Brandon looks good.
But in case there is a condition like "For a tag to be valid, it must end with a colon(:)"
In order to cover that condition, modifying Brandon's answer a little
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:'
matches = re.findall('%s-\d+\.+\d+\.*\d+:$'%project,line)
if matches:
for elem in matches:
print elem.split(':')[0]
Here is its working
#Matching lines with colon(:) at the end
>>> import re
>>> project = 'AirPortFamily'
>>> line = 'AirPortFamily-1425.9:'
>>> matches = re.findall('%s-\d+\.+\d+\.*\d+:$'%project,line)
>>> if matches:
... for elem in matches:
... print elem.split(':')[0]
...
AirPortFamily-1425.9 #Look, the output is the way you want.
#Below snippet with same regex and different line content (without :) doesn't match it
>>> line = 'AirPortFamily-1425.9'
>>> matches = re.findall('%s-\d+\.+\d+\.*\d+:$'%project,line)
>>> if matches:
... for elem in matches:
... print elem.split(':')[0]
...
>>> #Here, no output means no match
Your regex is missing the . after the first set of numbers. Here's a working example:
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:'
matches = re.findall('%s-\d+\.\d+'%project,line)
if matches:
print matches
On the possibilities with 1.1.1.1.1 I'm getting the : so just stripping it from the results
if re.findall(r'%s-[\d+\.\d+]+:'%project,line):
print(line.strip(':'))
(xenial)vash#localhost:~/python/stack_overflow$ python3.7 formats.py
AirPortFamily-1425.9.1.1.1
This is my regex,my test on regex101 edit again
import re
project = 'AirPortFamily'
line = 'AirPortFamily-1425.9:AirPortfamily-14.5.9AirPortFamily-14.2.5.9:'
result = re.findall('%s[0-9.-]+:'%project,line) #this dose not cantain ':' [s[:-1] for s in re.findall('%s[0-9.-]+:'%project,line)]
if result:
for each in result:
print (each)

Python - remove parts of a string

I have many fill-in-the-blank sentences in strings,
e.g. "6d) We took no [pains] to hide it ."
How can I efficiently parse this string (in Python) to be
"We took no to hide it"?
I also would like to be able to store the word in brackets (e.g. "pains") in a list for use later. I think the regex module could be better than Python string operations like split().
This will give you all the words inside the brackets.
import re
s="6d) We took no [pains] to hide it ."
matches = re.findall('\[(.*?)\]', s)
Then you can run this to remove all bracketed words.
re.sub('\[(.*?)\]', '', s)
just for fun (to do the gather and substitution in one iteration)
matches = []
def subber(m):
matches.append(m.groups()[0])
return ""
new_text = re.sub("\[(.*?)\]",subber,s)
print new_text
print matches
import re
s = 'this is [test] string'
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
Output
'test'
For your example you could use this regex:
(.*\))(.+)\[(.+)\](.+)
You will get four groups that you can use to create your resulting string and save the 3. group for later use:
6d)
We took no
pains
to hide it .
I used .+ here because I don't know if your strings always look like your example. You can change the .+ to alphanumeric or sth. more special to your case.
import re
s = '6d) We took no [pains] to hide it .'
m = re.search(r"(.*\))(.+)\[(.+)\](.+)", s)
print(m.group(2) + m.group(4)) # "We took no to hide it ."
print(m.group(3)) # pains
import re
m = re.search(".*\) (.*)\[.*\] (.*)","6d) We took no [pains] to hide it .")
if m:
g = m.groups()
print g[0] + g[1]
Output :
We took no to hide it .

Using regex to extract information from string

I am trying to write a regex in Python to extract some information from a string.
Given:
"Only in Api_git/Api/folder A: new.txt"
I would like to print:
Folder Path: Api_git/Api/folder A
Filename: new.txt
After having a look at some examples on the re manual page, I'm still a bit stuck.
This is what I've tried so far
m = re.match(r"(Only in ?P<folder_path>\w+:?P<filename>\w+)","Only in Api_git/Api/folder A: new.txt")
print m.group('folder_path')
print m.group('filename')
Can anybody point me in the right direction??
Get the matched group from index 1 and 2 using capturing groups.
^Only in ([^:]*): (.*)$
Here is demo
sample code:
import re
p = re.compile(ur'^Only in ([^:]*): (.*)$')
test_str = u"Only in Api_git/Api/folder A: new.txt"
re.findall(p, test_str)
If you want to print in the below format then try with substitution.
Folder Path: Api_git/Api/folder A
Filename: new.txt
DEMO
sample code:
import re
p = re.compile(ur'^Only in ([^:]*): (.*)$')
test_str = u"Only in Api_git/Api/folder A: new.txt"
subst = u"Folder Path: $1\nFilename: $2"
result = re.sub(p, subst, test_str)
Your pattern: (Only in ?P<folder_path>\w+:?P<filename>\w+) has a few flaws in it.
The ?P construct is only valid as the first bit inside a parenthesized expression,
so we need this.
(Only in (?P<folder_path>\w+):(?P<filename>\w+))
The \w character class is only for letters and underscores. It won't match / or ., for example. We need to use a different character class that more closely aligns with requirements. In fact, we can just use ., the class of nearly all characters:
(Only in (?P<folder_path>.+):(?P<filename>.+))
The colon has a space after it in your example text. We need to match it:
(Only in (?P<folder_path>.+): (?P<filename>.+))
The outermost parentheses are not needed. They aren't wrong, just not needed:
Only in (?P<folder_path>.+): (?P<filename>.+)
It is often convenient to provide the regular expression separate from the call to the regular expression engine. This is easily accomplished by creating a new variable, for example:
regex = r'Only in (?P<folder_path>.+): (?P<filename>.+)'
... # several lines later
m = re.match(regex, "Only in Api_git/Api/folder A: new.txt")
The above is purely for the convenience of the programmer: it neither saves nor squanders time or memory space. There is, however, a technique that can save some of the time involved in regular expressions: compiling.
Consider this code segment:
regex = r'Only in (?P<folder_path>.+): (?P<filename>.+)'
for line in input_file:
m = re.match(regex, line)
...
For each iteration of the loop, the regular expression engine must interpret the regular expression and apply it to the line variable. The re module allows us to separate the interpretation from the application; we can interpret once but apply several times:
regex = re.compile(r'Only in (?P<folder_path>.+): (?P<filename>.+)')
for line in input_file:
m = re.match(regex, line)
...
Now, your original program should look like this:
regex = re.compile(r'Only in (?P<folder_path>.+): (?P<filename>.+)')
m = re.match(regex, "Only in Api_git/Api/folder A: new.txt")
print m.group('folder_path')
print m.group('filename')
However, I'm a fan of using comments to explain regular expressions. My version, including some general cleanup, looks like this:
import re
regex = re.compile(r'''(?x) # Verbose
Only\ in\ # Literal match
(?P<folder_path>.+) # match longest sequence of anything, and put in 'folder_path'
:\ # Literal match
(?P<filename>.+) # match longest sequence of anything and put in 'filename'
''')
with open('diff.out') as input_file:
for line in input_file:
m = re.match(regex, line)
if m:
print m.group('folder_path')
print m.group('filename')
It really depends on the limitation of the input, if this is the only input this will do the trick.
^Only in (?P<folder_path>[a-zA-Z_/ ]*): (?P<filename>[a-z]*.txt)$

Changing only one letter when there are a lot simular in string

Suppose I have the following string:
I.like.football
sky.is.blue
I need to make a loop that changes the last '.' to '_' so it looks this way
I.like_football
sky.is_blue
They are all simular style(3 words, 3 dots).
How to do that in a loop?
str='I.like.football'
str=str.rsplit('.',1) #this split from right but only first '.'
print '_'.join(str) # then join it
#output I.like_football
in single line
str='_'.join(str.rsplit('.',1))
str.replace lets you specify the number of replacements. Unfortunately there is no str.rreplace, so you'd need to reverse the string before and after :) eg.
>>> def f(s):
... return s[::-1].replace(".", "_", 1)[::-1]
...
>>> f('I.like.football')
'I.like_football'
>>> f('sky.is.blue')
'sky.is_blue'
Alternatively you could use one of str.rpartition, str.rsplit, str.rfind
This doesn't even need to run in a loop:
import re
p = re.compile(ur'\.(?=[^\.]+$)', re.IGNORECASE | re.MULTILINE)
test_str = u"I.like.football\nsky.is.blue"
subst = u"_"
result = re.sub(p, subst, test_str)

How can I get part of regex match as a variable in python?

In Perl it is possible to do something like this (I hope the syntax is right...):
$string =~ m/lalala(I want this part)lalala/;
$whatIWant = $1;
I want to do the same in Python and get the text inside the parenthesis in a string like $1.
If you want to get parts by name you can also do this:
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcom Reynolds")
>>> m.groupdict()
{'first_name': 'Malcom', 'last_name': 'Reynolds'}
The example was taken from the re docs
See: Python regex match objects
>>> import re
>>> p = re.compile("lalala(I want this part)lalala")
>>> p.match("lalalaI want this partlalala").group(1)
'I want this part'
import re
astr = 'lalalabeeplalala'
match = re.search('lalala(.*)lalala', astr)
whatIWant = match.group(1) if match else None
print(whatIWant)
A small note: in Perl, when you write
$string =~ m/lalala(.*)lalala/;
the regexp can match anywhere in the string. The equivalent is accomplished with the re.search() function, not the re.match() function, which requires that the pattern match starting at the beginning of the string.
import re
data = "some input data"
m = re.search("some (input) data", data)
if m: # "if match was successful" / "if matched"
print m.group(1)
Check the docs for more.
there's no need for regex. think simple.
>>> "lalala(I want this part)lalala".split("lalala")
['', '(I want this part)', '']
>>> "lalala(I want this part)lalala".split("lalala")[1]
'(I want this part)'
>>>
import re
match = re.match('lalala(I want this part)lalala', 'lalalaI want this partlalala')
print match.group(1)
import re
string_to_check = "other_text...lalalaI want this partlalala...other_text"
p = re.compile("lalala(I want this part)lalala") # regex pattern
m = p.search(string_to_check) # use p.match if what you want is always at beginning of string
if m:
print m.group(1)
In trying to convert a Perl program to Python that parses function names out of modules, I ran into this problem, I received an error saying "group" was undefined. I soon realized that the exception was being thrown because p.match / p.search returns 0 if there is not a matching string.
Thus, the group operator cannot function on it. So, to avoid an exception, check if a match has been stored and then apply the group operator.
import re
filename = './file_to_parse.py'
p = re.compile('def (\w*)') # \w* greedily matches [a-zA-Z0-9_] character set
for each_line in open(filename,'r'):
m = p.match(each_line) # tries to match regex rule in p
if m:
m = m.group(1)
print m

Categories