Issue when using array slicing - python

I am an intermediate Python programmer. In my experiment, I use Linux command that outputs some results something like this:
OFPST_TABLE reply (xid=0x2):
table 0 ("classifier"):
active=1, lookup=41, matched=4
max_entries=1000000
matching:
in_port: exact match or wildcard
eth_src: exact match or wildcard
eth_dst: exact match or wildcard
eth_type: exact match or wildcard
vlan_vid: exact match or wildcard
vlan_pcp: exact match or wildcard
ip_src: exact match or wildcard
ip_dst: exact match or wildcard
nw_proto: exact match or wildcard
nw_tos: exact match or wildcard
tcp_src: exact match or wildcard
tcp_dst: exact match or wildcard
My goal is to collect the value of parameter active= which is variable from time to time (In this case it is just 1). I use the following slicing but it does not work:
string = sw.cmd('ovs-ofctl dump-tables ' + sw.name) # trigger the sh command
count = count + int(string[string.rfind("=") + 1:])
I think I am using slicing wrong here but I tried many ways but I still get nothing. Can someone help me to extract the value of active= parameter from this string?
Thank you very much :)

How about regex?
import re
count += int(re.search(r'active\s*=\s*([^,])\s*,', string).group(1))

1) Use regular expressions:
import re
m = re.search('active=(\d+)', ' active=1, lookup=41, matched=4')
print m.group(1)
2) str.rfind returns the highest index in the string where substring is found, it will find the rightmost = (of matched=4), that is not what you want.
3) Simple slicing won't help you because you need to know the length of the active value, overall it is not the best tool for this task.

Related

Python Regex Search Returns Positive Non-Integer Number <1 As Empty String

I am using Python Regex module to search a string, an example of string of interest is "*MBps 2.57".
I am using the following code:
temp_string = re.search('MBps, \d*\.?\d*', line)
if (temp_string != None):
temp_number = re.split(' ', temp_string.group(), 1)
I want to find instances where MBps is > 0, then take that number and process it.
The code works fine as long as the number after MBps is > 1. For example, if it's 'MBps 182.57', the RegEx object when converted to string shows 'MBps, 182.57'.
However, when the number after MBps is <1, for example, if it's 'MBps 0.31', then RegEx object returned shows 'MBps' but no number. It's just an empty string following the first match.
I have tried different regex matching strategies (re.match, re.findall), but none seemed to work correctly. In the regex101 testing site, it showed the regex expression working but I can't get Python regex module to match the behavior.
Any ideas on why it's happening and how to correct it?
Thanks
I would use re.findall here:
inp = "The first speed is 3.14 MBps and the second is 5.43 MBps"
matches = re.findall(r'\b(\d+(?:\.\d+)?) MBps\b', inp)
print(matches)
This prints:
['3.14', '5.43']
OK, I found a way to make this work.
I changed the code to:
temp_string = re.search('MBps, [0-9\.]+', line)
if (temp_string != None):
temp_number = re.split(' ', temp_string.group(), 1)
That worked to capture all the numbers. I think being explicit in Regex matching rather than just \d+ or \d* makes this work better.
Thanks

Search for any number of unknown substrings in place of * in a list of string

First of all, sorry if the title isn't very explicit, it's hard for me to formulate it properly. That's also why I haven't found if the question has already been asked, if it has.
So, I have a list of string, and I want to perform a "procedural" search replacing every * in my target-substring by any possible substring.
Here is an example:
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor('mesh_*')
# should return: ['mesh_1_TMP', 'mesh_2_TMP']
In this case where there is just one * I just split each string with * and use startswith() and/or endswith(), so that's ok.
But I don't know how to do the same thing if there are multiple * in the search string.
So my question is, how do I search for any number of unknown substrings in place of * in a list of string?
For example:
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor('*_1_*')
# should return: ['obj_1_mesh', 'mesh_1_TMP']
Hope everything is clear enough. Thanks.
Consider using 'fnmatch' which provides Unix-like file pattern matching. More info here http://docs.python.org/2/library/fnmatch.html
from fnmatch import fnmatch
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor = '*_1_*'
resultSubList = [ strList[i] for i,x in enumerate(strList) if fnmatch(x,searchFor) ]
This should do the trick
I would use the regular expression package for this if I were you. You'll have to learn a little bit of regex to make correct search queries, but it's not too bad. '.+' is pretty similar to '*' in this case.
import re
def search_strings(str_list, search_query):
regex = re.compile(search_query)
result = []
for string in str_list:
match = regex.match(string)
if match is not None:
result+=[match.group()]
return result
strList= ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
print search_strings(strList, '.+_1_.+')
This should return ['obj_1_mesh', 'mesh_1_TMP']. I tried to replicate the '*_1_*' case. For 'mesh_*' you could make the search_query 'mesh_.+'. Here is the link to the python regex api: https://docs.python.org/2/library/re.html
The simplest way to do this is to use fnmatch, as shown in ma3oun's answer. But here's a way to do it using Regular Expressions, aka regex.
First we transform your searchFor pattern so it uses '.+?' as the "wildcard" instead of '*'. Then we compile the result into a regex pattern object so we can efficiently use it multiple tests.
For an explanation of regex syntax, please see the docs. But briefly, the dot means any character (on this line), the + means look for one or more of them, and the ? means do non-greedy matching, i.e., match the smallest string that conforms to the pattern rather than the longest, (which is what greedy matching does).
import re
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor = '*_1_*'
pat = re.compile(searchFor.replace('*', '.+?'))
result = [s for s in strList if pat.match(s)]
print(result)
output
['obj_1_mesh', 'mesh_1_TMP']
If we use searchFor = 'mesh_*' the result is
['mesh_1_TMP', 'mesh_2_TMP']
Please note that this solution is not robust. If searchFor contains other characters that have special meaning in a regex they need to be escaped. Actually, rather than doing that searchFor.replace transformation, it would be cleaner to just write the pattern using regex syntax in the first place.
If the string you are looking for looks always like string you can just use the find function, you'll get something like:
for s in strList:
if s.find(searchFor) != -1:
do_something()
If you have more than one string to look for (like abc*123*test) you gonna need to look for the each string, find the second one in the same string starting at the index you found the first + it's len and so on.

Python - how to substitute a substring using regex with n occurrencies

I have a string with a lot of recurrencies of a single pattern like
a = 'eresQQQutnohnQQQjkhjhnmQQQlkj'
and I have another string like
b = 'rerTTTytu'
I want to substitute the entire second string having as a reference the 'QQQ' and the 'TTT', and I want to find in this case 3 different results:
'ererTTTytuohnQQQjkhjhnmQQQlkj'
'eresQQQutnrerTTTytujhnmQQQlkj'
'eresQQQutnohnQQQjkhjrerTTTytu'
I've tried using re.sub
re.sub('\w{3}QQQ\w{3}' ,b,a)
but I obtain only the first one, and I don't know how to get the other two solutions.
Edit: As you requested, the two characters surrounding 'QQQ' will be replaced as well now.
I don't know if this is the most elegant or simplest solution for the problem, but it works:
import re
# Find all occurences of ??QQQ?? in a - where ? is any character
matches = [x.start() for x in re.finditer('\S{2}QQQ\S{2}', a)]
# Replace each ??QQQ?? with b
results = [a[:idx] + re.sub('\S{2}QQQ\S{2}', b, a[idx:], 1) for idx in matches]
print(results)
Output
['errerTTTytunohnQQQjkhjhnmQQQlkj',
'eresQQQutnorerTTTytuhjhnmQQQlkj',
'eresQQQutnohnQQQjkhjhrerTTTytuj']
Since you didn't specify the output format, I just put it in a list.

Matching optional numbers in regex

This one is probably a simple one, but I could not find an example that's simple enough to understand (sorry, I'm new with RegEx).
I'm writing some Python code to search for any string that matches any of the following examples:
float[20]
float[7532]
float[]
So this is what I have so far:
import re
p = re.compile('float\[[0-9]+\]')
print p.match("float[20]")
print p.match("float[7532]")
print p.match("float[]")
The code works great for the first and second scenarios, but not the third (no numbers between brackets). What's the best way to add that condition?
Thanks a lot!
p = re.compile('float\[[0-9]*\]')
putting a * after the character class means 0 or matches of the character class.
Try
float\[\d*\]
\d is a shortcut for [0-9].
The asterisk matches 0..n (any number) of characters of the character class.
The + operator requires at least one instance of whatever it's applying to, which your third option doesn't have. You want the * operator which is 0 or more. So:
p = re.compile('float\[[0-9]*\]')
Try:
import re
p = re.compile('float\[[0-9]*\]')
print p.match("float[20]")
print p.match("float[7532]")
print p.match("float[]")
+ is for one or more elements and * is used for zero or more element.

How to extract longest of overlapping groups?

How can I extract the longest of groups which start the same way
For example, from a given string, I want to extract the longest match to either CS or CSI.
I tried this "(CS|CSI).*" and it it will return CS rather than CSI even if CSI is available.
If I do "(CSI|CS).*" then I do get CSI if it's a match, so I gues the solution is to always place the shorter of the overlaping groups after the longer one.
Is there a clearer way to express this with re's? somehow it feels confusing that the result depends on the order you link the groups.
No, that's just how it works, at least in Perl-derived regex flavors like Python, JavaScript, .NET, etc.
http://www.regular-expressions.info/alternation.html
As Alan says, the patterns will be matched in the order you specified them.
If you want to match on the longest of overlapping literal strings, you need the longest one to appear first. But you can organize your strings longest-to-shortest automatically, if you like:
>>> '|'.join(sorted('cs csi miami vice'.split(), key=len, reverse=True))
'miami|vice|csi|cs'
Intrigued to know the right way of doing this, if it helps any you can always build up your regex like:
import re
string_to_look_in = "AUHDASOHDCSIAAOSLINDASOI"
string_to_match = "CSIABC"
re_to_use = "(" + "|".join([string_to_match[0:i] for i in range(len(string_to_match),0,-1)]) + ")"
re_result = re.search(re_to_use,string_to_look_in)
print string_to_look_in[re_result.start():re_result.end()]
similar functionality is present in vim editor ("sequence of optionally matched atoms"), where e.g. col\%[umn] matches col in color, colum in columbus and full column.
i am not aware if similar functionality in python re,
you can use nested anonymous groups, each one followed by ? quantifier, for that:
>>> import re
>>> words = ['color', 'columbus', 'column']
>>> rex = re.compile(r'col(?:u(?:m(?:n)?)?)?')
>>> for w in words: print rex.findall(w)
['col']
['colum']
['column']

Categories