I'm writing a script to test latency between output switches and public IP using Pexpect and regex.
Here is a sample:
# Connect to a cisco system just before and going enable
for key in nodes:
ipaddr_node = nodes[key]["IP Address"]
print ('[|] Ping de %s en cours ...' % ipaddr_node)
p.sendline("ping %s repeat 20" % ipaddr_node) #ping of the ip 20 times on cisco
p.expect('#')
ping = p.before #get the output before '#'
print ('[+] Ping de %s reussi' % ipaddr_node)
place = ping.find('min') #get the position of 'min' in output
regex = ping.replace(ping[:place],"")
output = re.search(r'\s=\s(?P<min>\d{1,4}.\d{0,3})\/(?P<avg>\d{1,4}.\d{0,3})\/(?P<max>\d{1,4}.\d{0,3})', regex) #regex to get min, avg and max
print(output) #Print regex object
avg = output.group('avg') #get value of group "avg" in regex
print('[+] Average time : ' + avg) #print it
Here is an output example:
('min/avg/max = 33/44/51 ms\r\nRTR-LAB-GRE', '<= string for regex to work on')
(<_sre.SRE_Match object at 0x7f2d68ea11f8>, '<= Regex object')
[+] Temps moyen : 44
('min/avg/max = 41/46/59 ms\r\nRTR-LAB-GRE', '<= string for regex to work on')
(<_sre.SRE_Match object at 0x7f2d68ea1290>, '<= Regex object')
[+] Temps moyen : 46
('min/avg/max = 41/41/42 ms\r\nRTR-LAB-GRE', '<= string for regex to work on')
(<_sre.SRE_Match object at 0x7f2d68ea11f8>, '<= Regex object')
[+] Temps moyen : 41
('min/avg/max = 1/3/9 ms\r\nRTR-LAB-GRE', '<= string for regex to work on')
(None, '<= Regex object')
Traceback (most recent call last):
File "EssaiPexpect.py", line 95, in <module>
avg = output.group('avg')
AttributeError: 'NoneType' object has no attribute 'group'
The dict containing IP to test contains 4 IPs.
My node is a dict containing IP and others informations, but this works for sure.
Also my regex variable looks like this every time (even in the last iteration): min/avg/max = 1/3/9 ms
I'm sure this is a simple thing, but I can't get my finger on it.
Solution found !
This was a simple mistake in my regex search.
This was the old one : Output of Regex101 of old regex
This is the new one : Ouput of Regex 101 of new regex
To simplify, my first request couldn't find the last line because my . wasn't escaped properly.
I just added a good escape plus a or between both possibilities.
Thank you for your help.
Related
I have an Cisco ASA with a VPN tunnel configured. I call the CLI command via API and it returns this multiline string:
\nSession Type: LAN-to-LAN\n\nConnection : 192.168.1.10\nIndex : 11701 IP Addr : 192.168.1.10\nProtocol : IKEv2 IPsecOverNatT\nEncryption : IKEv2: (1)AES256 IPsecOverNatT: (1)AES256\nHashing : IKEv2: (1)SHA256 IPsecOverNatT: (1)SHA256\nBytes Tx : 0 Bytes Rx : 0\nLogin Time : 23:14:43 EST Fri Dec 3 2021\nDuration : 0h:11m:50s\n\n
I can't figure out how to get only the "Bytes Rx" plus the number out beside it. I've tried searching it like this, but it returns "Bytes Tx":
import re
regex_parse = re.compile(r'[a-zA-Z]+\s[a-zA-Z][a-zA-Z]\s+:\s[0-9]+')
multilinestring = webhook_api_call()
for item in multilinestring:
a = regex_parse.search(item)
print(a.group(0))
Output:
Bytes Tx : 0
I want to only get Bytes Rx and the number out beside it
Looks like you are trying to parse the result of sh vpn-sessiondb l2l from a Cisco ASA. The output is pretty standard, so I would skip the regex and do the following:
multilinestring = webhook_api_call()
lines = multilinestring.split("\n")
for l in lines:
if l.find("Bytes Tx") != -1:
print("Bytes Rx" + l.partition("Bytes Rx")[2])
Output:
Bytes Rx : 0
Good luck with your code!
I want to process every line in my log file, and extract IP address if line matches my pattern. There are several different types of messages, in example below I am using p1andp2`.
I could read the file line by line, and for each line match to each pattern. But
Since there can be many more patterns, I would like to do it as efficiently as possible. I was hoping to compile thos patterns into one object, and do the match only once for each line:
import re
IP = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
p1 = 'Registration from' + IP + '- Wrong password'
p2 = 'Call from' + IP + 'rejected because extension not found'
c = re.compile(r'(?:' + p1 + '|' + p2 + ')')
for line in sys.stdin:
match = re.search(c, line)
if match:
print(match['ip'])
but the above code does not work, it complains that ip is used twice.
What is the most elegant way to achieve my goal ?
EDIT:
I have modified my code based on answer from #Dev Khadka.
But I am still struggling with how to properly handle the multiple ip matches. The code below prints all IPs that matched p1:
for line in sys.stdin:
match = c.search(line)
if match:
print(match['ip1'])
But some lines don't match p1. They match p2. ie, I get:
1.2.3.4
None
2.3.4.5
...
How do I print the matching ip, when I don't know wheter it was p1, p2, ... ? All I want is the IP. I don't care which pattern it matched.
You can consider installing the excellent regex module, which supports many advanced regex features, including branch reset groups, designed to solve exactly the problem you outlined in this question. Branch reset groups are denoted by (?|...). All capture groups of the same positions or names in different alternative patterns within a branch reset grouop share the same capture groups for output.
Notice that in the example below the matching capture group becomes the named capture group, so that you don't need to iterate over multiple groups searching for a non-empty group:
import regex
ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
pattern = regex.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
match = regex.search(pattern, line)
if match:
print(match['ip'])
Demo: https://repl.it/#blhsing/RegularEmbellishedBugs
why don't you check which regex matched?
if 'ip1' in match :
print match['ip1']
if 'ip2' in match :
print match['ip2']
or something like:
names = [ 'ip1', 'ip2', 'ip3' ]
for n in names :
if n in match :
print match[n]
or even
num = 1000 # can easily handle millions of patterns =)
for i in range(num) :
name = 'ip%d' % i
if name in match :
print match[name]
thats because you are using same group name for two group
try this, this will give group names ip1 and ip2
import re
IP = r'(?P<ip%d>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
p1 = 'Registration from' + IP%1 + '- Wrong password'
p2 = 'Call from' + IP%2 + 'rejected because extension not found'
c = re.compile(r'(?:' + p1 + '|' + p2 + ')')
Named capture groups must have distinct names, but since all of your capture groups are meant to capture the same pattern, it's better not to use named capture groups in this case but instead simply use regular capture groups and iterate through the groups from the match object to print the first group that is not empty:
ip_pattern = r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
pattern = re.compile('|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
match = re.search(pattern, line)
if match:
print(next(filter(None, match.groups())))
Demo: https://repl.it/#blhsing/UnevenCheerfulLight
Adding ip address validity to already accepted answer.
Altho import ipaddress & import socket should be ideal ways, this code will parse-the-host,
import regex as re
from io import StringIO
def valid_ip(address):
try:
host_bytes = address.split('.')
valid = [int(b) for b in host_bytes]
valid = [b for b in valid if b >= 0 and b<=255]
return len(host_bytes) == 4 and len(valid) == 4
except:
return False
ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
file = StringIO('''
Registration from 259.1.1.1 - Wrong password,
Call from 1.1.2.2 rejected because extension not found
''')
pattern = re.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
list1 = []
list2 = []
for line in file:
match = re.search(pattern, line)
if match:
list1.append(match['ip']) # List of ip address
list2.append(valid_ip(match['ip'])) # Boolean results of valid_ip
for i in range(len(list1)):
if list2[i] == False:
print(f'{list1[i]} is invalid IP')
else:
print(list1[i])
259.1.1.1 is invalid IP
1.1.2.2
[Program finished]
Error:
pymysql.err.InternalError: (1366, "Incorrect string value: '\\xEF\\xBF\\xBD 20...' for column 'history' at row 1")
I've received a few variations of this as I've tried to tweak my dictionary, always in the history column, the only variations is the characters it tells me are issues.
I can't post the dictionary because it's got sensitive information, but here is the jist:
I started with 200 addresses (including state, zip, etc) that needed
to be validated, normalized and standardized for DB insertion.
I spent a lot of time on google maps validating and standardizing.
I decided to get fancy, and put all the crazy accented letters in the addresses of these world addresses (often copies from google because I don't know how to type and A with an o over it, lol), Singapore to Brazil, everywhere.
I ended up with 120 unique addresses in my dictionary after processing.
Everything works 100% perfectly when INSERTING the data in SQLite and OUTPUTING to a CSV. The issue is exclusively with MySQL and some sneaky un-viewable characters.
Note: I used this to remove the accents after 7 hours of copy/pasting to notepad, encoding it with notepad++ and just trying to processes the data in a way that made it all the correct encoding. I think I did lose the version with the accents and only have this tools output now.
I do not see "\xEF\xBF\xBD 20..." in my dictionary I only see text. Currently I don't even see "20"... those two chars helped me find the previous issues.
Code I can show:
def insert_tables(cursor, assets_final, ips_final):
#Insert Asset data into asset table
field_names_dict = get_asset_field_names(assets_final)
sql_field_names = ",".join(field_names_dict.keys())
for key, row in assets_final.items():
insert_sql = 'INSERT INTO asset(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(row.values())) + '")'
print(insert_sql)
cursor.execute(insert_sql)
#Insert IP data into IP table
field_names_dict = get_ip_field_names(ips_final)
sql_field_names = ",".join(field_names_dict.keys())
for hostname_key, ip_dict in ips_final.items():
for ip_key, ip_row in ip_dict.items():
insert_sql = 'INSERT INTO ip(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(ip_row.values())) + '")'
print(insert_sql)
cursor.execute(insert_sql)
def output_sqlite_db(sqlite_file, assets_final, ips_final):
conn = sqlite3.connect(sqlite_file)
cursor = conn.cursor()
insert_tables(cursor, assets_final, ips_final)
conn.commit()
conn.close()
def output_mysql_db(assets_final, ips_final):
conn = mysql.connect(host=config.mysql_ip, port=config.mysql_port, user=config.mysql_user, password=config.mysql_password, charset="utf8mb4", use_unicode=True)
cursor = conn.cursor()
cursor.execute('USE ' + config.mysql_DB)
insert_tables(cursor, assets_final, ips_final)
conn.commit()
conn.close()
EDIT: Could this have something to do with the fact I'm using Cygwin as my terminal? HA! I added this line and got a different message (now using the accented version again):
cursor.execute('SET NAMES utf8')
Error:
pymysql.err.InternalError: (1366, "Incorrect string value: '\\xC5\\x81A II...' for column 'history' at row 1")
I can shine a bit of light on the messages that you have supplied:
Case 1:
>>> import unicodedata as ucd
>>> s1 = b"\xEF\xBF\xBD"
>>> s1
b'\xef\xbf\xbd'
>>> u1 = s1.decode('utf8')
>>> u1
'\ufffd'
>>> ucd.name(u1)
'REPLACEMENT CHARACTER'
>>>
Looks like you have obtained some bytes encoded in an encoding other than utf8 (e.g. cp1252) then tried bytes.decode(encoding='utf8', errors='strict'). This detected some errors. You then decoded again with errors="replace". This raised no exceptions. However your data has had the error bytes replaced by the replacement character (U+FFFD). Then you encoded your data using str.encodeso that you could write to a file or database. Each replacement characters turns up as 3 hex bytes EF BF BD.
... more to come
Case 2:
>>> s2 = b"\xC5\x81A II"
>>> s2
b'\xc5\x81A II'
>>> u2 = s2.decode('utf8')
>>> u2
'\u0141A II'
>>> ucd.name(u2[0])
'LATIN CAPITAL LETTER L WITH STROKE'
>>>
I'm sure this is a basic question, but I have spent about an hour on it already and can't quite figure it out. I'm parsing smartctl output, and here is the a sample of the data I'm working with:
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-39-pve] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA MD04ACA500
Serial Number: Y9MYK6M4BS9K
LU WWN Device Id: 5 000039 5ebe01bc8
Firmware Version: FP2A
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Jul 2 11:24:08 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
What I'm trying to achieve is pulling out the device model (some devices it's just one string, other devices, such as this one, it's two words), serial number, time, and a couple other fields. I assume it would be easiest to capture all data after the colon, but how to eliminate the variable amounts of spaces?
Here is the relevant code I currently came up with:
deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")
for line in lines:
parts = line.split()
if str(parts):
if parts[0] == "Device Model: ":
deviceModel = parts[1]
elif parts[0] == "Serial Number: ":
serialNumber = parts[1]
vprint(3, "Device model: %s" %deviceModel)
vprint(3, "Serial number: %s" %serialNumber)
The error I keep getting is:
File "./tester.py", line 152, in parseOutput
if parts[0] == "Device Model: ":
IndexError: list index out of range
I get what the error is saying (kinda), but I'm not sure what else the range could be, or if I'm even attempting this in the right way. Looking for guidance to get me going in the right direction. Any help is greatly appreciated.
Thanks!
The IndexError occurs when the split returns a list of length one or zero and you access the second element. This happens when it isn't finding anything to split (empty line).
No need for regular expressions:
deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")
for line in lines:
if line.startswith("Device Model:"):
deviceModel = line.split(":")[1].strip()
elif line.startswith("Serial Number:"):
serialNumber = line.split(":")[1].strip()
print("Device model: %s" %deviceModel)
print("Serial number: %s" %serialNumber)
I guess your problem is the empty line in the middle. Because,
>>> '\n'.split()
[]
You can do something like,
>>> f = open('a.txt')
>>> lines = f.readlines()
>>> deviceModel = [line for line in lines if 'Device Model' in line][0].split(':')[1].strip()
# 'TOSHIBA MD04ACA500'
>>> serialNumber = [line for line in lines if 'Serial Number' in line][0].split(':')[1].strip()
# 'Y9MYK6M4BS9K'
Try using regular expressions:
import re
r = re.compile("^[^:]*:\s+(.*)$")
m = r.match("Device Model: TOSHIBA MD04ACA500")
print m.group(1) # Prints "TOSHIBA MD04ACA500"
Not sure what version you're running, but on 2.7, line.split() is splitting the line by word, so
>>> parts = line.split()
parts = ['Device', 'Model:', 'TOSHIBA', 'MD04ACA500']
You can also try line.startswith() to find the lines you want https://docs.python.org/2/library/stdtypes.html#str.startswith
The way I would debug this is by printing out parts at every iteration. Try that and show us what the list is when it fails.
Edit: Your problem is most likely what #jonrsharpe said. parts is probably an empty list when it gets to an empty line and str(parts) will just return '[]' which is True. Try to test that.
I think it would be far easier to use regular expressions here.
import re
for line in lines:
# Splits the string into at most two parts
# at the first colon which is followed by one or more spaces
parts = re.split(':\s+', line, 1)
if parts:
if parts[0] == "Device Model":
deviceModel = parts[1]
elif parts[0] == "Serial Number":
serialNumber = parts[1]
Mind you, if you only care about the two fields, startswith might be better.
When you split the blank line, parts is an empty list.
You try to accommodate that by checking for an empty list, But you turn the empty list to a string which causes your conditional statement to be True.
>>> s = []
>>> bool(s)
False
>>> str(s)
'[]'
>>> bool(str(s))
True
>>>
Change if str(parts): to if parts:.
Many would say that using a try/except block would be idiomatic for Python
for line in lines:
parts = line.split()
try:
if parts[0] == "Device Model: ":
deviceModel = parts[1]
elif parts[0] == "Serial Number: ":
serialNumber = parts[1]
except IndexError:
pass
I have a script of about 300 lines (part of which is pasted below) with a lot of print commands. I am trying to cleanup the output it produces. If I leave it the way it is then all the print commands print bytes with \r\n on to the console.
I figured if I add .decode('utf-8') in front of the variable that I need to print then the output is what I should be expecting (uni-code string). For example, compare print (data1) and print (data3) commands below. What I want to do is to go through all of the code and append .decode() to every print statement.
All the print commands are in this format: Print (dataxxxx)
import telnetlib
import time
import sys
import random
from xlwt import Workbook
shelfIp = "10.10.10.10"
shelf = "33"
print ("Shelf IP is: " + str(shelfIp))
print ("Shelf number is: " + str(shelf))
def addCard():
tn = telnetlib.Telnet(shelfIp)
### Telnet session
tn.read_until(b"<",5)
cmd = "ACT-USER::ADMIN:ONE::ADMIN;"
tn.write(bytes(cmd,encoding="UTF-8"))
data1 = tn.read_until(b"ONE COMPLD", 5)
print (data1.decode('utf-8'))
### Entering second network element
cmd = "ENT-CARD::CARD" + shelf + "-" + shelf + ":TWO:xyz:;"
tn.write(bytes(cmd,encoding="UTF-8"))
data3 = tn.read_until(b"TWO COMPLD", 5)
print (data3)
### Entering third network element
cmd = "ENT-CARD::CARD-%s-%s:ADM:ABC:;" %(shelf,shelf)
tn.write(bytes(cmd,encoding="UTF-8"))
dataAmp = tn.read_until(b"ADM COMPLD", 5)
print (dataAmp)
tn.close()
addCard()
If you are looking into doing some sort of find-replace on the code, you can try this:
import re
f = open('script.py','rb')
script = f.read()
f.close()
newscript = re.sub("(print\(.*)\)", "\g<1>.decode('utf-8'))", script)
f = open('script.py', 'wb')
f.write(newscript)
f.close()
What I did in the regular expression:
Catch text that contains print(......) and save the print(..... part into group 1
Replace the text after the print(.... which is ) with: .decode('utf-8')) using the syntax \g<1> which takes the saved group number 1 and put that as the prefix in the replaced text.
Appending .decode() to print() statements will fail because .decode() is a string method.
>>> x=u"testing"
>>> print(x).decode('utf-8')
testing
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'decode'
You must apply .decode('utf-8') to the variables you wish to decode, which is not easily accomplished using regex based tools.