Dhcpd lease matching by using regex in python? - python

I want to match all dhcp leases that have given mac address.
I wrote this code
fh = open(leaseFile)
lines = fh.read()
fh.close()
regex = r"lease\s*[0-9\.]+\s*\{[^\{\}]*%s[^\{\}]*?\}" % mac #mac comes as parameter
m = re.findall(regex,lines,re.DOTALL)
This worked well if a lease don't contain '}' character. But if it does, my regex failed.
For example:
lease 10.14.53.253 {
starts 3 2012/10/17 09:27:20;
ends 4 2012/10/18 09:27:20;
tstp 4 2012/10/18 09:27:20;
binding state free;
hardware ethernet 00:23:18:62:31:5b;
uid "\001\000\013OW}k";
}
I couldn't figure out how I handle this exception. Thanks for any advice...
EDIT
After research, I decided to use this regex with MULTILINE mode. It worked for all leases that I tried.
fh = open(leaseFile)
lines = fh.read()
fh.close()
regex = r"lease\s*[0-9\.]+\s*\{[^\{\}]*%s[\s\S]*?^\}" % mac #mac comes as parameter
m = re.findall(regex,lines,re.MULTILINE)

regex = r'(lease\s*[0-9\.]+\s*\{[^\{\}]*%s[^\{\}]*(.*"[^\{\}]*\}|\}))' % mac #mac comes as parameter
m = re.findall(regex,lines)
This should do the trick.

Related

python3: extract IP address from compiled pattern

I want to process every line in my log file, and extract IP address if line matches my pattern. There are several different types of messages, in example below I am using p1andp2`.
I could read the file line by line, and for each line match to each pattern. But
Since there can be many more patterns, I would like to do it as efficiently as possible. I was hoping to compile thos patterns into one object, and do the match only once for each line:
import re
IP = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
p1 = 'Registration from' + IP + '- Wrong password'
p2 = 'Call from' + IP + 'rejected because extension not found'
c = re.compile(r'(?:' + p1 + '|' + p2 + ')')
for line in sys.stdin:
match = re.search(c, line)
if match:
print(match['ip'])
but the above code does not work, it complains that ip is used twice.
What is the most elegant way to achieve my goal ?
EDIT:
I have modified my code based on answer from #Dev Khadka.
But I am still struggling with how to properly handle the multiple ip matches. The code below prints all IPs that matched p1:
for line in sys.stdin:
match = c.search(line)
if match:
print(match['ip1'])
But some lines don't match p1. They match p2. ie, I get:
1.2.3.4
None
2.3.4.5
...
How do I print the matching ip, when I don't know wheter it was p1, p2, ... ? All I want is the IP. I don't care which pattern it matched.
You can consider installing the excellent regex module, which supports many advanced regex features, including branch reset groups, designed to solve exactly the problem you outlined in this question. Branch reset groups are denoted by (?|...). All capture groups of the same positions or names in different alternative patterns within a branch reset grouop share the same capture groups for output.
Notice that in the example below the matching capture group becomes the named capture group, so that you don't need to iterate over multiple groups searching for a non-empty group:
import regex
ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
pattern = regex.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
match = regex.search(pattern, line)
if match:
print(match['ip'])
Demo: https://repl.it/#blhsing/RegularEmbellishedBugs
why don't you check which regex matched?
if 'ip1' in match :
print match['ip1']
if 'ip2' in match :
print match['ip2']
or something like:
names = [ 'ip1', 'ip2', 'ip3' ]
for n in names :
if n in match :
print match[n]
or even
num = 1000 # can easily handle millions of patterns =)
for i in range(num) :
name = 'ip%d' % i
if name in match :
print match[name]
thats because you are using same group name for two group
try this, this will give group names ip1 and ip2
import re
IP = r'(?P<ip%d>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
p1 = 'Registration from' + IP%1 + '- Wrong password'
p2 = 'Call from' + IP%2 + 'rejected because extension not found'
c = re.compile(r'(?:' + p1 + '|' + p2 + ')')
Named capture groups must have distinct names, but since all of your capture groups are meant to capture the same pattern, it's better not to use named capture groups in this case but instead simply use regular capture groups and iterate through the groups from the match object to print the first group that is not empty:
ip_pattern = r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
pattern = re.compile('|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
match = re.search(pattern, line)
if match:
print(next(filter(None, match.groups())))
Demo: https://repl.it/#blhsing/UnevenCheerfulLight
Adding ip address validity to already accepted answer.
Altho import ipaddress & import socket should be ideal ways, this code will parse-the-host,
import regex as re
from io import StringIO
def valid_ip(address):
try:
host_bytes = address.split('.')
valid = [int(b) for b in host_bytes]
valid = [b for b in valid if b >= 0 and b<=255]
return len(host_bytes) == 4 and len(valid) == 4
except:
return False
ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
file = StringIO('''
Registration from 259.1.1.1 - Wrong password,
Call from 1.1.2.2 rejected because extension not found
''')
pattern = re.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
list1 = []
list2 = []
for line in file:
match = re.search(pattern, line)
if match:
list1.append(match['ip']) # List of ip address
list2.append(valid_ip(match['ip'])) # Boolean results of valid_ip
for i in range(len(list1)):
if list2[i] == False:
print(f'{list1[i]} is invalid IP')
else:
print(list1[i])
259.1.1.1 is invalid IP
1.1.2.2
[Program finished]

Python - line split with spaces?

I'm sure this is a basic question, but I have spent about an hour on it already and can't quite figure it out. I'm parsing smartctl output, and here is the a sample of the data I'm working with:
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-39-pve] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA MD04ACA500
Serial Number: Y9MYK6M4BS9K
LU WWN Device Id: 5 000039 5ebe01bc8
Firmware Version: FP2A
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Jul 2 11:24:08 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
What I'm trying to achieve is pulling out the device model (some devices it's just one string, other devices, such as this one, it's two words), serial number, time, and a couple other fields. I assume it would be easiest to capture all data after the colon, but how to eliminate the variable amounts of spaces?
Here is the relevant code I currently came up with:
deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")
for line in lines:
parts = line.split()
if str(parts):
if parts[0] == "Device Model: ":
deviceModel = parts[1]
elif parts[0] == "Serial Number: ":
serialNumber = parts[1]
vprint(3, "Device model: %s" %deviceModel)
vprint(3, "Serial number: %s" %serialNumber)
The error I keep getting is:
File "./tester.py", line 152, in parseOutput
if parts[0] == "Device Model: ":
IndexError: list index out of range
I get what the error is saying (kinda), but I'm not sure what else the range could be, or if I'm even attempting this in the right way. Looking for guidance to get me going in the right direction. Any help is greatly appreciated.
Thanks!
The IndexError occurs when the split returns a list of length one or zero and you access the second element. This happens when it isn't finding anything to split (empty line).
No need for regular expressions:
deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")
for line in lines:
if line.startswith("Device Model:"):
deviceModel = line.split(":")[1].strip()
elif line.startswith("Serial Number:"):
serialNumber = line.split(":")[1].strip()
print("Device model: %s" %deviceModel)
print("Serial number: %s" %serialNumber)
I guess your problem is the empty line in the middle. Because,
>>> '\n'.split()
[]
You can do something like,
>>> f = open('a.txt')
>>> lines = f.readlines()
>>> deviceModel = [line for line in lines if 'Device Model' in line][0].split(':')[1].strip()
# 'TOSHIBA MD04ACA500'
>>> serialNumber = [line for line in lines if 'Serial Number' in line][0].split(':')[1].strip()
# 'Y9MYK6M4BS9K'
Try using regular expressions:
import re
r = re.compile("^[^:]*:\s+(.*)$")
m = r.match("Device Model: TOSHIBA MD04ACA500")
print m.group(1) # Prints "TOSHIBA MD04ACA500"
Not sure what version you're running, but on 2.7, line.split() is splitting the line by word, so
>>> parts = line.split()
parts = ['Device', 'Model:', 'TOSHIBA', 'MD04ACA500']
You can also try line.startswith() to find the lines you want https://docs.python.org/2/library/stdtypes.html#str.startswith
The way I would debug this is by printing out parts at every iteration. Try that and show us what the list is when it fails.
Edit: Your problem is most likely what #jonrsharpe said. parts is probably an empty list when it gets to an empty line and str(parts) will just return '[]' which is True. Try to test that.
I think it would be far easier to use regular expressions here.
import re
for line in lines:
# Splits the string into at most two parts
# at the first colon which is followed by one or more spaces
parts = re.split(':\s+', line, 1)
if parts:
if parts[0] == "Device Model":
deviceModel = parts[1]
elif parts[0] == "Serial Number":
serialNumber = parts[1]
Mind you, if you only care about the two fields, startswith might be better.
When you split the blank line, parts is an empty list.
You try to accommodate that by checking for an empty list, But you turn the empty list to a string which causes your conditional statement to be True.
>>> s = []
>>> bool(s)
False
>>> str(s)
'[]'
>>> bool(str(s))
True
>>>
Change if str(parts): to if parts:.
Many would say that using a try/except block would be idiomatic for Python
for line in lines:
parts = line.split()
try:
if parts[0] == "Device Model: ":
deviceModel = parts[1]
elif parts[0] == "Serial Number: ":
serialNumber = parts[1]
except IndexError:
pass

Parsing a text file for pattern and writing found pattern back to another file python 3.4

I am trying to open a text file. Parse the text file for specific regex patterns then when if I find that pattern I write the regex returned pattern to another text file.
Specifically a list of IP Addresses which I want to parse specific ones out of.
So the file may have
10.10.10.10
9.9.9.9
5.5.5.5
6.10.10.10
And say I want just the IPs that end in 10 (the regex I think I am good with) My example looks for the 10.180.42, o4 41.XX IP hosts. But I will adjust as needed.
I've tried several method and fail miserably at them all. It's days like this I know why I just never mastered any language. But I'm committed to Python so here goes.
import re
textfile = open("SymantecServers.txt", 'r')
matches = re.findall('^10.180\.4[3,1].\d\d',str(textfile))
print(matches)
This gives me empty backets. I had to encase the textfile in the str function or it just puked. I don't know if this is right.
This just failed all over the place no matter how I fine tuned it.
f = open("SymantecServers.txt","r")
o = open("JustIP.txt",'w', newline="\r\n")
for line in f:
pattern = re.compile("^10.180\.4[3,1].\d\d")
print(pattern)
#o.write(pattern)
#o.close()
f.close()
I did get one working but it just returned the entire line (including netmask and other test like hostname which are all on the same line in the text file. I just want IP)
Any help on how to read a text file and if it has a pattern of IP grab the full IP and write that into another text file so I end up with a text file with a list of just the IPs I want. I am 3 hours into it and behind on work so going to do the first file by hand...
I am just at a loss what I am missing. Sorry for being a newbie
here is it working:
>>> s = """10.10.10.10
... 9.9.9.9
... 5.5.5.5
... 10.180.43.99
... 6.10.10.10"""
>>> re.findall(r'10\.180\.4[31]\.\d\d', s)
['10.180.43.99']
you do not really need to add line boundaries, as you're matching a very specific IP address, if your file does not have weird things like '123.23.234.10.180.43.99.21354' that you don't want to match, it should be ok!
your syntax of [3,1] is matching either 3, 1 or , and you don't want to match against a comma ;-)
about your function:
r = re.compile(r'10\.180\.4[31]\.\d\d')
with open("SymantecServers.txt","r") as f:
with open("JustIP.txt",'w', newline="\r\n") as o:
for line in f:
matches = r.findall(line)
for match in matches:
o.write(match)
though if I were you, I'd extract IPs using:
r = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
with open("SymantecServers.txt","r") as f:
with open("JustIP.txt",'w', newline="\r\n") as o:
for line in f:
matches = r.findall(line)
for match in matches:
a, b, c, d = match.split('.')
if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
o.write(match)
or another way to do it:
r = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})')
with open("SymantecServers.txt","r") as f:
with open("JustIP.txt",'w', newline="\r\n") as o:
for line in f:
m = r.match(line)
if m:
a, b, c, d = m.groups()
if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
o.write(match)
which uses the regex to split the IP address into groups.
What you're missing is that you're doing a re.compile() which creates a Regular Expression object in Python. You're forgetting to match.
You could try:
# This isn't the best way to match IP's, but if it fits for your use-case keep it for now.
pattern = re.compile("^10.180\.4[13].\d\d")
f = open("SymantecServers.txt",'r')
o = open("JustIP.txt",'w')
for line in f:
m = pattern.match(line)
if m is not None:
print "Match: %s" %(m.group(0))
o.write(m.group(0) + "\n")
f.close()
o.close()
Which is compiling the Python object, attempting to match the line against the compiled object, and then printing out that current match. I can avoid having to split my matches, but I have to pay attention to matching groups - therefore group(0)
You can also look at re.search() which you can do, but if you're running search enough times with the same regular expression, it becomes more worthwhile to use compile.
Also note that I moved the f.close() to the outside of the for loop.

Regex not working locally though working in every online regex tester [duplicate]

This question already has an answer here:
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 1 year ago.
I am doing a small script, which should read my worked time from my email and save how much time I already worked. It is doing this through regex.
Now here is my Script:
import imaplib
import re
from pprint import pprint
mail = imaplib.IMAP4_SSL('imap.gmail.com',993)
mail.login('*************', '**************')
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.
typ, data = mail.search(None, 'SUBJECT', 'Zeiterfassung')
worked_time_pattern = re.compile(r'"(?P<time>\d+(,\d)?)"[^>]*?selected[^>]*>=?(\r?\n?)(?P=time)<')
# old version: worked_time_pattern = re.compile(r'\"(?P<time>[0-9]+(?:[,][0-9])?)\"(?: disabled)? selected(?: disabled)? style=3D"">[=]?[\n]?(?P=time)<\/option>')
date_pattern = re.compile('.*Date: [a-zA-Z]{1,4}[,] (?P<date>[0-9]{1,2} [a-zA-Z]{1,4} [0-9]{4}).*', re.DOTALL)
count = 0
countFail = 0
if 'OK' == typ:
for num in data[0].split():
typ, data = mail.fetch(num, '(RFC822)')
mailbody = "".join(data[0][1].split("=\r\n"))
mailbody = "".join(mailbody.split("\r"))
mailbody = "".join(mailbody.split("\n"))
worked_time = worked_time_pattern.search(data[0][1])
date = date_pattern.match(data[0][1])
if worked_time != None:
print worked_time.group('time')
count = count + 1
else:
print mailbody
countFail = countFail + 1
print worked_time
print "You worked on %s\n" % ( date.group('date'))
#print 'Message %s\n%s\n' % (num, data[0][1])
print count
print countFail
mail.close()
mail.logout()
the problem is, it returns None for worked_time for some of my strings (not all, more than a half works [23 works, 8 not]), which means that the pattern is not matched. I tested it with most online regex testers, and they all told me, that the pattern matches and everything fine..
here a few example strings that weren't accepted but are by online tools, e.g. http://regex101.com
pasted them, because it they are big and ugly:
http://pastebin.com/4Z2BdmXk
http://pastebin.com/dMxcRqQu
btw the regex for date works fine on all (but not on the pasted string I had to cut away the upper part because of a lot of private information)
worked_time_pattern should search for something like: "1,5" disabled selected style=3D"">1,5</option> (and get the 1,5 out of it, exaclty as it does on half of the cases...)
Anybody any idea?
If you think it is inserting =\r\n into your data then keep removing that, but also remove all \rs and \ns.
mailbody = "".join(data[0][1].split("=\r\n"))
mailbody = "".join(data[0][1].split("\r"))
mailbody = "".join(data[0][1].split("\n"))
Then try using the regex I suggested in the comments - although your original expression would likely work fine too.
(?P<time>\d+(,\d)?)"[^>]*?selected[^>]*>=?(\r?\n?)(?P=time)<
As Quirliom suggests in the comments, this is a perfect example of why regex shouldn't be used for parsing HTML - although if the line breaks are present mid-word then this isn't valid HTML either.

Removing a lease that has giving mac from dhcpd.leases with python?

I am trying to remove a lease from dhcpd.lease with python according to its mac address.
This is a dhcpd.lease example
lease 10.14.53.253 {
starts 3 2012/10/17 09:27:20;
ends 4 2012/10/18 09:27:20;
tstp 4 2012/10/18 09:27:20;
binding state free;
hardware ethernet 00:23:18:62:31:5b;
}
lease 10.14.53.252 {
starts 3 2012/10/17 10:15:17;
ends 4 2012/10/18 10:15:17;
tstp 4 2012/10/18 10:15:17;
binding state free;
hardware ethernet 70:71:bc:c8:46:3c;
uid "\001pq\274\310F<";
}
Assume I am given '00:23:18:62:31:5b'. Then I should remove all line belong to this lease. After deletion, file should look like
lease 10.14.53.252 {
starts 3 2012/10/17 10:15:17;
ends 4 2012/10/18 10:15:17;
tstp 4 2012/10/18 10:15:17;
binding state free;
hardware ethernet 70:71:bc:c8:46:3c;
uid "\001pq\274\310F<";
}
I am simple reading a file and put it a string but I have no idea what to do after that. I tried this regex but didn't work. It checked only first line of the file.
fh = open(DHCPFILE)
lines = fh.read()
fh.close()
m = re.match(r"(.*lease.*%s.*})" % mac ,lines)
This problem is not shaped like a regular expression nail, so please put that hammer down.
The correct tool would be to parse the contents into a python structure, filtering out the items you don't want, then writing out the remaining entries again.
pyparsing would make the parsing job easy; the following is based on an existing example:
from pyparsing import *
LBRACE,RBRACE,SEMI,QUOTE = map(Suppress,'{};"')
ipAddress = Combine(Word(nums) + ('.' + Word(nums))*3)
hexint = Word(hexnums,exact=2)
macAddress = Combine(hexint + (':'+hexint)*5)
hdwType = Word(alphanums)
yyyymmdd = Combine((Word(nums,exact=4)|Word(nums,exact=2))+
('/'+Word(nums,exact=2))*2)
hhmmss = Combine(Word(nums,exact=2)+(':'+Word(nums,exact=2))*2)
dateRef = oneOf(list("0123456"))("weekday") + yyyymmdd("date") + \
hhmmss("time")
startsStmt = "starts" + dateRef + SEMI
endsStmt = "ends" + (dateRef | "never") + SEMI
tstpStmt = "tstp" + dateRef + SEMI
tsfpStmt = "tsfp" + dateRef + SEMI
hdwStmt = "hardware" + hdwType("type") + macAddress("mac") + SEMI
uidStmt = "uid" + QuotedString('"')("uid") + SEMI
bindingStmt = "binding" + Word(alphanums) + Word(alphanums) + SEMI
leaseStatement = startsStmt | endsStmt | tstpStmt | tsfpStmt | hdwStmt | \
uidStmt | bindingStmt
leaseDef = "lease" + ipAddress("ipaddress") + LBRACE + \
Dict(ZeroOrMore(Group(leaseStatement))) + RBRACE
input = open(DHCPLEASEFILE).read()
with open(OUTPUTFILE, 'w') as output:
for lease, start, stop in leaseDef.scanString(input):
if lease.hardware.mac != mac:
output.write(input[start:stop])
The above code tersely defines the grammar of a dhcp.leases file, then uses scanString() to parse out each lease in the file. scanString() returns a sequence of matches, each consisting of a parse result and the start and end positions in the original string.
The parse result has a .hardware.mac attribute (you may want to catch AttributeError exceptions on that, in case no hardware statement was present in the input), making it easy to test for your MAC address to remove. If the MAC address doesn't match, we write the whole lease back to an output file, using the start and stop positions to get the original text for that lease (much easier than formatting the lease from the parsed information).

Categories