I am trying to remove a lease from dhcpd.lease with python according to its mac address.
This is a dhcpd.lease example
lease 10.14.53.253 {
starts 3 2012/10/17 09:27:20;
ends 4 2012/10/18 09:27:20;
tstp 4 2012/10/18 09:27:20;
binding state free;
hardware ethernet 00:23:18:62:31:5b;
}
lease 10.14.53.252 {
starts 3 2012/10/17 10:15:17;
ends 4 2012/10/18 10:15:17;
tstp 4 2012/10/18 10:15:17;
binding state free;
hardware ethernet 70:71:bc:c8:46:3c;
uid "\001pq\274\310F<";
}
Assume I am given '00:23:18:62:31:5b'. Then I should remove all line belong to this lease. After deletion, file should look like
lease 10.14.53.252 {
starts 3 2012/10/17 10:15:17;
ends 4 2012/10/18 10:15:17;
tstp 4 2012/10/18 10:15:17;
binding state free;
hardware ethernet 70:71:bc:c8:46:3c;
uid "\001pq\274\310F<";
}
I am simple reading a file and put it a string but I have no idea what to do after that. I tried this regex but didn't work. It checked only first line of the file.
fh = open(DHCPFILE)
lines = fh.read()
fh.close()
m = re.match(r"(.*lease.*%s.*})" % mac ,lines)
This problem is not shaped like a regular expression nail, so please put that hammer down.
The correct tool would be to parse the contents into a python structure, filtering out the items you don't want, then writing out the remaining entries again.
pyparsing would make the parsing job easy; the following is based on an existing example:
from pyparsing import *
LBRACE,RBRACE,SEMI,QUOTE = map(Suppress,'{};"')
ipAddress = Combine(Word(nums) + ('.' + Word(nums))*3)
hexint = Word(hexnums,exact=2)
macAddress = Combine(hexint + (':'+hexint)*5)
hdwType = Word(alphanums)
yyyymmdd = Combine((Word(nums,exact=4)|Word(nums,exact=2))+
('/'+Word(nums,exact=2))*2)
hhmmss = Combine(Word(nums,exact=2)+(':'+Word(nums,exact=2))*2)
dateRef = oneOf(list("0123456"))("weekday") + yyyymmdd("date") + \
hhmmss("time")
startsStmt = "starts" + dateRef + SEMI
endsStmt = "ends" + (dateRef | "never") + SEMI
tstpStmt = "tstp" + dateRef + SEMI
tsfpStmt = "tsfp" + dateRef + SEMI
hdwStmt = "hardware" + hdwType("type") + macAddress("mac") + SEMI
uidStmt = "uid" + QuotedString('"')("uid") + SEMI
bindingStmt = "binding" + Word(alphanums) + Word(alphanums) + SEMI
leaseStatement = startsStmt | endsStmt | tstpStmt | tsfpStmt | hdwStmt | \
uidStmt | bindingStmt
leaseDef = "lease" + ipAddress("ipaddress") + LBRACE + \
Dict(ZeroOrMore(Group(leaseStatement))) + RBRACE
input = open(DHCPLEASEFILE).read()
with open(OUTPUTFILE, 'w') as output:
for lease, start, stop in leaseDef.scanString(input):
if lease.hardware.mac != mac:
output.write(input[start:stop])
The above code tersely defines the grammar of a dhcp.leases file, then uses scanString() to parse out each lease in the file. scanString() returns a sequence of matches, each consisting of a parse result and the start and end positions in the original string.
The parse result has a .hardware.mac attribute (you may want to catch AttributeError exceptions on that, in case no hardware statement was present in the input), making it easy to test for your MAC address to remove. If the MAC address doesn't match, we write the whole lease back to an output file, using the start and stop positions to get the original text for that lease (much easier than formatting the lease from the parsed information).
Related
The error I am getting is a write() takes exactly one argument (5 given). I was able to get the write to work by making a write statement on each line, but that caused each of the inputs to be written on a new line. What I am trying to do is to have the write happen in a format similar to the table created for the temp file. I am not sure how I would implement the logic to make that happen.
import os
def main ():
temp_file = open('temp.txt', 'a')
temp_file.write('Product Code | Description | Price' + '\n'
'TBL100 | Oak Table | 799.99' + '\n'
'CH23| Cherry Captains Chair | 199.99' + '\n'
'TBL103| WalnutTable |1999.00' + '\n'
'CA5| Chest Five Drawer| 639' + '\n')
another = 'y'
# Add records to the file.
while another == 'y' or another == 'Y':
# Get the coffee record data.
print('Enter the following furniture data:')
code = input('Product code: ')
descr = input('Description: ')
price = float(input('Price: '))
# Append the data to the file.
temp_file.write(code, print('|'), descr, print('|'), str(price) + '\n')
# Determine whether the user wants to add
# another record to the file.
print('Do you want to add another record?')
another = input('Y = yes, anything else = no: ')
# Close the file.
temp_file.close()
print('Data appended to temp_file.')
You should only write one line via one parameter
temp_file.write(f'{code} | {descr} | {price}\n')
In your code, just replace this line
temp_file.write(code, print('|'), descr, print('|'), str(price) + '\n')
by this line
temp_file.write(code + '|' + descr + '|' + str(price) + '\n')
Explanations:
The method write takes one argument, but you provide five in your code. That is the reason of the error you have got. You just have to concatenate your variables to get one string that you will pass to the method.
I want to read and print the 3rd from last line in date.log. This is specified in the line_from_bottom_line variable.
The log can have any number of lines at any time.
Here is an example of what the log looks like:
192.168.80.231 May 8 2018 18:45:00
192.168.80.231 July 30 2018 09:46:48
192.168.80.231 July 2 2018 14:37:14
If there are only 3 lines in the log, the else line will be printed:
line_number[x1] = [(index,log_time)]
The output will be:
(1, '18:45:00')
Which is not what I want.
If there are 4 or more lines, the line printed will be in the format of:
2018:8:18:45:00
This is what I want.
I think my code below is moving to the bottom line and subtracting 3. So if 4 lines don't exist, it doesn't know
what to print. How can I change it so that the 3rd from bottom line is printed even if there are not 4 or more
lines in the log?
old_time = (line_number[x1][-(line_from_bottom_line)].__str__())
from datetime import datetime, date, time
# this would be the third from last line
line_from_bottom_line = 3
date_order = 2
time_order = 4
year_order = 3
present = datetime.now()
def main():
logfile = open('krinkov.log', 'r+')
line_number = dict()
for index,line in enumerate(logfile,1): # scan lines
if line in ['\n', '\r\n']: # Error Checking: if not enough lines in var .log
print("Not enough lines.")
return
if line:
x1 = line.split()[0] # if line, get IP address
log_day = line.split()[date_order]
log_time = line.split()[time_order] # This will already be in the format of hh:mm:ss
log_year = line.split()[year_order]
if x1 in line_number : # if ip address on line
line_number[x1].append((log_year + ":" + log_day + ":" + log_time))
else:
line_number[x1] = [(index,log_time)]
if x1 in line_number and len(line_number.get(x1,None)) > 1:
# Below is where I am having issues.
# If there are not 4 or more lines in the log, an error occurs.
old_time = (line_number[x1][-line_from_bottom_line])
print(old_time)
# ** get last line number. Print that line number. then subtract 2 from it
# old_time = that new number
else:
print('Nothing')
main()
Problem: Look at where you add new elements to your line_number dictionary:
if x1 in line_number : # if ip address on line
line_number[x1].append((log_year + ":" + log_day + ":" + log_time))
else:
line_number[x1] = [(index,log_time)]
If the dictionary does not contain the IP address yet (i.e. the else part gets executed), you create the IP field with a dictionary containing a list with the element (index,log_time), which is a tuple with two elements.
After that, if the IP address is already contained (the if part gets executed), you only add (log_year + ":" + log_day + ":" + log_time), i.e. the string log_year + ":" + log_day + ":" + log_time. That's because (elem) in Python gets unpacked to elem. If you want to create a tuple containing a single element, you have to write (elem,).
Considering this, it seems likeevery value in your line_number dictionary will look something like this (check this!):
[(1, '18:45:00'), "2018:8:18:45:00", "2018:8:18:45:00", "2018:8:18:45:00" ... ]
Fix: changing [(index,log_time)] in the above excerpt to [(log_year + ":" + log_day + ":" + log_time)] should fix your problem. It's bad coding style though because you're writing the same thing twice. A better solution would be to replace the above code with the following line:
line_number[x1] = line_number.get(x1, []) + [f"{log_year}:{log_day}:{log_time}"]
I'm trying to use pyparsing to parse key:value pairs from the comments in a document. A key starts at the beginning of a line, and a value follows. Values may be continued on multiple lines that begin with whitespace.
import pyparsing as pp
instring = """
-- This is (a) #%^& comment
/*
name1: val
name2: val2 with $*&##) junk
name3: val3: with #)(*% multi-
line: content
*/
"""
comment1 = pp.Literal("--") + pp.originalTextFor(pp.SkipTo(pp.LineEnd())).setDebug()
identifier = pp.Word(pp.alphanums + "_").setDebug()
meta1 = pp.LineStart() + identifier + pp.Literal(":") + pp.SkipTo(pp.LineEnd())
meta2 = pp.LineStart() + pp.White() + pp.SkipTo(pp.LineEnd())
metaval = meta1 + pp.ZeroOrMore(meta2)
metalist = pp.ZeroOrMore(comment1) + pp.Literal("/*") + pp.OneOrMore(metaval) + pp.Literal("*/")
if __name__ == "__main__":
p = metalist.parseString(instring)
print(p)
Fails with:
Matched {Empty SkipTo:(LineEnd) Empty} -> ['This is (a) #%^& comment']
File "C:\Users\user\py3\lib\site-packages\pyparsing.py", line 2305, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected start of line (at char 32), (line:4, col:1)
The answer to pyparsing whitespace match issues says
LineStart has always been difficult to work with, but ...
If the parser is at line 4 column 1 (the first key:value pair), then why is it not finding a start of line? What is the correct pyparsing syntax to recognize lines beginning with no whitespace and lines beginning with whitespace?
I think the confusion I have with LineStart is that, for LineEnd, I can look for a '\n' character, but there is no separate character for LineStart. So in LineStart I look to see if the current parser location is positioned just after a '\n'; or if it is currently on a '\n', move past it and still continue. Unfortunately, I implemented this in a place that messes up the reporting location, so you get those weird errors that read like "failed to find a start of line on line X col 1," which really does sound like it should be a successfully matched start of a line. Also, I think I need to revisit this implicit newline-skipping, or for that matter, all whitespace-skipping in general for LineStart.
For now, I've gotten your code to work by expanding your line-starting expression slightly, as:
LS = pp.Optional(pp.LineEnd()) + pp.LineStart()
and replaced the LineStart references in meta1 and meta2 with LS:
comment1 = pp.Literal("--") + pp.originalTextFor(pp.SkipTo(pp.LineEnd())).setDebug()
identifier = pp.Word(pp.alphanums + "_").setDebug()
meta1 = LS + identifier + pp.Literal(":") + pp.SkipTo(pp.LineEnd())
meta2 = LS + pp.White() + pp.SkipTo(pp.LineEnd())
metaval = meta1 + pp.ZeroOrMore(meta2)
metalist = pp.ZeroOrMore(comment1) + pp.Literal("/*") + pp.OneOrMore(metaval) + pp.Literal("*/")
If this situation with LineStart leaves you uncomfortable, here is another tactic you can try: using a parse-time condition to only accept identifiers that start in column 1:
comment1 = pp.Literal("--") + pp.originalTextFor(pp.SkipTo(pp.LineEnd())).setDebug()
identifier = pp.Word(pp.alphanums + "_").setName("identifier")
identifier.addCondition(lambda instring,loc,toks: pp.col(loc,instring) == 1)
meta1 = identifier + pp.Literal(":") + pp.SkipTo(pp.LineEnd()).setDebug()
meta2 = pp.White().setDebug() + pp.SkipTo(pp.LineEnd()).setDebug()
metaval = meta1 + pp.ZeroOrMore(meta2, stopOn=pp.Literal('*/'))
metalist = pp.ZeroOrMore(comment1) + pp.Literal("/*") + pp.LineEnd() + pp.OneOrMore(metaval) + pp.Literal("*/")
This code does away with LineStart completely, while I figure out just what I want this particular token to do. I also had to modify the ZeroOrMore repetition in metaval so that */ would not be accidentally processed as continued comment content.
Thanks for your patience with this - I am not keen to quickly put out a patched LineStart change and then find that I have overlooked other compatibility or other edge cases that just put me back in the current less-than-great state on this class. But I'll put some effort into clarifying this behavior before putting out 2.1.10.
I have a quite large configuration file that consists of blocks delimited by
#start <some-name> ... #end <some-name> were some-name has to be the same for the block. The block can appear multiple times but is never contained within itself. Only some other blocks may appear in certain blocks. I'm not interested in these contained blocks, but on the blocks in the second level.
In the real file the names do not start with blockX but are very different from each other.
An example:
#start block1
#start block2
/* string but no more name2 or name1 in here */
#end block2
#start block3
/* configuration data */
#end block3
#end block1
This is being parsed with regex and is, when run without a debugger attached, quite fast. 0.23s for a 2k 2.7MB file with simple rules like:
blocks2 = re.findAll('#start block2\s+(.*?)#end block2', contents)
I tried parsing this with pyparsing but the speed is VERY slow even without a debugger attached, it took 16s for the same file.
My approach was to produce a pyparsing code that would mimic the simple parsing from the regex so I can use some of the other code for now and avoid having to parse every block now. The grammar is quite extense.
Here is what I tried
block = [Group(Keyword(x) + SkipTo(Keyword('#end') + Keyword(x)) + Keyword('#end') - x )(x + '*') for x in ['block3', 'block4', 'block5', 'block6', 'block7', 'block8']]
blocks = Keyword('#start') + block
x = OneOrMore(blocks).searchString(contents) # I also tried parseString() but the results were similar.
What am I doing wrong? How can I optimize this to come anywhere close to the speed achieved by the regex implementation?
Edit: The previous example was way to easy compared to the real data, so i created a proper one now:
/* all comments are C comments */
VERSION 1 0
#start PROJECT project_name "what is it about"
/* why not another comment here too! */
#start SECTION where_the_wild_things_are "explain this section"
/* I need all sections at this level */
/* In the real data there are about 10k of such blocks.
There are around 10 different names (types) of blocks */
#start INTERFACE_SPEC
There can be anything in the section. Not Really but i want to skip anything until the matching (hash)end.
/* can also have comments */
#end INTERFACE_SPEC
#start some_other_section
name 'section name'
#start with_inner_section
number_of_points 3 /* can have comments anywhere */
#end with_inner_section
#end some_other_section /* basically comments can be anywhere */
#start some_other_section
name 'section name'
other_section_attribute X
ref_to_section another_section
#end some_other_section
#start another_section
degrees
#start section_i_do_not_care_about_at_the_moment
ref_to some_other_section
/* of course can have comments */
#end section_i_do_not_care_about_at_the_moment
#end another_section
#end SECTION
#end PROJECT
For this i had to expand your original suggestion. I hard coded the two outer blocks (PROJECT and SECTION) because they MUST exist.
With this version the time is still at ~16s:
def test_parse(f):
import pyparsing as pp
import io
comment = pp.cStyleComment
start = pp.Literal("#start")
end = pp.Literal("#end")
ident = pp.Word(pp.alphas + "_", pp.printables)
inner_ident = ident.copy()
inner_start = start + inner_ident
inner_end = end + pp.matchPreviousLiteral(inner_ident)
inner_block = pp.Group(inner_start + pp.SkipTo(inner_end) + inner_end)
version = pp.Literal('VERSION') - pp.Word(pp.nums)('major_version') - pp.Word(pp.nums)('minor_version')
project = pp.Keyword('#start') - pp.Keyword('PROJECT') - pp.Word(pp.alphas + "_", pp.printables)(
'project_name') - pp.dblQuotedString + pp.ZeroOrMore(comment) - \
pp.Keyword('#start') - pp.Keyword('SECTION') - pp.Word(pp.alphas, pp.printables)(
'section_name') - pp.dblQuotedString + pp.ZeroOrMore(comment) - \
pp.OneOrMore(inner_block) + \
pp.Keyword('#end') - pp.Keyword('SECTION') + \
pp.ZeroOrMore(comment) - pp.Keyword('#end') - pp.Keyword('PROJECT')
grammar = pp.ZeroOrMore(comment) - version.ignore(comment) - project.ignore(comment)
with io.open(f) as ff:
return grammar.parseString(ff.read())
EDIT: Typo, said it was 2k but it instead it is a 2.7MB file.
First of all, this code as posted doesn't work for me:
blocks = Keyword('#start') + block
Changing to this:
blocks = Keyword('#start') + MatchFirst(block)
at least runs against your sample text.
Rather than hard-code all the keywords, you can try using one of pyparsing's adaptive expressions, matchPreviousLiteral:
(EDITED)
def grammar():
import pyparsing as pp
comment = pp.cStyleComment
start = pp.Keyword("#start")
end = pp.Keyword('#end')
ident = pp.Word(pp.alphas + "_", pp.printables)
integer = pp.Word(pp.nums)
inner_ident = ident.copy()
inner_start = start + inner_ident
inner_end = end + pp.matchPreviousLiteral(inner_ident)
inner_block = pp.Group(inner_start + pp.SkipTo(inner_end) + inner_end)
VERSION, PROJECT, SECTION = map(pp.Keyword, "VERSION PROJECT SECTION".split())
version = VERSION - pp.Group(integer('major_version') + integer('minor_version'))
project = (start - PROJECT + ident('project_name') + pp.dblQuotedString
+ start + SECTION + ident('section_name') + pp.dblQuotedString
+ pp.OneOrMore(inner_block)('blocks')
+ end + SECTION
+ end + PROJECT)
grammar = version + project
grammar.ignore(comment)
return grammar
It is only necessary to call ignore() on the topmost expression in your grammar - it will propagate down to all internal expressions. Also, it should be unnecessary to sprinkle ZeroOrMore(comment)s in your grammar, if you have already called ignore().
I parsed a 2MB input string (containing 10,000 inner blocks) in about 16 seconds, so a 2K file should only take about 1/1000th as long.
I want to match all dhcp leases that have given mac address.
I wrote this code
fh = open(leaseFile)
lines = fh.read()
fh.close()
regex = r"lease\s*[0-9\.]+\s*\{[^\{\}]*%s[^\{\}]*?\}" % mac #mac comes as parameter
m = re.findall(regex,lines,re.DOTALL)
This worked well if a lease don't contain '}' character. But if it does, my regex failed.
For example:
lease 10.14.53.253 {
starts 3 2012/10/17 09:27:20;
ends 4 2012/10/18 09:27:20;
tstp 4 2012/10/18 09:27:20;
binding state free;
hardware ethernet 00:23:18:62:31:5b;
uid "\001\000\013OW}k";
}
I couldn't figure out how I handle this exception. Thanks for any advice...
EDIT
After research, I decided to use this regex with MULTILINE mode. It worked for all leases that I tried.
fh = open(leaseFile)
lines = fh.read()
fh.close()
regex = r"lease\s*[0-9\.]+\s*\{[^\{\}]*%s[\s\S]*?^\}" % mac #mac comes as parameter
m = re.findall(regex,lines,re.MULTILINE)
regex = r'(lease\s*[0-9\.]+\s*\{[^\{\}]*%s[^\{\}]*(.*"[^\{\}]*\}|\}))' % mac #mac comes as parameter
m = re.findall(regex,lines)
This should do the trick.