how to extract the key from the log in python

how to extract the key from the log in python - python

i write the python code ,in order to extract key from the log.And using the same log,it worked well in one machine.But when i run it in hadoop,it failed.I guess there are some bugs when using regex.Who can give me some comments?Is regex can't support hadoop?
This python code is aim to extract qry and rc ,and count the value of rc ,and then print it as qry query_count rc_count .When run it in hadoop,it report
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1.
I search google,there may some bug in your mapper code.So how can i fix it?
log formats like that,
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|1|1 discount=20 indv_type=0 rep_query=
And my python code is that
import sys
import re
for line in sys.stdin:
count_result = 0
line = line.strip()
match=re.search('.*qry=(.*?)qid0.*rc=(.*?)discount',line).groups()
if (len(match)<2):
continue
counts_tmp = match[1].strip()
counts=counts_tmp.split('|')
for count in counts:
if count.isdigit():
count_result += int(count)
key_tmp = match[0].strip()
if key_tmp.strip():
key = key_tmp.split('\t')
key = ' '.join(key)
print '%s\t%s\t%s' %(key,1,count_result)

Most likely is that your regular expression catches more that you expect. I would suggest to split it to some more simple parts like:
(?<= qry=).*(?= quid0)
and
(?<= rc=).*(?= discount)

Taking a lot of assumptions and hazarding an educated guess, you might be able to parse your log like this:
from collections import defaultdict
input = """NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|1|1 discount=20 indv_type=0 rep_query=
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=boats qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|5|2 discount=20 indv_type=0 rep_query=
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|somestring|12 discount=20 indv_type=0 rep_query="""
d = defaultdict (lambda: 0)
for line in input.split ("\n"):
tokens = line.split (" ")
count = 0
qry = None
for token in tokens:
pair = token.split ("=")
if len (pair) != 2: continue
key, value = pair
if key == "qry":
qry = value
if key == "rc":
values = value.split ("|")
for value in values:
try: count += int (value)
except: pass
if qry: d [qry] += count
print (d)
Assuming, that (a) key-value pairs are separated by spaces, and (b) there are no spaces inside neither keys nor values.

Related

Command not executed on if statement being true

So I'm writing a simple program in Python that checks active window ID and in this program I have an if statement inside a while True loop that checks if the active window ID matches the expected window ID.
Here's the code:
import os, subprocess
def get_window_id():
output = subprocess.check_output("xprop -root | grep \"window id\"", shell=True)
get_window_id.result = {}
for row in output.split(b"\n"):
if b": " in row:
key, value = row.split(b": ")
get_window_id.result[key.strip(b"window id # ")] = value.strip()
while True:
get_window_id()
print(get_window_id.result[b"_NET_ACTIVE_WINDOW(WINDOW)"].strip(b"b'window id # "))
if get_window_id.result[b"_NET_ACTIVE_WINDOW(WINDOW)"].strip(b"b'window id # ") != "b'0x3000003'":
os.system("kill -STOP $(pgrep mpv)")
else:
os.system("kill -CONT $(pgrep mpv)")
When the active window ID doesn't match the expected one, the block of code that corresponds to that condition executes, but when the active window ID matches the expected one, the block of code that corresponds to that condition doesn't execute. I obviously want that block of code to execute.
As you can see, I'm not the best at Python and asking for help but I hope that this is enough for somebody to help. If not I can always provide more information :)
And sorry if the title and/or body of the question are not clear. It's currently 2AM and I've been trying to do this all day.
Thanks in advance.

YES! I did it! Just needed some sleep and a little bit more searching, but basically what I needed to do is get rid of those bs before the strings and to do that I needed to add encoding=utf8 to the subprocess.check_output line and delete the bs and it worked!
Code after the change:
import os, subprocess
def get_window_id():
output = subprocess.check_output("xprop -root | grep \"window id\"", shell=True, encoding="utf8")
get_window_id.result = {}
for row in output.split("\n"):
if ": " in row:
key, value = row.split(": ")
get_window_id.result[key.strip("window id # ")] = value.strip()
while True:
get_window_id()
print(get_window_id.result["_NET_ACTIVE_WINDOW(WINDOW)"].strip("'window id # "))
if get_window_id.result["_NET_ACTIVE_WINDOW(WINDOW)"].strip("'window id # ") != "0x3000003":
os.system("kill -STOP $(pgrep mpv)")
else:
os.system("kill -CONT $(pgrep mpv)")

How to check the windows path matches with partial Linux path string

I am trying to check what files that are present in my full_list_files are also present in required_list.
The thing here is they are not exactly equal to one other , but macthes with filename and last sub directory.
Example :
'C:\Users\Documents\Updated\Build\Output\M\Application_1.bin' matches with "M/Application_1.bin" except the slashes are different.
So I am trying to make both uniform by using the function convert_fslash_2_bslash
But still, I see the output as below ,none of the files are matched.
full_list_files = set(['C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Report.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_2.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Testing.txt', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\masking.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_1.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_1.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\History.zip', 'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Challenge.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_2.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\porting.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Booting.tar.gz'])
original required_list = set(['N/Application_2.bin', 'M/masking.tar.gz', 'N/Application_1.bin', 'O/Challenge.tar.gz', 'M/Application_1.bin', 'O/Testing.txt', 'M/rooting.tar.gz', 'M/Application_2.bin', 'O/History.zip', 'N/porting.tar.gz', 'O/Report.tar.gz'])
modified required_list = ['N\\Application_2.bin', 'M\\masking.tar.gz', 'N\\Application_1.bin', 'O\\Challenge.tar.gz', 'M\\Application_1.bin', 'O\\Testing.txt', 'M\\rooting.tar.gz', 'M\\Application_2.bin', 'O\\History.zip', 'N\\porting.tar.gz', 'O\\Report.tar.gz']
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Report.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_2.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Testing.txt' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\masking.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_1.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_1.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\History.zip' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Challenge.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_2.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\porting.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Booting.tar.gz' not present
How can I get it working correctly.
import os
import sys
import re
full_list_files = {
#These are actually real paths parsed from listdir
#Just for convenience used as strings
'C:\Users\Documents\Updated\Build\Output\M\Application_1.bin',
'C:\Users\Documents\Updated\Build\Output\M\Application_2.bin',
'C:\Users\Documents\Updated\Build\Output\M\masking.tar.gz',
'C:\Users\Documents\Updated\Build\Output\M\Booting.tar.gz',
'C:\Users\Documents\Updated\Build\Output\N\Application_1.bin',
'C:\Users\Documents\Updated\Build\Output\N\Application_2.bin',
'C:\Users\Documents\Updated\Build\Output\N\porting.tar.gz',
'C:\Users\Documents\Updated\Build\Output\O\Challenge.tar.gz',
'C:\Users\Documents\Updated\Build\Output\O\History.zip',
'C:\Users\Documents\Updated\Build\Output\O\Testing.txt',
'C:\Users\Documents\Updated\Build\Output\O\Report.tar.gz'
}
required_list = {
"M/Application_1.bin",
"M/Application_2.bin",
"M/masking.tar.gz",
"M/rooting.tar.gz",
"N/Application_1.bin",
"N/Application_2.bin",
"N/porting.tar.gz",
"O/Challenge.tar.gz",
"O/History.zip",
"O/Testing.txt",
"O/Report.tar.gz"
}
def convert_fslash_2_bslash(required_file_list):
required_config_file_list = []
i = 0
for entry in required_file_list:
entry = entry.strip()
entry = entry.replace('"',"")
entry = entry.replace('/','\\')
required_config_file_list.insert(i, entry)
i = i + 1
return required_config_file_list
if __name__ == "__main__":
print
print "full_list_files = ", full_list_files
print
print "original required_list = ", required_list
print
required_config_file_list = convert_fslash_2_bslash(required_list)
print "modified required_list = ", required_config_file_list
print
for f_entry in full_list_files:
f_entry = repr(f_entry)
#for r_entry in required_config_file_list:
#if ( f_entry.find(r_entry) != -1):
if f_entry in required_config_file_list:
print f_entry ," present"
else:
print f_entry ," not present"

Here is the logic you need at the bottom:
for f_entry in full_list_files:
for r_entry in required_config_file_list:
if f_entry.endswith(r_entry):
print f_entry, " present"
You need to loop over both collections, then check to see if the longer path ends with the shorter path. One of your mistakes was calling repr(), which changes the double backslashes to quadruple ones.
I'll leave it up to you to decide how you'll handle printing paths that are not present at all.

python: how to delete registry key (and subkeys) from HKLM (getting error 5)

I'm trying to delete certain registry keys, via python script.
i have no problems reading and deleting keys from the "HKEY_CURRENT_USER", but trying to do the same from the "HKEY_LOCAL_MACHINE", gives me the dreaded WindowsError: [Error 5] Access is denied.
i'm running the script via the IDLE IDE, with admin privileges.
here's the code:
from _winreg import *
ConnectRegistry(None,HKEY_LOCAL_MACHINE)
OpenKey(HKEY_LOCAL_MACHINE,r'software\wow6432node\App',0,KEY_ALL_ACCESS)
DeleteKey(OpenKey(HKEY_LOCAL_MACHINE,r'software\wow6432node'),'App')

You need to remove all subkeys before you can delete the key.
def deleteSubkey(key0, key1, key2=""):
import _winreg
if key2=="":
currentkey = key1
else:
currentkey = key1+ "\\" +key2
open_key = _winreg.OpenKey(key0, currentkey ,0,_winreg.KEY_ALL_ACCESS)
infokey = _winreg.QueryInfoKey(open_key)
for x in range(0, infokey[0]):
#NOTE:: This code is to delete the key and all subkeys.
# If you just want to walk through them, then
# you should pass x to EnumKey. subkey = _winreg.EnumKey(open_key, x)
# Deleting the subkey will change the SubKey count used by EnumKey.
# We must always pass 0 to EnumKey so we
# always get back the new first SubKey.
subkey = _winreg.EnumKey(open_key, 0)
try:
_winreg.DeleteKey(open_key, subkey)
print "Removed %s\\%s " % ( currentkey, subkey)
except:
deleteSubkey( key0, currentkey, subkey )
# no extra delete here since each call
#to deleteSubkey will try to delete itself when its empty.
_winreg.DeleteKey(open_key,"")
open_key.Close()
print "Removed %s" % (currentkey)
return
Here is an how you run it:
deleteSubkey(_winreg.HKEY_CURRENT_USER, "software\\wow6432node", "App")
deleteSubkey(_winreg.HKEY_CURRENT_USER, "software\\wow6432node\\App")

Just my two cents on the topic, but I recurse to the lowest subkey and delete on unravel:
def delete_sub_key(root, sub):
try:
open_key = winreg.OpenKey(root, sub, 0, winreg.KEY_ALL_ACCESS)
num, _, _ = winreg.QueryInfoKey(open_key)
for i in range(num):
child = winreg.EnumKey(open_key, 0)
delete_sub_key(open_key, child)
try:
winreg.DeleteKey(open_key, '')
except Exception:
# log deletion failure
finally:
winreg.CloseKey(open_key)
except Exception:
# log opening/closure failure
The difference between the other posts is that I do not try to delete if num is >0 because it will fail implicitly (as stated in the docs). So I don't waste time to try if there are subkeys.

[EDIT]
I have created a pip package that handles registry keys.
Install with: pip install windows_tools.registry
Usage:
from windows_tools.registry import delete_sub_key, KEY_WOW64_32KEY, KEY_WOW64_64KEY
keys = ['SOFTWARE\MyInstalledApp', 'SOFTWARE\SomeKey\SomeOtherKey']
for key in keys:
delete_sub_key(key, arch=KEY_WOW64_32KEY | KEY_WOW64_64KEY)
[/EDIT]
Unburying this old question, here's an updated version of ChrisHiebert's recursive function that:
Handles Python 3 (tested with Python 3.7.1)
Handles multiple registry architectures (eg Wow64 for Python 32 on Windows 64)
Is PEP-8 compliant
The following example shows function usage to delete two keys in all registry architectures (standard and redirected WOW6432Node) by using architecture key masks.
Hopefully this will help someone:
import winreg
def delete_sub_key(key0, current_key, arch_key=0):
open_key = winreg.OpenKey(key0, current_key, 0, winreg.KEY_ALL_ACCESS | arch_key)
info_key = winreg.QueryInfoKey(open_key)
for x in range(0, info_key[0]):
# NOTE:: This code is to delete the key and all sub_keys.
# If you just want to walk through them, then
# you should pass x to EnumKey. sub_key = winreg.EnumKey(open_key, x)
# Deleting the sub_key will change the sub_key count used by EnumKey.
# We must always pass 0 to EnumKey so we
# always get back the new first sub_key.
sub_key = winreg.EnumKey(open_key, 0)
try:
winreg.DeleteKey(open_key, sub_key)
print("Removed %s\\%s " % (current_key, sub_key))
except OSError:
delete_sub_key(key0, "\\".join([current_key,sub_key]), arch_key)
# No extra delete here since each call
# to delete_sub_key will try to delete itself when its empty.
winreg.DeleteKey(open_key, "")
open_key.Close()
print("Removed %s" % current_key)
return
# Allows to specify if operating in redirected 32 bit mode or 64 bit, set arch_keys to 0 to disable
arch_keys = [winreg.KEY_WOW64_32KEY, winreg.KEY_WOW64_64KEY]
# Base key
root = winreg.HKEY_LOCAL_MACHINE
# List of keys to delete
keys = ['SOFTWARE\MyInstalledApp', 'SOFTWARE\SomeKey\SomeOtherKey']
for key in keys:
for arch_key in arch_keys:
try:
delete_sub_key(root, key, arch_key)
except OSError as e:
print(e)

Figured it out!
turns out the registry key wasn't empty and contained multiple subkeys.
i had to enumerate and delete the subkeys first, and only then i was able to delete the main key from HKLM.
(also added "try...except", so it wouldn't break the whole code, it case there were problems).

This is my solution. I like to use with statements in order to not have to close the key manually. First I check for sub keys and delete them before I delete the key itself. EnumKey raises an OSError if no sub key exists. I use this to break out of the loop.
from winreg import *
def delete_key(key: Union[HKEYType, int], sub_key_name: str):
with OpenKey(key, sub_key_name) as sub_key:
while True:
try:
sub_sub_key_name = EnumKey(sub_key, 0)
delete_key(sub_key, sub_sub_key_name)
except OSError:
break
DeleteKey(key, sub_key_name)

Unpack ValueError in Python

I was making a site component scanner with Python. Unfortunately, something goes wrong when I added another value to my script. This is my script:
#!/usr/bin/python
import sys
import urllib2
import re
import time
import httplib
import random
# Color Console
W = '\033[0m' # white (default)
R = '\033[31m' # red
G = '\033[1;32m' # green bold
O = '\033[33m' # orange
B = '\033[34m' # blue
P = '\033[35m' # purple
C = '\033[36m' # cyan
GR = '\033[37m' # gray
#Bad HTTP Responses
BAD_RESP = [400,401,404]
def main(path):
print "[+] Testing:",host.split("/",1)[1]+path
try:
h = httplib.HTTP(host.split("/",1)[0])
h.putrequest("HEAD", "/"+host.split("/",1)[1]+path)
h.putheader("Host", host.split("/",1)[0])
h.endheaders()
resp, reason, headers = h.getreply()
return resp, reason, headers.get("Server")
except(), msg:
print "Error Occurred:",msg
pass
def timer():
now = time.localtime(time.time())
return time.asctime(now)
def slowprint(s):
for c in s + '\n':
sys.stdout.write(c)
sys.stdout.flush() # defeat buffering
time.sleep(8./90)
print G+"\n\t Whats My Site Component Scanner"
coms = { "index.php?option=com_artforms" : "com_artforms" + "link1","index.php?option=com_fabrik" : "com_fabrik" + "ink"}
if len(sys.argv) != 2:
print "\nUsage: python jx.py <site>"
print "Example: python jx.py www.site.com/\n"
sys.exit(1)
host = sys.argv[1].replace("http://","").rsplit("/",1)[0]
if host[-1] != "/":
host = host+"/"
print "\n[+] Site:",host
print "[+] Loaded:",len(coms)
print "\n[+] Scanning Components\n"
for com,nme,expl in coms.items():
resp,reason,server = main(com)
if resp not in BAD_RESP:
print ""
print G+"\t[+] Result:",resp, reason
print G+"\t[+] Com:",nme
print G+"\t[+] Link:",expl
print W
else:
print ""
print R+"\t[-] Result:",resp, reason
print W
print "\n[-] Done\n"
And this is the error message that comes up:
Traceback (most recent call last):
File "jscan.py", line 69, in <module>
for com,nme,expl in xpls.items():
ValueError: need more than 2 values to unpack
I already tried changing the 2 value into 3 or 1, but it doesn't seem to work.

xpls.items returns a tuple of two items, you're trying to unpack it into three. You initialize the dict yourself with two pairs of key:value:
coms = { "index.php?option=com_artforms" : "com_artforms" + "link1","index.php?option=com_fabrik" : "com_fabrik" + "ink"}
besides, the traceback seems to be from another script - the dict is called xpls there, and coms in the code you posted...

you can try
for (xpl, poc) in xpls.items():
...
...
because dict.items will return you tuple with 2 values.

You have all the information you need. As with any bug, the best place to start is the traceback. Let's:
for com,poc,expl in xpls.items():
ValueError: need more than 2 values to unpack
Python throws ValueError when a given object is of correct type but has an incorrect value. In this case, this tells us that xpls.items is an iterable an thus can be unpacked, but the attempt failed.
The description of the exception narrows down the problem: xpls has 2 items, but more were required. By looking at the quoted line, we can see that "more" is 3.
In short: xpls was supposed to have 3 items, but has 2.
Note that I never read the rest of the code. Debugging this was possible using only those 2 lines.
Learning to read tracebacks is vital. When you encounter an error such as this one again, devote at least 10 minutes to try to work with this information. You'll be repayed tenfold for your effort.

As already mentioned, dict.items() returns a tuple with two values. If you use a list of strings as dictionary values instead of a string, which should be split anyways afterwards, you can go with this syntax:
coms = { "index.php?option=com_artforms" : ["com_artforms", "link1"],
"index.php?option=com_fabrik" : ["com_fabrik", "ink"]}
for com, (name, expl) in coms.items():
print com, name, expl
>>> index.php?option=com_artforms com_artforms link1
>>> index.php?option=com_fabrik com_fabrik ink

How to parse nagios status.dat file?

I'd like to parse status.dat file for nagios3 and output as xml with a python script.
The xml part is the easy one but how do I go about parsing the file? Use multi line regex?
It's possible the file will be large as many hosts and services are monitored, will loading the whole file in memory be wise?
I only need to extract services that have critical state and host they belong to.
Any help and pointing in the right direction will be highly appreciated.
LE Here's how the file looks:
########################################
# NAGIOS STATUS FILE
#
# THIS FILE IS AUTOMATICALLY GENERATED
# BY NAGIOS. DO NOT MODIFY THIS FILE!
########################################
info {
created=1233491098
version=2.11
}
program {
modified_host_attributes=0
modified_service_attributes=0
nagios_pid=15015
daemon_mode=1
program_start=1233490393
last_command_check=0
last_log_rotation=0
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=0
enable_failure_prediction=1
process_performance_data=0
global_host_event_handler=
global_service_event_handler=
total_external_command_buffer_slots=4096
used_external_command_buffer_slots=0
high_external_command_buffer_slots=0
total_check_result_buffer_slots=4096
used_check_result_buffer_slots=0
high_check_result_buffer_slots=2
}
host {
host_name=localhost
modified_attributes=0
check_command=check-host-alive
event_handler=
has_been_checked=1
should_be_scheduled=0
check_execution_time=0.019
check_latency=0.000
check_type=0
current_state=0
last_hard_state=0
plugin_output=PING OK - Packet loss = 0%, RTA = 3.57 ms
performance_data=
last_check=1233490883
next_check=0
current_attempt=1
max_attempts=10
state_type=1
last_state_change=1233489475
last_hard_state_change=1233489475
last_time_up=1233490883
last_time_down=0
last_time_unreachable=0
last_notification=0
next_notification=0
no_more_notifications=0
current_notification_number=0
notifications_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=1
obsess_over_host=1
last_update=1233491098
is_flapping=0
percent_state_change=0.00
scheduled_downtime_depth=0
}
service {
host_name=gateway
service_description=PING
modified_attributes=0
check_command=check_ping!100.0,20%!500.0,60%
event_handler=
has_been_checked=1
should_be_scheduled=1
check_execution_time=4.017
check_latency=0.210
check_type=0
current_state=0
last_hard_state=0
current_attempt=1
max_attempts=4
state_type=1
last_state_change=1233489432
last_hard_state_change=1233489432
last_time_ok=1233491078
last_time_warning=0
last_time_unknown=0
last_time_critical=0
plugin_output=PING OK - Packet loss = 0%, RTA = 2.98 ms
performance_data=
last_check=1233491078
next_check=1233491378
current_notification_number=0
last_notification=0
next_notification=0
no_more_notifications=0
notifications_enabled=1
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=1
obsess_over_service=1
last_update=1233491098
is_flapping=0
percent_state_change=0.00
scheduled_downtime_depth=0
}
It can have any number of hosts and a host can have any number of services.

Pfft, get yerself mk_livestatus. http://mathias-kettner.de/checkmk_livestatus.html

Nagiosity does exactly what you want:
http://code.google.com/p/nagiosity/

Having shamelessly stolen from the above examples,
Here's a version build for Python 2.4 that returns a dict containing arrays of nagios sections.
def parseConf(source):
conf = {}
patID=re.compile(r"(?:\s*define)?\s*(\w+)\s+{")
patAttr=re.compile(r"\s*(\w+)(?:=|\s+)(.*)")
patEndID=re.compile(r"\s*}")
for line in source.splitlines():
line=line.strip()
matchID = patID.match(line)
matchAttr = patAttr.match(line)
matchEndID = patEndID.match( line)
if len(line) == 0 or line[0]=='#':
pass
elif matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2).strip()
cur[1][attribute] = value
elif matchEndID and cur:
conf.setdefault(cur[0],[]).append(cur[1])
del cur
return conf
To get all Names your Host which have contactgroups beginning with 'devops':
nagcfg=parseConf(stringcontaingcompleteconfig)
hostlist=[host['host_name'] for host in nagcfg['host']
if host['contact_groups'].startswith('devops')]

Don't know nagios and its config file, but the structure seems pretty simple:
# comment
identifier {
attribute=
attribute=value
}
which can simply be translated to
<identifier>
<attribute name="attribute-name">attribute-value</attribute>
</identifier>
all contained inside a root-level <nagios> tag.
I don't see line breaks in the values. Does nagios have multi-line values?
You need to take care of equal signs within attribute values, so set your regex to non-greedy.

You can do something like this:
def parseConf(filename):
conf = []
with open(filename, 'r') as f:
for i in f.readlines():
if i[0] == '#': continue
matchID = re.search(r"([\w]+) {", i)
matchAttr = re.search(r"[ ]*([\w]+)=([\w\d]*)", i)
matchEndID = re.search(r"[ ]*}", i)
if matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2)
cur[1][attribute] = value
elif matchEndID:
conf.append(cur)
return conf
def conf2xml(filename):
conf = parseConf(filename)
xml = ''
for ID in conf:
xml += '<%s>\n' % ID[0]
for attr in ID[1]:
xml += '\t<attribute name="%s">%s</attribute>\n' % \
(attr, ID[1][attr])
xml += '</%s>\n' % ID[0]
return xml
Then try to do:
print conf2xml('conf.dat')

If you slightly tweak Andrea's solution you can use that code to parse both the status.dat as well as the objects.cache
def parseConf(source):
conf = []
for line in source.splitlines():
line=line.strip()
matchID = re.match(r"(?:\s*define)?\s*(\w+)\s+{", line)
matchAttr = re.match(r"\s*(\w+)(?:=|\s+)(.*)", line)
matchEndID = re.match(r"\s*}", line)
if len(line) == 0 or line[0]=='#':
pass
elif matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2).strip()
cur[1][attribute] = value
elif matchEndID and cur:
conf.append(cur)
del cur
return conf
It is a little puzzling why nagios chose to use two different formats for these files, but once you've parsed them both into some usable python objects you can do quite a bit of magic through the external command file.
If anybody has a solution for getting this into a a real xml dom that'd be awesome.

For the last several months I've written and released a tool that that parses the Nagios status.dat and objects.cache and builds a model that allows for some really useful manipulation of Nagios data. We use it to drive an internal operations dashboard that is a simplified 'mini' Nagios. Its under continual development and I've neglected testing and documentation but the code isn't too crazy and I feel fairly easy to follow.
Let me know what you think...
https://github.com/zebpalmer/NagParser

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to extract the key from the log in python - python

Most likely is that your regular expression catches more that you expect. I would suggest to split it to some more simple parts like: (?<= qry=).(?= quid0) and (?<= rc=).(?= discount)

Related

Command not executed on if statement being true

How to check the windows path matches with partial Linux path string

python: how to delete registry key (and subkeys) from HKLM (getting error 5)

Unpack ValueError in Python

How to parse nagios status.dat file?

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to extract the key from the log in python - python

Most likely is that your regular expression catches more that you expect. I would suggest to split it to some more simple parts like: (?<= qry=).*(?= quid0) and (?<= rc=).*(?= discount)

Related

Command not executed on if statement being true

How to check the windows path matches with partial Linux path string

python: how to delete registry key (and subkeys) from HKLM (getting error 5)

Unpack ValueError in Python

How to parse nagios status.dat file?

Categories

Resources

Most likely is that your regular expression catches more that you expect. I would suggest to split it to some more simple parts like: (?<= qry=).(?= quid0) and (?<= rc=).(?= discount)