Python parse configuration tokens from log4j in order to parse logs

Python parse configuration tokens from log4j in order to parse logs - python

I want to parse a log4j configuration in order to know how to parse a given log.
Requirements: python 2.6+, no custom c modules (unless absolutely required).
For example:
%d{yyyy-MM-dd HH:mm:ss.SSS} %-5p{length=5} [%t] %c:%L %message%n
or
%d{ISO8601} %-5p{length=5} ((%t) %c:%L) %message%n
As a reference, the pattern layout is described here:
Pattern Layouts for log4j
Initially, I was going to customize it for each log pattern, as an example using re:
log1 = re.compile(r'([\d-]{10}) ([\d:.]{12}) {1}([A-Z]{0,}) \[(catalina-exec-[0-9]{2})\]{0,} (.*)\n')
Note: I realize that this is not a very comprehensive use of re, nor is it an optimized regular expression. It was testing only.
I initially started using parsimonious like so (very early stage):
from parsimonious.grammar import Grammar
grammar = Grammar(
r"""
category = "%c"
category_precise = category optional_open number optional_close
timedate = '%d'
timedate_absolute = timedate optional_open timedate_abstext optional_close
timedate_iso = timedate optional_open timedate_isotext optional_close
timedate_date = timedate optional_open timedate_date optional_close
timedate_era = "G"
timedate_year_two_digit = ~"y{2}"
timedate_year_number = ~"(?:y{1}|y{3,}"
timedate_month = "MM"
timedate_minute = "mm"
"""
Effectively, I am wondering if I am going about it the wrong way? It almost seems like I am using a PEG parser in the wrong way, in fact the more I look at it, I think I am.
I don't need full code, just a good concept, a start, an idea, or a good place to start reading.
In the end, I want to be able to review a log format, and for lack of better words "convert the log4j2 pattern into a regular expression"
Any help would be appreciated

I would suggest Plex 2.0. I have found it easy to write the code that would identify tokens such as ISO8601, %d, %t, etc, from the configuration file. Then, as you will discern from the documentation, I expect that you will be able to write regex code to be returned by Plex that parses the log file itself.

Related

Python - any property file or data format that is mostly free-form?

I'm about to roll my own property file parser. I've got a somewhat odd requirement where I need to be able to store metadata in an existing field of a GUI. The data needs to be easily parse-able and human readable, preferably with some flexibility in defining the data (no yaml for example).
I was thinking I could do something like this:
this is random text that is truly a description
.metadata.
owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla
I was thinking I could use something like '.metadata.' to denote that anything below that line is metadata to be parsed. Then, I would treat the properties almost like java properties where I would read each line in and build a map (or object) to hold the metadata, which would then be outputted and searchable via a simple web app.
My real question before I roll this on my own, is can anyone suggest a better method for solving this problem? A specific data format or library that would fit this use case? I would normally use something like yaml or the like, but there's no good way for me to validate that the data is indeed in yaml format when it is saved.

You have 3 problems:
How to fit two different things into one box.
If you are mixing free form text and something that is more tightly defined, you are always going to end up with stuff that you can't parse. Then you will have a never ending battle of trying to deal with the rubbish that gets put in. Is there really no other way?
How to define a simple format for metadata that is robust enough for simple use.
This is a hard problem - all attempts to do so seem to expand until they become quite complicated (e.g. YAML). You will probably have custom requirements for your domain, so what you've proposed may be best.
How to parse that format.
For this I would recommend parsy.
It would be quite simple to split the text on .metadata. and then parse what remains.
Here is an example using parsy:
from parsy import *
attribute = letter.at_least(1).concat()
name = attribute.sep_by(string("."))
value = regex(r"[^\n]+")
definition = seq(name << string(":") << string(" ").many(), value)
metadata = definition.sep_by(string("\n"))
Example usage:
>>> metadata.parse_partial("""owner.first: rick
owner.second: bob
property: blue
pets.mammals.dog: rufus
pets.mammals.cat: ludmilla""")
([[['owner', 'first'], 'rick'],
[['owner', 'second'], 'bob'],
[['property'], 'blue'],
[['pets', 'mammals', 'dog'], 'rufus'],
[['pets', 'mammals', 'cat'], 'ludmilla']],
'')

YAML is a simple and nice solution. There is a YAML library in Python:
import yaml
output = {'a':1,'b':{'c':output = {'a':1,'b':{'c':[2,3,4]}}}}
print yaml.dump(output,default_flow_style=False)
Giving as a result:
a: 1
b:
c:
- 2
- 3
- 4
You can also parse from string and so. Just explore it and check if it fits your requeriments.
Good luck!

Snippets vs. Abbreviations in Vim

What advantages and/or disadvantages are there to using a "snippets" plugin, e.g. snipmate, ultisnips, for VIM as opposed to simply using the builtin "abbreviations" functionality?
Are there specific use-cases where declaring iabbr, cabbr, etc. lack some major features that the snippets plugins provide? I've been unsuccessful in finding a thorough comparison between these two "features" and their respective implementations.
As #peter-rincker pointed out in a comment:
It should be noted that abbreviations can execute code as well. Often via <c-r>= or via an expression abbreviation (<expr>). Example which expands ## to the current file's path: :iabbrev ## <c-r>=expand('%:p')<cr>
As an example for python, let's compare a snipmate snippet and an abbrev in Vim for inserting lines for class declaration.
Snipmate
# New Class
snippet cl
class ${1:ClassName}(${2:object}):
"""${3:docstring for $1}"""
def __init__(self, ${4:arg}):
${5:super($1, self).__init__()}
self.$4 = $4
${6}
Vimscript
au FileType python :iabbr cl class ClassName(object):<CR><Tab>"""docstring for ClassName"""<CR>def __init__(self, arg):<CR><Tab>super(ClassName, self).__init__()<CR>self.arg = arg
Am I missing some fundamental functionality of "snippets" or am I correct in assuming they are overkill for the most part, when Vim's abbr and :help template templates are able to do all most of the stuff snippets do?
I assume it's easier to implement snippets, and they provide additional aesthetic/visual features. For instance, if I use abbr in Vim and other plugins for running/testing python code inside vim--e.g. syntastic, pytest, ropevim, pep8, etc--am I missing out on some key features that snippets provide?

Everything that can be done with snippets can be done with abbreviations and vice-versa. You can have (mirrored or not) placeholders with abbreviations, you can have context-sensitive snippets.
There are two important differences:
Abbreviations are triggered when the abbreviation text has been typed, and a non word character (or esc) is hit. Snippets are triggered on demand, and shortcuts are possible (no need to type while + tab. w + tab may be enough).
It's much more easier to define new snippets (or to maintain old ones) than to define abbreviations. With abbreviations, a lot of boiler plate code is required when we want to do neat things.
There are a few other differences. For instance, abbreviations are always triggered everywhere. And seeing for expanded into for(placeholder) {\n} within a comment or a string context is certainly not what the end-user expects. With snippets, this is not a problem any more: we can expect the end-user to know what's he's doing when he asks to expand a snippet. Still, we can propose context-aware snippets that expand throw into #throw {domain::exception} {explanation} within a comment, or into throw domain::exception({message}); elsewhere.

Snippets
Rough superset of Vim's native abbreviations. Here are the highlights:
Only trigger on key press
Uses placeholders which a user can jump between
Exist only for insert mode
Dynamic expansions
Abbreviations
Great for common typos and small snippets.
Native to Vim so no need for plugins
Typically expand on whitespace or <c-]>
Some special rules on trigger text (See :h abbreviations)
Can be used in command mode via :cabbrev (often used to create command aliases)
No placeholders
Dynamic expansions
Conclusion
For the most part snippets are more powerful and provide many features that other editors enjoy, but you can use both and many people do. Abbreviations enjoy the benefit of being native which can be useful for remote environments. Abbreviations also enjoy another clear advantage which is can be used in command mode.

Snippets are more powerful.
Depending on the implementation, snippets can let you change (or accept defaults for) multiple placeholders and can even execute code when the snippet is expanded.
For example with ultisnips, you can have it execute shell commands, vimscript but also Python code.
An (ultisnips) example:
snippet hdr "General file header" b
# file: `!v expand('%:t')`
# vim:fileencoding=utf-8:ft=`!v &filetype`
# ${1}
#
# Author: ${2:J. Doe} ${3:<jdoe#gmail.com>}
# Created: `!v strftime("%F %T %z")`
# Last modified: `!v strftime("%F %T %z")`
endsnippet
This presents you with three placeholders to fill in (it gives default values for two of them), and sets the filename, filetype and current date and time.
After the word "snippet", the start line contains three items;
the trigger string,
a description and
options for the snippet.
Personally I mostly use the b option where the snippet is expanded at the beginning of a line and the w option that expands the snippet if the trigger string starts at the beginning of a word.
Note that you have to type the trigger string and then input a key or key combination that actually triggers the expansion. So a snippet is not expanded unless you want it to.
Additionally, snippets can be specialized by filetype. Suppose you want to define four levels of headings, h1 .. h4. You can have the same name expand differently between e.g. an HTML, markdown, LaTeX or restructuredtext file.

snippets are like the built-in :abbreviate on steroids, usually with:
parameter insertions: You can insert (type or select) text fragments in various places inside the snippet. An abbreviation just expands once.
mirroring: Parameters may be repeated (maybe even in transformed fashion) elsewhere in the snippet, usually updated as you type.
multiple stops inside: You can jump from one point to another within the snippet, sometimes even recursively expand snippets within one.
There are three things to evaluate in a snippet plugin: First, the features of the snippet engine itself, second, the quality and breadth of snippets provided by the author or others; third, how easy it is to add new snippets.

How to convert jenkins job configuration config.xml to YAML format in python to be used jenkins-job-builder?

jenkins-job-builder is a nice tool to help me to maintain jobs in YAML files. see example in configuration chapter.
Now I had lots of old jenkins jobs, it will be nice to have a python script xml2yaml to convert the existing jenkins job config.xml to YAML file format.
Do you any suggestions to had a quick solution in python ?
I don't need it to be used in jenkins-job-builder directly, just can be converted it into YAML for reference.
For the convert, some part can be ignored like namespace.
config.xml segment looks like:
<project>
<logRotator class="hudson.tasks.LogRotator">
<daysToKeep>-1</daysToKeep>
<numToKeep>20</numToKeep>
<artifactDaysToKeep>-1</artifactDaysToKeep>
<artifactNumToKeep>-1</artifactNumToKeep>
</logRotator>
...
</project>
The yaml output could be:
- project:
logrotate:
daysToKeep: -1
numToKeep: 20
artifactDaysToKeep: -1
artifactNumToKeep: -1
If you are not familiar with config.xml in jenkins, you can check infra_backend-merge-all-repo job in https://ci.jenkins-ci.org

I'm writing a program that does this conversion from XML to YAML. It can dynamically query a Jenkins server and translate all the jobs to YAML.
https://github.com/ktdreyer/jenkins-job-wrecker
Right now it works for very simple jobs. I've taken a safe/pessimistic approach and the program will bail if it encounters XML that it cannot yet translate.

It's hard to tell from your question exactly what you're looking for here, but assuming you're looking for the basic structure:
Python has good support on most platforms for XML Parsing. Chances are you'll want to use something simple and easy to use like minidom. See the XML Processing Modules in the python docs for your version of python.
Once you've opened the file, looking for project and then parsing down from there and using a simple mapping should work pretty well given the simplicity of the yaml format.
from xml.dom.minidom import parse
def getText(nodelist):
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)
def getTextForTag(nodelist,tag):
elements = nodelist.getElementsByTagName(tag)
if (elements.length>0):
return getText( elements[0].childNodes)
return ''
def printValueForTag(parent, indent, tag, valueName=''):
value = getTextForTag( parent,tag)
if (len(value)>0):
if (valueName==''):
valueName = tag
print indent + valueName+": "+value
def emitLogRotate(indent, rotator):
print indent+"logrotate:"
indent+=' '
printValueForTag( rotator,indent, 'daysToKeep')
printValueForTag( rotator,indent, 'numToKeep')
def emitProject(project):
print "- project:"
# all projects have log rotators, so no need to chec
emitLogRotate(" ",project.getElementsByTagName('logRotator')[0])
# next section...
dom = parse('config.xml')
emitProject(dom)
This snippet will print just a few lines of the eventual configuration file, but it puts you in the right direction for a simple translator. Based on what I've seen, there's not much room for an automatic translation scheme due to naming differences. You could streamline the code as you iterate for more options and to be table driven, but that's "just a matter of programming", this will at least get you started with the DOM parsers in python.

I suggest querying and accessing the xml with xpath expressions using xmlstarlet on the command line and in shell scripts. No trouble with low-level programmatical access to XML. XMLStarlet is an XPath swiss-army knife on the command line.
"xmlstarlet el" shows you the element structure of the entire XML as XPath expressions.
"xmlstarlet sel -t -c XPath-expression" will extract exactly what you want.
Maybe you want to spend an hour (or two) on freshing up your XPath know-how in advance.
You will shed a couple of tears, once you recognize how much time you spent with programming XML access before you used XMLStarlet.

How to transform hyperlink codes into normal URL strings?

I'm trying to build a blog system. So I need to do things like transforming '\n' into < br /> and transform http://example.com into < a href='http://example.com'>http://example.com< /a>
The former thing is easy - just using string replace() method
The latter thing is more difficult, but I found solution here: Find Hyperlinks in Text using Python (twitter related)
But now I need to implement "Edit Article" function, so I have to do the reverse action on this.
So, how can I transform < a href='http://example.com'>http://example.com< /a> into http://example.com?
Thanks! And I'm sorry for my poor English.

Sounds like the wrong approach. Making round-trips work correctly is always challenging. Instead, store the source text only, and only format it as HTML when you need to display it. That way, alternate output formats / views (RSS, summaries, etc) are easier to create, too.
Separately, we wonder whether this particular wheel needs to be reinvented again ...

Since you are using the answer from that other question your links will always be in the same format. So it should be pretty easy using regex. I don't know python, but going by the answer from the last question:
import re
myString = 'This is my tweet check it out http://tinyurl.com/blah'
r = re.compile(r'(http://[^ ]+)')
print r.sub(r'\1', myString)
Should work.

Writing a compiler for a DSL in python

I am writing a game in python and have decided to create a DSL for the map data files. I know I could write my own parser with regex, but I am wondering if there are existing python tools which can do this more easily, like re2c which is used in the PHP engine.
Some extra info:
Yes, I do need a DSL, and even if I didn't I still want the experience of building and using one in a project.
The DSL contains only data (declarative?), it doesn't get "executed". Most lines look like:
SOMETHING: !abc #123 #xyz/123
I just need to read the tree of data.

I've always been impressed by pyparsing. The author, Paul McGuire, is active on the python list/comp.lang.python and has always been very helpful with any queries concerning it.

Here's an approach that works really well.
abc= ONETHING( ... )
xyz= ANOTHERTHING( ... )
pqr= SOMETHING( this=abc, that=123, more=(xyz,123) )
Declarative. Easy-to-parse.
And...
It's actually Python. A few class declarations and the work is done. The DSL is actually class declarations.
What's important is that a DSL merely creates objects. When you define a DSL, first you have to start with an object model. Later, you put some syntax around that object model. You don't start with syntax, you start with the model.

Yes, there are many -- too many -- parsing tools, but none in the standard library.
From what what I saw PLY and SPARK are popular. PLY is like yacc, but you do everything in Python because you write your grammar in docstrings.
Personally, I like the concept of parser combinators (taken from functional programming), and I quite like pyparsing: you write your grammar and actions directly in python and it is easy to start with. I ended up producing my own tree node types with actions though, instead of using their default ParserElement type.
Otherwise, you can also use existing declarative language like YAML.

I have written something like this in work to read in SNMP notification definitions and automatically generate Java classes and SNMP MIB files from this. Using this little DSL, I could write 20 lines of my specification and it would generate roughly 80 lines of Java code and a 100 line MIB file.
To implement this, I actually just used straight Python string handling (split(), slicing etc) to parse the file. I find Pythons string capabilities to be adequate for most of my (simple) parsing needs.
Besides the libraries mentioned by others, if I were writing something more complex and needed proper parsing capabilities, I would probably use ANTLR, which supports Python (and other languages).

For "small languages" as the one you are describing, I use a simple split, shlex (mind that the # defines a comment) or regular expressions.
>>> line = 'SOMETHING: !abc #123 #xyz/123'
>>> line.split()
['SOMETHING:', '!abc', '#123', '#xyz/123']
>>> import shlex
>>> list(shlex.shlex(line))
['SOMETHING', ':', '!', 'abc', '#', '123']
The following is an example, as I do not know exactly what you are looking for.
>>> import re
>>> result = re.match(r'([A-Z]*): !([a-z]*) #([0-9]*) #([a-z0-9/]*)', line)
>>> result.groups()
('SOMETHING', 'abc', '123', 'xyz/123')

DSLs are a good thing, so you don't need to defend yourself :-)
However, have you considered an internal DSL ? These have so many pros versus external (parsed) DSLs that they're at least worth consideration. Mixing a DSL with the power of the native language really solves lots of the problems for you, and Python is not really bad at internal DSLs, with the with statement handy.

On the lines of declarative python, I wrote a helper module called 'bpyml' which lets you declare data in python in a more XML structured way without the verbose tags, it can be converted to/from XML too, but is valid python.
https://svn.blender.org/svnroot/bf-blender/trunk/blender/release/scripts/modules/bpyml.py
Example Use
http://wiki.blender.org/index.php/User:Ideasman42#Declarative_UI_In_Blender

Here is a simpler approach to solve it
What if I can extend python syntax with new operators to introduce new functionally to the language? For example, a new operator <=> for swapping the value of two variables.
How can I implement such behavior? Here comes AST module.
The last module is a handy tool for handling abstract syntax trees. What’s cool about this module is it allows me to write python code that generates a tree and then compiles it to python code.
Let’s say we want to compile a superset language (or python-like language) to python:
from :
a <=> b
to:
a , b = b , a
I need to convert my 'python like' source code into a list of tokens.
So I need a tokenizer, a lexical scanner for Python source code. Tokenize module
I may use the same meta-language to define both the grammar of new 'python-like' language and then build the structure of the abstract syntax tree AST
Why use AST?
AST is a much safer choice when evaluating untrusted code
manipulate the tree before executing the code Working on the Tree
from tokenize import untokenize, tokenize, NUMBER, STRING, NAME, OP, COMMA
import io
import ast
s = b"a <=> b\n" # i may read it from file
b = io.BytesIO(s)
g = tokenize(b.readline)
result = []
for token_num, token_val, _, _, _ in g:
# naive simple approach to compile a<=>b to a,b = b,a
if token_num == OP and token_val == '<=' and next(g).string == '>':
first = result.pop()
next_token = next(g)
second = (NAME, next_token.string)
result.extend([
first,
(COMMA, ','),
second,
(OP, '='),
second,
(COMMA, ','),
first,
])
else:
result.append((token_num, token_val))
src = untokenize(result).decode('utf-8')
exp = ast.parse(src)
code = compile(exp, filename='', mode='exec')
def my_swap(a, b):
global code
env = {
"a": a,
"b": b
}
exec(code, env)
return env['a'], env['b']
print(my_swap(1,10))
Other modules using AST, whose source code may be a useful reference:
textX-LS: A DSL used to describe a collection of shapes and draw it for us.
pony orm: You can write database queries using Python generators and lambdas with translate to SQL query sting—pony orm use AST under the hood
osso: Role Based Access Control a framework handle permissions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python parse configuration tokens from log4j in order to parse logs - python

Related

Python - any property file or data format that is mostly free-form?

Snippets vs. Abbreviations in Vim

How to convert jenkins job configuration config.xml to YAML format in python to be used jenkins-job-builder?

How to transform hyperlink codes into normal URL strings?

Writing a compiler for a DSL in python

Categories

Resources