Parsing a custom format (curly braces separated) text configuration with Pyparsing - python

I need to parse some load balancer configuration section. It's seemingly simple (at least for a human).
Config consists of several objects with their content in curly braces like so:
ltm rule ssl-header-insert {
when HTTP_REQUEST {
HTTP::header insert "X-SSL-Connection" "yes"
}
}
ltm rule some_redirect {
priority 1
when HTTP_REQUEST {
if { (not [class match [IP::remote_addr] equals addresses_group ]) }
{
HTTP::redirect "http://some.page.example.com"
TCP::close
event disable all
}
}
The contents of each section/object is a TCL code so there will be nested curly braces. What I want to achieve is to parse this in pairs as: object identifier (after ltm rule keywords) and it's contents (tcl code within braces) as it is.
I've looked around some examples and experimented a lot, but it's really giving me a hard time. I did some debugging within pyparsing (which is a bit confusing to me too) and I think that I'm failing to detect closing braces somehow, but can't figure that out.
What I came up with so far:
from pyparsing import *
import json
list_sample = """ltm rule ssl-header-insert {
when HTTP_REQUEST {
HTTP::header insert "X-SSL-Connection" "yes"
}
}
ltm rule some_redirect {
priority 1
when HTTP_REQUEST {
if { (not [class match [IP::remote_addr] equals addresses_group ]) }
{
HTTP::redirect "http://some.page.example.com"
TCP::close
event disable all
}
}
}
ltm rule http_header_replace {
when HTTP_REQUEST {
HTTP::header replace Host some.host.example.com
}
}"""
ParserElement.defaultWhitespaceChars=(" \t")
NL = LineEnd()
END = StringEnd()
LBRACE, RBRACE = map(Suppress, '{}')
ANY_HEADER = Suppress("ltm rule ") + Word(alphas, alphanums + "_-")
END_MARK = Literal("ltm rule")
CONTENT_LINE = (~ANY_HEADER + (NotAny(RBRACE + FollowedBy(END_MARK)) + ~END + restOfLine) | (~ANY_HEADER + NotAny(RBRACE + FollowedBy(END)) + ~END + restOfLine)) | (~RBRACE + ~END + restOfLine)
ANY_HEADER.setName("HEADER").setDebug()
LBRACE.setName("LBRACE").setDebug()
RBRACE.setName("RBRACE").setDebug()
CONTENT_LINE.setName("LINE").setDebug()
template_defn = ZeroOrMore((ANY_HEADER + LBRACE +
Group(ZeroOrMore(CONTENT_LINE)) +
RBRACE))
template_defn.ignore(NL)
results = template_defn.parseString(list_sample).asList()
print("Raw print:")
print(results)
print("----------------------------------------------")
print("JSON pretty dump:")
print json.dumps(results, indent=2)
I see in the debug that some of the matches work but in the end it fails with an empty list as a result.
On a sidenote - my CONTENT_LINE part of the grammar is probably overly complicated in general, but I didn't find any simpler way to cover it so far.
The next thing would be to figure out how to preserve new lines and tabs in content part, since I need that to be unchanged in the output. But looks like I have to use ignore() function - which is skipping new lines - to parse the multiline text in the first place, so that's another challenge.
I'd be grateful for someone to help me find out what the issues are. Or maybe I should take some other approach?

I think nestedExpr('{', '}') will help. That will take care of the nested '{}'s, and wrapping in originalTextFor will preserve newlines and spaces.
import pyparsing as pp
LTM, RULE = map(pp.Keyword, "ltm rule".split())
ident = pp.Word(pp.alphas, pp.alphanums+'-_')
ltm_rule_expr = pp.Group(LTM + RULE
+ ident('name')
+ pp.originalTextFor(pp.nestedExpr('{', '}'))('body'))
Using your sample string (after adding missing trailing '}'):
for rule, _, _ in ltm_rule_expr.scanString(sample):
print(rule[0].name, rule[0].body.splitlines()[0:2])
gives
ssl-header-insert ['{', ' when HTTP_REQUEST {']
some_redirect ['{', ' priority 1']
dump() is also a good way to list out the contents of a returned ParseResults:
for rule, _, _ in ltm_rule_expr.scanString(sample):
print(rule[0].dump())
print()
['ltm', 'rule', 'ssl-header-insert', '{\n when HTTP_REQUEST {\n HTTP::header insert "X-SSL-Connection" "yes"\n}\n}']
- body: '{\n when HTTP_REQUEST {\n HTTP::header insert "X-SSL-Connection" "yes"\n}\n}'
- name: 'ssl-header-insert'
['ltm', 'rule', 'some_redirect', '{\n priority 1\n\nwhen HTTP_REQUEST {\n\n if { (not [class match [IP::remote_addr] equals addresses_group ]) }\n {\n HTTP::redirect "http://some.page.example.com"\n TCP::close\n event disable all\n }\n}}']
- body: '{\n priority 1\n\nwhen HTTP_REQUEST {\n\n if { (not [class match [IP::remote_addr] equals addresses_group ]) }\n {\n HTTP::redirect "http://some.page.example.com"\n TCP::close\n event disable all\n }\n}}'
- name: 'some_redirect'
Note that I broke up 'ltm' and 'rule' into separate keyword expressions. This guards against the case where a developer may have written valid code as ltm rule blah, with > 1 space between "ltm" and "rule". This kind of thing happens all the time, you never know where whitespace will crop up.

Related

Passing value to nested JSON in Python?

I try to pass a variable to nested JSON in Python script.
Script as below,
import requests, request
group = request.form['grp']
zon = request.form['zone']
load = { "extra_vars": {
"g_name": "' +str(group)+ '",
"z_name": "' +str(zon)+ '"
}
}
----
--
-
However when i post the value to the API, it seem i post word '+str(group)+' and '+str(zon)+' instead the actual value that assign under declared variable.
Since i'm very new in Python programming, does passing value to nested JSON is allow in Python?
Try the following:
group = request.form['grp']
zon = request.form['zone']
load = { "extra_vars": {
"g_name": f"{group}",
"z_name": f"{zon}"
}
}
You can pass variables into a string using f-strings and brackets around your variable (note {group}):
>>> group = "my_group"
>>> {"g_name": f"'{group}'"}
{'g_name': "'my_group'"}
Or doing simple string concatenation also, which is what you almost done in your code (but just did not properly close the ' character using "'":
>>> "'" + str(group) + "'"
"'my_group'"
All in all here's your code adapted:
load = { "extra_vars": {
"g_name": f"'{group}'",
"z_name": f"'{zon}'"
}
}

Replace double quotes with single quotes inside a certain textfield in a file. Python

So I have some json files which they have in certain senteces this:
"message": "Merge branch " master " of example-week-18"
And while there are double quotes inside the messages items the json gets destroyed.
So basicaly I want to use the replace() method of string but to replace the double quotes with single quotes inside the double quotes of the messages item. I guess I have to use regular expression + replace(). ?
A desired outcome would be this:
INPUT:
"message": "Merge branch " master " of example-week-18"
"message": "Don"t do it"
OUTPUT:
"message": "Merge branch ' master ' of example-week-18"
"message": "Don't do it"
You're right. You can combine regular expression and the replace method.
Here I use the re module to find all the message content (block after "message:" ). Then I replace the double quotes by simple quotes. Finally, I rebuild the original whole message.
Here the code:
# Import module
import re
# Your text
message = """
"message": "Merge branch " master " of example-week-18"
"message": "Don"t do it"
"""
new_text = ""
# Select all the data after: "message":
list_message = re.findall("\"message\"\s*:\s*?(\".*)", message)
# Replace the " by ' in text message content + rebuild original row
for message in list_message:
new_text += '"message": "' + message[1:-1].replace('"', "'") + '"\n'
print(new_text)
# "message": "Merge branch ' master ' of example-week-18"
# "message": "Don't do it"
Hope that helps !

Python Requests Body Automatically Adds Single Quotes

In my python script, I'm trying to loop through a text file containing domain names, and fill them to my JSON request body. The correct format required for the API call is
payload = {
"threatInfo": {
"threatEntries": [
{"url": "http://malware.wicar.org/"}, {"url": "http://urltocheck2.org"},
]
}
}
The variable I'm using to replicate this is called mystring
domain_list_formatted = []
for item in domain_list:
domain_list_formatted.append("""{"url": """ + '"{}"'.format(item) + "},")
domain_list_formatted_tuple= tuple(domain_list_formatted)
mystring = ' '.join(map(str, (domain_list_formatted_tuple)))
Printing mystring gets me the results I need to pass to the payload variable
{"url": "http://malware.wicar.org/"},
{"url": "http://www.urltocheck2.org/"},
However, I want to loop this, so I add the following loop
for item in domain_list_formatted_tuple:
printcorrectly = ' '.join(map(str, (domain_list_formatted_tuple)))
payload["threatInfo"]["threatEntries"] = [printcorrectly]
And this is the result:
['{"url": "http://malware.wicar.org/"}, {"url": "http://www.urltocheck2.org/"}']
The single quotes on the outside of the bracket completely throw it off. How is the for loop modifying or encoding the payload in a way that's creating this issue? Your help would be greatly appreciated.
Your code:
for item in domain_list_formatted_tuple:
printcorrectly = ' '.join(map(str, (domain_list_formatted_tuple)))
payload["threatInfo"]["threatEntries"] = [printcorrectly]
should probably be:
for item in domain_list_formatted_tuple:
printcorrectly = ' '.join(map(str, (domain_list_formatted_tuple)))
payload["threatInfo"]["threatEntries"] = printcorrectly
without the brackets around printcorrectly
If you have :
a = ['xxx']
print(a)
You will get output ['xxx'] with brackets and quotation marks.

Extracting BIND parameters to build a JSON query

I have a file which was exported from BIND containing TSIG values for about 500 domain names. I need to repurpose the data into JSON for a REST API query. The BIND data is formatted like so:
// secondary-example.com.
key "2000000000000.key." {
algorithm hmac-md5;
secret "ahashedvalue=";
};
zone "secondary-example.com." {
type slave;
file "sec/secondary-example.com.";
allow-transfer { 1.1.1.1;
1.1.2.2;
};
also-notify { 1.1.1.1;
2.2.2.2;
};
masters {
1.2.3.4 key 2000000000000.key.;
};
};
From this I need to extract the key, zone and secret. Here's an example API request.
{
"properties":{
"name":"secondary-example.com.",
"accountName":"example",
"type":"SECONDARY"
},
"secondaryCreateInfo":{
"primaryNameServers":{
"nameServerIpList":{
"nameServerIp1":{
"ip":"1.2.3.4",
"tsigKey":"2000000000000.key.",
"tsigKeyValue":"ahashedvalue="
}
}
}
}
}
I'm having difficulty crafting a regular expression appropriate for the scenario. I'm looking construct the JSON in a python script and send the request through Postman.
I spent a couple days reading up on regex and figured out a solution. So, each of those "zones" began with a comment... e.g. "secondary-example.com"... and each set of BIND info was 17 lines long exactly. This solution is hackey and always assumes data is correct, but it managed to work.
Separate the zones into chunks of text.
zones = []
cur_zone = ''
f = open(bind_file).readlines()
for line in f:
if line[0:2] == '//':
zones.append(cur_zone)
cur_zone = ''
else:
cur_zone = cur_zone + line
zones.pop(0) # Drop the first list item, it's empty
Iterate through those chunks and match the needed parameters.
for z in zones:
z_lines = z.splitlines()
# Regex patterns to make the required parameters
key = re.findall('\"(.*)\"', z_lines[0])[0]
secret = re.findall('\"(.*)\"', z_lines[2])[0]
name = re.findall('\"(.*)\"', z_lines[5])[0]
master = re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', z_lines[15])[0]

Newbie python RESULTS BINDINGS question

I'm using python with SPARQLWrapper and it has worked until now -- I'm not able to add a new SPARQL object to my results.
Here is my working snippet:
else:
for result in results["results"]["bindings"]:
project = result["project"]["value"].encode('utf-8')
filename = result["filename"]["value"].encode('utf-8')
keywords = result["keywords"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
print "<p class=\"results\"><span>Project</span>: %s</p><p class=\"indent\"><span>Filename</span>: %s</p><p class=\"indent\"><span>URL</span>:%s</p><p class=\"indent-bottom\"><span>Keywords</span>: %s</p> " % \
(project,filename,url,url,keywords)
I'm trying to add more results. I've tested the SPARQL query as added to the script, I add the object of the query ("parameter") as a RESULTS and BINDINGS pair, I add the %s to print and add the result name to the parens below the print command (not sure what to call that area). So after doing what I did before to add these results, I get the white screen of death -- the header of the page only is written out and the apache error log gives me a KeyError, project = result["project"]["value"].encode('utf-8').
Here is an example of an added element that breaks the script:
else:
print "<h1>ASDC RDF Search Results</h1>"
print "<p class=\"newsearch\">new search | <a href=\"http://localhost/asdc.html\">About this project</p><div style=\"clear:both;\"</div>"
for result in results["results"]["bindings"]:
project = result["project"]["value"].encode('utf-8')
filename = result["filename"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
keywords = result["keywords"]["value"].encode('utf-8')
parameter = result["parameter"]["value"].encode('utf-8')
print "<p class=\"results\"><span>Project</span>: %s</p><p class=\"indent\"><span>Filename</span>: %s</p><p class=\"indent\"><span>URL</span>:%s</p><p class=\"indent\"><span>Keywords</span>: %s</p><p class=\"indent-bottom\"><span>Parameter</span>: %s</p> " % \
(project,filename,url,url,keywords,parameter)
So two questions: Is the error obvious? Am I screwing up the formatting in the keys somehow when I add the new line? Also, does python write errors to a log or can I enable that? Thanks...
Edit: Here's the query including parameter (it works, tested directly in the Fuseki UI)
PREFIX e1: <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/>
SELECT ?url ?filename ?keywords ?project ?parameter
WHERE {
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/url> ?url.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/filename> ?filename.
OPTIONAL {
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/keywords> ?keywords.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/project> ?project.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/parameter> ?parameter.
}
FILTER (regex(?keywords, "FILTER-STRING", "i") || regex(?url, "FILTER-STRING", "i") || regex(?filename, "FILTER-STRING", "i")) .
}
First query is similar minus the ?parameter. FILTER-STRING comes from my cgi form.
Either your result dict has no key "project", or the result["project"] dict has no key "value".
So insert
print result.keys()
print result["project"]
print result["project"].keys()
print result["project"]["value"]
imediatly after for result in ... and you see what is going wrong.
My issue turned out to be caused by NULL values in my results from the OPTIONAL clause in my query. SPARQLWrapper developers suggested using if-else, as below with example query using OPTIONAL (& it worked for me):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?party
WHERE { ?person <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Asturias>
OPTIONAL { ?person <http://dbpedia.org/property/party> ?party }
for result in results["results"]["bindings"]:
if result.has_key("party"):
print "* " + result["person"]["value"] + " ** " + result["party"]["value"]
else:
print result["person"]["value"]

Categories