Newbie python RESULTS BINDINGS question

Newbie python RESULTS BINDINGS question - python

I'm using python with SPARQLWrapper and it has worked until now -- I'm not able to add a new SPARQL object to my results.
Here is my working snippet:
else:
for result in results["results"]["bindings"]:
project = result["project"]["value"].encode('utf-8')
filename = result["filename"]["value"].encode('utf-8')
keywords = result["keywords"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
print "<p class=\"results\"><span>Project</span>: %s</p><p class=\"indent\"><span>Filename</span>: %s</p><p class=\"indent\"><span>URL</span>:%s</p><p class=\"indent-bottom\"><span>Keywords</span>: %s</p> " % \
(project,filename,url,url,keywords)
I'm trying to add more results. I've tested the SPARQL query as added to the script, I add the object of the query ("parameter") as a RESULTS and BINDINGS pair, I add the %s to print and add the result name to the parens below the print command (not sure what to call that area). So after doing what I did before to add these results, I get the white screen of death -- the header of the page only is written out and the apache error log gives me a KeyError, project = result["project"]["value"].encode('utf-8').
Here is an example of an added element that breaks the script:
else:
print "<h1>ASDC RDF Search Results</h1>"
print "<p class=\"newsearch\">new search | <a href=\"http://localhost/asdc.html\">About this project</p><div style=\"clear:both;\"</div>"
for result in results["results"]["bindings"]:
project = result["project"]["value"].encode('utf-8')
filename = result["filename"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
keywords = result["keywords"]["value"].encode('utf-8')
parameter = result["parameter"]["value"].encode('utf-8')
print "<p class=\"results\"><span>Project</span>: %s</p><p class=\"indent\"><span>Filename</span>: %s</p><p class=\"indent\"><span>URL</span>:%s</p><p class=\"indent\"><span>Keywords</span>: %s</p><p class=\"indent-bottom\"><span>Parameter</span>: %s</p> " % \
(project,filename,url,url,keywords,parameter)
So two questions: Is the error obvious? Am I screwing up the formatting in the keys somehow when I add the new line? Also, does python write errors to a log or can I enable that? Thanks...
Edit: Here's the query including parameter (it works, tested directly in the Fuseki UI)
PREFIX e1: <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/>
SELECT ?url ?filename ?keywords ?project ?parameter
WHERE {
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/url> ?url.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/filename> ?filename.
OPTIONAL {
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/keywords> ?keywords.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/project> ?project.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/parameter> ?parameter.
}
FILTER (regex(?keywords, "FILTER-STRING", "i") || regex(?url, "FILTER-STRING", "i") || regex(?filename, "FILTER-STRING", "i")) .
}
First query is similar minus the ?parameter. FILTER-STRING comes from my cgi form.

Either your result dict has no key "project", or the result["project"] dict has no key "value".
So insert
print result.keys()
print result["project"]
print result["project"].keys()
print result["project"]["value"]
imediatly after for result in ... and you see what is going wrong.

My issue turned out to be caused by NULL values in my results from the OPTIONAL clause in my query. SPARQLWrapper developers suggested using if-else, as below with example query using OPTIONAL (& it worked for me):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?party
WHERE { ?person <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Asturias>
OPTIONAL { ?person <http://dbpedia.org/property/party> ?party }
for result in results["results"]["bindings"]:
if result.has_key("party"):
print "* " + result["person"]["value"] + " ** " + result["party"]["value"]
else:
print result["person"]["value"]

Related

S3 Select Query JSON for nested value when keys are dynamic

I have a JSON object in S3 which follows this structure:
<code> : {
<client>: <value>
}
For example,
{
"code_abc": {
"client_1": 1,
"client_2": 10
},
"code_def": {
"client_2": 40,
"client_3": 50,
"client_5": 100
},
...
}
I am trying to retrieve the numerical value with an S3 Select query, where the "code" and the "client" are populated dynamically with each query.
So far I have tried:
sql_exp = f"SELECT * from s3object[*][*] s where s.{proc}.{client_name} IS NOT NULL"
sql_exp = f"SELECT * from s3object s where s.{proc}[*].{client_name}[*] IS NOT NULL"
as well as without the asterisk inside the square brackets, but nothing works, I get ClientError: An error occurred (ParseUnexpectedToken) when calling the SelectObjectContent operation: Unexpected token found LITERAL:UNKNOWN at line 1, column X (depending on the length of the query string)
Within the function defining the object, I have:
resp = s3.select_object_content(
Bucket=<bucket>,
Key=<filename>,
ExpressionType="SQL",
Expression=sql_exp,
InputSerialization={'JSON': {"Type": "Document"}},
OutputSerialization={"JSON": {}},
)
Is there something off in the way I define the object serialization? How can I fix the query so I can retrieve the desired numerical value on the fly when I provide ”code” and “client”?

I did some tinkering based on the documentation, and it works!
I need to access the single event in the EventStream (resp) as follows:
event_stream = resp['Payload']
# unpack successful query response
for event in event_stream:
if "Records" in event:
output_str = event["Records"]["Payload"].decode("utf-8") # bytes to string
output_dict = json.loads(output_str) # string to dict
Now the correct SQL expression is:
sql_exp= f"SELECT s['{code}']['{client}'] FROM S3Object s"
where I have gotten (dynamically) my values for code and client beforehand.
For example, based on the dummy JSON structure above, if code = "code_abc" and client = "client_2", I want this S3 Select query to return the value 10.
The f-string resolves to sql_exp = "SELECT s['code_abc']['client_2'] FROM S3Object s", and when we call resp, we retrieve output_dict = {'client_2': 10} (Not sure if there is a clear way to get the value by itself without the client key, this is how it looks like in the documentation as well).
So, the final step is to retrieve value = output_dict['client_2'], which in our case is equal to 10.

Passing value to nested JSON in Python?

I try to pass a variable to nested JSON in Python script.
Script as below,
import requests, request
group = request.form['grp']
zon = request.form['zone']
load = { "extra_vars": {
"g_name": "' +str(group)+ '",
"z_name": "' +str(zon)+ '"
}
}
----
--
-
However when i post the value to the API, it seem i post word '+str(group)+' and '+str(zon)+' instead the actual value that assign under declared variable.
Since i'm very new in Python programming, does passing value to nested JSON is allow in Python?

Try the following:
group = request.form['grp']
zon = request.form['zone']
load = { "extra_vars": {
"g_name": f"{group}",
"z_name": f"{zon}"
}
}

You can pass variables into a string using f-strings and brackets around your variable (note {group}):
>>> group = "my_group"
>>> {"g_name": f"'{group}'"}
{'g_name': "'my_group'"}
Or doing simple string concatenation also, which is what you almost done in your code (but just did not properly close the ' character using "'":
>>> "'" + str(group) + "'"
"'my_group'"
All in all here's your code adapted:
load = { "extra_vars": {
"g_name": f"'{group}'",
"z_name": f"'{zon}'"
}
}

Parsing a custom format (curly braces separated) text configuration with Pyparsing

I need to parse some load balancer configuration section. It's seemingly simple (at least for a human).
Config consists of several objects with their content in curly braces like so:
ltm rule ssl-header-insert {
when HTTP_REQUEST {
HTTP::header insert "X-SSL-Connection" "yes"
}
}
ltm rule some_redirect {
priority 1
when HTTP_REQUEST {
if { (not [class match [IP::remote_addr] equals addresses_group ]) }
{
HTTP::redirect "http://some.page.example.com"
TCP::close
event disable all
}
}
The contents of each section/object is a TCL code so there will be nested curly braces. What I want to achieve is to parse this in pairs as: object identifier (after ltm rule keywords) and it's contents (tcl code within braces) as it is.
I've looked around some examples and experimented a lot, but it's really giving me a hard time. I did some debugging within pyparsing (which is a bit confusing to me too) and I think that I'm failing to detect closing braces somehow, but can't figure that out.
What I came up with so far:
from pyparsing import *
import json
list_sample = """ltm rule ssl-header-insert {
when HTTP_REQUEST {
HTTP::header insert "X-SSL-Connection" "yes"
}
}
ltm rule some_redirect {
priority 1
when HTTP_REQUEST {
if { (not [class match [IP::remote_addr] equals addresses_group ]) }
{
HTTP::redirect "http://some.page.example.com"
TCP::close
event disable all
}
}
}
ltm rule http_header_replace {
when HTTP_REQUEST {
HTTP::header replace Host some.host.example.com
}
}"""
ParserElement.defaultWhitespaceChars=(" \t")
NL = LineEnd()
END = StringEnd()
LBRACE, RBRACE = map(Suppress, '{}')
ANY_HEADER = Suppress("ltm rule ") + Word(alphas, alphanums + "_-")
END_MARK = Literal("ltm rule")
CONTENT_LINE = (~ANY_HEADER + (NotAny(RBRACE + FollowedBy(END_MARK)) + ~END + restOfLine) | (~ANY_HEADER + NotAny(RBRACE + FollowedBy(END)) + ~END + restOfLine)) | (~RBRACE + ~END + restOfLine)
ANY_HEADER.setName("HEADER").setDebug()
LBRACE.setName("LBRACE").setDebug()
RBRACE.setName("RBRACE").setDebug()
CONTENT_LINE.setName("LINE").setDebug()
template_defn = ZeroOrMore((ANY_HEADER + LBRACE +
Group(ZeroOrMore(CONTENT_LINE)) +
RBRACE))
template_defn.ignore(NL)
results = template_defn.parseString(list_sample).asList()
print("Raw print:")
print(results)
print("----------------------------------------------")
print("JSON pretty dump:")
print json.dumps(results, indent=2)
I see in the debug that some of the matches work but in the end it fails with an empty list as a result.
On a sidenote - my CONTENT_LINE part of the grammar is probably overly complicated in general, but I didn't find any simpler way to cover it so far.
The next thing would be to figure out how to preserve new lines and tabs in content part, since I need that to be unchanged in the output. But looks like I have to use ignore() function - which is skipping new lines - to parse the multiline text in the first place, so that's another challenge.
I'd be grateful for someone to help me find out what the issues are. Or maybe I should take some other approach?

I think nestedExpr('{', '}') will help. That will take care of the nested '{}'s, and wrapping in originalTextFor will preserve newlines and spaces.
import pyparsing as pp
LTM, RULE = map(pp.Keyword, "ltm rule".split())
ident = pp.Word(pp.alphas, pp.alphanums+'-_')
ltm_rule_expr = pp.Group(LTM + RULE
+ ident('name')
+ pp.originalTextFor(pp.nestedExpr('{', '}'))('body'))
Using your sample string (after adding missing trailing '}'):
for rule, _, _ in ltm_rule_expr.scanString(sample):
print(rule[0].name, rule[0].body.splitlines()[0:2])
gives
ssl-header-insert ['{', ' when HTTP_REQUEST {']
some_redirect ['{', ' priority 1']
dump() is also a good way to list out the contents of a returned ParseResults:
for rule, _, _ in ltm_rule_expr.scanString(sample):
print(rule[0].dump())
print()
['ltm', 'rule', 'ssl-header-insert', '{\n when HTTP_REQUEST {\n HTTP::header insert "X-SSL-Connection" "yes"\n}\n}']
- body: '{\n when HTTP_REQUEST {\n HTTP::header insert "X-SSL-Connection" "yes"\n}\n}'
- name: 'ssl-header-insert'
['ltm', 'rule', 'some_redirect', '{\n priority 1\n\nwhen HTTP_REQUEST {\n\n if { (not [class match [IP::remote_addr] equals addresses_group ]) }\n {\n HTTP::redirect "http://some.page.example.com"\n TCP::close\n event disable all\n }\n}}']
- body: '{\n priority 1\n\nwhen HTTP_REQUEST {\n\n if { (not [class match [IP::remote_addr] equals addresses_group ]) }\n {\n HTTP::redirect "http://some.page.example.com"\n TCP::close\n event disable all\n }\n}}'
- name: 'some_redirect'
Note that I broke up 'ltm' and 'rule' into separate keyword expressions. This guards against the case where a developer may have written valid code as ltm rule blah, with > 1 space between "ltm" and "rule". This kind of thing happens all the time, you never know where whitespace will crop up.

results = results['results']['bindings'] Flask error

I try to obtain results bindings by this Sparql query.
Through this Sparql entry point: http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query
I have the error: "TypeError: query() takes at least 2 arguments (1 given)
Thank you!!!
#app.route('/caricaArgomento/<type>', methods=['GET'])
def getArgomento(type):
#sparql = SPARQLUpdateStore("http://digitale.bncf.firenze.sbn.it/openrdf- workbench/repositories/NS_03_2014/query")
sparql=SPARQLUpdateStore("http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query")
sparql.setQuery("""
PREFIX dc:<http://purl.org/dc/elements/1.1/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX nsogi:<http://prefix.cc/nsogi>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX dcterms:<http://purl.org/dc/terms/>
SELECT ?risultato
WHERE {
?item a skos:Concept .
?item skos:prefLabel ?risultato .
filter regex(?risultato, """+type+""", "i")
} ORDER BY ?risultato
""")
#FILTER regex(str(?aConcept), "http://thes.bncf.firenze.sbn.it/", "i").}
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results = results['results']['bindings']
results = json.dumps(results)
return results
digitale.bncf.firenze.sbn.it
digitale.bncf.firenze.sbn.it

Looking at the relevant documentation sparql.query()... requires a query argument.
sparql=SPARQLUpdateStore("http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query")
sparql.setReturnFormat(JSON)
query = """
PREFIX dc:<http://purl.org/dc/elements/1.1/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX nsogi:<http://prefix.cc/nsogi>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX dcterms:<http://purl.org/dc/terms/>
SELECT ?risultato
WHERE {
?item a skos:Concept .
?item skos:prefLabel ?risultato .
filter regex(?risultato, """+type+""", "i")
} ORDER BY ?risultato
"""
results = sparql.query(query).convert()
I can't work out what the purpose of sparql.setQuery(...) is, but clearly it's not what you want.
Edit:
Having looked at the source setQuery(...) is really for internal use (people writing subclasses, for example), and not for regular api users. It pulls out the query form and records the query text. query(...) calls it internally.

ElasticSearch and Python - Correct methodolgy

I am building a search engine for the list of articles I have. I was advised by a lot of people to use elastic search for full text search. I wrote the following code. It works. But I have a few issues.
1) If the same article is added twice - that is indexdoc is run twice for the same article, it accepts it and adds the article twice. Is there a way to have a "unique key" in the search index.
2) How can I change the scoring / ranking function? I want to give more importance to title?
3) Is this the correct way to do it anyways?
4) How do I show related results - if there is a spelling mistake?
from elasticsearch import Elasticsearch
from crsq.models import ArticleInfo
es = Elasticsearch()
def indexdoc(articledict):
doc = {
'text': articledict['articlecontent'],
'title' : articledict['articletitle'],
'url': articledict['url']
}
res = es.index(index="article-index", doc_type='article', body=doc)
def searchdoc(keywordstr):
res = es.search(index="article-index", body={"query": {"query_string": {"query": keywordstr}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print("%(url)s: %(text)s" % hit["_source"])
def indexurl(url):
articledict = ArticleInfo.objects.filter(url=url).values()
if len(articledict):
indexdoc(articledict)
return

1) You have to specify an id for you document. You have to add the parameter id when you are indexing
res = es.index(index="article-index", doc_type='article', body=doc, id="some_unique_id")
2) There is more than one way to do this, but for example you can boost title by changing a bit your query:
{"query": {"query_string": {"query": keywordstr, "fields" : ["text", "title^2"]}}
With this change title will have the double of importance that field text
3) As a proof of concept is not bad.
4) This is a big topic, I think you should check the documentation of suggesters

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Newbie python RESULTS BINDINGS question - python

Either your result dict has no key "project", or the result["project"] dict has no key "value". So insert print result.keys() print result["project"] print result["project"].keys() print result["project"]["value"] imediatly after for result in ... and you see what is going wrong.

Related

S3 Select Query JSON for nested value when keys are dynamic

Passing value to nested JSON in Python?

Parsing a custom format (curly braces separated) text configuration with Pyparsing

results = results['results']['bindings'] Flask error

ElasticSearch and Python - Correct methodolgy

Categories

Resources