I try to obtain results bindings by this Sparql query.
Through this Sparql entry point: http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query
I have the error: "TypeError: query() takes at least 2 arguments (1 given)
Thank you!!!
#app.route('/caricaArgomento/<type>', methods=['GET'])
def getArgomento(type):
#sparql = SPARQLUpdateStore("http://digitale.bncf.firenze.sbn.it/openrdf- workbench/repositories/NS_03_2014/query")
sparql=SPARQLUpdateStore("http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query")
sparql.setQuery("""
PREFIX dc:<http://purl.org/dc/elements/1.1/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX nsogi:<http://prefix.cc/nsogi>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX dcterms:<http://purl.org/dc/terms/>
SELECT ?risultato
WHERE {
?item a skos:Concept .
?item skos:prefLabel ?risultato .
filter regex(?risultato, """+type+""", "i")
} ORDER BY ?risultato
""")
#FILTER regex(str(?aConcept), "http://thes.bncf.firenze.sbn.it/", "i").}
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results = results['results']['bindings']
results = json.dumps(results)
return results
digitale.bncf.firenze.sbn.it
digitale.bncf.firenze.sbn.it
Looking at the relevant documentation sparql.query()... requires a query argument.
sparql=SPARQLUpdateStore("http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query")
sparql.setReturnFormat(JSON)
query = """
PREFIX dc:<http://purl.org/dc/elements/1.1/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX nsogi:<http://prefix.cc/nsogi>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX dcterms:<http://purl.org/dc/terms/>
SELECT ?risultato
WHERE {
?item a skos:Concept .
?item skos:prefLabel ?risultato .
filter regex(?risultato, """+type+""", "i")
} ORDER BY ?risultato
"""
results = sparql.query(query).convert()
I can't work out what the purpose of sparql.setQuery(...) is, but clearly it's not what you want.
Edit:
Having looked at the source setQuery(...) is really for internal use (people writing subclasses, for example), and not for regular api users. It pulls out the query form and records the query text. query(...) calls it internally.
Related
I have a JSON object in S3 which follows this structure:
<code> : {
<client>: <value>
}
For example,
{
"code_abc": {
"client_1": 1,
"client_2": 10
},
"code_def": {
"client_2": 40,
"client_3": 50,
"client_5": 100
},
...
}
I am trying to retrieve the numerical value with an S3 Select query, where the "code" and the "client" are populated dynamically with each query.
So far I have tried:
sql_exp = f"SELECT * from s3object[*][*] s where s.{proc}.{client_name} IS NOT NULL"
sql_exp = f"SELECT * from s3object s where s.{proc}[*].{client_name}[*] IS NOT NULL"
as well as without the asterisk inside the square brackets, but nothing works, I get ClientError: An error occurred (ParseUnexpectedToken) when calling the SelectObjectContent operation: Unexpected token found LITERAL:UNKNOWN at line 1, column X (depending on the length of the query string)
Within the function defining the object, I have:
resp = s3.select_object_content(
Bucket=<bucket>,
Key=<filename>,
ExpressionType="SQL",
Expression=sql_exp,
InputSerialization={'JSON': {"Type": "Document"}},
OutputSerialization={"JSON": {}},
)
Is there something off in the way I define the object serialization? How can I fix the query so I can retrieve the desired numerical value on the fly when I provide ”code” and “client”?
I did some tinkering based on the documentation, and it works!
I need to access the single event in the EventStream (resp) as follows:
event_stream = resp['Payload']
# unpack successful query response
for event in event_stream:
if "Records" in event:
output_str = event["Records"]["Payload"].decode("utf-8") # bytes to string
output_dict = json.loads(output_str) # string to dict
Now the correct SQL expression is:
sql_exp= f"SELECT s['{code}']['{client}'] FROM S3Object s"
where I have gotten (dynamically) my values for code and client beforehand.
For example, based on the dummy JSON structure above, if code = "code_abc" and client = "client_2", I want this S3 Select query to return the value 10.
The f-string resolves to sql_exp = "SELECT s['code_abc']['client_2'] FROM S3Object s", and when we call resp, we retrieve output_dict = {'client_2': 10} (Not sure if there is a clear way to get the value by itself without the client key, this is how it looks like in the documentation as well).
So, the final step is to retrieve value = output_dict['client_2'], which in our case is equal to 10.
In PHPs Slim I can do this:
$app->get('/table/{table}', function (Request $request, Response $response, $args) {
$table = $args['table'];
$mapper = new TableMapper($this, $table);
$res = $mapper->getTable();
return $response->withJson($res);
}
$app->get('/table/{table}/{id}', function (Request $request, Response $response, $args) {
$table = $args['table'];
$id = (int)$args['id'];
$mapper = new TableMapper($this, $table);
$res = $mapper->getTableById($id);
return $response->withJson($res);
}
Now I'm trying with web.py. I can do this:
urls = (
'/table/(.+)', 'table',
)
class table:
def GET( self, table ):
rows = db.select( table )
web.header('Content-Type', 'application/json')
return json.dumps( [dict(row) for row in rows], default=decimal_default )
but if I try to extend this by doing, e.g.:
urls = (
'/table/(.+)', 'table',
'/table/(.+)/(\d+)', 'table_by_id'
)
Processing never arrive at the second of the urls. Instead the code does a db.select on a table name which is "table/id", which of course errors.
How can I develop this to parse a url with id added?
web.py matches in order listed in urls, so switching the order is one way to solve your issue:
urls = (
'/table/(.+)/(\d+)', 'table_by_id',
'/table/(.+)', 'table'
)
Another piece of advice: Tighten up your regex, so you match more closely exactly what you're looking for. You'll find bugs sooner.
For example, you'll note your /table/(.+) will indeed match "/table/foo/1", because the regex .+ also matches /, so you might consider a pattern like ([^/]+) to match "everything" except a slash.
Finally, no need for a leading '^' or trailing '$' in your URLs, web.py always looks to match the full pattern. (Internally, it adds '^' and '$').
Try this one:
urls = (
'^/table/(.+)/$', 'table',
'^/table/(.+)/(\d+)/$', 'table_by_id'
)
I am using named parameters in Bigquery SQL and want to write the results to a permanent table. I have two functions 1 for using named query parameters and 1 for writing query results to table. How do I combine the two to get query results written to table; the query having named parameters.
This is the function using parameterized queries :
def sync_query_named_params(column_name,min_word_count,value):
query = """with lsq_results as
(select "%s" = #min_word_count)
replace (%s AS %s)
from lsq.lsq_results
""" % (min_word_count,value,column_name)
client = bigquery.Client()
query_results = client.run_sync_query(query
,
query_parameters=(
bigquery.ScalarQueryParameter('column_name', 'STRING', column_name),
bigquery.ScalarQueryParameter(
'min_word_count',
'STRING',
min_word_count),
bigquery.ScalarQueryParameter('value','INT64',value)
))
query_results.use_legacy_sql = False
query_results.run()
Function to write to permanent table
class BigQueryClient(object):
def __init__(self, bq_service, project_id, swallow_results=True):
self.bigquery = bq_service
self.project_id = project_id
self.swallow_results = swallow_results
self.cache = {}
def write_to_table(
self,
query,
dataset=None,
table=None,
external_udf_uris=None,
allow_large_results=None,
use_query_cache=None,
priority=None,
create_disposition=None,
write_disposition=None,
use_legacy_sql=None,
maximum_billing_tier=None,
flatten=None):
configuration = {
"query": query,
}
if dataset and table:
configuration['destinationTable'] = {
"projectId": self.project_id,
"tableId": table,
"datasetId": dataset
}
if allow_large_results is not None:
configuration['allowLargeResults'] = allow_large_results
if flatten is not None:
configuration['flattenResults'] = flatten
if maximum_billing_tier is not None:
configuration['maximumBillingTier'] = maximum_billing_tier
if use_query_cache is not None:
configuration['useQueryCache'] = use_query_cache
if use_legacy_sql is not None:
configuration['useLegacySql'] = use_legacy_sql
if priority:
configuration['priority'] = priority
if create_disposition:
configuration['createDisposition'] = create_disposition
if write_disposition:
configuration['writeDisposition'] = write_disposition
if external_udf_uris:
configuration['userDefinedFunctionResources'] = \
[ {'resourceUri': u} for u in external_udf_uris ]
body = {
"configuration": {
'query': configuration
}
}
logger.info("Creating write to table job %s" % body)
job_resource = self._insert_job(body)
self._raise_insert_exception_if_error(job_resource)
return job_resource
How do I combine the 2 functions to write a parameterized query and write the results to a permanent table?Or if there is another simpler way. Please suggest.
You appear to be using two different client libraries.
Your first code sample uses a beta version of the BigQuery client library, but for the time being I would recommend against using it, since it needs substantial revision before it is considered generally available. (And if you do use it, I would recommend using run_async_query() to create a job using all available parameters, and then call results() to get the QueryResults object.)
Your second code sample is creating a job resource directly, which is a lower-level interface. When using this approach, you can specify the configuration.query.queryParameters field on your query configuration directly. This is the approach I'd recommend right now.
I am using rdflib, to query series of rdf files in which I know that the names have a language tag (when I query in a sparql endpoint), this is a snapshot of my data:
but when I parse the rdf file in python and extract the 'name' value I only get "Josephus" without the language tag.
I can get the tag separately with something like:
bind(lang(?name) as ?lan)
but that doesn't. I need to have the name and its tag when I serialize my graph.
Any suggestions what might cause this lose of information and how I can have my names with their language tags?
It's quite a simple query. Here is my full script:
import unicodedata
from rdflib import Namespace, URIRef, Graph , Literal , OWL, RDFS , RDF
from SPARQLWrapper import SPARQLWrapper2, XML , JSON , TURTLE
import os
sparql = SPARQLWrapper2("http://dbpedia.org/sparql")
os.chdir('...\Desktop')
jl = Namespace("http://data.judaicalink.org/ontology/")
foaf = Namespace("http://xmlns.com/foaf/0.1/")
graph = Graph()
spar= ("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
SELECT ?occ ?x ?name ?same
where {
<http://dbpedia.org/resource/Category:Jewish_historians> rdfs:label ?occ.
?x dct:subject <http://dbpedia.org/resource/Category:Jewish_historians>.
?x rdfs:label ?name.
?x owl:sameAs ?same.
}
""")
sparql.setQuery(spar)
sparql.setReturnFormat(TURTLE)
results = sparql.query().convert()
graph.bind('jl', jl)
graph.bind('foaf',foaf)
if (u"x",u"name",u"occ",u"same") in results:
bindings = results[u"x",u"name",u"occ",u"same"]
for b in bindings:
graph.add( (URIRef(b[u"x"].value), RDF.type , foaf.Person ) )
graph.add( (URIRef(b[u"x"].value), jl.hasLabel, Literal(b[u"name"].value) ) )
graph.serialize(destination= 'output.rdf' , format="turtle")
I'm using python with SPARQLWrapper and it has worked until now -- I'm not able to add a new SPARQL object to my results.
Here is my working snippet:
else:
for result in results["results"]["bindings"]:
project = result["project"]["value"].encode('utf-8')
filename = result["filename"]["value"].encode('utf-8')
keywords = result["keywords"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
print "<p class=\"results\"><span>Project</span>: %s</p><p class=\"indent\"><span>Filename</span>: %s</p><p class=\"indent\"><span>URL</span>:%s</p><p class=\"indent-bottom\"><span>Keywords</span>: %s</p> " % \
(project,filename,url,url,keywords)
I'm trying to add more results. I've tested the SPARQL query as added to the script, I add the object of the query ("parameter") as a RESULTS and BINDINGS pair, I add the %s to print and add the result name to the parens below the print command (not sure what to call that area). So after doing what I did before to add these results, I get the white screen of death -- the header of the page only is written out and the apache error log gives me a KeyError, project = result["project"]["value"].encode('utf-8').
Here is an example of an added element that breaks the script:
else:
print "<h1>ASDC RDF Search Results</h1>"
print "<p class=\"newsearch\">new search | <a href=\"http://localhost/asdc.html\">About this project</p><div style=\"clear:both;\"</div>"
for result in results["results"]["bindings"]:
project = result["project"]["value"].encode('utf-8')
filename = result["filename"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
url = result["url"]["value"].encode('utf-8')
keywords = result["keywords"]["value"].encode('utf-8')
parameter = result["parameter"]["value"].encode('utf-8')
print "<p class=\"results\"><span>Project</span>: %s</p><p class=\"indent\"><span>Filename</span>: %s</p><p class=\"indent\"><span>URL</span>:%s</p><p class=\"indent\"><span>Keywords</span>: %s</p><p class=\"indent-bottom\"><span>Parameter</span>: %s</p> " % \
(project,filename,url,url,keywords,parameter)
So two questions: Is the error obvious? Am I screwing up the formatting in the keys somehow when I add the new line? Also, does python write errors to a log or can I enable that? Thanks...
Edit: Here's the query including parameter (it works, tested directly in the Fuseki UI)
PREFIX e1: <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/>
SELECT ?url ?filename ?keywords ?project ?parameter
WHERE {
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/url> ?url.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/filename> ?filename.
OPTIONAL {
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/keywords> ?keywords.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/project> ?project.
?s <http://data.gov/source/work/dataset/gov/vocab/enhancement/1/parameter> ?parameter.
}
FILTER (regex(?keywords, "FILTER-STRING", "i") || regex(?url, "FILTER-STRING", "i") || regex(?filename, "FILTER-STRING", "i")) .
}
First query is similar minus the ?parameter. FILTER-STRING comes from my cgi form.
Either your result dict has no key "project", or the result["project"] dict has no key "value".
So insert
print result.keys()
print result["project"]
print result["project"].keys()
print result["project"]["value"]
imediatly after for result in ... and you see what is going wrong.
My issue turned out to be caused by NULL values in my results from the OPTIONAL clause in my query. SPARQLWrapper developers suggested using if-else, as below with example query using OPTIONAL (& it worked for me):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?party
WHERE { ?person <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Asturias>
OPTIONAL { ?person <http://dbpedia.org/property/party> ?party }
for result in results["results"]["bindings"]:
if result.has_key("party"):
print "* " + result["person"]["value"] + " ** " + result["party"]["value"]
else:
print result["person"]["value"]