SPARQLWrapper : problem in querying an ontology in a local file - python

I'm working with SPARQLWrapper and I'm following the documentation. Here is my code:
queryString = "SELECT * WHERE { ?s ?p ?o. }"
sparql = SPARQLWrapper("http://example.org/sparql")# I replaced this line with
sparql = SPARQLWrapper("file:///thelocation of my file in my computer")
sparql.setQuery(queryString)
try :
ret = sparql.query()
# ret is a stream with the results in XML, see <http://www.w3.org/TR/rdf-sparql-XMLres/>
except :
deal_with_the_exception()
I'm getting these 2 errors:
1- The system cannot find the path specified
2- NameError: name 'deal_with_the_exception' is not defined

You need a SPARQL endpoint to make it work. Consider setting up Apache Fuseki in your local computer. See https://jena.apache.org/documentation/fuseki2/jena

Related

Push turtle file in bytes form to stardog database using pystardog

def add_graph(file, file_name):
file.seek(0)
file_content = file.read()
if 'snomed' in file_name:
conn.add(stardog.content.Raw(file_content,content_type='bytes', content_encoding='utf-
8'), graph_uri='sct:900000000000207008')
Here I'm facing issues in push the file which I have downloaded from S3 bucket and is in bytes form. It is throwing stardog.Exception 500 on pushing this data to stardog database.
I tried pushing the bytes directly as shown below but that also didn't help.
conn.add(content.File(file),
graph_uri='<http://www.example.com/ontology#concept>')
Can someone help me to push the turtle file which is in bytes form to push in stardog database using pystardog library of Python.
I believe this is what you are looking for:
import stardog
conn_details = {
'endpoint': 'http://localhost:5820',
'username': 'admin',
'password': 'admin'
}
conn = stardog.Connection('myDb', **conn_details) # assuming you have this since you already have 'conn', just sending it to a DB named 'myDb'
file = open('snomed.ttl', 'rb') # just opening a file as a binary object to mimic
file_name = 'snomed.ttl' # adding this to keep your function as it is
def add_graph(file, file_name):
file.seek(0)
file_content = file.read() # this will be of type bytes
if 'snomed' in file_name:
conn.begin() # added this to begin a connection, but I do not think it is required
conn.add(stardog.content.Raw(file_content, content_type='text/turtle'), graph_uri='sct:900000000000207008')
conn.commit() # added this to commit the added data
add_graph(file, file_name) # I just ran this directly in the Python file for the example.
Take note of the conn.add line where I used text/turtle as the content-type. I added some more context so it can be a running example.
Here is the sample file as well snomed.ttl:
<http://api.stardog.com/id=1> a :person ;
<http://api.stardog.com#first_name> "John" ;
<http://api.stardog.com#id> "1" ;
<http://api.stardog.com#dob> "1995-01-05" ;
<http://api.stardog.com#email> "john.doe#example.com" ;
<http://api.stardog.com#last_name> "Doe" .
EDIT - Query Test
If it runs successfully and there are no errors in stardog.log you should be able to see results using this query. Note that you have to specify the Named Graph since the data was added there. If you query without specifying, it will show no results.
SELECT * {
GRAPH <sct:900000000000207008> {
?s ?p ?o
}
}
You can run that query in stardog.studio but if you want it in Python, this will print the JSON result:
print(conn.select('SELECT * { GRAPH <sct:900000000000207008> { ?s ?p ?o } }'))

rdflib's parseQuery decode the query string which cause invalid URI

I have the following ttl file:
#prefix : <https://www.example.co/reserved/language#> .
<https://www.example.co/reserved/root> :_id "01G39WKRH76BGY5D3SKDHJP2SX" ;
:transcript%20data [ :_id "01G39WKRH7JYRX78X7FG4RCNYF" ;
:_key "transcript%20data" ;
:value "value" ;
:value_id "01G39WKRH7PVK1DXQHWT08DZA8" ] .
And I have the following query:
q = """
PREFIX : <https://www.example.co/reserved/language#>
SELECT ?o
WHERE { ?s :transcript%20data/:value ?o . }
"""
While trying to query the graph I got from the ttl file I got the following error:
https://www.example.co/reserved/language#transcript data does not look like a valid URI, trying to serialize this will break.
As you see, parseQuery has decoded the "%20" to a space " " which cases invalid URI. And this will return False while passed to _is_valid_uri function.
I've tested the query on different SPARQL engines and it is valid and works as expected.
So, what do you advise? to make the query valid and get the required results?
I am using rdflib Version: 6.1.1 on macOS Monterey 12.4
It was a bug in rdflib in SPARQL parser and it is fixed in this PR
Seems like _hexExpand internal SPARQL parser function inappropriately expands percent-encoded reserved characters. Added an exclusionary regexp to disable this behaviour and a parameterized test which checks SPARQL parser processing of the set of percent-encoded reserved chars

Why the results are diffrent in SPARQLWrapper and Wikidata query editor in sparql

I am using the following sparql query in wikidata query editor:
SELECT ?s ?p WHERE {?s ?p wd:Q22673982 .}
Link to the query editor: https://w.wiki/5E7
I am getting 40 records for the above query.
However, when I try to do the same in python using SPARQLWrapper I get 0 records. My code is as follows.
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON
sparqlwd = SPARQLWrapper("https://query.wikidata.org/sparql")
myid = "wd:Q22673982"
sparqlwd.setQuery(f"SELECT ?s ?p WHERE {{?s ?p \"{myid}\" .}}")
sparqlwd.setReturnFormat(JSON)
results = sparqlwd.query().convert()
print(results)
results_df = pd.io.json.json_normalize(results['results']['bindings'])
print(results_df)
I am just wondering why this mismatch happens. Is there a way to resolve this issue?
I am happy to provide more details if needed.
My code using SPARQLWRAPPER is almost right. However, I have made a typo when preparing the query using f-strings.
The corrected code is as follows, which solved my issue.
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON
sparqlwd = SPARQLWrapper("https://query.wikidata.org/sparql")
myid = "wd:Q22673982"
sparqlwd.setQuery(f"SELECT ?s ?p WHERE {{?s ?p {myid} .}}")
sparqlwd.setReturnFormat(JSON)
results = sparqlwd.query().convert()
print(results)
results_df = pd.io.json.json_normalize(results['results']['bindings'])
print(results_df)

python SPARQLWrapper return only 10000 results

I use the SPARQLWrapper module to launch a query to a virtuoso endpoint and get the result.
The query always return a maximum of 10000 results
Here is the python script:
from SPARQLWrapper import SPARQLWrapper, JSON
queryString = """
SELECT DISTINCT ?s
WHERE {
?s ?p ?o .
}
"""
sparql = SPARQLWrapper("http://localhost:8890/sparql")
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
res = sparql.query().convert()
# Parse result
parsed = []
for entry in res['results']['bindings']:
for sparql_variable in entry.keys():
parsed.append({sparql_variable: entry[sparql_variable]['value']})
print('Query return ' + str(len(parsed)) + ' results')
When I lauch the query with
SELECT count(*) AS ?count
I get the right number of triples : 917051.
Why the SPARQLWrapper module limit the number of result to 10000 ?
How do I get all the results ?
The answer is to adjust the Virtuoso configuration file, as documented. Specifically for this case, you need to increase the ResultSetMaxRows in the [SPARQL] stanza.
The limit is not in SPARQLWrapper. You would see the same limit if you did the full SELECT (instead of the COUNT, which only delivers 1 row) through the SPARQL endpoint, Conductor, or any other interface.
The 10000 results is set by the data owner via the item ResultSetMaxRows in the virtuoso.ini, to protect the data.
If not, anyone can use a simple sparql query select * where {?s ?p ?o} to get all the data which may cost the data owner a lot of time and money.

How can I properly serialize wikidata SPARQL queries answers?

I have the following example of querying wikidata via Python's SPARQLWrapper:
import rdflib, urllib
from SPARQLWrapper import SPARQLWrapper, JSON, XML, TURTLE, RDF, N3
from rdflib import Graph, Namespace, URIRef, RDF#, RDFS, Literal
def graph_full(uri, f)
sparql = SPARQLWrapper('https://query.wikidata.org/sparql')
sparql.setQuery('''
PREFIX entity: <http://www.wikidata.org/entity/>
SELECT ?predicate ?object WHERE {
<'''+urllib.unquote(uri).encode("utf8")+'''> ?predicate ?object .
} LIMIT 100
''')
sparql.setReturnFormat(N3)
results = sparql.query().convert()
#print results.serialize()
print type(results)
g = Graph()
g.parse(results)
print g
#g.serialize(f, format="n3")
if __name__ == '__main__':
graph_full("entity:Q76", "wikidata/output.nt")
I want to serialize the result of the SPARQL query and save it to a file. This seems to always throw the following error:
Exception: Unexpected type '<type 'instance'>' for source '<xml.dom.minidom.Document instance at 0x7fa11e3715a8>'
Using similar code against DBpedia SPARQL endpoints throws no erros.

Categories