python SPARQLWrapper return only 10000 results - python

I use the SPARQLWrapper module to launch a query to a virtuoso endpoint and get the result.
The query always return a maximum of 10000 results
Here is the python script:
from SPARQLWrapper import SPARQLWrapper, JSON
queryString = """
SELECT DISTINCT ?s
WHERE {
?s ?p ?o .
}
"""
sparql = SPARQLWrapper("http://localhost:8890/sparql")
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
res = sparql.query().convert()
# Parse result
parsed = []
for entry in res['results']['bindings']:
for sparql_variable in entry.keys():
parsed.append({sparql_variable: entry[sparql_variable]['value']})
print('Query return ' + str(len(parsed)) + ' results')
When I lauch the query with
SELECT count(*) AS ?count
I get the right number of triples : 917051.
Why the SPARQLWrapper module limit the number of result to 10000 ?
How do I get all the results ?

The answer is to adjust the Virtuoso configuration file, as documented. Specifically for this case, you need to increase the ResultSetMaxRows in the [SPARQL] stanza.
The limit is not in SPARQLWrapper. You would see the same limit if you did the full SELECT (instead of the COUNT, which only delivers 1 row) through the SPARQL endpoint, Conductor, or any other interface.

The 10000 results is set by the data owner via the item ResultSetMaxRows in the virtuoso.ini, to protect the data.
If not, anyone can use a simple sparql query select * where {?s ?p ?o} to get all the data which may cost the data owner a lot of time and money.

Related

rdflib's parseQuery decode the query string which cause invalid URI

I have the following ttl file:
#prefix : <https://www.example.co/reserved/language#> .
<https://www.example.co/reserved/root> :_id "01G39WKRH76BGY5D3SKDHJP2SX" ;
:transcript%20data [ :_id "01G39WKRH7JYRX78X7FG4RCNYF" ;
:_key "transcript%20data" ;
:value "value" ;
:value_id "01G39WKRH7PVK1DXQHWT08DZA8" ] .
And I have the following query:
q = """
PREFIX : <https://www.example.co/reserved/language#>
SELECT ?o
WHERE { ?s :transcript%20data/:value ?o . }
"""
While trying to query the graph I got from the ttl file I got the following error:
https://www.example.co/reserved/language#transcript data does not look like a valid URI, trying to serialize this will break.
As you see, parseQuery has decoded the "%20" to a space " " which cases invalid URI. And this will return False while passed to _is_valid_uri function.
I've tested the query on different SPARQL engines and it is valid and works as expected.
So, what do you advise? to make the query valid and get the required results?
I am using rdflib Version: 6.1.1 on macOS Monterey 12.4
It was a bug in rdflib in SPARQL parser and it is fixed in this PR
Seems like _hexExpand internal SPARQL parser function inappropriately expands percent-encoded reserved characters. Added an exclusionary regexp to disable this behaviour and a parameterized test which checks SPARQL parser processing of the set of percent-encoded reserved chars

How to get result a sparql query in text format in python

I want to fetch the results of my sparql query in text format not in json, xml and etc in this python code. Actually, I need just the value of the object in string/text.
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
SELECT ?p ?o
WHERE { <http://dbpedia.org/page/Eurobike> ?p ?o .
filter langMatches(lang(?o),"en")
}
""")
sparql.setReturnFormat(XML)
results = sparql.query().convert()
print(results.toxml())
I think you want something like CSV or TSV.
setReturnFormat(CSV)

SPARQLWrapper : problem in querying an ontology in a local file

I'm working with SPARQLWrapper and I'm following the documentation. Here is my code:
queryString = "SELECT * WHERE { ?s ?p ?o. }"
sparql = SPARQLWrapper("http://example.org/sparql")# I replaced this line with
sparql = SPARQLWrapper("file:///thelocation of my file in my computer")
sparql.setQuery(queryString)
try :
ret = sparql.query()
# ret is a stream with the results in XML, see <http://www.w3.org/TR/rdf-sparql-XMLres/>
except :
deal_with_the_exception()
I'm getting these 2 errors:
1- The system cannot find the path specified
2- NameError: name 'deal_with_the_exception' is not defined
You need a SPARQL endpoint to make it work. Consider setting up Apache Fuseki in your local computer. See https://jena.apache.org/documentation/fuseki2/jena

Why the results are diffrent in SPARQLWrapper and Wikidata query editor in sparql

I am using the following sparql query in wikidata query editor:
SELECT ?s ?p WHERE {?s ?p wd:Q22673982 .}
Link to the query editor: https://w.wiki/5E7
I am getting 40 records for the above query.
However, when I try to do the same in python using SPARQLWrapper I get 0 records. My code is as follows.
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON
sparqlwd = SPARQLWrapper("https://query.wikidata.org/sparql")
myid = "wd:Q22673982"
sparqlwd.setQuery(f"SELECT ?s ?p WHERE {{?s ?p \"{myid}\" .}}")
sparqlwd.setReturnFormat(JSON)
results = sparqlwd.query().convert()
print(results)
results_df = pd.io.json.json_normalize(results['results']['bindings'])
print(results_df)
I am just wondering why this mismatch happens. Is there a way to resolve this issue?
I am happy to provide more details if needed.
My code using SPARQLWRAPPER is almost right. However, I have made a typo when preparing the query using f-strings.
The corrected code is as follows, which solved my issue.
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON
sparqlwd = SPARQLWrapper("https://query.wikidata.org/sparql")
myid = "wd:Q22673982"
sparqlwd.setQuery(f"SELECT ?s ?p WHERE {{?s ?p {myid} .}}")
sparqlwd.setReturnFormat(JSON)
results = sparqlwd.query().convert()
print(results)
results_df = pd.io.json.json_normalize(results['results']['bindings'])
print(results_df)

How can I properly serialize wikidata SPARQL queries answers?

I have the following example of querying wikidata via Python's SPARQLWrapper:
import rdflib, urllib
from SPARQLWrapper import SPARQLWrapper, JSON, XML, TURTLE, RDF, N3
from rdflib import Graph, Namespace, URIRef, RDF#, RDFS, Literal
def graph_full(uri, f)
sparql = SPARQLWrapper('https://query.wikidata.org/sparql')
sparql.setQuery('''
PREFIX entity: <http://www.wikidata.org/entity/>
SELECT ?predicate ?object WHERE {
<'''+urllib.unquote(uri).encode("utf8")+'''> ?predicate ?object .
} LIMIT 100
''')
sparql.setReturnFormat(N3)
results = sparql.query().convert()
#print results.serialize()
print type(results)
g = Graph()
g.parse(results)
print g
#g.serialize(f, format="n3")
if __name__ == '__main__':
graph_full("entity:Q76", "wikidata/output.nt")
I want to serialize the result of the SPARQL query and save it to a file. This seems to always throw the following error:
Exception: Unexpected type '<type 'instance'>' for source '<xml.dom.minidom.Document instance at 0x7fa11e3715a8>'
Using similar code against DBpedia SPARQL endpoints throws no erros.

Categories