SPARQL Join ttl to dbpedia in Python - python

So I know that in order to run SPARQL statements against a local ttl file I use rdflib. In order to run SPARQL statements against dbpedia I run Sparqlwrapper. But how do I do both? i.e. suppose I have a local ttl file and I want to leverage some of the online resources available.
So ... suppose I have the following local ttl
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://www.learningsparql.com/ns/demo#i93234>
foaf:nick "Dick" ;
foaf:givenname "Richard" ;
foaf:mbox "richard49#hotmail.com" ;
foaf:surname "Mutt" ;
foaf:workplaceHomepage <http://www.philamuseum.org/> ;
foaf:aimChatID "bridesbachelor" .
Then I create the following python program to execute a SPARQL query and print out more human readable versions of the properties
filename = "C:/DataStuff/SemanticOntology/LearningSPARQLExamples/ex050.ttl" interesting
import rdflib
g = rdflib.Graph()
result = g.parse(filename, format='ttl')
print(result)
query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?propertyLabel ?value
WHERE
{
?s ?property ?value .
?property rdfs:label ?propertyLabel .
}
"""
results=g.query(query)
print('Results!')
for row in results:
print(row)
Which will return nothing because it isn't accessing dbpedia, and therefore doesn't know what rdfs:label is. I get that. But how do I tell it to?

Related

"KeyError: rdflib.term.BNode" Error appeared when executing SPARQL query

I'm trying to retrieve all intersection members for a specific class from a .owl ontology using SPARQL. I executed the following SPARQL query :
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
select ?class ?i ?uri ?label where {
?class owl:equivalentClass/
owl:intersectionOf/
rdf:rest*/rdf:first ?i.
?uri rdfs:label ?label.
FILTER (?uri IN (<http://purl.obolibrary.org/obo/AGRO_00000002>) )
}
When executed it, I got an error :
"KeyError: rdflib.term.BNode('510')"
I'm using Python , Pycharm framework. For executing the SPARQL, I used both rdflib and owlready2. Can you please help me solve the error mentioned above?

rdflib's parseQuery decode the query string which cause invalid URI

I have the following ttl file:
#prefix : <https://www.example.co/reserved/language#> .
<https://www.example.co/reserved/root> :_id "01G39WKRH76BGY5D3SKDHJP2SX" ;
:transcript%20data [ :_id "01G39WKRH7JYRX78X7FG4RCNYF" ;
:_key "transcript%20data" ;
:value "value" ;
:value_id "01G39WKRH7PVK1DXQHWT08DZA8" ] .
And I have the following query:
q = """
PREFIX : <https://www.example.co/reserved/language#>
SELECT ?o
WHERE { ?s :transcript%20data/:value ?o . }
"""
While trying to query the graph I got from the ttl file I got the following error:
https://www.example.co/reserved/language#transcript data does not look like a valid URI, trying to serialize this will break.
As you see, parseQuery has decoded the "%20" to a space " " which cases invalid URI. And this will return False while passed to _is_valid_uri function.
I've tested the query on different SPARQL engines and it is valid and works as expected.
So, what do you advise? to make the query valid and get the required results?
I am using rdflib Version: 6.1.1 on macOS Monterey 12.4
It was a bug in rdflib in SPARQL parser and it is fixed in this PR
Seems like _hexExpand internal SPARQL parser function inappropriately expands percent-encoded reserved characters. Added an exclusionary regexp to disable this behaviour and a parameterized test which checks SPARQL parser processing of the set of percent-encoded reserved chars

Selecting literal values from Wikidata federated query service using RDFLib

I'm trying to get external identifiers for an entity in Wikidata. Using the following query, I can get the literal values (_value) and optionally formatted URLs (value) for Q2409 on the Wikidata Query Service site.
Load in Wikidata Query Service
SELECT ?property ?_value ?value
WHERE {
?property wikibase:propertyType wikibase:ExternalId .
?property wikibase:directClaim ?propertyclaim .
OPTIONAL { ?property wdt:P1630 ?formatterURL . }
wd:Q2409 ?propertyclaim ?_value .
BIND(IF(BOUND(?formatterURL), IRI(REPLACE(?formatterURL, "\\$", ?_value)) , ?_value) AS ?value)
}
Using RDFLib, I'm writing the same query, but with a federated service.
from rdflib import Graph
from rdflib.plugins.sparql import prepareQuery
g = Graph()
q = prepareQuery(r"""
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?property ?_value ?value
WHERE {
SERVICE <https://query.wikidata.org/sparql> {
?property wikibase:propertyType wikibase:ExternalId .
?property wikibase:directClaim ?propertyclaim .
OPTIONAL { ?property wdt:P1630 ?formatterURL . }
wd:Q2409 ?propertyclaim ?_value .
BIND(IF(BOUND(?formatterURL), IRI(REPLACE(?formatterURL, "\\$", ?_value)) , ?_value) AS ?value)
}
}
""")
for row in g.query(q, DEBUG=True):
print(row)
With this, I'm getting the URLs as URIRef objects. But, instead of Literal for the literal values, I'm getting None.
First 6 lines of output:
(rdflib.term.URIRef('http://www.wikidata.org/entity/P232'), None, None)
(rdflib.term.URIRef('http://www.wikidata.org/entity/P657'), None, None)
(rdflib.term.URIRef('http://www.wikidata.org/entity/P6366'), None, None)
(rdflib.term.URIRef('http://www.wikidata.org/entity/P1296'), None, rdflib.term.URIRef('https://www.enciclopedia.cat/EC-GEC-01407541.xml'))
(rdflib.term.URIRef('http://www.wikidata.org/entity/P486'), None, rdflib.term.URIRef('https://id.nlm.nih.gov/mesh/D0068511.html'))
(rdflib.term.URIRef('http://www.wikidata.org/entity/P7033'), None, rdflib.term.URIRef('http://vocabulary.curriculum.edu.au/scot/5001.html'))
What am I missing for the literal values? I'm having trouble figuring out why I'm getting None instead of the values.
I'm not sure if all of the features of SERVICE calls are fully implemented in RDFLib.
I would get this working with a 'normal' call the Wikidata SPARQL endpoint using either RDFLib's SPARQLWrapper library or the general-purpose web request Python libraries requests or httpx first. If that all works, you could then try again with the SERVICE request but you likely won't need it.

SPARQLWrapper : problem in querying an ontology in a local file

I'm working with SPARQLWrapper and I'm following the documentation. Here is my code:
queryString = "SELECT * WHERE { ?s ?p ?o. }"
sparql = SPARQLWrapper("http://example.org/sparql")# I replaced this line with
sparql = SPARQLWrapper("file:///thelocation of my file in my computer")
sparql.setQuery(queryString)
try :
ret = sparql.query()
# ret is a stream with the results in XML, see <http://www.w3.org/TR/rdf-sparql-XMLres/>
except :
deal_with_the_exception()
I'm getting these 2 errors:
1- The system cannot find the path specified
2- NameError: name 'deal_with_the_exception' is not defined
You need a SPARQL endpoint to make it work. Consider setting up Apache Fuseki in your local computer. See https://jena.apache.org/documentation/fuseki2/jena

python SPARQLWrapper return only 10000 results

I use the SPARQLWrapper module to launch a query to a virtuoso endpoint and get the result.
The query always return a maximum of 10000 results
Here is the python script:
from SPARQLWrapper import SPARQLWrapper, JSON
queryString = """
SELECT DISTINCT ?s
WHERE {
?s ?p ?o .
}
"""
sparql = SPARQLWrapper("http://localhost:8890/sparql")
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
res = sparql.query().convert()
# Parse result
parsed = []
for entry in res['results']['bindings']:
for sparql_variable in entry.keys():
parsed.append({sparql_variable: entry[sparql_variable]['value']})
print('Query return ' + str(len(parsed)) + ' results')
When I lauch the query with
SELECT count(*) AS ?count
I get the right number of triples : 917051.
Why the SPARQLWrapper module limit the number of result to 10000 ?
How do I get all the results ?
The answer is to adjust the Virtuoso configuration file, as documented. Specifically for this case, you need to increase the ResultSetMaxRows in the [SPARQL] stanza.
The limit is not in SPARQLWrapper. You would see the same limit if you did the full SELECT (instead of the COUNT, which only delivers 1 row) through the SPARQL endpoint, Conductor, or any other interface.
The 10000 results is set by the data owner via the item ResultSetMaxRows in the virtuoso.ini, to protect the data.
If not, anyone can use a simple sparql query select * where {?s ?p ?o} to get all the data which may cost the data owner a lot of time and money.

Categories