Selecting literal values from Wikidata federated query service using RDFLib - python

I'm trying to get external identifiers for an entity in Wikidata. Using the following query, I can get the literal values (_value) and optionally formatted URLs (value) for Q2409 on the Wikidata Query Service site.
Load in Wikidata Query Service
SELECT ?property ?_value ?value
WHERE {
?property wikibase:propertyType wikibase:ExternalId .
?property wikibase:directClaim ?propertyclaim .
OPTIONAL { ?property wdt:P1630 ?formatterURL . }
wd:Q2409 ?propertyclaim ?_value .
BIND(IF(BOUND(?formatterURL), IRI(REPLACE(?formatterURL, "\\$", ?_value)) , ?_value) AS ?value)
}
Using RDFLib, I'm writing the same query, but with a federated service.
from rdflib import Graph
from rdflib.plugins.sparql import prepareQuery
g = Graph()
q = prepareQuery(r"""
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?property ?_value ?value
WHERE {
SERVICE <https://query.wikidata.org/sparql> {
?property wikibase:propertyType wikibase:ExternalId .
?property wikibase:directClaim ?propertyclaim .
OPTIONAL { ?property wdt:P1630 ?formatterURL . }
wd:Q2409 ?propertyclaim ?_value .
BIND(IF(BOUND(?formatterURL), IRI(REPLACE(?formatterURL, "\\$", ?_value)) , ?_value) AS ?value)
}
}
""")
for row in g.query(q, DEBUG=True):
print(row)
With this, I'm getting the URLs as URIRef objects. But, instead of Literal for the literal values, I'm getting None.
First 6 lines of output:
(rdflib.term.URIRef('http://www.wikidata.org/entity/P232'), None, None)
(rdflib.term.URIRef('http://www.wikidata.org/entity/P657'), None, None)
(rdflib.term.URIRef('http://www.wikidata.org/entity/P6366'), None, None)
(rdflib.term.URIRef('http://www.wikidata.org/entity/P1296'), None, rdflib.term.URIRef('https://www.enciclopedia.cat/EC-GEC-01407541.xml'))
(rdflib.term.URIRef('http://www.wikidata.org/entity/P486'), None, rdflib.term.URIRef('https://id.nlm.nih.gov/mesh/D0068511.html'))
(rdflib.term.URIRef('http://www.wikidata.org/entity/P7033'), None, rdflib.term.URIRef('http://vocabulary.curriculum.edu.au/scot/5001.html'))
What am I missing for the literal values? I'm having trouble figuring out why I'm getting None instead of the values.

I'm not sure if all of the features of SERVICE calls are fully implemented in RDFLib.
I would get this working with a 'normal' call the Wikidata SPARQL endpoint using either RDFLib's SPARQLWrapper library or the general-purpose web request Python libraries requests or httpx first. If that all works, you could then try again with the SERVICE request but you likely won't need it.

Related

"KeyError: rdflib.term.BNode" Error appeared when executing SPARQL query

I'm trying to retrieve all intersection members for a specific class from a .owl ontology using SPARQL. I executed the following SPARQL query :
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
select ?class ?i ?uri ?label where {
?class owl:equivalentClass/
owl:intersectionOf/
rdf:rest*/rdf:first ?i.
?uri rdfs:label ?label.
FILTER (?uri IN (<http://purl.obolibrary.org/obo/AGRO_00000002>) )
}
When executed it, I got an error :
"KeyError: rdflib.term.BNode('510')"
I'm using Python , Pycharm framework. For executing the SPARQL, I used both rdflib and owlready2. Can you please help me solve the error mentioned above?

rdflib's parseQuery decode the query string which cause invalid URI

I have the following ttl file:
#prefix : <https://www.example.co/reserved/language#> .
<https://www.example.co/reserved/root> :_id "01G39WKRH76BGY5D3SKDHJP2SX" ;
:transcript%20data [ :_id "01G39WKRH7JYRX78X7FG4RCNYF" ;
:_key "transcript%20data" ;
:value "value" ;
:value_id "01G39WKRH7PVK1DXQHWT08DZA8" ] .
And I have the following query:
q = """
PREFIX : <https://www.example.co/reserved/language#>
SELECT ?o
WHERE { ?s :transcript%20data/:value ?o . }
"""
While trying to query the graph I got from the ttl file I got the following error:
https://www.example.co/reserved/language#transcript data does not look like a valid URI, trying to serialize this will break.
As you see, parseQuery has decoded the "%20" to a space " " which cases invalid URI. And this will return False while passed to _is_valid_uri function.
I've tested the query on different SPARQL engines and it is valid and works as expected.
So, what do you advise? to make the query valid and get the required results?
I am using rdflib Version: 6.1.1 on macOS Monterey 12.4
It was a bug in rdflib in SPARQL parser and it is fixed in this PR
Seems like _hexExpand internal SPARQL parser function inappropriately expands percent-encoded reserved characters. Added an exclusionary regexp to disable this behaviour and a parameterized test which checks SPARQL parser processing of the set of percent-encoded reserved chars

Is there any way i can run cypher command starting from :Param using py2neo 2021.0.0

Well, in neo4j i am trying to achieve this simple query to save the sparql keyword to use in later query and graph.run is not allowing me to do it. It is showing a syntax error
graph.run(":PARAM sparql: 'PREFIX sch: <http://schema.org/> CONSTRUCT{?item a sch:item; sch:legalIdentity ?legalIdentity} WHERE { {?item p:P31/ps:P31 wd:Q783794 optional { ?item wdt:P1278 ?legalIdentity} } UNION {?item p:P31/ps:P31 wd:Q4830453 optional { ?item wdt:P1278 ?legalIdentity}} UNION {?item p:P31/ps:P31 wd:Q43229 optional { ?item wdt:P1278 ?legalIdentity}} UNION {?item p:P31/ps:P31 wd:Q6881511 optional { ?item wdt:P1278 ?legalIdentity}}}'")
And following line is the cypher query which uses sparql keyword
graph.run('CALL n10s.rdf.import.fetch("https://query.wikidata.org/sparql?query=" + apoc.text.urlencode($sparql), "RDF/XML", { headerParams: { Accept: "application/rdf+xml"} });')
The :PARAM command is a client-side browser/shell built-in. It does not exist in Cypher itself. As mentioned by #fbiville, you will need to pass a dict of parameters instead.
You can pass a dictionary of parameters to the run method, as documented here.

SPARQL Join ttl to dbpedia in Python

So I know that in order to run SPARQL statements against a local ttl file I use rdflib. In order to run SPARQL statements against dbpedia I run Sparqlwrapper. But how do I do both? i.e. suppose I have a local ttl file and I want to leverage some of the online resources available.
So ... suppose I have the following local ttl
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://www.learningsparql.com/ns/demo#i93234>
foaf:nick "Dick" ;
foaf:givenname "Richard" ;
foaf:mbox "richard49#hotmail.com" ;
foaf:surname "Mutt" ;
foaf:workplaceHomepage <http://www.philamuseum.org/> ;
foaf:aimChatID "bridesbachelor" .
Then I create the following python program to execute a SPARQL query and print out more human readable versions of the properties
filename = "C:/DataStuff/SemanticOntology/LearningSPARQLExamples/ex050.ttl" interesting
import rdflib
g = rdflib.Graph()
result = g.parse(filename, format='ttl')
print(result)
query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?propertyLabel ?value
WHERE
{
?s ?property ?value .
?property rdfs:label ?propertyLabel .
}
"""
results=g.query(query)
print('Results!')
for row in results:
print(row)
Which will return nothing because it isn't accessing dbpedia, and therefore doesn't know what rdfs:label is. I get that. But how do I tell it to?

return full wikipedia page for a query using dbpedia

I am using the following Code to retrieve disambiguation pages for a given query.
#disambiguation function
def disambiguation(name, sparql):
query = "SELECT DISTINCT ?syn WHERE { { ?disPage dbpedia-owl:wikiPageDisambiguates <http://dbpedia.org/resource/"+name+"> . ?disPage dbpedia-owl:wikiPageDisambiguates ?syn . } UNION {<http://dbpedia.org/resource/"+name+"> dbpedia-owl:wikiPageDisambiguates ?syn . } }"
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results_list = sparql.query().convert()
return results_list
Question:
Is it possible to return the full wikipedia page for every element in the results_list?
Simplifying your query
SELECT DISTINCT ?syn WHERE {
{ ?disPage dbpedia-owl:wikiPageDisambiguates <http://dbpedia.org/resource/"+name+"> .
?disPage dbpedia-owl:wikiPageDisambiguates ?syn . }
UNION
{ <http://dbpedia.org/resource/"+name+"> dbpedia-owl:wikiPageDisambiguates ?syn . }
}
This query can be more cleanly written as
select distinct ?syn where {
?syn (dbpedia-owl:wikiPageDisambiguates|^dbpedia-owl:wikiPageDisambiguates)* dbpedia:name
}
This query says to find everything that's connected to dbpedia:name by a path of dbpedia-owl:wikiPageDisambiguates properties in any direction.
Getting the Wikipedia article URL
I actually wanted to retrieve the whole wikipedia page. For example:
When I find a name in a different language I want to Go to the
corresponding wikipedia page and retrieve its corresponding page
If you actually want to retrieve the page (using some other library, or whatever you have), then you just need to get the Wikipedia article URL. That's the value of the foaf:isPrimaryTopicOf property. E.g., if you look at property values for Johnny Cash, you'll see
http://dbpedia.org/resource/Johnny_Cash foaf:isPrimaryTopicOf http://en.wikipedia.org/wiki/Johnny_Cash
Based on that, it sounds like you'd want a query more like:
select distinct ?page where {
?syn (dbpedia-owl:wikiPageDisambiguates|^dbpedia-owl:wikiPageDisambiguates)* dbpedia:name ;
foaf:isPrimaryTopicOf ?page
}
Then each value of ?page should be a Wikipedia article URL that you can download.

Categories