Fetching triples using SPARQL query from turtle file - python

I am new to SPARQL and currently struglling to fetch triples from a turtle file.
### https://ontology/1001
<https://ontology/1001> rdf:type owl:Class ;
rdfs:subClassOf <https://ontology/748>;
<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "Injury, neuronal" ,
"Neurotrauma" ;
rdfs:label "Nervous system injury" .
### https://ontology/10021
<https://ontology/10021> rdf:type owl:Class ;
rdfs:subClassOf <https://ontology/2034> ;
rdfs:label "C3 glomerulopathy" .
I am trying to extract all classes with their superclasses, labels and Synonym. The query which I am running is below.
query_id = """
prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?cid ?label ?class ?synonyms
WHERE {
?cid rdfs:label ?label .
?cid rdfs:subClassOf ?class .
?cid oboInOwl:hasExactSynonym ?synonyms .
}
"""
However, this query is filtering the triple where 'hasExactSynonym' doesn't exists.
Following is the output:
cid label class synonyms
1001 Nervous system injury 748 Injury, neuronal , Neurotrauma
The expected output is:
cid label class synonyms
1001 Nervous system injury 748 Injury, neuronal , Neurotrauma
10021 C3 glomerulopathy 2034

You can use OPTIONAL to make the synonyms optional:
WHERE {
?cid rdfs:label ?label .
?cid rdfs:subClassOf ?class .
OPTIONAL { ?cid oboInOwl:hasExactSynonym ?synonyms . }
}

Related

Inserting python variable in SPARQL

I have a string variable I want to pass in my SPARQL query and I can't get it to work.
title = 'Good Will Hunting'
[str(s) for s, in graph.query('''
PREFIX ddis: <http://ddis.ch/atai/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
SELECT ?lbl WHERE {
?movie rdfs:label $title#en .
?movie wdt:P57 ?director .
?director rdfs:label ?lbl .
}
''')]
It doesn't work and I get an error. The query is correct as it works if I manualy enter the name when I replace $title.
String interpolation in python can be achieved with the %s symbol (for string variables):
title = 'Good Will Hunting'
[str(s) for s, in graph.query('''
PREFIX ddis: <http://ddis.ch/atai/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
SELECT ?lbl WHERE {
?movie rdfs:label "%s"#en .
?movie wdt:P57 ?director .
?director rdfs:label ?lbl .
}
''' % title)]
Note that I also added quotes ("%s"), that are necessary for specifying a string in SPARQL.

Get entity name/label from wikidata in python

I have some SPARQL queries to run on wikidata in python and I need to get the name/label of the entity returned instead of URI. For example, given the python snippet below:
from qwikidata.sparql import return_sparql_query_results
query_string = """
select ?ent where { ?ent wdt:P31 wd:Q2637056 . ?ent wdt:P2244 ?obj } ORDER BY DESC(?obj)LIMIT 5
"""
res = return_sparql_query_results(query_string)
for row in res["results"]["bindings"]:
print(row["ent"]["value"])
The queries in the original form return URIs, but I need to get the entity label/name. How can I do that in python?
The current output of the query:
http://www.wikidata.org/entity/Q841796
http://www.wikidata.org/entity/Q780047
NOTE: I don't have real access to the queries, therefore I can't rewrite the queries.
My comment was too long so i am posting an answer.
You'll need to rewrite the queries. Please find below an example how to get labels without using the label service.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?country ?countryLabel
WHERE
{
# instance of country
?country wdt:P31 wd:Q3624078.
OPTIONAL {
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
}
}
ORDER BY ?countryLabel
try it!
Adapted for your Soyuz-T example:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?ent ?entLabel
WHERE
{
# instance of Soyuz-T https://www.wikidata.org/wiki/Q2637056
?ent wdt:P31 wd:Q2637056 .
# https://www.wikidata.org/wiki/Property:P2244 periapsis
?ent wdt:P2244 ?obj
OPTIONAL {
?ent rdfs:label ?entLabel filter (lang(?entLabel) = "en").
}
} ORDER BY DESC(?obj)LIMIT 5
try it!
Result:
ent entLabel
wd:Q841796 Soyuz T-15
wd:Q780047 Soyuz T-8

Find information for many people in wikidata

I have a list of names (hundreds of them) that are already transformed to Q-numbers in wikidata using python. For each Q-number (person) I want to get some basic information such as place_of_birth, nationality, etc.
SELECT DISTINCT ?name ?nameLabel ?genderLabel ?placeofbirth ?nationality (year(?birthdate) as ?birthyear) (year(?deathdate) as ?deathyear)
WHERE
{
?name wdt:P106/wdt:P279* wd:Q1028181 # painter
FILTER (?name IN (wd:Q2674488)) # James Seymour
OPTIONAL { ?name wdt:P569 ?birthdate. }
OPTIONAL { ?name wdt:P27 ?nationality. }
OPTIONAL { ?name wdt:P21 ?gender. }
OPTIONAL { ?name wdt:P19 ?placeofbirth. }
OPTIONAL { ?name wdt:P570 ?deathyear. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Using SPARQL, I can search two or three people at a time by adding Q-numbers into "FILTER", but how can I loop through all Q-numbers in a python list? Thanks a lot!

SPARQL query with Turtle file (Public data source)

I am new to Turtle format files and querying them with SPARQL. So I have many questions to be solved, I hope you can help me!
I have a file called equipamentsCURT3.ttl and contains the following:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix v: <http://www.w3.org/2006/vcard/ns#> .
#prefix xml: <http://www.w3.org/XML/1998/namespace> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://opendata.gencat.cat/recursos/equipaments/30883> a v:VCard ;
v:adr [ a v:Work ;
v:country-name "Spain" ;
v:locality "Sabadell" ;
v:postal-code "08202" ;
v:region "Vallès Occidental" ;
v:street-address " c. Sant Josep" ] ;
v:category "2. Parvulari"#ca,
"3. Educació primària"#ca,
"4. Educació secundària obligatòria"#ca,
"Educació. Formació"#ca,
"Ensenyaments de règim general"#ca ;
v:fn "Escolàpies Sabadell"#ca ;
v:geo [ v:latitude 4.154826e+01 ;
v:longitude 2.111243e+00 ] ;
v:nickname "Escolàpies Sabadell"#ca ;
v:tel [ a v:Pref,
v:Tel,
v:Work ;
rdf:value "937255348" ] .
<http://opendata.gencat.cat/recursos/equipaments/31264> a v:VCard ;
v:adr [ a v:Work ;
v:country-name "Spain" ;
v:locality "Molins de Rei" ;
v:postal-code "08750" ;
v:region "Baix Llobregat" ;
v:street-address " c. Ntra. Sra. de Lourdes" ] ;
v:category "4. Educació secundària obligatòria"#ca,
"7. Batxillerat"#ca,
"8. Cicles formatius d'FP de grau mitjà (CFPM)"#ca,
"9. Cicles formatius d'FP de grau superior (CFPS)"#ca,
"Educació. Formació"#ca,
"Ensenyaments de règim general"#ca ;
v:fn "Institut Bernat el Ferrer"#ca ;
v:geo [ v:latitude 4.14105e+01 ;
v:longitude 2.02704e+00 ] ;
v:nickname "Institut Bernat el Ferrer"#ca ;
v:tel [ a v:Pref,
v:Tel,
v:Work ;
rdf:value "936683762" ] .
<http://opendata.gencat.cat/recursos/equipaments/31265> a v:VCard ;
v:adr [ a v:Work ;
v:country-name "Spain" ;
v:locality "Castellar del Vallès" ;
v:postal-code "08211" ;
v:region "Vallès Occidental" ;
v:street-address " NC Bonavista" ] ;
v:category "2. Parvulari"#ca,
"3. Educació primària"#ca,
"Educació. Formació"#ca,
"Ensenyaments de règim general"#ca ;
v:fn "Escola Bonavista"#ca ;
v:geo [ v:latitude 4.161903e+01 ;
v:longitude 2.091745e+00 ] ;
v:nickname "Escola Bonavista"#ca ;
v:tel [ a v:Pref,
v:Tel,
v:Work ;
rdf:value "937144195" ] .
I am using Python3.5 and a library called RDFLib (https://github.com/RDFLib/rdflib). I need to read from a file called equipamentsCURT.rdf, serialize it into equipamentsCURT3.ttl and then retrieve all information related to an equipment. For example, for the equipment 30883 (http://opendata.gencat.cat/recursos/equipaments/30883), I want v:adr,v:category,v:fn,v:geo and v:tel. To obtain this data, I use SPARQL but I don't know why the query doesn't work. I'm very confused in how to query the information.
Here is my code:
import rdflib , pprint
from rdflib import URIRef, Graph
from rdflib.plugins import sparql
g = Graph()
g.load("equipamentsCURT3.ttl", format='turtle')
queryTest = 'prefix v: <http://www.w3.org/2006/vcard/ns#> ' \
'select ?y where {?x a <http://opendata.gencat.cat/recursos/equipaments 30883>; ?y v:VCard .}'
qresult = g.query(queryTest)
for st in qresult:
print rdflib.term.Literal(st).value
The whole query doesn't make any sense nor does it match the data.
I'd suggest reading a SPARQL tutorial first. The whole query looks like copy-paste from something else + some random stuff from your side.
the URI http://opendata.gencat.cat/recursos/equipaments 30883 contains a white space which is wrong
http://opendata.gencat.cat/recursos/equipaments/30883 is not a class. Thus, a triple pattern
?x a <http://opendata.gencat.cat/recursos/equipaments/30883>, which means to all resources that belong to the class http://opendata.gencat.cat/recursos/equipaments/30883 doesn't match your data.
The second triple pattern is ?x ?y v:VCard. And you're selecting the predicate ?y as the final result of your query. But you want the objects for a given subject and a given set of predicates. Syntax of a triple /resp. triple pattern) is subject-predicate-object. Thus, for example for v:category it should be
PREFIX v: <http://www.w3.org/2006/vcard/ns#>
SELECT ?o WHERE {
<http://opendata.gencat.cat/recursos/equipaments/30883> v:category ?o
}
For the other properties it will be more complicated since the values itself are blank nodes that have attached multiple values via additional properties. E.g. for v:adr it would be
PREFIX v: <http://www.w3.org/2006/vcard/ns#>
SELECT ?p ?o WHERE {
<http://opendata.gencat.cat/recursos/equipaments/30883> v:adr ?adr .
?adr ?p ?o
}
Update
If you don't want the values but the properties it's correct to have the variable in predicate position. But it's wrong to restrict it to those properties that occur only in triples with the object v:VCard because there is no such property besides rdf:type (a is just a synonym for it). In that case it should be
PREFIX v: <http://www.w3.org/2006/vcard/ns#>
SELECT DISTINCT ?p WHERE {
<http://opendata.gencat.cat/recursos/equipaments/30883> ?p ?o
}

Same sparql not returning same results

I'm using the same sparql statement using two different clients but both are not returning the same results. The owl file is in rdf syntax and can be accessed here.
This is the sparql statement:
PREFIX wo:<http://purl.org/ontology/wo/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> select ?individual where { ?individual rdf:type wo:Class }
I'm using it using top braid and the following python program:
>>> import rdflib
>>> import rdfextras
>>> rdfextras.registerplugins()
>>> g=rdflib.Graph()
>>> g.parse("index.owl")
<Graph identifier=N39ccd52985014f15b2fea90c3ffaedca (<class 'rdflib.graph.Graph'>)>
>>> PREFIX = "PREFIX wo:<http://purl.org/ontology/wo/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
>>> query = "select ?individual where { ?individual rdf:type wo:Class }"
>>> query = PREFIX + query
>>> result_set = g.query(query)
>>> len(result_set)
0
Which is returning 0
This query constructs a graph containing all the triples in which wo:Class is used as a subject, predicate, or object:
PREFIX wo: <http://purl.org/ontology/wo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
construct { ?s ?p ?o }
where {
{ ?s ?p wo:Class . bind( wo:Class as ?o ) } union
{ ?s wo:Class ?o . bind( wo:Class as ?p ) } union
{ wo:Class ?p ?o . bind( wo:Class as ?s ) }
}
I made a local copy of your data and the results I get are (in Turtle):
#prefix vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix wo: <http://purl.org/ontology/wo/> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
wo:Class a owl:Class ;
rdfs:comment "A class is a scientific way to group related organisms together, some examples of classes being jellyfish, reptiles and sea urchins. Classes are big groups and contain within them smaller groupings called orders, families, genera and species."#en ;
rdfs:label "Class"#en ;
rdfs:seeAlso <http://www.bbc.co.uk/nature/class> , <http://en.wikipedia.org/wiki/Class_%28biology%29> ;
rdfs:subClassOf wo:TaxonRank ;
vs:term_status "testing" .
wo:class rdfs:range wo:Class .
There are no individuals of type wo:Class in your data. The result set ought to be empty.

Categories