Get entity name/label from wikidata in python - python

I have some SPARQL queries to run on wikidata in python and I need to get the name/label of the entity returned instead of URI. For example, given the python snippet below:
from qwikidata.sparql import return_sparql_query_results
query_string = """
select ?ent where { ?ent wdt:P31 wd:Q2637056 . ?ent wdt:P2244 ?obj } ORDER BY DESC(?obj)LIMIT 5
"""
res = return_sparql_query_results(query_string)
for row in res["results"]["bindings"]:
print(row["ent"]["value"])
The queries in the original form return URIs, but I need to get the entity label/name. How can I do that in python?
The current output of the query:
http://www.wikidata.org/entity/Q841796
http://www.wikidata.org/entity/Q780047
NOTE: I don't have real access to the queries, therefore I can't rewrite the queries.

My comment was too long so i am posting an answer.
You'll need to rewrite the queries. Please find below an example how to get labels without using the label service.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?country ?countryLabel
WHERE
{
# instance of country
?country wdt:P31 wd:Q3624078.
OPTIONAL {
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
}
}
ORDER BY ?countryLabel
try it!
Adapted for your Soyuz-T example:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?ent ?entLabel
WHERE
{
# instance of Soyuz-T https://www.wikidata.org/wiki/Q2637056
?ent wdt:P31 wd:Q2637056 .
# https://www.wikidata.org/wiki/Property:P2244 periapsis
?ent wdt:P2244 ?obj
OPTIONAL {
?ent rdfs:label ?entLabel filter (lang(?entLabel) = "en").
}
} ORDER BY DESC(?obj)LIMIT 5
try it!
Result:
ent entLabel
wd:Q841796 Soyuz T-15
wd:Q780047 Soyuz T-8

Related

Inserting python variable in SPARQL

I have a string variable I want to pass in my SPARQL query and I can't get it to work.
title = 'Good Will Hunting'
[str(s) for s, in graph.query('''
PREFIX ddis: <http://ddis.ch/atai/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
SELECT ?lbl WHERE {
?movie rdfs:label $title#en .
?movie wdt:P57 ?director .
?director rdfs:label ?lbl .
}
''')]
It doesn't work and I get an error. The query is correct as it works if I manualy enter the name when I replace $title.
String interpolation in python can be achieved with the %s symbol (for string variables):
title = 'Good Will Hunting'
[str(s) for s, in graph.query('''
PREFIX ddis: <http://ddis.ch/atai/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
SELECT ?lbl WHERE {
?movie rdfs:label "%s"#en .
?movie wdt:P57 ?director .
?director rdfs:label ?lbl .
}
''' % title)]
Note that I also added quotes ("%s"), that are necessary for specifying a string in SPARQL.

Search for Q-ID (from Wikidata) with Twitter Username ID. (Python)

I have a list of verified Twitter User-IDs.
data['screen_name'] = [MOFAJapan_en, serenawilliams, JeffBezos ....]
data['twitter_ids'] = [303735625, 26589987, 15506669 ....]
and I want to get their respective Q-IDs from Wikidata. For the above Twitter-username-IDs, it will look sort of like this:
q_id_list = [Q222241, Q11459, Q312556 ....]
I ran into a slight complication here: if you search for MOFAJapan_en or MOFA of Japan, Wikidata API cannot recognize it. However, MOFAJapan has a wikidata page.
I know that the Property # for Twitter username is P2002, but how do I query for this without knowing the Q-ID?
Thank you in advance.
Given a list of Twitter names (inside VALUES), this SPARQL query will find the persons:
SELECT ?twitterName ?person
WHERE {
VALUES ?twitterName {
"MOFAJapan_en"
"serenawilliams"
"JeffBezos"
}
?person wdt:P2002 ?twitterName .
}
It won’t find anything for MOFAJapan_en, as the correct value seems to be MofaJapan_en. To ignore case, you can use a FILTER with LCASE, but this will increase runtime performance:
SELECT ?twitterName ?person
WHERE {
VALUES ?twitterName_anyCase {
"MOFAJapan_en"
"serenawilliams"
"JeffBezos"
}
FILTER( LCASE(?twitterName_anyCase) = LCASE(?twitterName) ) .
?person wdt:P2002 ?twitterName .
}

SPARQL query in python - invalid replace string due to escapes not working

I am trying to run a SPARQL query in Python, however, when trying to use bind & replace, escaping the " is not working resulting in an error.
I tried to find some info on unicoding, but seem to find it hard to include it within this query.
Does anyone have a solution for the problem?
TEXT:
from rdflib import Graph
from rdflib.namespace import RDF, SKOS
g = Graph()
g.parse('\Python\Molenakker.orox.ttl')
len(g)
print(len(g))
gmls = g.query('''
PREFIX gwsw: <http://data.gwsw.nl/1.5/totaal/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sparql: <http://sparql.gwsw.nl/bim/juinen#>
SELECT ?XYZ
WHERE {
{
?uri rdf:type gwsw:GemengdRiool;
gwsw:hasAspect ?ori.
?ori gwsw:hasAspect ?lijn.
?lijn gwsw:hasValue ?GML.
BIND(REPLACE(STR(?GML),"<gml:LineString xmlns:gml=\"http://www.opengis.net/gml\"><gml:posList srsDimension=\"3\">","") AS ?gml)
BIND(REPLACE(STR(?gml),"</gml:posList></gml:LineString>","")AS ?XYZ)
}
UNION
{
?uri rdf:type gwsw:Hemelwaterriool;
gwsw:hasAspect ?ori.
?ori gwsw:hasAspect ?lijn.
?lijn gwsw:hasValue ?GML.
BIND(REPLACE(STR(?GML),"<gml:LineString xmlns:gml=\"http://www.opengis.net/gml\"><gml:posList srsDimension=\"3\">","") AS ?gml)
BIND(REPLACE(STR(?gml),"</gml:posList></gml:LineString>","")AS ?XYZ)
}
UNION
{
?uri rdf:type gwsw:Vuilwaterriool;
gwsw:hasAspect ?ori.
?ori gwsw:hasAspect ?lijn.
?lijn gwsw:hasValue ?GML.
BIND(REPLACE(STR(?GML),"<gml:LineString xmlns:gml=\"http://www.opengis.net/gml\"><gml:posList srsDimension=\"3\">","") AS ?gml)
BIND(REPLACE(STR(?gml),"</gml:posList></gml:LineString>","")AS ?XYZ)
}
}''')
for gml in gmls:
print(f"{gml.XYZ}")
Since you are using a triple quote for constructing a multi-line string, the backslash-doublequote becomes simply a doublequote.
'''
"<gml:LineString xmlns:gml=\"http://www.opengis.net/gml\"><gml:posList srsDimension=\"3\">"
'''
becomes
'''
"<gml:LineString xmlns:gml="http://www.opengis.net/gml"><gml:posList srsDimension="3">"
'''
which is probably not what you want.
Try to rewrite it as
'''
'<gml:LineString xmlns:gml="http://www.opengis.net/gml"><gml:posList srsDimension="3">'
'''

Printing the broader and narrower concepts against the captured URI REF

I am having difficulty in printing the SKOS broader and narrower concepts against my URIRef (i.e. the output of the SPAQRL query). I want to print the Broader and Narrowers concepts against the captured URI REF (i.e. Biomass). The file which i am parsing does not contains the Broader and Narrowers concepts. I dont know whether i need to manually add them in file before i run queries on them.
I have already seen the similar questions like skos broader and narrow inverse not working but couldnt find the solution.
import rdflib
g = rdflib.Graph()
result = g.parse("C://Users/support/Documents/Re.txt", format=("turtle"))
qres = g.query(
"""
prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT *
WHERE { ?s skos:prefLabel "Biomass"}
""")
for row in qres: print(row)
The output of the query is
for row in qres: print(row)
(rdflib.term.URIRef('http://aims.fao.org/aos/agrovoc/c_926'),)
I have tried by nesting the SELECT Queries but it is not working.
My Query
qres = g.query(
"""
prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT *
WHERE { ?s skos:broader ?o . {
SELECT ?s
WHERE { ?s skos:prefLabel "Biomass" .}
}
""")
for row in qres: print(row)
If you're just struggling with the query, then I think you're overcomplicating it. This should work
qres = g.query(
"""
prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT * WHERE {
?s skos:broader ?o ; skos:prefLabel "Biomass" . }
""")

Same sparql not returning same results

I'm using the same sparql statement using two different clients but both are not returning the same results. The owl file is in rdf syntax and can be accessed here.
This is the sparql statement:
PREFIX wo:<http://purl.org/ontology/wo/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> select ?individual where { ?individual rdf:type wo:Class }
I'm using it using top braid and the following python program:
>>> import rdflib
>>> import rdfextras
>>> rdfextras.registerplugins()
>>> g=rdflib.Graph()
>>> g.parse("index.owl")
<Graph identifier=N39ccd52985014f15b2fea90c3ffaedca (<class 'rdflib.graph.Graph'>)>
>>> PREFIX = "PREFIX wo:<http://purl.org/ontology/wo/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
>>> query = "select ?individual where { ?individual rdf:type wo:Class }"
>>> query = PREFIX + query
>>> result_set = g.query(query)
>>> len(result_set)
0
Which is returning 0
This query constructs a graph containing all the triples in which wo:Class is used as a subject, predicate, or object:
PREFIX wo: <http://purl.org/ontology/wo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
construct { ?s ?p ?o }
where {
{ ?s ?p wo:Class . bind( wo:Class as ?o ) } union
{ ?s wo:Class ?o . bind( wo:Class as ?p ) } union
{ wo:Class ?p ?o . bind( wo:Class as ?s ) }
}
I made a local copy of your data and the results I get are (in Turtle):
#prefix vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix wo: <http://purl.org/ontology/wo/> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
wo:Class a owl:Class ;
rdfs:comment "A class is a scientific way to group related organisms together, some examples of classes being jellyfish, reptiles and sea urchins. Classes are big groups and contain within them smaller groupings called orders, families, genera and species."#en ;
rdfs:label "Class"#en ;
rdfs:seeAlso <http://www.bbc.co.uk/nature/class> , <http://en.wikipedia.org/wiki/Class_%28biology%29> ;
rdfs:subClassOf wo:TaxonRank ;
vs:term_status "testing" .
wo:class rdfs:range wo:Class .
There are no individuals of type wo:Class in your data. The result set ought to be empty.

Categories