Find information for many people in wikidata - python

I have a list of names (hundreds of them) that are already transformed to Q-numbers in wikidata using python. For each Q-number (person) I want to get some basic information such as place_of_birth, nationality, etc.
SELECT DISTINCT ?name ?nameLabel ?genderLabel ?placeofbirth ?nationality (year(?birthdate) as ?birthyear) (year(?deathdate) as ?deathyear)
WHERE
{
?name wdt:P106/wdt:P279* wd:Q1028181 # painter
FILTER (?name IN (wd:Q2674488)) # James Seymour
OPTIONAL { ?name wdt:P569 ?birthdate. }
OPTIONAL { ?name wdt:P27 ?nationality. }
OPTIONAL { ?name wdt:P21 ?gender. }
OPTIONAL { ?name wdt:P19 ?placeofbirth. }
OPTIONAL { ?name wdt:P570 ?deathyear. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Using SPARQL, I can search two or three people at a time by adding Q-numbers into "FILTER", but how can I loop through all Q-numbers in a python list? Thanks a lot!

Related

Fetching triples using SPARQL query from turtle file

I am new to SPARQL and currently struglling to fetch triples from a turtle file.
### https://ontology/1001
<https://ontology/1001> rdf:type owl:Class ;
rdfs:subClassOf <https://ontology/748>;
<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "Injury, neuronal" ,
"Neurotrauma" ;
rdfs:label "Nervous system injury" .
### https://ontology/10021
<https://ontology/10021> rdf:type owl:Class ;
rdfs:subClassOf <https://ontology/2034> ;
rdfs:label "C3 glomerulopathy" .
I am trying to extract all classes with their superclasses, labels and Synonym. The query which I am running is below.
query_id = """
prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?cid ?label ?class ?synonyms
WHERE {
?cid rdfs:label ?label .
?cid rdfs:subClassOf ?class .
?cid oboInOwl:hasExactSynonym ?synonyms .
}
"""
However, this query is filtering the triple where 'hasExactSynonym' doesn't exists.
Following is the output:
cid label class synonyms
1001 Nervous system injury 748 Injury, neuronal , Neurotrauma
The expected output is:
cid label class synonyms
1001 Nervous system injury 748 Injury, neuronal , Neurotrauma
10021 C3 glomerulopathy 2034
You can use OPTIONAL to make the synonyms optional:
WHERE {
?cid rdfs:label ?label .
?cid rdfs:subClassOf ?class .
OPTIONAL { ?cid oboInOwl:hasExactSynonym ?synonyms . }
}

Get second level domain with regexp in elasticsearch

I want to search through an index in my database which is elasticsearch and I want to search for domains contains a second level domain (sld) but it returns me None.
here is what I've done so far:
sld = "smth"
query = client.search(
index = "x",
body = {
"query": {
"regexp": {
"site_domain.keyword": fr"*\.{sld}\.*"
}
}
}
)
EDIT:
I think the problem is with the regex I wrote
any help would be appreciated.
TLDR;
GET /so_regex_url/_search
{
"query": {
"regexp": {
"site_domain": ".*api\\.[a-z]+\\.[a-z]+"
}
}
}
This regex will match api.google.com but won't google.com.
You should watch out for the reserved characters such as .
It require proper escape sequence.
To understand
First let's talk about the pattern your are looking for.
You want to match every url that as a given subdomain.
1. Check the subdomain string exist in the url
Something like .*<subdomain>.* will work. .* means any char in any quantity.
2. Check it is a subdomain
A subdomain in a url looks like <subdomain>.<domain>.<top level domain>
You need to make sure that your subdomain has a . between both domain and top domain
Something like .*<subdomain>.*\.[a-z]+\.[a-z]+ will work [a-z]+
means at least one character between a to z and because . has a special meaning you need to escape it with \
This will match https://<subdomain>.google.com, but won't https://<subdomain>.com
/!\ This is a naive implementation.
https://<subdomain>.1234.com won't match has neither 1, 2 ... exist in [a-z]
3. Create Elastic DSL
I am performing the request on the text field not the keyword, this keep my exemple leaner but work the same way.
GET /so_regex_url/_search
{
"query": {
"regexp": {
"site_domain": ".*api\\.[a-z]+\\.[a-z]+"
}
}
}
You may have noticed the \\ it is explained in the thread it is because the payload travel in a json it also needs to escape that.
4. Python implementation
I imagine it should be
sld = "smth"
query = client.search(
index = "x",
body = {
"query": {
"regexp": {
"site_domain.keyword": `.*{sld}\\.[a-z]+\\.[a-z]+`
}
}
}
)

Get entity name/label from wikidata in python

I have some SPARQL queries to run on wikidata in python and I need to get the name/label of the entity returned instead of URI. For example, given the python snippet below:
from qwikidata.sparql import return_sparql_query_results
query_string = """
select ?ent where { ?ent wdt:P31 wd:Q2637056 . ?ent wdt:P2244 ?obj } ORDER BY DESC(?obj)LIMIT 5
"""
res = return_sparql_query_results(query_string)
for row in res["results"]["bindings"]:
print(row["ent"]["value"])
The queries in the original form return URIs, but I need to get the entity label/name. How can I do that in python?
The current output of the query:
http://www.wikidata.org/entity/Q841796
http://www.wikidata.org/entity/Q780047
NOTE: I don't have real access to the queries, therefore I can't rewrite the queries.
My comment was too long so i am posting an answer.
You'll need to rewrite the queries. Please find below an example how to get labels without using the label service.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?country ?countryLabel
WHERE
{
# instance of country
?country wdt:P31 wd:Q3624078.
OPTIONAL {
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
}
}
ORDER BY ?countryLabel
try it!
Adapted for your Soyuz-T example:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?ent ?entLabel
WHERE
{
# instance of Soyuz-T https://www.wikidata.org/wiki/Q2637056
?ent wdt:P31 wd:Q2637056 .
# https://www.wikidata.org/wiki/Property:P2244 periapsis
?ent wdt:P2244 ?obj
OPTIONAL {
?ent rdfs:label ?entLabel filter (lang(?entLabel) = "en").
}
} ORDER BY DESC(?obj)LIMIT 5
try it!
Result:
ent entLabel
wd:Q841796 Soyuz T-15
wd:Q780047 Soyuz T-8

How can I retrieve the wikipageID if the page is not in english using sparql and dbpedia

I want to retrieve the wikipageID for the same query name in different languages. For example:
select * where { <http://dbpedia.org/resource/Mike_Quigley_(footballer)> dbpedia-owl:wikiPageID ?wikiID }
====>Mike_Quigley_(footballer) 17237449 en
select * where { <http://dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts 6831454 en
select * where { <http://de.dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts de
select * where { <http://fr.dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts fr
select * where { <http://it.dbpedia.org/resource/Theodore_Roberts> dbpedia-owl:wikiPageID ?wikiID }
====>Theodore_Roberts it
select * where { <http://ja.dbpedia.org/resource/セオドア・ロバーツ> dbpedia-owl:wikiPageID ?wikiID }
====>セオドア・ロバーツ ja
In the first query Mike_Quigley_(footballer)which is in english I was able to retrieve its ID = 17237449but when the language changes as you can see I cannot retrieve the wikipageIDs.
How can I retrieve the ids of these pages
The more complex part is that the following link German language Theodore_Roberts will lead me to a page where the property wikipageID is not dbpedia-owl It gets really complicated.
Do you have any idea on how to solve it?
Amazing, when I try this query
SELECT ?uri ?id
WHERE {
?uri <http://dbpedia.org/ontology/wikiPageID> ?id.
FILTER (?uri = <http://dbpedia.org/resource/Lyon>)
}
I get the following result:
http://dbpedia.org/resource/Lyon 863863
But When I change the uri to <it.dbpedia.org/resource/Lyon> I get nothing.
So if I understand correctly you want to get the Italian version of your URI. But the problem is you are looking for the wrong URI. The Italian version of Lyon URI in English DBpedia is http://it.dbpedia.org/resource/Lione and I am assuming you are using that. I found this out by:
SELECT *
WHERE {
?uri owl:sameAs ?b.
FILTER (?uri = <http://dbpedia.org/resource/Lyon>)
}
And I couldn't get the Italian pageID from the English DBPedia.
When I tried in on the Italian DBpedia, it worked :
SELECT ?uri ?id
WHERE {
?uri <http://dbpedia.org/ontology/wikiPageID> ?id.
FILTER (?uri = <http://it.dbpedia.org/resource/Lyon>)
}
However, if you look at the Italian page http://it.dbpedia.org/resource/Lyon, you can see that it has a property dbpedia-owl:wikiPageRedirects which is equal to http://it.dbpedia.org/resource/Lione that the owl:sameAs gives you. Maybe you can work your way through that.

How to make a sparql query with unicode letters?

I am querying the french dbpedia (http://fr.dbpedia.org/) with SPARQL.
I am using Python and SPARQLWrapper if it makes any difference.
This 1st query is working Ok.
PREFIX dbpp:<http://dbpedia.org/property/>
PREFIX dbpo:<http://dbpedia.org/ontology/>
PREFIX dbpr:<http://dbpedia.org/resource/>
SELECT ?wt ?summary ?source_url
WHERE {
?wt rdfs:label "Concerto"#fr .
OPTIONAL { ?wt dbpedia-owl:abstract ?summary . }
OPTIONAL { ?wt foaf:isPrimaryTopicOf ?source_url . }
filter (lang(?summary) = "fr" )
}
This 2nd query doesn't work.
PREFIX dbpp:<http://dbpedia.org/property/>
PREFIX dbpo:<http://dbpedia.org/ontology/>
PREFIX dbpr:<http://dbpedia.org/resource/>
SELECT ?wt ?summary ?source_url
WHERE {
?wt rdfs:label "Opéra"#fr .
OPTIONAL { ?wt dbpedia-owl:abstract ?summary . }
OPTIONAL { ?wt foaf:isPrimaryTopicOf ?source_url . }
filter (lang(?summary) = "fr" )
}
The only difference is the value of the label. The page http://fr.dbpedia.org/page/Opéra exists in dbpedia and rdfs label is set as "Opéra".
I think that the query doesn't work because it contains the french letter é. I've tried several escaping (Op%C3%A9re, Op\u0233ra, Op\xe9ra) without any success.
Any idea?
The problem is that the FILTER is not made optional. So it doesn't match <http://fr.dbpedia.org/resource/Opéra>, which has no dbpedia-owl:abstract.
PREFIX dbpp: <http://dbpedia.org/property/>
PREFIX dbpo: <http://dbpedia.org/ontology/>
PREFIX dbpr: <http://dbpedia.org/resource/>
SELECT ?wt ?summary ?source_url
WHERE {
?wt rdfs:label "Opéra"#fr .
OPTIONAL { ?wt dbpedia-owl:abstract ?summary .
filter (lang(?summary) = "fr" )
}
OPTIONAL { ?wt foaf:isPrimaryTopicOf ?source_url . }
}
... works (and returns <http://fr.dbpedia.org/resource/Catégorie:Opéra> as well).

Categories