How can I properly serialize wikidata SPARQL queries answers? - python

I have the following example of querying wikidata via Python's SPARQLWrapper:
import rdflib, urllib
from SPARQLWrapper import SPARQLWrapper, JSON, XML, TURTLE, RDF, N3
from rdflib import Graph, Namespace, URIRef, RDF#, RDFS, Literal
def graph_full(uri, f)
sparql = SPARQLWrapper('https://query.wikidata.org/sparql')
sparql.setQuery('''
PREFIX entity: <http://www.wikidata.org/entity/>
SELECT ?predicate ?object WHERE {
<'''+urllib.unquote(uri).encode("utf8")+'''> ?predicate ?object .
} LIMIT 100
''')
sparql.setReturnFormat(N3)
results = sparql.query().convert()
#print results.serialize()
print type(results)
g = Graph()
g.parse(results)
print g
#g.serialize(f, format="n3")
if __name__ == '__main__':
graph_full("entity:Q76", "wikidata/output.nt")
I want to serialize the result of the SPARQL query and save it to a file. This seems to always throw the following error:
Exception: Unexpected type '<type 'instance'>' for source '<xml.dom.minidom.Document instance at 0x7fa11e3715a8>'
Using similar code against DBpedia SPARQL endpoints throws no erros.

Related

How to compare schema of 2 databases with sqlalchemy-diff - Could not parse rfc1738 URL from string

I have Dev and Prod version of Azure SQL(or SQL Server) databases.
I would like to compare schema part of testing.
I read that sqlalchemy-diff could be useful tool. https://pypi.org/project/sqlalchemy-diff/
However I'm getting error from URL. I wonder what needs to be done?
CODE:
from pprint import pprint
from sqlalchemydiff import compare
DBURI1 = "Server=my-sql-server.database.windows.net,1433;Database=my-dev-
sql-db;UID=myuser;PWD=mypassword;Trusted_Connection=No;"
DBURI2 = "Server=my2-sql-server.database.windows.net,1433;Database=my2-dev-
sql-db;UID=myuser;PWD=mypassword;Trusted_Connection=No;"
result = compare(DBURI1, DBURI2)
if result.is_match:
print('Databases are identical')
else:
print('Databases are different')
pprint(result.errors)
ERROR:
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'Server=my-sql-server.database.windows.net,1433;Database=my-dev-
sql-db;UID=myuser;PWD=mypassword;Trusted_Connection=No;'

SPARQLWrapper : problem in querying an ontology in a local file

I'm working with SPARQLWrapper and I'm following the documentation. Here is my code:
queryString = "SELECT * WHERE { ?s ?p ?o. }"
sparql = SPARQLWrapper("http://example.org/sparql")# I replaced this line with
sparql = SPARQLWrapper("file:///thelocation of my file in my computer")
sparql.setQuery(queryString)
try :
ret = sparql.query()
# ret is a stream with the results in XML, see <http://www.w3.org/TR/rdf-sparql-XMLres/>
except :
deal_with_the_exception()
I'm getting these 2 errors:
1- The system cannot find the path specified
2- NameError: name 'deal_with_the_exception' is not defined
You need a SPARQL endpoint to make it work. Consider setting up Apache Fuseki in your local computer. See https://jena.apache.org/documentation/fuseki2/jena

How to add special categories in sparqlwrapper in python

I am using the following sparql query using sparqlwrapper as follows.
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://live.dbpedia.org/sparql")
sparql.setReturnFormat(JSON)
my_category = 'dbc:Meteorological_concepts'
sparql.setQuery(f" ASK {{ {my_category} skos:broader{{1,3}} dbc:Medicine }} ")
results = sparql.query().convert()
print(results['boolean'])
As mentioned above it works fine with categories that do not have brackets (e.g., dbc:Meteorological_concepts). However, when I enter a category with brackets (i.e my_category = dbc:Elasticity_(physics)) I get the following error.
b"Virtuoso 37000 Error SP030: SPARQL compiler, line 4: syntax error at 'physics' before ')'\n\nSPARQL query:\ndefine sql:big-data-const 0 \n#output-format:application/sparql-results+json\n\n ASK { dbc:Elasticity_(physics) skos:broader{1,3} dbc:Medicine }\n"
CRITICAL: Exiting due to uncaught exception <class 'SPARQLWrapper.SPARQLExceptions.QueryBadFormed'>
Is there a way to resolve this issue.
I am happy to provide more details if needed.
I am rewriting what #StanislavKralin mentioned in the above comment. I always try to use full URL in the SPARQL code, particularly when there is special character in SPARQL query.
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://live.dbpedia.org/sparql")
sparql.setReturnFormat(JSON)
my_category = '<http://dbpedia.org/resource/Category:Elasticity_(physics)>'
sparql.setQuery(f" ASK {{ {my_category} skos:broader{{1,3}} dbc:Medicine }} ")
results = sparql.query().convert()
print(results['boolean'])

python SPARQLWrapper return only 10000 results

I use the SPARQLWrapper module to launch a query to a virtuoso endpoint and get the result.
The query always return a maximum of 10000 results
Here is the python script:
from SPARQLWrapper import SPARQLWrapper, JSON
queryString = """
SELECT DISTINCT ?s
WHERE {
?s ?p ?o .
}
"""
sparql = SPARQLWrapper("http://localhost:8890/sparql")
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
res = sparql.query().convert()
# Parse result
parsed = []
for entry in res['results']['bindings']:
for sparql_variable in entry.keys():
parsed.append({sparql_variable: entry[sparql_variable]['value']})
print('Query return ' + str(len(parsed)) + ' results')
When I lauch the query with
SELECT count(*) AS ?count
I get the right number of triples : 917051.
Why the SPARQLWrapper module limit the number of result to 10000 ?
How do I get all the results ?
The answer is to adjust the Virtuoso configuration file, as documented. Specifically for this case, you need to increase the ResultSetMaxRows in the [SPARQL] stanza.
The limit is not in SPARQLWrapper. You would see the same limit if you did the full SELECT (instead of the COUNT, which only delivers 1 row) through the SPARQL endpoint, Conductor, or any other interface.
The 10000 results is set by the data owner via the item ResultSetMaxRows in the virtuoso.ini, to protect the data.
If not, anyone can use a simple sparql query select * where {?s ?p ?o} to get all the data which may cost the data owner a lot of time and money.

Error when I retrieve data from dbpedia

I try to retrieve data from dbpedia but I get error every time i run the code.
The code in Python is:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject
WHERE { <http://dbpedia.org/resource/Musée_du_Louvre> dcterms:subject ?subject }
""")
# JSON example
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["subject"]["value"]
I believe that I must use a different char for "é" in "Musée_du_Louvre"but I cant figure which.
Thx!
The first problem is that SPARQLWrapper seems to expect its query to be in unicode, but you're passing it an utf-8 encoded string - that's why you get a UnicodeDecoreError. Instead you should pass it a unicode object, either by decoding your utf-8 string
unicode_obj = some_utf8_string.decode('utf-8')
or by using an unicode literal:
unicode_obj = u'Hello World'
Passing it a unicode object avoids that UnicodeDecodeError, but doesn't yield any results. So it looks the dbpedia API expects URLs containing non-ASCII characters to be percent-encoded. Therefore you need to encode the URL beforehand using urllib.quote_plus:
from urllib import quote_plus
encoded_url = quote_plus(url, safe='/:')
With these two changes your code could look like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from SPARQLWrapper import SPARQLWrapper, JSON
from urllib import quote_plus
url = 'http://dbpedia.org/resource/Musée_du_Louvre'
encoded_url = quote_plus(url, safe='/:')
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
query = u"""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject
WHERE { <%s> dcterms:subject ?subject }
""" % encoded_url
sparql.setQuery(query)
# JSON example
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["subject"]["value"]

Categories