Loading Dbpedia data and ontologies into virtuoso locally - python

I am trying to set up dbpedia sparql endpoint locally with the help of virtuoso triple store.
I followed two links.
Loading data with the help of folders
Loading data with the help of symbolic links
from these links. I followed the configurations according to the second link and I tried to load the data only from "en" folder and dbpedia-owl.owl file into "en" folder itself. I tried to load this en folder with the following command onto isql
ld_dir_all('/media/D8849AB0849A911C/datasets/en','*','http://dbpedia.org');
I did the further processing for committing this data. Then checked the data onto the local endpoint "localhost:8890/sparql". But the prefix "dbpedia-owl" seems to be missing. I also checked into the list of "namespace prefixes". but "dbpedia-owl" seems to be missing. What did i do wrong while loading the data? Also I tried to add dbpedia-owl.gz file too. But the "dbpedia-owl" still doesnt seems to work on the endpoint.
when i tried to query this
select ?type {
?type a owl:Class .
} LIMIT 5
I got the result as
type
http://www.w3.org/2002/07/owl#Thing
http://www.w3.org/2002/07/owl#Nothing
http://dbpedia.org/ontology/Abbey
http://dbpedia.org/ontology/Abbey
http://dbpedia.org/ontology/AcademicJournal
So this result shows the data from ontology file. But the "dbpedia-owl" is not getting linked to this ontology file. Help is appreciated.

This is a very late answer, but I stumbled across this question...
As far as I know, you loaded the ontology into virtuoso (so the classes and properties definitions are available in the DB), but this is different from defining a prefix and associating it with an URL.
If you want to do the later programmatically, just use :
DB.DBA.XML_SET_NS_DECL ('dbpedia-owl', 'http://dbpedia.org/ontology/', 2);
This just tell to virtuoso that, locally, the dbpedia-owl prefix will be used to denote the dbpedia ontology URL. There is now such thing as an universal prefix, hence you may also want to use any other prefix like dbpo or whatever you see fit on your local virtuoso server.

Related

How can I create a downloadable VrCard on my flask web page?

I would like to create a vcf file on my website that users can download and add the file info to their contacts on their mobile phones.
So far I have made this:
Download
When I click the link it downloads a vcf file. When I open it, it redirects me to my contacts app and throws me this error: "No importable cards were found." That´s because I haven´t set any information in any VCard. I would like to know how can I set/create a VCard with the information I have in my SQLAlchemy database (name, email,phonenumber,website,etc.) Thanks in advance
I had to solve this problem recently for work. Here is how I did it!
The broad strokes: Created a Jinja2 template based on my team's needs for vcard output, a data model to lay over the template, a service to render the template from the database query, and finally, using io.BytesIO and flask.send_file to transmit the data in file format to the user.
The gist linked above doesn't have the more contextual parts of the implementation, but does provide an example of how to wire up flask to do this.
Edit: I evaluated the vobject library that i found recommended, but I honestly didn't think it was intuitive to use or very pythonic, meaning it wasn't something I wanted to depend on in my code base. However, maybe it'll work better for you (or others).

SPARQL query on multiple RDF files

I have some basics of programming, but I am completely new to RDF or Sparql, so I hope to be clear in what follows.
I am trying to download some data available at http://data.camera.it/data/en/datasets/, and all the data are organized in rdf-xml format, in an ontology.
I noticed this website has a SPARQL Query Editor online (http://dati.camera.it/sparql), and using some of their examples I was able to retrieve and convert some of the data I need using Python. I used the following code and query, using SparqlWrapper
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dati.camera.it/sparql")
sparql.setQuery(
'''
SELECT distinct ?deputatoId ?cognome ?nome ?data ?argomento titoloSeduta ?testo
WHERE {
?dibattito a ocd:dibattito; ocd:rif_leg <http://dati.camera.it/ocd/legislatura.rdf/repubblica_17>.
?dibattito ocd:rif_discussione ?discussione.
?discussione ocd:rif_seduta ?seduta.
?seduta dc:date ?data; dc:title ?titoloSeduta.
?seduta ocd:rif_assemblea ?assemblea.
?discussione rdfs:label ?argomento.
?discussione ocd:rif_intervento ?intervento.
?intervento ocd:rif_deputato ?deputatoId; dc:relation ?testo.
?deputatoId foaf:firstName ?nome; foaf:surname ?cognome .
}
ORDER BY ?data ?cognome ?nome
LIMIT 100
'''
)
sparql.setReturnFormat(JSON)
results_raw = sparql.query().convert()
However, I have a problem because the website allows only to download 10,000 values. As far as I understood, this limit cannot be modified.
Therefore I decided to download the datasets on my computer. I tried to work on all these rdf files, but I don't know how to do it, since, as far as I know, the SparqlWrapper does not work with local files.
So my questions are:
How do I create a dataset containing all the RDF files so that I can work on them as if it were a single object?
How do I query on such an object to retrieve the information I need? Is that possible?
Is this way of reasoning the right approach?
Any suggestion on how to tackle the problem is appreciated.
Thank you!
Download all the RDF/XML files from their download area, and load them into a local instance of Virtuoso (which happens to be the engine they are using for their public SPARQL endpoint). You will have the advantage of running a much more recent version (v7.2.5.1 or later), whether Open Source or Enterprise Edition than the one they've got (Open Source v7.1.0, from March, 2014!).
Use your new local SPARQL endpoint, found at http://localhost:8890/sparql by default. You can configure it to have no limits on result set sizes, or query runtimes, or otherwise.
Seems likely.
(P.S. You might encourage the folks at dati.camera.it (assistenza-dati#camera.it) to upgrade their Virtuoso instance. There are substantial performance and feature enhancements awaiting!)

Loading data into Biqquery Partitioned table through Google Dataflow/Beam with write_truncate

So the existing setup we had use to create a new table for each day, which worked fine with "WRITE_TRUNCATE" option, however when we updated our code to use partitioned table, though our dataflow job, it wouldn`t work with write_truncate.
It works perfectly fine, with write disposition set as "WRITE_APPEND" (From what i understood, from beam, it maybe tries to delete the table, and then recreate it), since i`m supplying the table decorator it fails to create a new table.
Sample snippet using python code:
beam.io.Write('Write({})'.format(date), beam.io.BigQuerySink(output_table_name + '$' + date, create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER, write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
which gives the error:
Table IDs must be alphanumeric
since it tried to recreate the table, and we supply the partition decorator in the argument.
Here are some of the things that i`v tried:
Updating the write_disposition as WRITE_APPEND, although it works, it fails the purpose, since running for the same date again would duplicate data.
Using
bq --apilog /tmp/log.txt load --replace --source_format=NEWLINE_DELIMITED_JSON 'table.$20160101' sample_json.json
command, to see if i can observe any logs, on how does truncate actually works, based on the link that i found.
Tried some other links, but this as well uses WRITE_APPEND.
Is there a way to write to a partitioned table, from a dataflow job using write_truncate method?
Let me know if any additional details are required.
Thanks
Seems like this is not supported at this time. Credit goes to #Pablo for finding out from the IO dev.
According to the Beam documentation on the Github page, their JIRA page would be the appropriate to request such a feature. I'd recommend filing a feature request there and posting a link in a comment here so that others in the community can follow through and show their support.

The Python script configuration for a Big Query job requires a sourceUri value, but there is no sourceUri

When attempting to write a Python script for a Google BigQuery job. I'm following the configuration guidelines found in the job configuration properties. It indicates the configuration parameter configuration.query.tableDefinitions.(key).sourceUris[] is required. This parameter is described as "The fully-qualified URIs that point to your data in Google Cloud Storage." However, the query I'm submitting runs on a data set within Big Query, not data in Cloud Storage. I've tried leaving the format parameter empty or pointing it to a storage location where I have other tables, but the script still throws an error. Can anyone tell me the proper way to handle this?
The configuration.query.tableDefinitions parameter should be optional. If you are querying only data stored in BigQuery tables, then you should be able to omit the entire tableDefinitions parameter. The sourceUris parameter should only be required if a tableDefinitions object is present.
https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query.tableDefinitions

error in retrieving tables in unicode data using Azure/Python

I'm using Azure and the python SDK.
I'm using Azure's table service API for DB interaction.
I've created a table which contains data in unicode (hebrew for example). Creating tables and setting the data in unicode seems to work fine. I'm able to view the data in the database using Azure Storage Explorer and the data is correct.
The problem is when retrieving the data.. Whenever I retrieve specific row, data retrieval works fine for unicoded data:
table_service.get_entity("some_table", "partition_key", "row_key")
However, when trying to get a number of records using a filter, an encode exception is thrown for any row that has non-ascii chars in it:
tasks = table_service.query_entities('some_table', "PartitionKey eq 'partition_key'")
Is this a bug on the azure python SDK? Is there a way to set the encoding beforehand so that it won't crash? (azure doesn't give access to sys.setdefaultencoding and using DEFAULT_CHARSET on settings.py doesn't work as well)
I'm using https://www.windowsazure.com/en-us/develop/python/how-to-guides/table-service/ as reference to the table service API
Any idea would be greatly appreciated.
This looks like a bug in the Python library to me. I whipped up a quick fix and submitted a pull request on GitHub: https://github.com/WindowsAzure/azure-sdk-for-python/pull/59.
As a workaround for now, feel free to clone my repo (remembering to checkout the dev branch) and install it via pip install <path-to-repo>/src.
Caveat: I haven't tested my fix very thoroughly, so you may want to wait for the Microsoft folks to take a look at it.

Categories