CKAN - Different datasets

CKAN - Different datasets - python

I am starting to get involved with CKAN. Until now I have done some of the tutorials and currently I am installing some of the available extensions.
Does anybody know if there is any other extension for customizing metadata datasets fields according differences between datasources?
In example:
Uploading text files or documents like PDF: only I want 5 concrete
metadata fields to be requested
Uploading CSV files with Coordinates Fields (georeferenced): I want
10 fields requested metadata fields. These fields could be different
fields than PDF's ones.
In fact, I would like to add a new page where the user could specify first the tipology of the datasource and then the application could request for those fields which are necesary to be requested.
I have seen how to customize In the tutorial a schema with some extra metadata fields but I don't know how to work with different metadata schemas. And also this extension could be useful for customizing dataset fields.
But, does someone have any idea about how to have different schemas depending of the type of a dataset?
Thanks for helping me :)
Jordi.

I think with the ckan-scheming extension you get everything you want.
As you can see in their documentation, you can specify different schemas according to your needs:
Camel
Standard dataset
Feel free to create your own, customized schema, with exactly the fields that you need.
Once you have your schema (in fact you want to create two different ones, one for the text files and one for the georeferenced CSVs), you can simply use the generated form to enter those specific types of datasets.
The important bit here is, that you specify a new type of dataset in the schema, e.g. {"dataset_type": "my-custom-text-dataset",}. If everything is configured as it should be, you can find and add your datasets here: http://my-ckan-instance.com/my-custom-text-dataset

Related

How to inherit DBT source documentation in the models

I have multiple DBT models that read from the same source database, and despite having written documentation (descriptions, tests, etc) for the source database, none of the columns descriptions carry over to the models.
I'd like the descriptions I wrote for each of the source columns to be copied to their respective columns in each model.
Is this possible?
Serving docs with descriptions written for the columns in the source table produced a properly documented source, but none of the documentation appeared on the models based on that source.
I was hoping I wouldn't have to make duplicate descriptions for the columns reused between multiple models

Django data quality monitoring: best approach to 'construct' queries based on a settings file?

EDIT 28/04/2021
Have no chances with my question ; )
No one deals with this problem?
I want to develop an app for automatic data quality monitoring (DQM) based on different queries:
missing data
outliers
inconsitent data in the same models
incosistent data between models
missing form
User should be able to 'customize' its own queries in a structured settings files (in a excel file loaded in a Django model or directly in a Django model with UI interface).
DQM app should read this setting file and run queries with results stored in a model.
User can edit list of query result and make correction in database in order to resolve and improve data quality.
I look for Django package that could already do that but could not find any. So I would appreciate some help in conception.
I have designed the settings files as below :
I've read about data quality with Pandas bt nothing that cover all data quality queries mentionned above.
Nevertheless, Pandas could be used to read excel settings file using dataframe.
Then, I need to 'construct' query based on the setting file. I thought about 2 options:
use raw: concatenate SQL orders with data read from dataframe and passed SQL order in raw()
use Django queryset with unpack dictionnary to provide keyword arguments : qs = mymodel.objects.filter(**kwargs)
Is there a more clean way to achieve data quality?

How to initialize django objects automatically for the first time?

This is my first django application and I looked all over the place to find an answer, to no avail.
I created my models and I know need to to initialize the values to one of the classes. I could do it using the admin page, one by one, but I want anyone using my application to be able to just load the application for the first time to have all the correct objects (and associated records in the database) to be created automatically.
Please help

If you want to populate database check the wiki for initial data. You can use JSON, XML or YAML (with PyYAML installed). I think you are looking for this as your question is not that clear.

Where to put large configuration data for a general purpose Django app

Tl;dr where do Django users normally put fairly bulky configuration data which is completely site-specific? (so there is no possibility of default values being sensible).
Longer version. I'm most of the way through writing a django app for loading or updating models in bulk from data in CSV files. It is general, in that it can load or update any set of fields of any model (provided the configuration permits!), with the user being able to indicate which columns of the csv correspond to what fields. At present it is accessed through what might become a large number of entries in urls.py, of the form
url(r'^loadcsv/customers/',
SpreadsheetLoadView( config={
"model":'customer.Customer', "fields":('name','country',...),
"otherstuff": ...
}).view()
In passing, if you are wondering about 'customer.Customer' look up django.apps and apps.get_model
Anyway, this is less than ideal, because it requires one to know what is the right URL for any particular upload configuration, or alternatively also maintaining an html file containing links to the entire set of currently available operations. I can see myself making a mistake late at night, invoking "customerload" when I meant "customerxxxx" which does something different but also to Customer objects. One is also advised to keep logic out of urlconfs, although whether configuration data like this is "logic" is debatable. Anyway, It would be much nicer to have a single url pointing at a view which handled all currently available csv operations via a configuration structure like
config = {
"customerload": {
"verbose_name": 'load new Customer definitions',
"model": 'customer.Customer',
"fields": ('name','country',...),
"otherstuff": ...
},
"someotherload": { ...
},
...
}
then the first screen would involve selecting a configuration and a csv file, and the next screen will specify the mapping of columns to fields.
Where best to put this? There are no sensible default values for this stuff. It's something that will be very site-specific and will doubtless be subject to fairly frequent reconfiguration. Should it be stored at spreadsheetload/config.py and imported into spreadsheetload/SpreadsheetLoadView.py? Or is there some other place and/or form conventionally user for bulky site-specific configuration?

Export PDF with pre-filled and unfilled editable fields

I am trying to print out custom commercial invoices based on a known set of data and allowing for an unknown set of data. My known data includes addresses, general contact information, etc.
What I want is to be able to use a PDF template of a "commercial invoice" and have the known data auto-populated into the form where available. Then, the user can download the (incomplete) PDF and fill in the empty / optional form fields using their own collection of information - stuff like Tax ID, recipient care-of names, internal tracking ids, etc.
How can I use JSON / XML + python + HTML + a PDF template to auto-fill some info and leave some info empty, on an editable PDF form?
Thanks!

You essentially want server-side filling of the form.
There are several possible approaches.
An industry-strength approach would be using a dedicated application which could be called via command line (FDFMerge by Appligent comes to my mind, which is very easy to integrate, as all you'd have to do is to assemble the FDF data, and then the command string).
Another approach is to use one of the PDF creating libraries out there (iText, pdflib or Adobe's PDF Library come to my mind here). In this case, you have considerably more programming effort, but may have somewhat more flexibility.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.