Similar to Is there a query language for JSON? and the more specific How can I filter a YAML dataset with an attribute value? - I would like to:
hand-edit small amounts data in YAML files
perform arbitrary queries on the complete dataset (probably in Python, open to other ideas)
work with the resulting subset in Python
It doesn't appear that PyYAML has a feature like this, and today I can't find the link I had to the YQuery language, which wasn't a mature project anyway (or maybe I dreamt it).
Is there a (Python) library that offers YAML queries? If not, is there a Pythonic way to "query" a set of objects other than just iterating over them?
I don't thing there is a direct way to do it. But PyYAML reads yaml files into a dict representing everything in the file. Afterwards you can execute all dict related operations. The question python query keys in a dictionary based on values mentions some pythonic "query" styles.
bootalchemy provides a means to do this via SQLAlchemy. First, define your schema in a SQLAlchemy model. Then load your YAML into a SQLAlchemy session using bootalchemy. Finally, perform queries on that session. (You don't have to commit the session to an actual database.)
Example from the PyPI page (assume model is already defined):
from bootalchemy.loader import Loader
# (this simulates loading the data from YAML)
data = [{'Genre':[{'name': "action",
'description':'Car chases, guns and violence.'
}
]
}
]
# load YAML data into session using pre-defined model
loader = Loader(model)
loader.from_list(session, data)
# query SQLAlchemy session
genres = session.query(Genre).all()
# print results
print [(genre.name, genre.description) for genre in genres]
Output:
[('action', 'Car chases, guns and violence.')]
You could try to use jsonpath? Yes, that's meant for json, not yaml, but as long as you have json-compatible datastructures, this should work, because you're working on the parsed data, not on the json or yaml represention? (seems to work with the python libraries jsonpath and jsonpath-rw)
You can check the following tools:
yq for CLI queries, like with jq,
yaml-query another CLI query tool written in Python.
Related
I'm looking for the most effective way of parallel loading Google Analytics data, which is represented in JSON files with nested objects structure, into Relational Database, in order to collect and analyze this statistics later.
I've found pandas.io.json.json_normalize which can flatten nested data into flat structure, also there is a pyspark solution with converting json to dataframe as described here, but not sure about performance issues.
Can you describe best ways of loading data from Google Analytics API into RDBMS?
I think this answer can be best answered when we have more context about what data you want to consume and how you'll be consuming them. For example, if you would be consuming only few of the all fields available - then it make sense to store only those, or if you'll be using some specific field as index then maybe we can index that field also.
One thing that I can recall from on top my head is JSON type of Postgres, as it's inbuilt and have several helper methods to do operation later on.
References :
https://www.postgresql.org/docs/9.3/datatype-json.html
https://www.postgresql.org/docs/9.3/functions-json.html
If you can update here what decision you take - it would be great to know.
I already have an owl ontology which contains classes, instances and object properties. How can I map them to a relational data base such as MYSQL using a Python as a programming language(I prefer Python) ?
For example, an ontology can contains the classes: "Country and city" and instances like: "United states and NYC".
So I need manage to store them in relational data bases' tables. I would like to know if there is some Python libraries to so.
If I understand you well, I think you could use SQLite with python. SQLite is great because you have just to import the library with :
import sqlite3
And then, there is no need for a server. Things are stored in a file, gerenaly ending with : .db
Have a look at the doc, the example are helpful : https://docs.python.org/2/library/sqlite3.html
EDIT : To review or create your database and tables, I advise you tu use sqlitebrowser, which is light and easy to use : http://sqlitebrowser.org/
Use the right tool for the job. You're using RDF, that it's OWL axioms is immaterial, and you want to store and query it. Use an RDF database. They're optimized for storing and querying RDF. It's a waste of your time to homegrow storage & query in MySQL when other folks have already figured out how best to do this.
As an aside, there is a way to map RDF to a relational database. There's a formal specification for this, it's called R2RML.
I would like to give my users the possibility to store unstructured data in JSON-Format, alongside the structured data, via an API generated with Ramses.
Since the data is made available via Elasticsearch, I try to achieve that this data is indexed and searchable, too.
I can't find any mentioning in the docs or searching.
Would this be possible and how would one do it?
Cheers /Carsten
I put an answer here because needed to give a several docs links and this is a new SO account limited to a couple: https://gitter.im/ramses-tech/ramses?at=56bc0c7a4dfe1fa71ffc0b61
This is Chrisses answer, copied from gitter.im:
You can use the dict field type for "unstructured data", as it takes arbitrary json. If the db engine is postgres, it uses jsonfield under the hood, and if the db engine is mongo, it's converted to a bson document as usual. Either way it should index automatically as expected in ES and will be queryable through the Ramses API.
The following ES queries are supported on documents/fields: nefertari-readthedocs-org/en/stable/making_requests.html#query-syntax-for-elasticsearch
See the docs for field types here, start at the high level (ramses) and it should "just work", but you can see what the code is mapped to at each level below down to the db if desired:
ramses: ramses-readthedocs-org/en/stable/fields.html
nefertari (underlying web framework): nefertari-readthedocs-org/en/stable/models.html#wrapper-api
nefertari-sqla (postgres-specific engine): nefertari-sqla-readthedocs-org/en/stable/fields.html
nefertari-mongodb (mongo-specific engine): nefertari-mongodb-readthedocs-org/en/stable/fields.html
Let us know how that works out, sounds like it could be a useful thing. So far we've just used that field type to hold data like user settings that the frontend wants to persist but for which the API isn't concerned.
I know in javascript there is the stringify command, but is there something like this in python for pyramid applications? Right now I have a view callable that takes an uploaded stl file and parses it into a format as such. data= [[[x1,x2,x3],...],[[v1,v2,v3],...]] How can I convert this into a JSON string so that it can be stored in an SQLite database? Can I insert the javascript stringify command into my views.py file? Is there an easier way to do this?
You can use the json module to do this:
import json
data_str = json.dumps(data)
There are other array representations that can be stored in a database as well (see pickle).
However, if you're actually constructing a database, you should know that's it's considered a violation of basic database principles (first normal form) to store multiple data in a single value in a relational database. What you should do is decompose the array into rows (and possibly separate tables) and store a single value in each "cell". That will allow you to query and analyze the data using SQL.
If you're not trying to build an actual database (if the array is completely opaque to your application and you'll never want to search, sort, aggregate, or report by the values inside the array) you don't need to worry so much about normal form but you may also find that you don't need the overhead of an SQL database.
you also can use cjson, it is faster than json library.
import cjson
json_str = cjson.encode(your_string)
I developed a web platform in PHP a year ago, and I was kinda proud of the data access layer I wrote for it. Since then, I started re-using the same concept over and over. But now I'm thinking to take it to the next level, instead of re-writing the whole database access code I'd like to create a tool that will parse my SQL schema and generate the DAL classes by itself.
The information needed from the SQL schema in order to generate the code is:
* Tables
* Fields
* Fields types
* Foreign keys
Indeed, I looked up for some SQL parser and found some stuff but I ended up by deciding to do this differently. Instead of generating the code from the SQL schema itself, I'd generate it from a meta data that I'd create according to the database real schema.
I thought of something like:
TableName[
FieldA : Type;
FieldB: Type;
]
TableName2[
FieldA : Type, FK(TableName.FieldA);
FieldZ: Type;
]
This is not a spec at all, it's just a quick thinking result that says what kind of stuff I'd like to achieve.
The question now is:
Does python have some built-in API, or maybe some 3rd party library I could use to parse some format that'd let me define my schema as stated above?
I don't want to reinvent the wheel, and I'm not interested at all in writing my own parser, all I want is getting a basic and working tool ASAP.
Thanks
The immitiate thought would be to simply use regular python syntax to define your tables:
{
'TableName': {'FieldA': ['Type', FK(..)], 'FieldB': ['type']}
}
and so on.
You could however have a look at how django does it: you define a class and add properties to that class, which will then represent your model. This model can then be used to generate the SQL statements, and is also valid - and easily extendable - Python code.
Other suggestions could be to use a JSON structure to represent your data, and then write some code to parse that one. This would be similar to using the existing python syntax, but would be easier to parse in other languages (the example given above would be almost valid JSON syntax out of the box (replace ' with ").