We have our infrastructure up in AWS, which includes a database.
Our transfer of data occurs in Python using SQLAlchemy ORM, which we use to mimic the database schema. At this point it's very simple so it's no big deal.
But if the schema changes/grows, then a manual change needs to be done in the code as well each time.
I was wondering: what is the proper way to centralize the schema of the database, so that there is one source of truth for it?
Check out the Glue Schema Registry - this is pretty much what it's made for.
Related
I have a bigquery table about 200 rows, i need to insert,delete and update values in this through a web interface(the table cannot be migrated to any other relational or non-relational database).
The web application will be deployed in google-cloud on app-engine and the user who acts as admin and owner privileges on Bigquery will be able to create and delete records and the other users with view permissions on the dataset in bigquery will be able to view records only.
I am planning to use the scripting language as python,
server(django or flask or any other)-> not sure which one is better
The web application should be displayed as a data-grid like appearance with buttons create,delete or view visiblility according to their roles.
I have not done anything like this in python,bigquery and django. I am already familiar with calling bigquery from python-client but to call in a web interface and in a transactional way, i am totally new.
I am seeing examples only related to django with their inbuilt model and not with big-query.
Can anyone please help me and clarify whether this is possible to implement and how?
I was able to achieve all of "C R U D" on Bigquery with the help of SQLAlchemy, though I had make a lot of concessions like if i use sqlalchemy class i needed to use a false primary key as Bigquery does not use any primary key and for storing sessions i needed to use file based session On Django for updates and create sqlalchemy does not allow without primary key, so i used raw sql part of SqlAlchemy. Thanks to the #mhawke who provided the hint for me to carry out this exericse
No, at most you could achieve the "R" of "CRUD." BigQuery isn't a transactional database, it's for querying vast amounts of data and preparing the results as an immutable view.
It doesn't provide a method to modify the source data directly and even if you did you'd need to run the query again. Also important to note are that queries are asynchronous and require much longer to perform than traditional databases.
The only reasonable solution would be to export the table data to GCS and then import it into a normal database for querying. Alternatively if you can't use another database and since you said there are only 1,000 rows you could perform your CRUD actions directly on that exported CSV.
I've been working on a Flask app for a while, using SQLAlchemy to access a MySQL database. I've finally started looking into writing tests for this (I'm a strong believer in testing, but am new to Flask and SQLA and Python for that matter, so delayed this), and am having a problem getting my structure set up.
My production database isn't using any unusual MySQL features, and in other languages/frameworks I've been able to set up a test framework using an in-memory SQLite database. (For example, I have a Perl app using DBIx::Class to run a SQL Server db but with a test suite built on SQLite.) However, with SQLAlchemy I've needed to declare a few specific MySQL things in my model, and I'm not sure how to get around this. In particular, I use TINYINT and CHAR types for a few columns, and I seem to have to import these from sqlalchemy.dialects.mysql, since these aren't generic types in SQLA. Thus I'll have a class declaration like:
class Item(db.Model):
...
size = db.Column(TINYINT, db.ForeignKey('size.size_id'), nullable=False)
So even though if I were using raw SQL, I could use TINYINT with SQLite or MySQL and it would work fine, here, it's coming from the mysql dialect class.
I don't want to override my entire model class in order to cover seemingly trivial things like this. Is there some other solution? I've read what I could about using different databases for testing and production, but this issue hasn't been mentioned. It would be a lot easier to use an in-memory SQLite db for testing, instead of having to have a MySQL test database available for everything.
I'm updating from an ancient language to Django. I want to keep the data from the old project into the new.
But old project is mySQL. And I'm currently using SQLite3 in dev mode. But read that postgreSQL is most capable. So first question is: Is it better to set up postgreSQL while in development. Or is it an easy transition to postgreSQL from SQLite3?
And for the data in the old project. I am bumping up the table structure from the old mySQL structure. Since it got many relation db's. And this is handled internally with foreignkey and manytomany in SQLite3 (same in postgreSQL I guess).
So I'm thinking about how to transfer the data. It's not really much data. Maybe 3-5.000 rows.
Problem is that I don't want to have same table structure. So a import would be a terrible idea. I want to have the sweet functionality provided by SQLite3/postgreSQL.
One idea I had was to join all the data and create a nested json for each post. And then define into what table so the relations are kept.
But this is just my guessing. So I'm asking you if there is a proper way to do this?
Thanks!
better create the postgres database. write down the python script which take the data from the mysql database and import in postgres database.
I'm completely new to managing data using databases so I hope my question is not too stupid but I did not find anything related using the title keywords...
I want to setup a SQL database to store computation results; these are performed using a python library. My idea was to use a python ORM like SQLAlchemy or peewee to store the results to a database.
However, the computations are done by several people on many different machines, including some that are not directly connected to internet: it is therefore impossible to simply use one common database.
What would be useful to me would be a way of saving the data in the ORM's format to be able to read it again directly once I transfer the data to a machine where the main database can be accessed.
To summarize, I want to do:
On the 1st machine: Python data -> ORM object -> ORM.fileformat
After transfer on a connected machine: ORM.fileformat -> ORM object -> SQL database
Would anyone know if existing ORMs offer that kind of feature?
Is there a reason why some of the machine cannot be connected to the internet?
If you really can't, what I would do is setup a database and the Python app on each machine where data is collected/generated. Have each machine use the app to store into its own local database and then later you can create a dump of each database from each machine and import those results into one database.
Not the ideal solution but it will work.
Ok,
thanks to MAhsan's and Padraic's answers I was able to find the how this can be done: the CSV format is indeed easy to use for import/export from a database.
Here are examples for SQLAlchemy (import 1, import 2, and export) and peewee
Are there database testing tools for python (like sqlunit)? I want to test the DAL that is built using sqlalchemy
Follow the design pattern that Django uses.
Create a disposable copy of the database. Use SQLite3 in-memory, for example.
Create the database using the SQLAlchemy table and index definitions. This should be a fairly trivial exercise.
Load the test data fixture into the database.
Run your unit test case in a database with a known, defined state.
Dispose of the database.
If you use SQLite3 in-memory, this procedure can be reasonably fast.