I am working on a new project with Flask that will heavily read from a redshift database and particularly from materialized views. I'm fairly new to Flask/SQLAchemy/ORMs. I want to abstract the database layer with ORM by using Flask-SQLAlchemy. When i was reading the documents, i noticed SQL Alchemy requires underlying database source to have a primary key. However, i am worried that having materialized view without any primary key will cause a problem.
I found out that there are some workarounds to specify some columns as primary key even when they are not but i'm not sure if that will cause an issue when i perform a join on materialized views. I am sure there might be a workaround for this one as well but i'm thinking if using ORM with workarounds is actually a good idea when most of my operations will be heavy read operations from materialized views. So i have two questions
1)
Is it possible to use SQLAlchemy with Redshift Materialized Views (I wasn't able to find enough resources on this one)
2)
If possible, is it a good idea to use SQLAlchemy or should I stick to raw sql queries with my own logic of postgresql pooling?
Thank you.
P.S: I have no primary keys in redshift but i have dist/sort keys.
References/Links I used:
How to define a table without primary key with SQLAlchemy?
sqlalchemy materialized relationships
i noticed SQL Alchemy requires underlying database source to have a primary key.
This is not true. You can use synthetic primary keys. I am using them with TimescaleDB hypertables that do not have single-column primary keys.
Is it possible to use SQLAlchemy with Redshift Materialized Views (I wasn't able to find enough resources on this one)
SQLAlchemy does not care about the underlying database, as long SQL wire protocol and its flavour is compatible (PostgreSQL, MySQL, etc.)
If possible, is it a good idea to use SQLAlchemy or should I stick to raw sql queries with my own logic of postgresql pooling?
Using SQLAlchemy improves the readability your code and then reduces maintenance costs in long term.
Related
We have our infrastructure up in AWS, which includes a database.
Our transfer of data occurs in Python using SQLAlchemy ORM, which we use to mimic the database schema. At this point it's very simple so it's no big deal.
But if the schema changes/grows, then a manual change needs to be done in the code as well each time.
I was wondering: what is the proper way to centralize the schema of the database, so that there is one source of truth for it?
Check out the Glue Schema Registry - this is pretty much what it's made for.
I have a bigquery table about 200 rows, i need to insert,delete and update values in this through a web interface(the table cannot be migrated to any other relational or non-relational database).
The web application will be deployed in google-cloud on app-engine and the user who acts as admin and owner privileges on Bigquery will be able to create and delete records and the other users with view permissions on the dataset in bigquery will be able to view records only.
I am planning to use the scripting language as python,
server(django or flask or any other)-> not sure which one is better
The web application should be displayed as a data-grid like appearance with buttons create,delete or view visiblility according to their roles.
I have not done anything like this in python,bigquery and django. I am already familiar with calling bigquery from python-client but to call in a web interface and in a transactional way, i am totally new.
I am seeing examples only related to django with their inbuilt model and not with big-query.
Can anyone please help me and clarify whether this is possible to implement and how?
I was able to achieve all of "C R U D" on Bigquery with the help of SQLAlchemy, though I had make a lot of concessions like if i use sqlalchemy class i needed to use a false primary key as Bigquery does not use any primary key and for storing sessions i needed to use file based session On Django for updates and create sqlalchemy does not allow without primary key, so i used raw sql part of SqlAlchemy. Thanks to the #mhawke who provided the hint for me to carry out this exericse
No, at most you could achieve the "R" of "CRUD." BigQuery isn't a transactional database, it's for querying vast amounts of data and preparing the results as an immutable view.
It doesn't provide a method to modify the source data directly and even if you did you'd need to run the query again. Also important to note are that queries are asynchronous and require much longer to perform than traditional databases.
The only reasonable solution would be to export the table data to GCS and then import it into a normal database for querying. Alternatively if you can't use another database and since you said there are only 1,000 rows you could perform your CRUD actions directly on that exported CSV.
What are the pros and cons of manually creating an ORM for an existing database vs using database reflection?
I'm writing some code using SQLAlchemy to access a pre-existing database. I know I can use sqlalchemy.ext.automap to automagically reflect the schema and create the mappings.
However, I'm wondering if there is any significant benefit of manually creating the mapping classes vs letting the automap do it's magic.
If there is significant benefit, can SQLAlchemy auto-generate the python mapping classes like Django's inspectdb? That would make creating all of the declarative base mappings much faster, as I'd only have to verify and tweak rather than write from scratch.
Edit:
As #iuridiniz says below, there are a few solutions that mimic Django's inspectdb. See Is there a Django's inspectdb equivalent for SQLAlchemy?. The answers in that thread are not Python3 compatible, so look into sqlacodegen or flask-sqlacodegen if you're looking for something that's actually maintained.
I see a lot of tables that were created with: CREATE TABLE suppliers
AS (SELECT * FROM companies WHERE 1 = 2 );, (a poor man's table copy), which will have no primary keys. If existing tables don't have primary keys, you'll have to constantly catch exceptions and feed Column objects into the mapper. If you've got column objects handy, you're already halfway to writing your own ORM layer. If you just complete the ORM, you won't have to worry about whether tables have primary keys set.
I've been learning python by building a webapp on google app engine over the past five or six months. I also just finished taking a databases class this semester where I learned about views, and their benefits.
Is there an equivalent with the GAE datastore using python?
Read-only views (the most common type) are basically queries against one or more tables to present the illusion of new tables. If you took a college-level database course, you probably learned about relational databases, and I'm guessing you're looking for something like relational views.
The short answer is No.
The GAE datastore is non-relational. It doesn't have tables. It's essentially a very large distributed hash table that uses composite keys to present the (very useful) illusion of Entities, which are easy at first glance to mistake for rows in a relational database.
The longer answer depends on what you'd do if you had a view.
First of all answer to your question: With normal GAE, i.e. non relational DB GAE, you won't have such things as views
Since you are probably starting with Relational SQL in school, I would suggest switch to Relational SQL based GAE at http://code.google.com/apis/sql/ and http://code.google.com/apis/sql/docs/before_you_begin.html#enroll ( I am not sure if it's available right away, or you need to wait for approval to use an instance, but register right away)
Web based applications are using emerging NON relational DBs and you would be benefited by studying them as well. That way you also could understand GAE non relational better. As a basic level start at http://en.wikipedia.org/wiki/NoSQL and then you have many more to explore, specially famous once being Mongo DB, Amazon Simple DB etc.
I am not very familiar with databases, and so I do not know how to partition a table using SQLAlchemy.
Your help would be greatly appreciated.
There are two kinds of partitioning: Vertical Partitioning and Horizontal Partitioning.
From the docs:
Vertical Partitioning
Vertical partitioning places different
kinds of objects, or different tables,
across multiple databases:
engine1 = create_engine('postgres://db1')
engine2 = create_engine('postgres://db2')
Session = sessionmaker(twophase=True)
# bind User operations to engine 1, Account operations to engine 2
Session.configure(binds={User:engine1, Account:engine2})
session = Session()
Horizontal Partitioning
Horizontal partitioning partitions the
rows of a single table (or a set of
tables) across multiple databases.
See the “sharding” example in
attribute_shard.py
Just ask if you need more information on those, preferably providing more information about what you want to do.
It's quite an advanced subject for somebody not familiar with databases, but try Essential SQLAlchemy (you can read the key parts on Google Book Search -- p 122 to 124; the example on p. 125-126 is not freely readable online, so you'd have to purchase the book or read it on commercial services such as O'Reilly's Safari -- maybe on a free trial -- if you want to read the example).
Perhaps you can get better answers if you mention whether you're talking about vertical or horizontal partitioning, why you need partitioning, and what underlying database engines you are considering for the purpose.
Automatic partitioning is a very database engine specific concept and SQLAlchemy doesn't provide any generic tools to manage partitioning. Mostly because it wouldn't provide anything really useful while being another API to learn. If you want to do database level partitioning then do the CREATE TABLE statements using custom Oracle DDL statements (see Oracle documentation how to create partitioned tables and migrate data to them). You can use a partitioned table in SQLAlchemy just like you would use a normal table, you just need the table declaration so that SQLAlchemy knows what to query. You can reflect the definition from the database, or just duplicate the table declaration in SQLAlchemy code.
Very large datasets are usually time-based, with older data becoming read-only or read-mostly and queries usually only look at data from a time interval. If that describes your data, you should probably partition your data using the date field.
There's also application level partitioning, or sharding, where you use your application to split data across different database instances. This isn't all that popular in the Oracle world due to the exorbitant pricing models. If you do want to use sharding, then look at SQLAlchemy documentation and examples for that, for how SQLAlchemy can support you in that, but be aware that application level sharding will affect how you need to build your application code.