I hold good experience in working with Perl DBI module.The DBI module acts as single API for multiple databases like Oracle, Postgres, etc.
I have recently started working on Python and I noticed that there are separate API for each databases in Python.
Following are my questions:
1. Isn't there single DB API in Python?
2. If not, isn't this a disadvantage in Python?
There is no Python equivalent to Perl's DBI-centric ecosystem. Instead:
The DBAPI (PEP 249) defines a common low-level interface that relational database drivers are expected to provide.
Some projects like SQLAlchemy Core abstract over multiple drivers, using the common DBAPI interface.
Python's lack of a proper DBI equivalent is less of a disadvantage than it would be in Perl due to the different module system. Assuming you are restricting yourself to a common SQL subset and to the DBAPI instead of using driver-specific extensions, switching to a different driver can be as simple as changing an import, and updating the connection information:
- import somedatabase as db
+ import differentdriver as db
In practice, neither Python's DBAPI nor Perl's DBI will enable you to switch databases at a whim. However, Perl's DBI makes it much easier to write software that works with multiple databases.
Related
Is it possible to have a database driver written in pure python that doesn't need an underlying system library/ shared object to connect to a database?
Apologies for the necro-bump, but this still comes up in a google search for pure python drivers. So:
Implementing a database driver in pure python is conceptually quite straight forward, but only if you have the wire protocol it uses documented. Then you (just) write a handler for each type of message to and from the database server in byte format and away you go. The devil is in the detail of course and that's why you have to have the protocol documented unless you are patient enough to reverse engineer it (and handle undocumented changes!)
There is a pure python driver for mssql (called python-tds) and has been for a long time (v1.0 Jan 2013). There are also pure python drivers for postgresql (pg8000) and mysql (can't remember the name). I haven't done an exhaustive search for other databases as I don't generally use them.
Pure python drivers are excellent for cross platform development, using alternative python implementations, or simplifying packaging. I especially like them for putting a python program onto Android. You don't need to worry about how to cross compile db client libraries.
Yes. It is possible to implement python database API as it stated in PEP 249
Even more: such database API implementations exists.
E.g. nuodb-python
I want to call PostgreSQL queries and return results for python APIs?
Basically , do a python and PostgreSQL integration/Connectivity.
So, for specific Python API /calls want to execute the queries n return result.
Also, want to achieve abstraction of PostgreSQL DB.
Thanks.
To add to klin's comment:
psycopg2 -
This is the most popular psql adapter for python. It was build to address heavy concurrency issues with psql database usage. Several extensions are available for added functionality with the DB API.
asyncpg -
More recent psql adapter which seeks to address shortfalls in functionality and performance that exist with psycopg2. Doubles the speed of psycopg's text based data exchange protocol by using binary I/O (which adds generic support for container types). A Major plus is that it has zero dependencies. No personal experience with this adapter but will test soon.
I have a Pylons application using SQLAlchemy with SQLite as backend. I would like to know if every read operation going to SQLite will always lead to a hard disk read (which is very slow compared to RAM) or some caching mechanisms are already involved.
does SQLite maintain a subset of the database in RAM for faster access ?
Can the OS (Linux) do that automatically ?
How much speedup could I expect by using a production database (MySQL or PostgreSQL) instead of SQLite?
Yes, SQLite has its own memory cache. Check PRAGMA cache_size for instance. Also, if you're looking for speedups, check PRAGMA temp_store. There is also API for implementing your own cache.
The SQLite database is just a file to the OS. Nothing is 'automatically' done for it. To ensure caching does happen, there are sqlite.h defines and runtime pragma settings.
It depends, there are a lot of cases when you'll get a slowdown instead.
How much speedup could I expect by using a production database (Mysql or postgres) instead of sqlite?
Are you using sqlite in a production server environment? You probably shouldn't be:
From Appropriate Uses for Sqlite:
SQLite will normally work fine as the database backend to a website.
But if you website is so busy that you are thinking of splitting the
database component off onto a separate machine, then you should
definitely consider using an enterprise-class client/server database
engine instead of SQLite.
SQLite is not designed well for, and was never intended to scale well; SQLite trades convenience for performance; if performance is a concern, you should consider another DBMS
There's an API for Twisted apps to talk to a database in a scalable way: twisted.enterprise.dbapi
The confusing thing is, which database to pick?
The database will have a Twisted app that is mostly making inserts and updates and relatively few selects, and then other strictly-read-only clients that are accessing the database directly making selects.
(The read-only users are not necessarily selecting the data that the Twisted app is inserting; its not as though the database is being used as a message-queue)
My understanding - which I'd like corrected/adviced - is that:
Postgres is a great DB, but almost all the Python bindings - and there is a confusing maze of them - are abandonware
There is psycopg2 for postgres, but that makes a lot of noise about doing its own connection-pooling and things; does this co-exist gracefully/usefully/transparently with the Twisted async database connection pooling and such?
SQLLite is a great database for little things but if used in a multi-user way it does whole-database locking, so performance would suck in the usage pattern I envisage; it also has different mechanisms for typing column values?
MySQL - after the Oracle takeover, who'd want to adopt it now or adopt a fork?
Is there anything else out there?
Scalability
twisted.enterprise.adbapi isn't necessarily an interface for talking to databases in a scalable way. Scalability is a problem you get to solve separately. The only thing twisted.enterprise.adbapi really claims to do is let you use DB-API 2.0 modules without the blocking that normally implies.
Postgres
Yes. This is the correct answer. I don't think all of the Python bindings are abandonware - psycopg2, for example, seems to be actively maintained. In fact, they just added some new bindings for async access which Twisted might eventually offer an interface.
SQLite3 is pretty cool too. You might want to make it possible to use either Postgres or SQLite3 in your app; your unit tests will definitely be happier running against SQLite3, for example, even if you want to deploy against Postgres.
Other?
It's hard to know if another database entirely (something non-relational, perhaps) would fit your application better than Postgres. That depends a lot on the specific data you're going to be storing and the queries you need to run against it. If there are interesting relationships in your database, Postgres does seem like a pretty good answer. If all your queries look like "SELECT foo, bar FROM baz" though, there might be a simpler, higher performance option.
There is the txpostgres library which is a drop in replacement for twisted.enterprise.dbapi, —instead of a thread pool and blocking DB IO, it is fully asynchronous, leveraging the built in async capabilities of psycopg2.
We are using it in production in a big corporation and it's been serving us very well so far. Also, it's actively developed—a bug we reported recently was solved very quickly.
http://pypi.python.org/pypi/txpostgres
https://github.com/wulczer/txpostgres
You could look at nosql databases like mongodb or couchdb with twisted.
Scaling out could be rather easier with nosql based databases than with mysql or postgres.
Is there any python library that can keep a client-side SQLite database in sync with a server-side PostgreSQL database?
There are solutions for Java, such as Daffodil or SymmetricDS. Is there something similar for python?
SymmetricDS is a server-side solution for synchronization that gets triggered regardless of which language is being used to access the database. You should still be able to use that to synchronize the databases, while using Python libraries to actually query them. I would recommend sqlalchemy as a good database-independent query layer for Python.