How should I unit test MySQL queries? - python

I'm building some unit tests for my Python module which interfaces with a MySQL database via SQLAlchemy. From reading around I gather the best way to do this is to create a test database that I can query as if it was the real thing. I've done this however how should I test the existing queries in the module as they currently all point at the live database?
The only idea I'd come up with was to do something like the following:
def run_query(engine, db_name='live_db')
engine.execute(f'SELECT * FROM {db_name}.<table_name>')
I could then pass in test_db when I run the function from unittest. Is there a better way?

For a scalable testing approach, I would suggest having an intermediate DAL Layer than should decide to which DB the query should be routed.
Testing with a Test Database

Related

How to mock redshift queries integration test in Python and Pytest

I have Python code that looks like this:
db = de_core.db.redshift.get_connection()
...
query = get_query(f"export_user_{user_component}").render()
result = util.execute_query(db, query, user_id=user_id)
And it actually executes sql. I want to write an integration test that tests this sql. The sql is Redshift flavored sql... so like postgresql but not really. What's the best way to test this? Moto doesn't seem to support this kind of test. Are there any libraries that support this kind of integration test where I can mock out the real redshift connection with one that behaves like it?
I want to be able to setup tables in the test, create records, have sql execute against this mock, and return results. Is there anything like this?
In general, mocking a database requires the database engine to run somewhere to execute your SQL.
That is why things like test-containers or embedded-postgres exist. Traditional apps would use these for their integration testing.
But as you noted Redshift is Posgtres-flavored, so if your code is Redshift-specific, then you might actually need Redshift to run your tests, with a test-dedicated database.
Some inspiration to create a wrapper class: https://codereview.stackexchange.com/questions/123143/unit-tests-for-a-redshift-wrapper-class

how to do database mocking or make sqlite run on localhost?

Hi I am trying to write python functional tests for our application. It involves several external components and we are mocking them all out.. We have got a better framework for mocking a service, but not for mocking a database yet.
sqlite is very lite and thought of using them but its a serverless, is there a way I can write some python wrapper to make it a server or I should look at other options like HSQL DB?
I don't understand your problem. Why do you care that it's serverless?
My standard technique for this is:
use SQLAlchemy
in tests, configure it with sqlite:/// or sqlite:///:memory:

How to interface with another database effectively using python

I have an application that needs to interface with another app's database. I have read access but not write.
Currently I'm using sql statements via pyodbc to grab the rows and using python manipulate the data. Since I don't cache anything this can be quite costly.
I'm thinking of using an ORM to solve my problem. The question is if I use an ORM like "sql alchemy" would it be smart enough to pick up changes in the other database?
E.g. sql alchemy accesses a table and retrieves a row. If that row got modified outside of sql alchemy would it be smart enough to pick it up?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Edit: To be more clear
I have one application that is simply a reporting tool lets call App A.
I have another application that handles various financial transactions called App B.
A has access to B's database to retrieve the transactions and generates various reports. There's hundreds of thousands of transactions. We're currently caching this info manually in python, if we need an updated report we refresh the cache. If we get rid of the cache, the sql queries combined with the calculations becomes unscalable.
I don't think an ORM is the solution to your problem of performance. By default ORMs tend to be less efficient than row SQL because they might fetch data that you're not going to use (eg. doing a SELECT * when you need only one field), although SQLAlchemy allows fine-grained control over the SQL generated.
Now to implement a caching mechanism, depending on your application, you could use a simple dictionary in memory or a specialized system such as memcached or Redis.
To keep your cached data relatively fresh, you can poll the source at regular intervals, which might be OK if your application can tolerate a little delay. Otherwise you'll need the application that has write access to the db to notify your application or your cache system when an update occurs.
Edit: since you seem to have control over app B, and you've already got a cache system in app A, the simplest way to solve your problem is probably to create a callback in app A that app B can call to expire cached items. Both apps need to agree on a convention to identify cached items.

Database for Python Twisted

There's an API for Twisted apps to talk to a database in a scalable way: twisted.enterprise.dbapi
The confusing thing is, which database to pick?
The database will have a Twisted app that is mostly making inserts and updates and relatively few selects, and then other strictly-read-only clients that are accessing the database directly making selects.
(The read-only users are not necessarily selecting the data that the Twisted app is inserting; its not as though the database is being used as a message-queue)
My understanding - which I'd like corrected/adviced - is that:
Postgres is a great DB, but almost all the Python bindings - and there is a confusing maze of them - are abandonware
There is psycopg2 for postgres, but that makes a lot of noise about doing its own connection-pooling and things; does this co-exist gracefully/usefully/transparently with the Twisted async database connection pooling and such?
SQLLite is a great database for little things but if used in a multi-user way it does whole-database locking, so performance would suck in the usage pattern I envisage; it also has different mechanisms for typing column values?
MySQL - after the Oracle takeover, who'd want to adopt it now or adopt a fork?
Is there anything else out there?
Scalability
twisted.enterprise.adbapi isn't necessarily an interface for talking to databases in a scalable way. Scalability is a problem you get to solve separately. The only thing twisted.enterprise.adbapi really claims to do is let you use DB-API 2.0 modules without the blocking that normally implies.
Postgres
Yes. This is the correct answer. I don't think all of the Python bindings are abandonware - psycopg2, for example, seems to be actively maintained. In fact, they just added some new bindings for async access which Twisted might eventually offer an interface.
SQLite3 is pretty cool too. You might want to make it possible to use either Postgres or SQLite3 in your app; your unit tests will definitely be happier running against SQLite3, for example, even if you want to deploy against Postgres.
Other?
It's hard to know if another database entirely (something non-relational, perhaps) would fit your application better than Postgres. That depends a lot on the specific data you're going to be storing and the queries you need to run against it. If there are interesting relationships in your database, Postgres does seem like a pretty good answer. If all your queries look like "SELECT foo, bar FROM baz" though, there might be a simpler, higher performance option.
There is the txpostgres library which is a drop in replacement for twisted.enterprise.dbapi, —instead of a thread pool and blocking DB IO, it is fully asynchronous, leveraging the built in async capabilities of psycopg2.
We are using it in production in a big corporation and it's been serving us very well so far. Also, it's actively developed—a bug we reported recently was solved very quickly.
http://pypi.python.org/pypi/txpostgres
https://github.com/wulczer/txpostgres
You could look at nosql databases like mongodb or couchdb with twisted.
Scaling out could be rather easier with nosql based databases than with mysql or postgres.

testing python applications that use mysql

I want to write some unittests for an application that uses MySQL. However, I do not want to connect to a real mysql database, but rather to a temporary one that doesn't require any SQL server at all.
Any library (I could not find anything on google)? Any design pattern? Note that DIP doesn't work since I will still have to test the injected class.
There isn't a good way to do that. You want to run your queries against a real MySQL server, otherwise you don't know if they will work or not.
However, that doesn't mean you have to run them against a production server. We have scripts that create a Unit Test database, and then tear it down once the unit tests have run. That way we don't have to maintain a static test database, but we still get to test against the real server.
I've used python-mock and mox for such purposes (extremely lightweight tests that absolutely cannot require ANY SQL server), but for more extensive/in-depth testing, starting and populating a local instance of MySQL isn't too bad either.
Unfortunately SQLite's SQL dialect and MySQL's differ enough that trying to use the former for tests is somewhat impractical, unless you're using some ORM (Django, SQLObject, SQLAlchemy, ...) that can hide the dialect differences.

Categories