How to mock redshift queries integration test in Python and Pytest - python

I have Python code that looks like this:
db = de_core.db.redshift.get_connection()
...
query = get_query(f"export_user_{user_component}").render()
result = util.execute_query(db, query, user_id=user_id)
And it actually executes sql. I want to write an integration test that tests this sql. The sql is Redshift flavored sql... so like postgresql but not really. What's the best way to test this? Moto doesn't seem to support this kind of test. Are there any libraries that support this kind of integration test where I can mock out the real redshift connection with one that behaves like it?
I want to be able to setup tables in the test, create records, have sql execute against this mock, and return results. Is there anything like this?

In general, mocking a database requires the database engine to run somewhere to execute your SQL.
That is why things like test-containers or embedded-postgres exist. Traditional apps would use these for their integration testing.
But as you noted Redshift is Posgtres-flavored, so if your code is Redshift-specific, then you might actually need Redshift to run your tests, with a test-dedicated database.
Some inspiration to create a wrapper class: https://codereview.stackexchange.com/questions/123143/unit-tests-for-a-redshift-wrapper-class

Related

Python SQL cursor/fetch: When does the database query complete running?

I had a question regarding the python IBM_DB package (but I think it could be applied to any of the packages that employ the connection/cursor logic i.e. pyodbc).
When the cursor.execute() method is called, it executes an sql query on the database. However, to access this data, you would need to use the fetchall()/other fetch methods. I want to time the hit on the database.
Does the query completely finish running at the execute level, and it is in memory just for python to fetch? Or does the fetch method continue calling the database? I have scoured the documentation and am unable to find anything definitive on this subject.
Most or all of the Db2 open source drivers are based on the Call Level Interface (CLI). The CLI functions and details are part of the overall Db2 documentation. The Fetch() from a ResultSet retrieves one more row.
AFAIK the result set can be cached or go back to the engine. It makes sense to bring in few (dozen) rows, but not for some million rows.
You would need insights and understanding of how drivers and database query processing work in order to measure something useful and interpret it correctly.
BTW: There is some form of CLI tracing available.

How should I unit test MySQL queries?

I'm building some unit tests for my Python module which interfaces with a MySQL database via SQLAlchemy. From reading around I gather the best way to do this is to create a test database that I can query as if it was the real thing. I've done this however how should I test the existing queries in the module as they currently all point at the live database?
The only idea I'd come up with was to do something like the following:
def run_query(engine, db_name='live_db')
engine.execute(f'SELECT * FROM {db_name}.<table_name>')
I could then pass in test_db when I run the function from unittest. Is there a better way?
For a scalable testing approach, I would suggest having an intermediate DAL Layer than should decide to which DB the query should be routed.
Testing with a Test Database

How to generate SQL query for postgres (Redshift) without connection?

I want to build SQL query to pass into spark-redshift reader's "query" option. I'm trying to use psycopg2, so I do something like this:
from psycopg2 import sql
query = sql.SQL(
"select * from {} where event_timestamp < {}"
).format(
sql.Identifier("events"),
sql.Literal(datetime.now())
).as_string()
But it tells me that I need to pass context (connection or cursor) to as_string(). I'm not able to, because I don't have any connection.
Should I use plain string format in this case with some escaping?
Or is there any way to pass some mock context there? And why it needs connection to build query string? Does SQL query change depending on connection?
I'm not familiar with spark, but I'd be surprised if they didn't have some kind of sql support. Another alternative is a lightweight package like sqlbuilder.
If you really want to use psycopg, I suggest looking at how they unit test with mocks -- psycopg2's ConnectingTestCase.
class ConnectingTestCase(unittest.TestCase):
"""A test case providing connections for tests.
A connection for the test is always available as `self.conn`. Others can be
created with `self.connect()`. All are closed on tearDown.
Subclasses needing to customize setUp and tearDown should remember to call
the base class implementations.
"""

how to do database mocking or make sqlite run on localhost?

Hi I am trying to write python functional tests for our application. It involves several external components and we are mocking them all out.. We have got a better framework for mocking a service, but not for mocking a database yet.
sqlite is very lite and thought of using them but its a serverless, is there a way I can write some python wrapper to make it a server or I should look at other options like HSQL DB?
I don't understand your problem. Why do you care that it's serverless?
My standard technique for this is:
use SQLAlchemy
in tests, configure it with sqlite:/// or sqlite:///:memory:

testing python applications that use mysql

I want to write some unittests for an application that uses MySQL. However, I do not want to connect to a real mysql database, but rather to a temporary one that doesn't require any SQL server at all.
Any library (I could not find anything on google)? Any design pattern? Note that DIP doesn't work since I will still have to test the injected class.
There isn't a good way to do that. You want to run your queries against a real MySQL server, otherwise you don't know if they will work or not.
However, that doesn't mean you have to run them against a production server. We have scripts that create a Unit Test database, and then tear it down once the unit tests have run. That way we don't have to maintain a static test database, but we still get to test against the real server.
I've used python-mock and mox for such purposes (extremely lightweight tests that absolutely cannot require ANY SQL server), but for more extensive/in-depth testing, starting and populating a local instance of MySQL isn't too bad either.
Unfortunately SQLite's SQL dialect and MySQL's differ enough that trying to use the former for tests is somewhat impractical, unless you're using some ORM (Django, SQLObject, SQLAlchemy, ...) that can hide the dialect differences.

Categories