Pytest with xdist takes 16x longer, regardless of number of workers

Pytest with xdist takes 16x longer, regardless of number of workers - python

I've just added pytest to an existing Django project - all unit tests are using Django's unittest subclasses, etc. We use an SQLite in-memory database for tests.
manage.py test takes rougly 80 seconds on our test suite
py.test takes the same
py.test -n1 (or -n4, or anything like that) takes roughly 1280 seconds.
I would expect an overhead to support the distribution, but obviously with -n4, it should be approximately 3-4 times faster on a large-ish test suite.
Findings...
I've so far traced the issue down to database access. The tests run quickly until they first hit the database, but at the first .save() call on a Django model, that test will be incredibly slow.
After some profiling on the workers, it looks like they are spending a lot of time waiting on locks, but I have no idea if that's a reliable finding or not.
I wondered if there was some sort of locking on the database, it was suggested to me that the in-memory SQLite database might be a memory mapped file, and that locking might be happening on that between the workers, but apparently each call to open an in-memory database with SQLite will return a completely separate instance.
As it stands, I've probably spent 5+ hours on this so far, and spoken at length to colleagues and others about this, and not yet found the issue. I have not been able to reproduce on a separate codebase.
What kind of things might cause this?
What more could I do to further track down the issue?
Thanks in advance for any ideas!

Related

Managing Heroku RAM for Unique Application

I have a Flask application that allows users to query a ~small database (2.4M rows) using SQL. It's similar to a HackerRank but more limited in scope. It's deployed on Heroku.
I've noticed during testing that I can predictably hit an R14 error (memory quota exceeded) or R15 (memory quota greatly exceeded) by running large queries. The queries that typically cause this are outside what a normal user might do, such as SELECT * FROM some_huge_table. That said, I am concerned that these errors will become a regular occurrence for even small queries when 5, 10, 100 users are querying at the same time.
I'm looking for some advice on how to manage memory quotas for this type of interactive site. Here's what I've explored so far:
Changing the # of gunicorn workers. This has had some effect but I still hit R14 and R15 errors consistently.
Forced limits on user queries, based on either text or the EXPLAIN output. This does work to reduce memory usage, but I'm afraid it won't scale to even a very modest # of users.
Moving to a higher Heroku tier. The plan I use currently provides ~512MB RAM. The largest plan is around 14GB. Again, this would help but won't even moderately scale, to say nothing of the associated costs.
Reducing the size of the database significantly. I would like to avoid this if possible. Doing the napkin math on a table with 1.9M rows going to 10k or 50k, the application would have greatly reduced memory needs and will scale better, but will still have some moderate max usage limit.
As you can see, I'm a novice at best when it comes to memory management. I'm looking for some strategies/ideas on how to solve this general problem, and if it's the case that I need to either drastically cut the data size or throw tons of $ at this, that's OK too.
Thanks

Coming from my personal experience, I see two approaches:
1. plan for it
Coming from your example, this means you try to calculate the maximum memory that the request would use, multiply it by the number of gunicorn workers, and use dynos big enough.
With a different example this could be valid, I don't think it is for you.
2. reduce memory usage, solution 1
The fact that too much application memory is used makes me think that likely in your code you are loading the whole result-set into memory (probably even multiple times in multiple formats) before returning it to the client.
In the end, your application is only getting the data from the database and converting it to some output format (JSON/CSV?).
What you are probably searching for is streaming responses.
Your Flask-view will work on a record-by-record base. It will read a single record, convert it to your output format, and return a single record.
Both your database client library and Flask will support this (on most databases it is called cursors / iterators).
2. reduce memory usage, solution 2
other services often go for simple pagination or limiting resultsets to manage server-side memory.
security sidenote
it sounds like the users can actually define the SQL statement in their API requests. This is a security and application risk. Apart from doing INSERT, UPDATE, or DELETE statements, the user could create a SQL statement that will not only blow your application memory, but also break your database.

Django sqlite database is locked

I've been struggling with "sqlite3.OperationalError database is locked" all day....
Searching around for answers to what seems to be a well known problem I've found that it is explained most of the time by the fact that sqlite does not work very nice in multithreading where a thread could potentially timeout waiting for more than 5 (default timeout) seconds to write into the db because another thread has the db lock .
So having more threads that play with the db , one of them using transactions and frequently writing I've began measuring the time it takes for transactionns to complete. I've found that no transaction takes more than 300 ms , thus rendering as not plausible the above explication. Unless the thread that uses transactions makes ~21 (5000 ms / 300 ms) consecutive transactions while any other thread desiring to write gets ignored all this time
So what other hypothesis could potentially explain this behavior ?

I have had a lot of these problems with Sqlite before. Basically, don't have multiple threads that could, potentially, write to the db. If you this is not acceptable, you should switch to Postgres or something else that is better at concurrency.
Sqlite has a very simple implementation that relies on the file system for locking. Most file systems are not built for low-latency operations like this. This is especially true for network-mounted filesystems and the virtual filesystems used by some VPS solutions (that last one got me BTW).
Additionally, you also have the Django layer on top of all this, adding complexity. You don't know when Django releases connections (although I am pretty sure someone here can give that answer in detail :) ). But again, if you have multiple concurrent writers, you need a database layer than can do concurrency. Period.
I solved this issue by switching to postgres. Django makes this very simple for you, even migrating the data is a no-brainer with very little downtime.

In case anyone else might find this question via Google, here's my take on this.
SQLite is a database engine that implements the "serializable" isolation level (see here). By default, it implements this isolation level with a locking strategy (although it seems to be possible to change this to a more MVCC-like strategy by enabling the WAL mode described in that link).
But even with its fairly coarse-grained locking, the fact that SQLite has separate read and write locks, and uses deferred transactions (meaning it doesn't take the locks until necessary), means that deadlocks might still occur. It seems SQLite can detect such deadlocks and fail the transaction almost immediately.
Since SQLite does not support "select for update", the best way to grab the write lock early, and therefore avoid deadlocks, would be to start transactions with "BEGIN IMMEDIATE" or "BEGIN EXCLUSIVE" instead of just "BEGIN", but Django currently only uses "BEGIN" (when told to use transactions) and does not currently have a mechanism for telling it to use anything else. Therefore, locking failures become almost unavoidable with the combination of Django, SQLite, transactions, and concurrency (unless you issue the "BEGIN IMMEDIATE" manually, but that's pretty ugly and SQLite-specific).
But anyone familiar with databases knows that when you're using the "serializable" isolation level with many common database systems, then transactions can typically fail with a serialization error anyway. That happens in exactly the kind of situation this deadlock represents, and when a serialization error occurs, then the failing transaction must simply be retried. And, in fact, that works fine for me.
(Of course, in the end, you should probably use a less "lite" kind of database engine anyway if you need a lot of concurrency.)

Py.test: excessive memory usage with large number of tests

I am using py.test (version 2.4, on Windows 7) with xdist to run a number of numerical regression and interface tests for a C++ library that provides a Python interface through a C module.
The number of tests has grown to ~2,000 over time, but we are running into some memory issues now. Whether using xdist or not, the memory usage of the python process running the tests seems to be ever increasing.
In single-process mode we have even seen a few issues of bad allocation errors, whereas with xdist total memory usage may bring down the OS (8 processes, each using >1GB towards the end).
Is this expected behaviour? Or did somebody else experience the same issue when using py.test for a large number of tests? Is there something I can do in tearDown(Class) to reduce the memory usage over time?
At the moment I cannot exclude the possibility of the problem lying somewhere inside the C/C++ code, but when running some long-running program using that code through the Python interface outside of py.test, I do see relatively constant memory usage over time. I also do not see any excessive memory usage when using nose instead of py.test (we are using py.test as we need junit-xml reporting to work with multiple processes)

py.test's memory usage will grow with the number of tests. Each test is collected before they are executed and for each test run a test report is stored in memory, which will be much larger for failures, so that all the information can be reported at the end. So to some extend this is expected and normal.
However I have no hard numbers and have never closely investigated this. We did run out of memory on some CI hosts ourselves before but just gave them more memory to solve it instead of investigating. Currently our CI hosts have 2G of mem and run about 3500 tests in one test run, it would probably work on half of that but might involve more swapping. Pypy is also a project that manages to run a huge test suite with py.test so this should certainly be possible.
If you suspect the C code to leak memory I recommend building a (small) test script which just tests the extension module API (with or without py.test) and invoke that in an infinite loop while gathering memory stats after every loop. After a few loops the memory should never increase anymore.

Try using --tb=no which should prevent pytest from accumulating stacks on every failure.
I have found that it's better to have your test runner run smaller instances of pytest in multiple processes, rather than one big pytest run, because of it's accumulation in memory of every error.
pytest should probably accumulate test results on-disk, rather than in ram.

We also experience similar problems. In our case we run about ~4600 test cases.
We use extensively pytest fixtures and we managed to save the few MB by scoping the fixtures slightly differently (scoping several from "session" to "class" of "function"). However we dropped in test performances.

How to correctly achieve test isolation with a stateful Python module?

The project I'm working on is a business logic software wrapped up as a Python package. The idea is that various script or application will import it, initialize it, then use it.
It currently has a top level init() method that does the initialization and sets up various things, a good example is that it sets up SQLAlchemy with a db connection and stores the SA session for later access. It is being stored in a subpackage of my project (namely myproj.model.Session, so other code could get a working SA session after import'ing the model).
Long story short, this makes my package a stateful one. I'm writing unit tests for the project and this stafeful behaviour poses some problems:
tests should be isolated, but the internal state of my package breaks this isolation
I cannot test the main init() method since its behavior depends on the state
future tests will need to be run against the (not yet written) controller part with a well known model state (eg. a pre-populated sqlite in-memory db)
Should I somehow refactor my package because the current structure is not the Best (possible) Practice(tm)? :)
Should I leave it at that and setup/teardown the whole thing every time? If I'm going to achieve complete isolation that'd mean fully erasing and re-populating the db at every single test, isn't that overkill?
This question is really on the overall code & tests structure, but for what it's worth I'm using nose-1.0 for my tests. I know the Isolate plugin could probably help me but I'd like to get the code right before doing strange things in the test suite.

You have a few options:
Mock the database
There are a few trade offs to be aware of.
Your tests will become more complex as you will have to do the setup, teardown and mocking of the connection. You may also want to do verification of the SQL/commands sent. It also tends to create an odd sort of tight coupling which may cause you to spend additonal time maintaining/updating tests when the schema or SQL changes.
This is usually the purest for of test isolation because it reduces a potentially large dependency from testing. It also tends to make tests faster and reduces the overhead to automating the test suite in say a continuous integration environment.
Recreate the DB with each Test
Trade offs to be aware of.
This can make your test very slow depending on how much time it actually takes to recreate your database. If the dev database server is a shared resource there will have to be additional initial investment in making sure each dev has their own db on the server. The server may become impacted depending on how often tests get runs. There is additional overhead to running your test suite in a continuous integration environment because it will need at least, possibly more dbs (depending on how many branches are being built simultaneously).
The benefit has to do with actually running through the same code paths and similar resources that will be used in production. This usually helps to reveal bugs earlier which is always a very good thing.
ORM DB swap
If your using an ORM like SQLAlchemy their is a possibility that you can swap the underlying database with a potentially faster in-memory database. This allows you to mitigate some of the negatives of both the previous options.
It's not quite the same database as will be used in production, but the ORM should help mitigate the risk that obscures a bug. Typically the time to setup an in-memory database is much shorter that one which is file-backed. It also has the benefit of being isolated to the current test run so you don't have to worry about shared resource management or final teardown/cleanup.

Working on a project with a relatively expensive setup (IPython), I've seen an approach used where we call a get_ipython function, which sets up and returns an instance, while replacing itself with a function which returns a reference to the existing instance. Then every test can call the same function, but it only does the setup for the first one.
That saves doing a long setup procedure for every test, but occasionally it creates odd cases where a test fails or passes depending on what tests were run before. We have ways of dealing with that - a lot of the tests should do the same thing regardless of the state, and we can try to reset the object's state before certain tests. You might find a similar trade-off works for you.

Mock is a simple and powerfull tool to achieve some isolation. There is a nice video from Pycon2011 which shows how to use it. I recommend to use it together with py.test which reduces the amount of code required to define tests and is still very, very powerfull.

Finding the performance bottleneck in a Python and MySQL script

I have a script with a main for loop that repeats about 15k times. In this loop it queries a local MySQL database and does a SVN update on a local repository. I placed the SVN repository in a RAMdisk as before most of the time seemed to be spent reading/writing to disk.
Now I have a script that runs at basically the same speed but CPU utilization for that script never goes over 10%.
ProcessExplorer shows that mysqld is also not taking almost any CPU time or reading/writing a lot to disk.
What steps would you take to figure out where the bottleneck is?

Doing SQL queries in a for loop 15k times is a bottleneck in every language..
Is there any reason you query every time again ? If you do a single query before the for loop and then loop over the resultset and the SVN part, you will see a dramatic increase in speed.
But I doubt that you will get a higher CPU usage. The reason is that you are not doing calculations, but mostly IO.
Btw, you can't measure that in mysqld cpu usage, as it's in the actual code not complexity of the queries, but their count and the latency of the server engine to answer. So you will see only very short, not expensive queries, that do sum up in time, though.

Profile your Python code. That will show you how long each function/method call takes. If that's the method call querying the MySQL database, you'll have a clue where to look. But it also may be something else. In any case, profiling is the usual approach to solve such problems.

It is "well known", so to speak, that svn update waits up to a whole second after it has finished running, so that file modification timestamps get "in the past" (since many filesystems don't have a timestamp granularity finer than one second). You can find more information about it by Googling for "svn sleep_for_timestamps".
I don't have any obvious solution to suggest. If this is really performance critical you could either: 1) not update as often as you are doing 2) try to use a lower-level Subversion API (good luck).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.