Spyder's memory management is driving me crazy lately. I need to load large datasets (up to 10 GB) and preprocess them. Later on I perform some calculations / modelling (using sklearn) and plots on them. Spyder's cell functionality is perfect for this since it allows me to run the same calculations several times (using different parameters) without repeating the time-consuming preprocessing steps. However, I'm running into memory issues when repeatedly running cells for various reasons:
Re-running the same cell various times increases the memory consumption. I don't understand this since I'm not introducing new variables and previous (global) variables should just be overwritten.
Encapsulating variables in a function helps but not to the degree that one should expect. I have a strong feeling that the memory of local variables is often not released correctly (this also holds when returning any values with .copy() to avoid references to local variables).
Large matplotlib objects do not get recycled properly. Running gc.collect() can help sometimes but does not always clear all the used memory from plots.
When using the 'Clear all variables' button from the IPython console often only parts of the memory get released and several GBs of memory might still remain occupied.
Running %reset from the IPython console works better but does not always clear the memory completely neither (even when running import gc; gc.collect() afterwards).
The only thing that helps for sure is restarting the kernel. However, I don't like doing this since it removes all the output from the console.
Advice on any of the above points is appreciated, as well as some elaboration on the memory management of Spyder. I'm using multi-threading in my code and have a suspicion that some of the issues are related to that, even though I was not able to pinpoint the problem.
Related
I have a Python program that gets data from a measurement instrument and plots the data using matplotlib (I am on Debian Linux). The plotting is done in a separate thread, which updates the data plots at fixed time intervals. At every update, the existing lines are removed from the plot, and then the lines are re-created with the new data (yes, there might be more efficient ways, but it's not possible to just add the new data to existing lines in my situation).
After a while, the program will take up huge amounts of memory (gigabytes). This does not happen if I modify the code to skip the plotting/matplotlib part, so the use of humungeous amounts of memory is clearly related to matplotlib. If I put some pressure on the system by running another application that will consume a lot of memory, my Python program will at some point start to release the excessive memory used up by matplotlib (ending up at about 50 MB or so), Releasing the memory does not seem to have any negative effects on the operation of my program. This tells me that the large junk of memory used by matplotlib is not vital (if not useless) in my application.
How can I avoid matplotlib from picking up so much memory?
not sure if this will help, but have you tried using line-collections?
it should be more efficient for plotting a huge amount of lines in one go
... aside of that some lines of code that show what you're doing might help to identify the problem
I have a parquet file at around 10+GB, with columns are mainly strings. When loading it into the memory, the memory usage can peak to 110G, while after it's finished the memory usage is reduced back to around 40G.
I'm working on a high-performance computer with allocated memory so I do have access to large memory. However, it seems a waste to me that I have to apply for a 128G memory just for loading data, after that 64G is sufficient for me. Also, 128G memory is more often to be out of order.
My naive conjecture is that the Python interpreter mistreated the 512G physical memory on the HPC as the total available memory, so it does not do garbage collection as often as actually needed. For example, when I load the data with 64G memory, it never threw me a MemoryError but the kernel is directly killed and restarted.
I was wondering whether the over-high usage of memory when loading is a regular behavior of pyarrow, or it is due to the special setting of my environment. If the latter, then is it possible to somehow limit the available memory during loading?
We fixed a memory use bug that's present in 0.14.0/0.14.1 (which is probably what you're using right now).
https://issues.apache.org/jira/browse/ARROW-6060
We also are introducing an option to read string columns as categorical (aka DictionaryArray in Arrow parlance) which also will reduce memory usage. See https://issues.apache.org/jira/browse/ARROW-3325 and discussion in
https://ursalabs.org/blog/2019-06-07-monthly-report/
I run k_cliques_community on my network data and received a memory error after a long respond time. The code works perfectly fine for my other data but not this one.
c = list(k_clique_communities(G_fb, 3))
list(c[0])
Here is a snap shot of the trace error
I tried running your code on my system with 16 GB RAM and i7 -7700HQ, the kernel died after returning a memory error. I think it's because the computation of k-cliques of size 3 is taking a lot of computational power/memory since you have quite a large no. of nodes and edges. I think you need to look into other ways to find k-cliques, like GraphX - Apache Spark
You probably have an x32 installation of python. x32 installations are limited to 2 gigabytes of memory regardless of how much system memory you have. This is inconvenient, but it works. Uninstall python and install an x64 installation. Again, it's inconvenient, but it's the only solution that I've ever seen.
As stated in the networkx documentation:
To obtain a list of all maximal cliques, use list(find_cliques(G)). However, be aware that in the worst-case, the length of this list can be exponential in the number of nodes in the graph.
(this is referred to find_cliques, but it is related to your problem also).
That means, if there is a huge number of cliques, the strings you create with list will be so many that you'll probably run out of memory: strings need more space than int values to be saved.
I don't know if there is a work-around tho.
I'm trying to execute the logistic_sgd.py code on an Amazon cluster running the ami-b141a2f5 (Theano - CUDA 7) image.
Instead of the included MNIST database I am using the SD19 database, which requires changing a few dimensional constants, but otherwise no code has been touched. The code runs fine locally, on my CPU, but once I SSH the code and data to the Amazon cluster and run it there, I get this output:
It looks to me like it is running out of VRAM, but it was my understanding that the code should run on a GPU already, without any tinkering on my part necessary. After following the suggestion from the error message, the error persists.
There's nothing especially strange here. The error message is almost certainly accurate: there really isn't enough VRAM. Often, a script will run fine on CPU but then fail like this on GPU simply because there is usually much more system memory available than GPU memory, especially since the system memory is virtualized (and can page out to disk if required) while the GPU memory isn't.
For this script, there needs to be enough memory to store the training, validation, and testing data sets, the model parameters, and enough working space to store intermediate results of the computation. There are two options available:
Reduce the amount of memory needed for one or more of these three components. Reducing the amount of training data is usually easiest; reducing the size of the model next. Unfortunately both of those two options will often impair the quality of the result that is being looked for. Reducing the amount of memory needed for intermediate results is usually beyond the developers control -- it is managed by Theano, but there is sometimes scope for altering the computation to achieve this goal once a good understanding of Theano's internals is achieved.
If the model parameters and working memory can fit in GPU memory then the most common solution is to change the code so that the data is no longer stored in GPU memory (i.e. just store it as numpy arrays, not as Theano shared variables) then pass each batch of data in as inputs instead of givens. The LSTM sample code is an example of this approach.
Anyone following CUDA will probably have seen a few of my queries regarding a project I'm involved in, but for those who haven't I'll summarize. (Sorry for the long question in advance)
Three Kernels, One Generates a data set based on some input variables (deals with bit-combinations so can grow exponentially), another solves these generated linear systems, and another reduction kernel to get the final result out. These three kernels are ran over and over again as part of an optimisation algorithm for a particular system.
On my dev machine (Geforce 9800GT, running under CUDA 4.0) this works perfectly, all the time, no matter what I throw at it (up to a computational limit based on the stated exponential nature), but on a test machine (4xTesla S1070's, only one used, under CUDA 3.1) the exact same code (Python base, PyCUDA interface to CUDA kernels), produces the exact results for 'small' cases, but in mid-range cases, the solving stage fails on random iterations.
Previous problems I've had with this code have been to do with the numeric instability of the problem, and have been deterministic in nature (i.e fails at exactly the same stage every time), but this one is frankly pissing me off, as it will fail whenever it wants to.
As such, I don't have a reliable way to breaking the CUDA code out from the Python framework and doing proper debugging, and PyCUDA's debugger support is questionable to say the least.
I've checked the usual things like pre-kernel-invocation checking of free memory on the device, and occupation calculations say that the grid and block allocations are fine. I'm not doing any crazy 4.0 specific stuff, I'm freeing everything I allocate on the device at each iteration and I've fixed all the data types as being floats.
TL;DR, Has anyone come across any gotchas regarding CUDA 3.1 that I haven't seen in the release notes, or any issues with PyCUDA's autoinit memory management environment that would cause intermittent launch failures on repeated invocations?
Have you tried:
cuda-memcheck python yourapp.py
You likely have an out of bounds memory access.
You can use nVidia CUDA Profiler and see what gets executed before the failure.