Memory leak using fipy with trilinos

Memory leak using fipy with trilinos - python

I am currently trying to simulate a suspension flowing around a cylindrical obstacle using fipy. Because I'm using fine mesh and my equations are quite complicated, the simulations take quite a long time to converge. Which is why I want to run them in parallel. However, when I do that the program keeps using more and more memory, until Linux eventually kills it (after around 3 hours when I use 4 procesors).
What is more: trilinos increases memory usage even if I only use one processor. For example, when I run this example (changing no. of sweeps from 300 to 5,000 first):
python stokesCavity.py --trilinos -> memory usage goes from 638M to 958M in 10 minutes
python stokesCavity.py --pysparse -> memory usage goes from 616M to 635M in 10 minutes
I saw here that somebody had reported a similar problem before, but I could not find the solution. Any help would be appreciated.
Some info: I am using Trilinos 12.12.1 (compiled against swig 3.0) and fipy 3.2.

This is an issue we have reported against PyTrilinos

Related

HDBSCAN won't utilize all available cpus. Processes just sleep

For the past few weeks I've been attempting to preform a fairly large clustering analysis using the HDBSCAN algorithm in python 3.7. The data in question is roughly 4 million rows by 40 columns at around 1.5GB in CSV format. It's a mixture of ints, bools, and floats up to 9 digits.
During this period each time I've been able to get the data to cluster it has taken 3 plus days, which seems weird given HDBSCAN is revered for its speed and I'm running this on a Google Cloud Compute Instance with 96 cpus. I've spent days trying to get it to utilize the cloud instance's processing power but to no avail.
Using the auto algorithm detection in HDBSCAN, it selects the boruvka_kdtree as the best algorithm to use. And I've tried passing in all sorts of values to core_dist_n_jobs parameter. From -2,-1, 1, 96, multiprocessing.cpu_count(), to 300. All seem to have a similar effect of causing 4 main python processes to utilize a full core while spawning way more sleeping processes.
I refuse to believe I'm doing this right and this is truly how long this takes on this hardware. I'm convinced I must be missing something like an issue where using JupyterHub on the same machine causes some sort of GIL lock, or I'm missing some parameter for HDBSCAN.
Here is my current call to HDBSCAN:
hdbscan.HDBSCAN(min_cluster_size = 100000, min_samples = 500, algorithm='best', alpha=1.0, memory=mem, core_dist_n_jobs = multiprocessing.cpu_count())
I've followed all existing issues and posts related to this issue I could find and nothing has worked so far, but I'm always down to try even radical ideas, because this isn't even the full data I want to cluster and at this rate it would take 4 years years to cluster the full data!

According to the author
Only the core distance computation can use all the cores, sadly that is apparently the first few seconds. The rest of the computation is quite challenging to parallelise unfortunately and will run on a single thread.
you can read the issues from the links below:
Not using all available CPUs?
core_dist_n_jobs =1 or -1 -> no difference at all and computation time extremely high

Python/ Pycharm memory and CPU allocation for faster runtime?

I am trying to run a very capacity intensive python program which process text with NLP methods for conducting different classifications tasks.
The runtime of the programm takes several days, therefore, I am trying to allocate more capacity to the programm. However, I don't really understand if I did the right thing, because with my new allocation the python code is not significantly faster.
Here are some information about my notebook:
I have a notebook running windows 10 with a intel core i7 with 4 core (8 logical processors) # 2.5 GHZ and 32 gb physical memory.
What I did:
I changed some parameters in the vmoptions file, so that it looks like this now:
-Xms30g
-Xmx30g
-Xmn30g
-Xss128k
-XX:MaxPermSize=30g
-XX:ParallelGCThreads=20
-XX:ReservedCodeCacheSize=500m
-XX:+UseConcMarkSweepGC
-XX:SoftRefLRUPolicyMSPerMB=50
-ea
-Dsun.io.useCanonCaches=false
-Djava.net.preferIPv4Stack=true
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
My problem:
However, as I said my code is not running significantly faster. On top of that, if I am calling the taskmanager I can see that pycharm uses neraly 80% of the memory but 0% CPU, and python uses 20% of the CPU and 0% memory.
My question:
What do I need to do that the runtime of my python code gets faster?
Is it possible that i need to allocate more CPU to pycharm or python?
What is the connection beteen the allocation of memory to pycharm and the runtime of the python interpreter?
Thank you very much =)

You can not increase CPU usage manually. Try one of these solutions:
Try to rewrite your algorithm to be multi-threaded. then you can use
more of your CPU. Note that, not all programs can profit from
multiple cores. In these cases, calculation done in steps, where the
next step depends on the results of the previous step, will not be
faster using more cores. Problems that can be vectorized (applying
the same calculation to large arrays of data) can relatively easy be
made to use multiple cores because the individual calculations are
independent.
Use numpy. It is an extension written in C that can use optimized
linear algebra libraries like ATLAS. It can speed up numerical
calculations significantly compared to standard python.

You can adjust the number of CPU cores to be used by the IDE when running the active tasks (for example, indexing header files, updating symbols, and so on) in order to keep the performance properly balanced between AppCode and other applications running on your machine.
ues this link

Time needed of simulations in loop (c++ app, but also python) increases in time

I have a xeon workstation with 16 cores and 32 threads. On that machine I run a large set of simulations in parellel, split on 32 processes that each involve 68,750 (total 2,2 Milion) simulations in a loop.
Data is written on a ssd drive and a python process (while loop with waiting time) gathers the output, consolidates it and stores it away at another regular harddisk, continuously.
Now, at when I start everything and for about the first day, a single simulation only takes a few seconds (all simulations have very similar complexity/load). But then, the time the single simulations take gets longer and longer, up to a hundred-folth after about a week. The cpu temperatures are fine, the disk-tempratures are fine, etc.
However, whereas at the beginning Windows uses all CPU power at 100% is throttles the power a lot when it gets slow (I have a tool running that shows the load of all 32 threads). I tried to hybernate and restart, just to check if it is something with the hardware, but this does not change it, or not very much.
Has anybody any explanation? Is there a tweak to apply to windows to change this behaviour? I can of course do some work around, like splitting the simulations further appart and restarting the machine completely every now and then, starting a new set of simulations again. But I am interested in experience and hopefully solutions to this problem, instead.
On comments 1,2: Yes, there are no leaks (tested the c++ code with valgrind) and the usage is stable. The system has 192 GB of Ram available and it only uses a small proportion of it (the 32 simulations each about 190MB), and in the c++ everything is closed. For python I guess too (leaving scope there closes handles, if I am not mistaken). However, closing the python programe doesn't change a thing.
Thanks!
Frederik

Python - sudden slowdown in Pygame

I am writing a 3D engine in Python using Pygame. For the first 300 or so frames, it takes from about 0.004 to 0.006 seconds to render 140 polygons. But after that, it suddenly takes an average of about 0.020 seconds to do the same task. This is concerning to me because this is a small-scale test, and even though 50 FPS is decent, it cannot be sustained at 1000 polygons, for instance.
I have already done a lot of streamlining to my code. I have also done some slightly deeper profiling, and it appears that the increased time is more or less proportionately distributed, which suggests that the problem is not specific to a single piece of code.
I assume that the problem has something to do with memory usage, but I do not know exactly why this problem is happening. What specific issue is causing this to happen, and how can I optimize my code to fix it, as well as some more general practices? Since my code is very long, it is posted here.

Although I can't answer your question exactly, I would use a task manager and watch the "python" (or "pygame" depending on your OS) process, and view it's memory consumption. If that turns out to be the issue, you could check to see what variables you don't need after a certain time, and then you could clear those variables.
Edit: Some CPUs have data loss prevention systems. What I mean by this is this:
If Application X takes up 40ish % of the CPU (it doesn't have to be that high). After a certain amount of time, the CPU will throttle the amount of CPU that Application X is allowed to use. This can cause slowdown for things such as this. This doesn't happen with (most) games because they're set up to tell the CPU to expect that amount of strain.

PyPy memory usage increasing over time

I've noticed some oddities in the memory usage of my program running under PyPy and Python. Under PyPy the program uses not only a substantially larger initial amount of memory than CPython, but this memory usage increases over time quite dramatically. At the end of the program under PyPy it's using around 170MB, compared to 14MB when run under CPython.
I found a user with the exact same problem, albeit on a smaller scale, but the solutions which worked for him provided only a minor help for my program pypy memory usage grows forever?
The two things I tried changing were setting the environment variables PYPY_GC_MAX to be 100MB and PYPY_GC_GROWTH = 1.1, and also manually calling gc.collect() at each generation.
I'm determining the memory usage with
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1000
Here's the runtime and memory usage under different conditions:
Version: time taken, memory used at end of run
PyPy 2.5.0: 100s, 173MB
PyPy with PYPY_GC_MAX = 100MB and PYPY_GC_GROWTH = 1.1: 102s, 178MB
PyPy with gc.collect(): 108s, 131MB
Python 2.7.3: 167s, 14MB
As you can see the program is much quicker under PyPy than CPython which is why I moved to it in the first place, but at the cost of a 10-fold increase in memory.
The program is an implementation of Genetic Programming, where I'm evolving an arithmetic binary tree over 100 generations, with 200 individuals in the population. Each node in the tree has a reference to its 2 children and these trees can increase in size although for this experiment they stay relatively stable. Depending on the application this program can be running for 10 minutes up to a few hours, but for the results here I've set it to a smaller dataset to highlight the issue.
Does anyone have any idea a) what could be causing this, and b) if it's possible to limit the memory usage to somewhat more respectable levels?

PyPy is known to use more baseline memory than CPython, and this number is known to increase over time, as the JIT compiles more and more machine code. It does (or at least should) converge --- what this means is that the memory usage should increase as your program runs, but only until a maximum. You should get roughly the same usage after running for 10 minutes or after several hours.
We can discuss endlessly if 170MB is too much or not for a "baseline". What I can tell is that a program that uses several GBs of memory on CPython uses not significantly more on PyPy --- that's our goal and our experience so far; but please report it as a bug if your experience is different.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.