I've been looking at a in-memory database -- and it got me thinking, how does Python handle IO that's not tied to a connection (and even data that is); for example, hashes, sets, etc.; is this a config somewhere, or is it dynamically managed based on resources; are there "easy" ways to view the effect resources are having on a real program, and simulate what the performance hit would be differing hardware setups?
NOTE: If it matters, Redis is the in-memory data store I'm looking at; there's an implementation of a wrapper for Redis datatypes so they mimic the datatypes found in Python.
Python allocates all memory that the application asks for. There is not much room for policy. The only issue is when to release memory. (C)Python immediately releases all memory that is not referenced anymore (this is also not tunable). Memory that is referenced only from itself (ie. cycles) are released by the garbage collector; this has tunable settings.
It is the operating system's decision to write some of the memory into the pagefile.
Not exactly what you're asking for, but Dowser is a Python tool for interactively browsing the memory usage of your running program. Very useful in understanding memory usage and allocation patterns.
http://www.aminus.net/wiki/Dowser
Related
I'm coding a program that requires high memory usage.
I use python 3.7.10.
During the program I create about 3GB of python objects, modifying them.
Some objects I create contain pointer to other objects.
Also, sometimes I need to deepcopy one object to create another.
My problem is that these objects creation and modification takes a lot of time and causing some performance issues.
I wish I could do some of the creation and modification in parallel. However, there are some limitations:
the program is very CPU-bound and there is almost no usage of IO/network - so multithreading library will not work due to the GIL
the system I work with has no Read-on-write feature- so using multiprocessing python library spend a lot of time on forking the process
the objects do not contain numbers and most of the work in the program are not mathematical - so I cannot benefit from numpy and ctypes
What can be a good alternative for this kind of memory to allow me to parallelize better my code?
Deepcopy is extremely slow in python. A possible solution is to serialize and load the objects from the disk. See this answer for viable options – perhaps ujson and cPickle. Furthermore, you can serialize and deserialize objects asynchronously using aiofiles.
Can't you use your GPU RAM and use CUDA?
https://developer.nvidia.com/how-to-cuda-python
If it doesn't need to be realtime I'd use PySpark (see streaming section https://spark.apache.org/docs/latest/api/python/) and work with remote machines.
Can you tell me a bit about the application? Perhaps you're searching for something like the PyTorch framework (https://pytorch.org/).
You may also like to try using Transparent Huge Pages and a hugepage-aware allocator, such as tcmalloc. That may speed up your application by 5-15% without having to change a line of code.
See thp-usage for more information.
This is a follow up to a stackoverflow answer from 2009
How can I explicitly free memory in Python?
Unfortunately (depending on your version and release of Python) some
types of objects use "free lists" which are a neat local optimization
but may cause memory fragmentation, specifically by making more and
more memory "earmarked" for only objects of a certain type and thereby
unavailable to the "general fund".
The only really reliable way to ensure that a large but temporary use
of memory DOES return all resources to the system when it's done, is
to have that use happen in a subprocess, which does the memory-hungry
work then terminates. Under such conditions, the operating system WILL
do its job, and gladly recycle all the resources the subprocess may
have gobbled up. Fortunately, the multiprocessing module makes this
kind of operation (which used to be rather a pain) not too bad in
modern versions of Python.
In your use case, it seems that the best way for the subprocesses to
accumulate some results and yet ensure those results are available to
the main process is to use semi-temporary files (by semi-temporary I
mean, NOT the kind of files that automatically go away when closed,
just ordinary files that you explicitly delete when you're all done
with them).
It's been 10 years since that answer, and I am wondering if there is a better way to create some sort of process/subprocess/function/method that releases all of it's memory when completed.
The motivation for this is an issue I am having, where a forloop creates a memory error, despite creating no new variables.
Repeated insertions into sqlite database via sqlalchemy causing memory leak?
It is insertion to a database. I know it's not the database itself that is causing the memory error because when I restart my runtime, the database is still preserved, but the crash doesn't happen until another several hundred iterations of the for loop.
I am intended to make a program structure like below
PS1 is a python program persistently running. PC1, PC2, PC3 are client python programs. PS1 has a variable hashtable, whenever PC1, PC2... asks for the hashtable the PS1 will pass it to them.
The intention is to keep the table in memory since it is a huge variable (takes 10G memory) and it is expensive to calculate it every time. It is not feasible to store it in the hard disk (using pickle or json) and read it every time when it is needed. The read just takes too long.
So I was wondering if there is a way to keep a python variable persistently in the memory, so it can be used very fast whenever it is needed.
You are trying to reinvent a square wheel, when nice round wheels already exist!
Let's go one level up to how you have described your needs:
one large data set, that is expensive to build
different processes need to use the dataset
performance questions do not allow to simply read the full set from permanent storage
IMHO, we are exactly facing what databases were created for. For common use cases, having many processes all using their own copy of a 10G object is a memory waste, and the common way is that one single process have the data, and the others send requests for the data. You did not describe your problem enough, so I cannot say if the best solution will be:
a SQL database like PostgreSQL or MariaDB - as they can cache, if you have enough memory, all will be held automatically in memory
a NOSQL database (MongoDB, etc.) if your only (or main) need is single key access - very nice when dealing with lot of data requiring fast but simple access
a dedicated server using a dedicate query languages if your needs are very specific and none of the above solutions meet them
a process setting up a huge piece of shared memory that will be used by client processes - that last solution will certainly be fastest provided:
all clients make read-only accesses - it can be extended to r/w accesses but could lead to a synchronization nightmare
you are sure to have enough memory on your system to never use swap - if you do you will lose all the cache optimizations that real databases implement
the size of the database and the number of client process and the external load of the whole system never increase to a level where you fall in the swapping problem above
TL/DR: My advice is to experiment what are the performances with a good quality database and optionaly a dedicated chache. Those solution allow almost out of the box load balancing on different machines. Only if that does not work carefully analyze the memory requirements and be sure to document the limits in number of client processes and database size for future maintenance and use shared memory - read-only data being an hint that shared memory can be a nice solution
In short, to accomplish what you are asking about, you need to create a byte array as a RawArray from the multiprocessing.sharedctypes module that is large enough for your entire hashtable in the PS1 server, and then store the hashtable in that RawArray. PS1 needs to be the process that launches PC1, PC2, etc., which can then inherit access to the RawArray. You can create your own class of object that provides the hashtable interface through which the individual variables in the table are accessed that can be separately passed to each of the PC# processes that reads from the shared RawArray.
I am currently trying to debug the memory usage of my Python program (on Windows with CPython 2.7). But unfortunately, I can't even find any way to reliably measure the amount of memory it's currently using.
I've been using the Task Manager/Resource Monitor to measure the process memory, but this appears to only be useful for determining peak memory consumption. Often times Python will not reduce the Commit or Working Set even long after the relevant objects have been garbage collected.
Is there any way to find out how much memory Python is actually using, or failing that, to force it to free up its unused memory? I'd prefer not to use anything that would require recompiling the interperter.
An example of the behavior that proves it isn't freeing unused memory:
(after some calculations) # 290k
gc.collect() # still 290k
x = range(9999999) # 444k
del x # 405k
gc.collect() # 40k
Is there any way to find out how much memory Python is actually using,
Not from with-in Python.
You can get a rough idea of memory usage per object using sys.getsizeof however that doesn't capture total memory usage, overallocations, fragmentation, memory unused but not freed back to the OS.
There is a third-party tool called Pympler that can help with memory analysis. Also, there a programming environment called Guppy for object and heap memory sizing, profiling and analysis. And there is a similar project called PySizer with a memory usage profiler for Python code.
or failing that, to force it to free up its unused memory?
There is no public API for forcing memory to be released.
I have a Python program that dies with a MemoryError when I feed it a large file. Are there any tools that I could use to figure out what's using the memory?
This program ran fine on smaller input files. The program obviously needs some scalability improvements; I'm just trying to figure out where. "Benchmark before you optimize", as a wise person once said.
(Just to forestall the inevitable "add more RAM" answer: This is running on a 32-bit WinXP box with 4GB RAM, so Python has access to 2GB of usable memory. Adding more memory is not technically possible. Reinstalling my PC with 64-bit Windows is not practical.)
EDIT: Oops, this is a duplicate of Which Python memory profiler is recommended?
Heapy is a memory profiler for Python, which is the type of tool you need.
The simplest and lightweight way would likely be to use the built in memory query capabilities of Python, such as sys.getsizeof - just run it on your objects for a reduced problem (i.e. a smaller file) and see what takes a lot of memory.
In your case, the answer is probably very simple: Do not read the whole file at once but process the file chunk by chunk. That may be very easy or complicated depending on your usage scenario. Just for example, a MD5 checksum computation can be done much more efficiently for huge files without reading the whole file in. The latter change has dramatically reduced memory consumption in some SCons usage scenarios but was almost impossible to trace with a memory profiler.
If you still need a memory profiler: eliben already suggested sys.getsizeof. If that doesn't cut it, try Heapy or Pympler.
You asked for a tool recommendation:
Python Memory Validator allows you to monitor the memory usage, allocation locations, GC collections, object instances, memory snapshots, etc of your Python application. Windows only.
http://www.softwareverify.com/python/memory/index.html
Disclaimer: I was involved in the creation of this software.