How to manage memory usage of processes in Linux? [duplicate]

How to manage memory usage of processes in Linux? [duplicate] - python

I'm trying to implement a check on system resources for the current shell (basically everything in ulimit) in Python to see if enough resources can be allocated. I've found the resource module, but it doesn't seem to have all the information ulimit provides (e.g. POSIX message queues and real-time priority). Is there a way to find the soft and hard limits for these in Python without using external libraries? I'd like to avoid running ulimit as a subprocess if possible but if it's the only way, will do so.

Use resource.getrlimit(). If there's no constant in the resource package, look it up in /usr/include/bits/resource.h:
$ grep RLIMIT_MSGQUEUE /usr/include/bits/resource.h
__RLIMIT_MSGQUEUE = 12,
#define RLIMIT_MSGQUEUE __RLIMIT_MSGQUEUE
Then you can define the constant yourself:
import resource
RLIMIT_MSGQUEUE = 12
print(resource.getrlimit(RLIMIT_MSGQUEUE))

Related

What is Python's equivalent to 'ulimit'?

I'm trying to implement a check on system resources for the current shell (basically everything in ulimit) in Python to see if enough resources can be allocated. I've found the resource module, but it doesn't seem to have all the information ulimit provides (e.g. POSIX message queues and real-time priority). Is there a way to find the soft and hard limits for these in Python without using external libraries? I'd like to avoid running ulimit as a subprocess if possible but if it's the only way, will do so.

Use resource.getrlimit(). If there's no constant in the resource package, look it up in /usr/include/bits/resource.h:
$ grep RLIMIT_MSGQUEUE /usr/include/bits/resource.h
__RLIMIT_MSGQUEUE = 12,
#define RLIMIT_MSGQUEUE __RLIMIT_MSGQUEUE
Then you can define the constant yourself:
import resource
RLIMIT_MSGQUEUE = 12
print(resource.getrlimit(RLIMIT_MSGQUEUE))

Using resource in windows

i've got a script that uses the resource-module from python (see http://docs.python.org/library/resource.html for information). Now i want to port this script to windows. is there any alternative version of this (the python-docs are labeling it as "unix only").
if there isn't, is there any other workaround?
I'm using the following method/constant:
resource.getrusage(resource.RUSAGE_CHILDREN)
resource.RLIMIT_CPU
Thank you
PS: I'm using python 2.7 / 3.2

There's no good way of doing this generically for all "Resources"" -- hence why it's a Unix only command. For CPU speed only you can either use registry keys to set the process id limit:
http://technet.microsoft.com/en-us/library/ff384148%28WS.10%29.aspx
As done here:
http://code.activestate.com/recipes/286159/
IMPORTANT: Backup your registry before trying anything with registry
Or you could set the thread priority:
http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx
As done here:
http://nullege.com/codes/search/win32process.SetThreadPriority
For other resources you'll have to scrap together similar DLL access APIs to achieve the desired effect. You should first ask yourself if you need this behavior. Oftentimes you can limit CPU time by sleeping the thread in operation at convenient times to allow the OS to swap processes and memory controls can be done problematically to check data structure sizes.

How can I access Ring 0 with Python?

This answer, stating that the naming of classes in Python is not done because of special privileges, here confuses me.
How can I access lower rings in Python?
Is the low-level io for accessing lower level rings?
If it is, which rings I can access with that?
Is the statement "This function is intended for low-level I/O." referring to lower level rings or to something else?
C tends to be prominent language in os -programming. When there is the OS -class in Python, does it mean that I can access C -code through that class?
Suppose I am playing with bizarre machine-language code and I want to somehow understand what it means. Are there some tools in Python which I can use to analyze such things? If there is not, is there some way that I could still use Python to control some tool which controls the bizarre machine language? [ctypes suggested in comments]
If Python has nothing to do with the low-level privileged stuff, do it still offers some wrappers to control the privileged?

Windows and Linux both use ring 0 for kernel code and ring 3 for user processes. The advantage of this is that user processes can be isolated from one another, so the system continues to run even if a process crashes. By contrast, a bug in ring 0 code can potentially crash the entire machine.
One of the reasons ring 0 code is so critical is that it can access hardware directly. By contrast, when a user-mode (ring 3) process needs to read some data from a disk:
the process executes a special instruction telling the CPU it wants to make a system call
CPU switches to ring 0 and starts executing kernel code
kernel checks that the process is allowed to perform the operation
if permitted, the operation is carried out
kernel tells the CPU it has finished
CPU switches back to ring 3 and returns control to the process
Processes belonging to "privileged" users (e.g. root/Administrator) run in ring 3 just like any other user-mode code; the only difference is that the check at step 3 always succeeds. This is a good thing because:
root-owned processes can crash without taking the entire system down
many user-mode features are unavailable in the kernel, e.g. swappable memory, private address space
As for running Python code in lower rings - kernel-mode is a very different environment, and the Python interpreter simply isn't designed to run in it, e.g. the procedure for allocating memory is completely different.
In the other question you reference, both os.open() and open() end up making the open() system call, which checks whether the process is allowed to open the corresponding file and performs the actual operation.

I think SimonJ's answer is very good, but I'm going to post my own because from your comments it appears you're not quite understanding things.
Firstly, when you boot an operating system, what you're doing is loading the kernel into memory and saying "start executing at address X". The kernel, that code, is essentially just a program, but of course nothing else is loaded, so if it wants to do anything it has to know the exact commands for the specific hardware it has attached to it.
You don't have to run a kernel. If you know how to control all the attached hardware, you don't need one, in fact. However, it was rapidly realised way back when that there are many types of hardware one might face and having an identical interface across systems to program against would make code portable and generally help get things done faster.
So the function of the kernel, then, is to control all the hardware attached to the system and present it in a common interface, called an API (application programming interface). Code for programs that run on the system don't talk directly to hardware. They talk to the kernel. So user land programs don't need to know how to ask a specific hard disk to read sector 0x213E or whatever, but the kernel does.
Now, the description of ring 3 provided in SimonJ's answer is how userland is implemented - with isolated, unprivileged processes with virtual private address spaces that cannot interfere with each other, for the benefits he describes.
There's also another level of complexity in here, namely the concept of permissions. Most operating systems have some form of access control, whereby "administrators" have total control of the system and "users" have a restricted subset of options. So a kernel request to open a file belonging to an administrator should fail under this sort of approach. The user who runs the program forms part of the program's context, if you like, and what the program can do is constrained by what that user can do.
Most of what you could ever want to achieve (unless your intention is to write a kernel) can be done in userland as the root/administrator user, where the kernel does not deny any API requests made to it. It's still a userland program. It's still a ring 3 program. But for most (nearly all) uses it is sufficient. A lot can be achieved as a non-root/administrative user.
That applies to the python interpreter and by extension all python code running on that interpreter.
Let's deal with some uncertainties:
The naming of os and sys I think is because these are "systems" tasks (as opposed to say urllib2). They give you ways to manipulate and open files, for example. However, these go through the python interpreter which in turn makes a call to the kernel.
I do not know of any kernel-mode python implementations. Therefore to my knowledge there is no way to write code in python that will run in the kernel (linux/windows).
There are two types of privileged: privileged in terms of hardware access and privileged in terms of the access control system provided by the kernel. Python can be run as root/an administrator (indeed on Linux many of the administration gui tools are written in python), so in a sense it can access privileged code.
Writing a C extension or controlling a C application to Python would ostensibly mean you are either using code added to the interpreter (userland) or controlling another userland application. However, if you wrote a kernel module in C (Linux) or a Driver in C (Windows) it would be possible to load that driver and interact with it via the kernel APIs from python. An example might be creating a /proc entry in C and then having your python application pass messages via read/write to that /proc entry (which the kernel module would have to handle via a write/read handler. Essentially, you write the code you want to run in kernel space and basically add/extend the kernel API in one of many ways so that your program can interact with that code.
"Low-level" IO means having more control over the type of IO that takes place and how you get that data from the operating system. It is low level compared to higher level functions still in Python that give you easier ways to read files (convenience at the cost of control). It is comparable to the difference between read() calls and fread() or fscanf() in C.
Health warning: Writing kernel modules, if you get it wrong, will at best result in that module not being properly loaded; at worst your system will panic/bluescreen and you'll have to reboot.
The final point about machine instructions I cannot answer here. It's a totally separate question and it depends. There are many tools capable of analysing code like that I'm sure, but I'm not a reverse engineer. However, I do know that many of these tools (gdb, valgrind) e.g. tools that hook into binary code do not need kernel modules to do their work.

You can use inpout library http://logix4u.net/parallel-port/index.php
import ctypes
#Example of strobing data out with nStrobe pin (note - inverted)
#Get 50kbaud without the read, 30kbaud with
read = []
for n in range(4):
ctypes.windll.inpout32.Out32(0x37a, 1)
ctypes.windll.inpout32.Out32(0x378, n)
read.append(ctypes.windll.inpout32.Inp32(0x378)) #Dummy read to see what is going on
ctypes.windll.inpout32.Out32(0x37a, 0)
print read

[note: I was wrong. usermode code can no longer access ring 0 on modern unix systems. -- jc 2019-01-17]
I've forgotten what little I ever knew about Windows privileges. In all Unix systems with which I'm familiar, the root user can access all ring0 privileges. But I can't think of any mapping of Python modules with privilege rings.
That is, the 'os' and 'sys' modules don't give you any special privileges. You have them, or not, due to your login credentials.

How can I access lower rings in Python?
ctypes
Is the low-level io for accessing lower level rings?
No.
Is the statement "This function is intended for low-level I/O." referring to lower level rings or to something else?
Something else.
C tends to be prominent language in os -programming. When there is the OS -class in Python, does it mean that I can access C -code through that class?
All of CPython is implemented in C.
The os module (it's not a class, it's a module) is for accessing OS API's. C has nothing to do with access to OS API's. Python accesses the API's "directly".
Suppose I am playing with bizarre machine-language code and I want to somehow understand what it means. Are there some tools in Python which I can use to analyze such things?
"playing with"?
"understand what it means"? is your problem. You read the code, you understand it. Whether or not Python can help is impossible to say. What don't you understand?
If there is not, is there some way that I could still use Python to control some tool which controls the bizarre machine language? [ctypes suggested in comments]
ctypes
If Python has nothing to do with the low-level privileged stuff, do it still offers some wrappers to control the privileged?
You don't "wrap" things to control privileges.
Most OS's work like this.
You grant privileges to a user account.
The OS API's check the privileges granted to the user making the OS API request.
If the user has the privileges, the OS API works.
If the user lacks the privileges, the OS API raises an exception.
That's all there is to it.

python IPC (Inter Process Communication) for Vista UAC (User Access Control)

I am writing a Filemanager in (wx)python - a lot already works. When copying files there is already a progress dialog, overwrite handling etc.
Now in Vista when the user wants to copy a file to certain directories (eg %Program Files%) the application/script needs elevation, which cannot be asked for at runtime. So i have to start another app/script elevated, which does the work, but needs to communicate with the main app, so latter can update the progress etc.
I searched and found a lot of articles saying shared memory and pipes are the easiest way. So what i am looking for is a 'high level' platform independent ipc library whith python bindings using shared mem or pipes.
I already found ominORB, fnorb, etc. They look very interesting, but use TCP/IP, is there an equivalent lib using shared mem or pipes ? Since the ipc-client is always on the same machine sockets seems not to be neccesary here. And i am also afraid the user would have to allow ipc-socket-communications on his/her personal firewall.
EDIT: I really mean high level: it would be great to be able to just call some functions like when using omniORB instead of sending strings to stdin/stdout.

How about just communicating with the second process using stdin/stdout?
There are some caveats due to input and output buffering, but take a look at this Python Cookbook recipe, and also Pexpect, for ideas on how to do this.

How do I find what is using memory in a Python process in a production system?

My production system occasionally exhibits a memory leak I have not been able to reproduce in a development environment. I've used a Python memory profiler (specifically, Heapy) with some success in the development environment, but it can't help me with things I can't reproduce, and I'm reluctant to instrument our production system with Heapy because it takes a while to do its thing and its threaded remote interface does not work well in our server.
What I think I want is a way to dump a snapshot of the production Python process (or at least gc.get_objects), and then analyze it offline to see where it is using memory. How do I get a core dump of a python process like this? Once I have one, how do I do something useful with it?

Using Python's gc garbage collector interface and sys.getsizeof() it's possible to dump all the python objects and their sizes. Here's the code I'm using in production to troubleshoot a memory leak:
rss = psutil.Process(os.getpid()).get_memory_info().rss
# Dump variables if using more than 100MB of memory
if rss > 100 * 1024 * 1024:
memory_dump()
os.abort()
def memory_dump():
dump = open("memory.pickle", 'wb')
xs = []
for obj in gc.get_objects():
i = id(obj)
size = sys.getsizeof(obj, 0)
# referrers = [id(o) for o in gc.get_referrers(obj) if hasattr(o, '__class__')]
referents = [id(o) for o in gc.get_referents(obj) if hasattr(o, '__class__')]
if hasattr(obj, '__class__'):
cls = str(obj.__class__)
xs.append({'id': i, 'class': cls, 'size': size, 'referents': referents})
cPickle.dump(xs, dump)
Note that I'm only saving data from objects that have a __class__ attribute because those are the only objects I care about. It should be possible to save the complete list of objects, but you will need to take care choosing other attributes. Also, I found that getting the referrers for each object was extremely slow so I opted to save only the referents. Anyway, after the crash, the resulting pickled data can be read back like this:
with open("memory.pickle", 'rb') as dump:
objs = cPickle.load(dump)
Added 2017-11-15
The Python 3.6 version is here:
import gc
import sys
import _pickle as cPickle
def memory_dump():
with open("memory.pickle", 'wb') as dump:
xs = []
for obj in gc.get_objects():
i = id(obj)
size = sys.getsizeof(obj, 0)
# referrers = [id(o) for o in gc.get_referrers(obj) if hasattr(o, '__class__')]
referents = [id(o) for o in gc.get_referents(obj) if hasattr(o, '__class__')]
if hasattr(obj, '__class__'):
cls = str(obj.__class__)
xs.append({'id': i, 'class': cls, 'size': size, 'referents': referents})
cPickle.dump(xs, dump)

I will expand on Brett's answer from my recent experience. Dozer package is well maintained, and despite advancements, like addition of tracemalloc to stdlib in Python 3.4, its gc.get_objects counting chart is my go-to tool to tackle memory leaks. Below I use dozer > 0.7 which has not been released at the time of writing (well, because I contributed a couple of fixes there recently).
Example
Let's look at a non-trivial memory leak. I'll use Celery 4.4 here and will eventually uncover a feature which causes the leak (and because it's a bug/feature kind of thing, it can be called mere misconfiguration, cause by ignorance). So there's a Python 3.6 venv where I pip install celery < 4.5. And have the following module.
demo.py
import time
import celery
redis_dsn = 'redis://localhost'
app = celery.Celery('demo', broker=redis_dsn, backend=redis_dsn)
#app.task
def subtask():
pass
#app.task
def task():
for i in range(10_000):
subtask.delay()
time.sleep(0.01)
if __name__ == '__main__':
task.delay().get()
Basically a task which schedules a bunch of subtasks. What can go wrong?
I'll use procpath to analyse Celery node memory consumption. pip install procpath. I have 4 terminals:
procpath record -d celery.sqlite -i1 "$..children[?('celery' in #.cmdline)]" to record the Celery node's process tree stats
docker run --rm -it -p 6379:6379 redis to run Redis which will serve as Celery broker and result backend
celery -A demo worker --concurrency 2 to run the node with 2 workers
python demo.py to finally run the example
(4) will finish under 2 minutes.
Then I use sqliteviz (pre-built version) to visualise what procpath has recorder. I drop the celery.sqlite there and use this query:
SELECT datetime(ts, 'unixepoch', 'localtime') ts, stat_pid, stat_rss / 256.0 rss
FROM record
And in sqliteviz I create a line chart trace with X=ts, Y=rss, and add split transform By=stat_pid. The result chart is:
This shape is likely pretty familiar to anyone who fought with memory leaks.
Finding leaking objects
Now it's time for dozer. I'll show non-instrumented case (and you can instrument your code in similar way if you can). To inject Dozer server into target process I'll use Pyrasite. There are two things to know about it:
To run it, ptrace has to be configured as "classic ptrace permissions": echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope, which is may be a security risk
There are non-zero chances that your target Python process will crash
With that caveat I:
pip install https://github.com/mgedmin/dozer/archive/3ca74bd8.zip (that's to-be 0.8 I mentioned above)
pip install pillow (which dozer uses for charting)
pip install pyrasite
After that I can get Python shell in the target process:
pyrasite-shell 26572
And inject the following, which will run Dozer's WSGI application using stdlib's wsgiref's server.
import threading
import wsgiref.simple_server
import dozer
def run_dozer():
app = dozer.Dozer(app=None, path='/')
with wsgiref.simple_server.make_server('', 8000, app) as httpd:
print('Serving Dozer on port 8000...')
httpd.serve_forever()
threading.Thread(target=run_dozer, daemon=True).start()
Opening http://localhost:8000 in a browser there should see something like:
After that I run python demo.py from (4) again and wait for it to finish. Then in Dozer I set "Floor" to 5000, and here's what I see:
Two types related to Celery grow as the subtask are scheduled:
celery.result.AsyncResult
vine.promises.promise
weakref.WeakMethod has the same shape and numbers and must be caused by the same thing.
Finding root cause
At this point from the leaking types and the trends it may be already clear what's going on in your case. If it's not, Dozer has "TRACE" link per type, which allows tracing (e.g. seeing object's attributes) chosen object's referrers (gc.get_referrers) and referents (gc.get_referents), and continue the process again traversing the graph.
But a picture says a thousand words, right? So I'll show how to use objgraph to render chosen object's dependency graph.
pip install objgraph
apt-get install graphviz
Then:
I run python demo.py from (4) again
in Dozer I set floor=0, filter=AsyncResult
and click "TRACE" which should yield
Then in Pyrasite shell run:
objgraph.show_backrefs([objgraph.at(140254427663376)], filename='backref.png')
The PNG file should contain:
Basically there's some Context object containing a list called _children that in turn is containing many instances of celery.result.AsyncResult, which leak. Changing Filter=celery.*context in Dozer here's what I see:
So the culprit is celery.app.task.Context. Searching that type would certainly lead you to Celery task page. Quickly searching for "children" there, here's what it says:
trail = True
If enabled the request will keep track of subtasks started by this task, and this information will be sent with the result (result.children).
Disabling the trail by setting trail=False like:
#app.task(trail=False)
def task():
for i in range(10_000):
subtask.delay()
time.sleep(0.01)
Then restarting the Celery node from (3) and python demo.py from (4) yet again, shows this memory consumption.
Problem solved!

Could you record the traffic (via a log) on your production site, then re-play it on your development server instrumented with a python memory debugger? (I recommend dozer: http://pypi.python.org/pypi/Dozer)

Make your program dump core, then clone an instance of the program on a sufficiently similar box using gdb. There are special macros to help with debugging python programs within gdb, but if you can get your program to concurrently serve up a remote shell, you could just continue the program's execution, and query it with python.
I have never had to do this, so I'm not 100% sure it'll work, but perhaps the pointers will be helpful.

I don't know how to dump an entire python interpreter state and restore it. It would be useful, I'll keep my eye on this answer in case anyone else has ideas.
If you have an idea where the memory is leaking, you can add checks the refcounts of your objects. For example:
x = SomeObject()
... later ...
oldRefCount = sys.getrefcount( x )
suspiciousFunction( x )
if (oldRefCount != sys.getrefcount(x)):
print "Possible memory leak..."
You could also check for reference counts higher than some number that is reasonable for your app. To take it further, you could modify the python interpreter to do these kinds of check by replacing the Py_INCREF and Py_DECREF macros with your own. This might be a bit dangerous in a production app, though.
Here is an essay with more info on debugging these sorts of things. It's more geared for plugin authors but most of it applies.
Debugging Reference Counts

The gc module has some functions that might be useful, like listing all objects the garbage collector found to be unreachable but cannot free, or a list of all objects being tracked.
If you have a suspicion which objects might leak, the weakref module could be handy to find out if/when objects are collected.

Meliae looks promising:
This project is similar to heapy (in the 'guppy' project), in its attempt to understand how memory has been allocated.
Currently, its main difference is that it splits the task of computing summary statistics, etc of memory consumption from the actual scanning of memory consumption. It does this, because I often want to figure out what is going on in my process, while my process is consuming huge amounts of memory (1GB, etc). It also allows dramatically simplifying the scanner, as I don't allocate python objects while trying to analyze python object memory consumption.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.