PyCuda / Multiprocessing Issue on OS X 10.8

PyCuda / Multiprocessing Issue on OS X 10.8 - python

I'm working on a project where I distribute compute tasks to multiple python Processes each associated with its own CUDA device.
When spawning the subprocesses, I use the following code:
import pycuda.driver as cuda
class ComputeServer(object):
def _init_workers(self):
self.workers = []
cuda.init()
for device_id in range(cuda.Device.count()):
print "initializing device {}".format(device_id)
worker = CudaWorker(device_id)
worker.start()
self.workers.append(worker)
The CudaWorker is defined in another file as follows:
from multiprocessing import Process
import pycuda.driver as cuda
class CudaWorker(Process):
def __init__(self, device_id):
Process.__init__(self)
self.device_id = device_id
def run(self):
self._init_cuda_context()
while True:
# process requests here
def _init_cuda_context(self):
# the following line fails
cuda.init()
device = cuda.Device(self.device_id)
self.cuda_context = device.make_context()
When I run this code on Windows 7 or Linux, I have no issues. When running the code on my MacBook Pro with OSX 10.8.2, Cuda 5.0, and PyCuda 2012.1 I get the following error:
Process CudaWorker-1:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/Users/tombnorwood/pymodules/computeserver/worker.py", line 32, in run
self._init_cuda_context()
File "/Users/tombnorwood/pymodules/computeserver/worker.py", line 38, in _init_cuda_context
cuda.init()
RuntimeError: cuInit failed: no device
I have no issues running PyCuda scripts without forking new processes on my Mac. I only get this issue when spawning a new Process.
Has anyone run into this issue before?

This is really just an educated guess based on my experienced, but I suspect that the OS X implementation of CUDA (or possibly PyCuda) relies on some APIs that can't be used safely after fork, while the linux implementation does not.* Since the POSIX implementation of multiprocessing uses fork without exec to create child processes, this would explain why it fails on OS X but not linux. (And on Windows, there is no fork, just a spawn equivalent, so this isn't an issue.)
The simplest solution would be to drop multiprocessing. If CUDA and PyCUDA are thread-safe (I don't know if they are), and your code is not CPU-bound (just GPU-bound), you might be able to just drop in threading.Thread in place of multiprocessing.Process and be done with it. Or you could consider one of the other parallel-processing libraries that provide similar APIs to multiprocessing. (There are a few people who use pp only because it always execs…)
However, it's pretty easy to hack up multiprocessing to exec/spawn a new Python interpreter and then do everything Windows-style instead of POSIX-style. (Getting every case right is difficult, but getting one specific use case right is easy.)
Or, if you look at bug #8713, there's some work being done on making this work right in general. And there are working patches. Those patches are for 3.3, not 2.7, so you'd probably need a bit of massaging, but it shouldn't be very much. So, just cp $MY_PYTHON_LIB/multiprocessing.py $MY_PROJECT_DIR/mymultiprocessing.py, patch it, use mymultiprocessing in place of multiprocessing, and add the appropriate call to pick spawn/fork+exec/whatever the mode is called in the latest patch before you do anything else.
* The OP says he suspected the same thing, so I probably don't need to explain this to him, but for future readers: It's not about a difference between Darwin and other Unixes, but about the fact that Apple ships a lot of non-Unix-y mid-level libraries like CoreFoundation.framework, Accelerate.framework, etc. that use unsafe-after-fork functionality (or just assert that they're not being used after a fork because Apple doesn't want to put in the rigorous testing that would be warranted before they could say "as of 10.X, Foo.framework is safe after fork"). Also, if you compare the way OS X and linux deal with graphics and other hardware, there's a lot more mid-level in-each-process-userspace going on in OS X.

Related

Linux IPC: Locking, but not file based locking

I need a way to ensure only one python process is processing a directory.
The lock/semaphore should be local to the machine (linux operating system).
Networking or NFS is not involved.
I would like to avoid file based locks, since I don't know where I should put the lock file.
There are libraries which provide posix IPC at pypi.
Is there no way to use linux semaphores with python without a third party library?
The lock provided by multiprocessing.Lock does not help, since both python interpreter don't share one the same parent.
Threading is not involved. All processes have only one thread.
I am using Python 2.7 on linux.
How to to synchronize two python scripts on linux (without file based locking)?
Required feature: If one process dies, then the lock/semaphore should get released by the operating system.

flock the directory itself — then you never need worry about where to put the lock file:
import errno
import fcntl
import os
import sys
# This will work on Linux
dirfd = os.open(THE_DIRECTORY, os.O_RDONLY) # FIXME: FD_CLOEXEC
try:
fcntl.flock(dirfd, fcntl.LOCK_EX|fcntl.LOCK_NB)
except IOError as ex:
if ex.errno != errno.EAGAIN:
raise
print "Somebody else is working here; quitting." # FIXME: logging
sys.exit(1)
do_the_work()
os.close(dirfd)

I would like to avoid file based locks, since I don't know where I should put the lock file.
You can lock the existing file or directory (the one being processed).
Required feature: If one process dies, then the lock/semaphore should get released by the operating system.
That is exactly how file locks work.

Python multiprocessing not working when used from C++ [duplicate]

This question already has an answer here:
Embedded python: multiprocessing not working
(1 answer)
Closed 8 years ago.
I'm trying to embed Python 3.3 x64 script with 'multiprocessing' to C++ code under Windows 7 x64.
Simple script like:
from multiprocessing import Process
def spawnWork(fileName, index):
print("spawnWork: Entry... ")
process = Process(target=execute, args=(fileName, index, ))
process.start()
print("spawnWork: ... Exit.")
def execute(fileName, index):
print("execute: Entry... ")
#Do some long processing
print("execute: ... Exit.")
works fine from Python, but when embedded it stuck at .start() and locks.
I'm using all the relevant API calls to ensure safe GIL processing for Python. It works pretty well when not dealing with 'multiprocessing' package but locks when attempting to start another 'Process'.
Is it possible to use both Python/C++ mix and 'multiprocessing'?
Thanks

I wouldn't expect this to work, as the way multiprocessing works on Windows (where there's no fork) is to CreateProcess another copy of the same executable. And since that executable is your embedding C++ app, not the Python interpreter, you will probably have to cooperate very closely with it to make that work. You can see the relevant code in posix_spawn_win32.py.
Another potential problem is that on Windows, multiprocessing relies on a C extension module that fakes POSIX semaphores on top of Windows kernel semaphores; I haven't read through the code, but that could easily be doing something funky to GIL/threadstate and/or relying on something under the covers to share the semaphores with child Python executables.

python: functions from math and os modules are interrupted by EINTR

I have linux board on samsung SoC s3c6410 (ARM11).
I build rootfs with buildroot:
Python 2.7.1, uClibc-0.9.31.
Linux kernel:
Linux buildroot 2.6.28.6 #177 Mon Oct 3 12:50:57 EEST 2011 armv6l GNU/Linux
My app, written on python, in some mysterios conditons raise this exceptions:
1)
exception:
File "./dfbUtils.py", line 3209, in setItemData
ValueError: (4, 'Interrupted system call')
code:
currentPage=int(math.floor(float(rowId)/self.pageSize))==self.selectedPage
2)
exception:
File "./terminalGlobals.py", line 943, in getFirmawareName
OSError: [Errno 4] Interrupted system call: 'firmware'
code:
for fileName in os.listdir('firmware'):
Some info about app: it have 3-7 threads, listen serial ports via 'serial' module, use gui implemented via c extension that wrap directfb, i can't reproduce this exceptions, they are not predictable.
I googled for EINTR exceptions in python, but only found that EINTR can occur only on slow system calls and python's modules socket, subprocess and another one is already process EINTR. So what happens in my app? Why simple call of math function can interrupt program at any time, it's not reliable at all. I have only suggestions: ulibc bug, kernel/hw handling bug. But this suggestions don't show me solution.
Now i created wrap functions (that restart opertion in case of EINTR) around some functions from os module, but wrapping math module will increase execution time in 2 times. There another question: if math can be interrutped than other module also can and how to get reliability?
P.S. I realize that library call (to libm for example) is not system call, so why i have "Interrupted system call"?

There was an old bug with threads and EINTR in uClibc (#4994) that they fixed in 0.9.30. The fix was tested against pthreads, so I would second the suggestion by tMC to check how you configured threads when building uClibc.
Also you could try compiling with the malloc-simple option? It is slow, but if your issue disappears it may suggest threading issues as well:
malloc-simple is trivially simple and slow as molasses. It
was written from scratch for uClibc, and is the simplest possible
(and therefore smallest) malloc implementation.
This uses only the mmap() system call to allocate and free memory,
and does not use the brk() system call at all, making it a fine
choice for MMU-less systems with very limited memory. It's 100%
standards compliant, thread safe, very small, and releases freed
memory back to the OS immediately rather than keeping it in the
process's heap for reallocation. It is also VERY SLOW.

How can I watch for a filesystem change without drastically affecting system performance?

Time for another newbie question, I fear. I'm attempting to use Python 3.2.2 (the version is important, in this case) to monitor a particular Windows path for changes. The simplest method, and the method I'm using, is:
original_state = os.listdir(path_string)
while os.listdir(path_string) == original_state:
time.sleep(1)
change_time = datetime.datetime.now()
I'm writing this code to do some timing tests of another application. With that goal in mind, the Python script needs to (a) not adversely affect system performance, and (b) be relatively precise -- a margin of error of +/- 1 second is the absolute maximum I can justify. Unfortunately, this method doesn't meet the first criterion: When running this particular bit of code, the virtual environment is hammered, drastically slowing down the operations whose performance I'm trying to accurately measure.
I've read how to watch a File System for change, How do I watch a file for changes?, and http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html (an article recommended as a solution to that second SO question.) Unfortunately, Tim Golden's code appears to be Python 2.x code -- as near as I can tell, the pywin32 module isn't supported in Python 3.
What can I do in Python 3 to monitor this particular path without running into the same performance problems?

According to the ActivePython 3.2 Documentation, their pywin32 now supports Python 3.x

On Linux there is iNotify and pyNotify. Similar asynchronous notification mechanism on windows is FindFirstChangeNotification function which is a part of FileSystemWatcher Class
Please look at solutions on the Tim Golden's page:
http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html
http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html#use_findfirstchange

It is also possible to monitor a file or directory using GFileMonitor with Gio taking care of the underlying operating system details. Although, granted you likely won't be using Gtk if this is a Windows program. For posterity:
from gi.repository import Gio
gfile = Gio.file_new_for_path('/home/user/Downloads')
gfilemonitor = gfile.monitor(Gio.FileMonitorFlags.NONE, None)
gfilemonitor.connect('changed', callback_func)

How can I read system information in Python on OS X?

Following from this OS-agnostic question, specifically this response, similar to data available from the likes of /proc/meminfo on Linux, how can I read system information from OS X using Python (including, but not limited to memory usage).

You can get a large amount of system information from the command line utilities sysctl and vm_stat (as well as ps, as in this question.)
If you don't find a better way, you could always call these using subprocess.

The only stuff that's really nicely accesible is available from the platform module, but it's extremely limited (cpu, os version, architecture, etc). For cpu usage and uptime I think you will have to wrap the command line utilities 'uptime' and 'vm_stat'.
I built you one for vm_stat, the other one is up to you ;-)
import os, sys
def memoryUsage():
result = dict()
for l in [l.split(':') for l in os.popen('vm_stat').readlines()[1:8]]:
result[l[0].strip(' "').replace(' ', '_').lower()] = int(l[1].strip('.\n '))
return result
print memoryUsage()

I did some more googling (looking for "OS X /proc") -- it looks like the sysctl command might be what you want, although I'm not sure if it will give you all the information you need. Here's the manpage: http://developer.apple.com/DOCUMENTATION/Darwin/Reference/ManPages/man8/sysctl.8.html
Also, wikipedia.

i was searching for this same thing, and i noticed there was no accepted answer for this question. in the intervening time since the question was originally asked, a python module called psutil was released:
https://github.com/giampaolo/psutil
for memory utilization, you can use the following:
>>> psutil.virtual_memory()
svmem(total=8374149120L, available=2081050624L, percent=75.1, used=8074080256L, free=300068864L, active=3294920704, inactive=1361616896, buffers=529895424L, cached=1251086336)
>>> psutil.swap_memory()
sswap(total=2097147904L, used=296128512L, free=1801019392L, percent=14.1, sin=304193536, sout=677842944)
>>>
there are functions for cpu utilization, process management, disk, and network as well. the only omission from the module is a function for retrieving load average, but the python stdlib has os.getloadavg() if you are on a UNIX-like system.
psutil claims to support Linux, Windows, OSX, FreeBSD and Sun Solaris, but i have only tried OSX mavericks and fedora 20.

Here's a MacFUSE-based /proc fs:
http://www.osxbook.com/book/bonus/chapter11/procfs
If you have control of the boxes you're running your python program on it might be a reasonable solution. At any rate it's nice to have a /proc to look at!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.