Error in Python multiprocessing process - python

I am trying a write a python code having multiple processes whose structure and flow is something like this:
import multiprocessing
import ctypes
import time
import errno
m=multiprocessing.Manager()
mylist=m.list()
var1=m.Value('i',0)
var2=m.Value('i',1)
var3=m.Value('i',2)
var4=m.Value(ctypes.c_char_p,"a")
var5=m.Value(ctypes.c_char_p,"b")
var6=3
var7=4
var8=5
var9=6
var10=7
def func(var1,var2,var4,var5,mylist):
i=0
try:
if var1.value==0:
print var2.value,var4.value,var5.value
mylist.append(time.time())
elif var1.value==1:
i=i+2
print var2.value+2,var4.value,var5.value
mylist.append(time.time())
except IOError as e:
if e.errno==errno.EPIPE:
var3.value=var3.value+1
print "Error"
def work():
for i in range(var3.value):
print i,var6,var7,va8,var9,var10
p=multiprocessing.Process(target=func,args=(var1,var2,var4,var5,mylist))
p.start()
work()
When I run this code, sometimes it works perfectly, sometimes it does not run for exact amount of loop counts and sometimes I get following error:
0
1
Process Process-2:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "dummy.py", line 19, in func
if var1.value==0:
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 1005, in get
return self._callmethod('get')
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 149, in Client
answer_challenge(c, authkey)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 383, in answer_challenge
message = connection.recv_bytes(256) # reject large message
EOFError
What does this error mean? What wrong am I doing here? What this error indicates? Kindly guide me to the correct path. I am using CentOS 6.5

Working with shared variables in multiprocessing is tricky. Because of the python Global Interpreter Lock (GIL), multiprocessing is not directly possible in Python. When you use the multiprocessing module, you can launch several task on different process, BUT you can't share the memory.
In you case, you need this so you try to use shared memory. But what happens here is that you have several processes trying to read the same memory at the same time. To avoid memory corruption, a process lock the memory address it is currently reading, forbidding other processes to access it until it finishes reading.
Here you have 3 processes trying to evaluate var1.value in the first if loop of your func : the first process read the value, and the other are blocked, raising an error.
To avoid this mechanism, you should always manage the Lock of your shared variables yourself.
You can try with syntax:
var1=multiprocessing.Value('i',0) # create shared variable
var1.acquire() # get the lock : it will wait until lock is available
var1.value # read the value
var1.release() # release the lock
External documentation :
Locks : https://docs.python.org/2/librar/multiprocessing.html#synchronization-between-processes
GIL : https://docs.python.org/2/glossary.html#term-global-interpreter-lock

Related

"Dictionary size changed during iteration" from Pebble ProcessPool

We've some parallel processing code which is built around Pebble, it's been working robustly for quite some time but we seem to have run into some odd edge-case.
Based on the exception trace (and the rock-simple code feeding it) I suspect that it's actually a bug in Pebble but who knows.
The code feeding the process pool is pretty trivial:
pool = ProcessPool(max_workers=10, max_tasks=10)
for path in filepaths:
try:
future = pool.schedule(function=self.analyse_file, args(path), timeout=30)
future.add_done_callback(self.process_result)
exception Exception as e:
print("Exception fired:" + e) # NOT where the exception is firing
pool.close()
pool.join()
So in essence, we schedule a bunch of stuff to run, close out the pool then wait for the pool to complete the scheduled tasks. NOTE: the exception is not being thrown in the schedule loop, it gets fired AFTER we call join().
This is the exception stack trace:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 150, in task_scheduler_loop
pool_manager.schedule(task)
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 198, in schedule
self.worker_manager.dispatch(task)
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 327, in dispatch
self.pool_channel.send(WorkerTask(task.id, task.payload))
File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/channel.py", line 66, in send
return self.writer.send(obj)
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
RuntimeError: dictionary changed size during iteration
I think it's got to be some weird race condition, as the code will work flawlessly on some data sets but fail at what appears to be a random point on another dataset.
We were using pebble 4.3.1 when we first ran into the issue (same version we'd had since the beginning), tried upgrading to 4.5.0, no change.
Has anybody run into similar issues with Pebble in the past? If so what was your fix?

Invalid requirement, parse error at "''"

I'm trying to connect to host by different threads in python but getting error sometimes(1 times in 25 times execution)
I have seen similar threads and hoped to update pip to 8.1.1 will solve this but did not solve though.
code snippet:
def getkpis(self,cmd,host):
ssh=paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
ssh.connect(host,username='root',look_for_keys=True)
stdin, stdout, stderr = ssh.exec_command(cmd)
paramiko.util.log_to_file("kpiparamiko.log")
output=stdout.read()
appendarray=output.split('\n')
sys.stdin.flush()
ssh.close()
except paramiko.SSHException, e:
print str(e)
Error seen:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/conf/home/smodugu/kpiparse.py", line 56, in getkpis
ssh.connect(host,username='root',look_for_keys=True)
File "/usr/lib/python2.7/site-packages/paramiko/client.py", line 338, in connect
t.start_client()
File "/usr/lib/python2.7/site-packages/paramiko/transport.py", line 493, in start_client
raise e
RequirementParseError: Invalid requirement, parse error at "''"
Yesterday, I was able to get around this by using an older version of setuptools, pip install "setuptools<34" but then today the problem came back. I was able to get around it by adding a 0.1 second sleep in the loop that was queuing the threads. Why multiple threaded calls to paramiko's SSHClient cause this error with pip/setuptools, I have no idea.
It looks like the connect function is not thread safe in the version of paramiko for python2.7
The solution is to use the Lock object from the threading module,
from threading import Lock.
Then wrap the call to the connect function of the paramiko client with the lock object.
For example:
from threading import Lock
lock = Lock()
...
lock.acquire()
client.connect(...)
lock.release()
The code above makes so that only one thread will use the connect at a time, which solves the problem that the function is not thread safe.
*** I am not sure if the problem exists in newer versions of paramiko, worth a look.

Cement framework receive signal 15 on pool worker close

I'm experiencing a problem with the Cement framework for python (using python3 at the moment). I have a multiprocess application which uses python's Pool workers. A the end (it deos not interfere with the results) of every multiporcessing section my stdout is filled with one or more of these exceptions:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/util.py", line 254, in _run_finalizers
finalizer()
File "/usr/lib/python3.5/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/queues.py", line 198, in _finalize_join
thread.join()
File "/usr/lib/python3.5/threading.py", line 1054, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.5/threading.py", line 1070, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
File "/home/yogaub/.virtualenvs/seminar/lib/python3.5/site-packages/cement/core/foundation.py", line 123, in cement_signal_handler
raise exc.CaughtSignal(signum, frame)
cement.core.exc.CaughtSignal: Caught signal 15
Does anyone know why this happens, and how to prevent it?
Thanks
edit: I should add that i'm logging with the multiprocess logging system of this question. I don't really know if there is any correlation.
edit2: This is the process pool creation and termination:
pool = Pool(processes=core_num)
pool.map(worker_unpacker.work, formatted_input)
pool.close()
t2 = time.time()
I've tried catching sigterm with Cement's hook system but it doesn't work. The only solution I found at the moment is to actually completely ignore signals in the cement app configuration (but it is not really a solution I like..).
This is an educated guess: The parent process kills (terminate()s) the started processes on exit. If you call pool.join() in the parent process, then the parent process waits until all sub processes are finished and will not send SIGTERM to them.

Convert a multi-threaded Python to a multi-process one using concurrent futures

I have the following working code (Python 3.5) which uses concurrent futures to parse files in a threaded manner, and then do some post-processing on the results when they come back (in any order).
from concurrent import futures
with futures.ThreadPoolExecutor(max_workers=4) as executor:
# A dictionary which will contain a list the future info in the key, and the filename in the value
jobs = {}
# Loop through the files, and run the parse function for each file, sending the file-name to it, along with the kwargs of parser_variables.
# The results of the functions can come back in any order.
for this_file in files_list:
job = executor.submit(parse_log_file.parse, this_file, **parser_variables)
jobs[job] = this_file
# Get the completed jobs whenever they are done
for job in futures.as_completed(jobs):
debug.checkpointer("Multi-threaded Parsing File finishing")
# Send the result of the file the job is based on (jobs[job]) and the job (job.result)
result_content = job.result()
this_file = jobs[job]
I want to convert this to use processes instead of threads because threads don't offer any speedup. In theory I just need to change ThreadPoolExecutor into ProcessPoolExecutor.
The problem is, if I do that I get this exception:
Process Process-2:
Traceback (most recent call last):
File "C:\Python35\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
File "C:\Python35\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\Python35\lib\concurrent\futures\process.py", line 169, in _process_worker
call_item = call_queue.get(block=True)
File "C:\Python35\lib\multiprocessing\queues.py", line 113, in get
return ForkingPickler.loads(res)
TypeError: Required argument 'fileno' (pos 1) not found
Traceback (most recent call last):
File "c:/myscript/main.py", line 89, in <module>
main()
File "c:/myscript/main.py", line 59, in main
system_counters = process_system(system, filename)
File "c:\myscript\per_system.py", line 208, in process_system
system_counters = process_filelist(**file_handling_variables)
File "c:\myscript\per_logfile.py", line 31, in process_filelist
results_list = job.result()
File "C:\Python35\lib\concurrent\futures\_base.py", line 398, in result
return self.__get_result()
File "C:\Python35\lib\concurrent\futures\_base.py", line 357, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I think that this might have something to do with pickling, but googling for the error hasn't found anything.
How do I convert the above to use multiple processes?
It turns out this is because one of the things I'm passing inside parser_variables is a class (a reader from a third-party module). If I remove the class, the above works fine.
For whatever reason, pickle doesn't seem to be able to handle this particular object.

pytables crash with threads

The following code shows a problem in the interaction between pytables and threading. I'm creating an HDF file and reading it with 100 concurrent threads:
import threading
import pandas as pd
from pandas.io.pytables import HDFStore, get_store
filename='test.hdf'
with get_store(filename,mode='w') as store:
store['x'] = pd.DataFrame({'y': range(10000)})
def process(i,filename):
# print 'start', i
with get_store(filename,mode='r') as store:
df = store['x']
# print 'end', i
return df['y'].max
threads = []
for i in range(100):
t = threading.Thread(target=process, args = (i,filename,))
t.daemon = True
t.start()
threads.append(t)
for t in threads:
t.join()
The program usually executes cleanly. But now and then I get exceptions like this:
Exception in thread Thread-27:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "crash.py", line 13, in process
with get_store(filename,mode='r') as store:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.py", line 259, in get_store
store = HDFStore(path, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.py", line 398, in __init__
self.open(mode=mode, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.py", line 528, in open
self._handle = tables.openFile(self._path, self._mode, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tables/_past.py", line 35, in oldfunc
return obj(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tables/file.py", line 298, in open_file
for filehandle in _open_files.get_handlers_by_name(filename):
RuntimeError: Set changed size during iteration
or
[...]
File "/usr/local/lib/python2.7/dist-packages/tables/_past.py", line 35, in oldfunc
return obj(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tables/file.py", line 299, in open_file
omode = filehandle.mode
AttributeError: 'File' object has no attribute 'mode'
While reducing the code I got very different error messages, some of them indicating memory corruption.
Here are my library versions:
>>> pd.__version__
'0.13.1'
>>> tables.__version__
'3.1.0'
I already have had an error with threads which occured in writing files and I solved it by recompiling hdf5 with options: --enable-threadsafe --with-pthread
Can anyone reproduce the problem? How to solve it?
Anthony already pointed out that hdf5 (PyTables is basically a wrapper around the hdf5 C library) is not thread-safe. If you want to access an hdf5 file from a web application, you have basically two options:
Use a dedicated process that handles all the hdf5 I/O. Processes/threads of the web application must communicate with this process through, e.g., Unix Domain Sockets. The downside of this approach — obviously — is that it scales very badly. If one web request is accessing the hdf5 file, all other requests must wait.
Implement a read-write locking mechanism that allows concurrent reading, but uses an exclusive lock for writing. Cf. http://en.wikipedia.org/wiki/Readers-writers_problem.
Note that with a mod_wsgi application — depending on the configuration — you have to deal with threads and processes!
I am also currently struggling with using hdf5 as a database backend for a web application. I think the 2nd approach above provides a decent solution. But still, hdf5 is not a database system. If you want a real array database server with a Python interface, have a look at http://www.scidb.org. It is not nearly as light-weight as an hdf5-based solution, though.
One bit that has not been mentioned yet, recompile HDF5 to be thread-safe using:
--enable-threadsafe --with-pthread=DIR
https://support.hdfgroup.org/HDF5/faq/threadsafe.html
I had some hard-to-find bugs in my keras code, which uses HDF5, and this was what solved it.
PyTables is not fully thread safe. Use multiprocess pools instead.

Categories