multiprocessing.pool.MaybeEncodingError: Error sending result occurs at last object - python

I keep having an issue when executing a function multiple times at once using the multiprocessing.Pool class.
I am using Python 3.8.3 on Windows 10 with PyCharm 2017.3.
The function I am executing is opening and serialising excel files from my harddisk to custom objects which I want to iterate through later on.
The error always occurs after the last execution of the function.
Here is what it says:
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<IntegListe.IntegrityList object at 0x037481F0>, <IntegListe.IntegrityList object at 0x03D86CE8>, <IntegListe.IntegrityList object at 0x03F50F88>]'. Reason: 'TypeError("cannot pickle '_thread.RLock' object")'
Here is what my code looks like:
from multiprocessing import Pool
p = Pool()
ilList = p.starmap(extract_excel, [(f, spaltennamen) for f in files])
p.join()
p.close()
And this is the function I am trying to execute parallely:
def extract_excel(t: tuple) -> IntegrityList:
file_path = t[0]
spaltennamen = t[1]
il = IntegrityList(file_path)
print(il)
spaltennamen = list(map(lambda x: Excel.HereOrFind(il.ws, x, x.value), spaltennamen)) # Update position of column headers
il.parse_columns(spaltennamen, il.ws)
il.close()
return il
Since I am quite new to python, I am having troubles figuring out the magic behind this multiprocessing error. Executing the function serially is working perfectly fine and I get the desired output. This proofs, that the function and all the sub functions work as expected. I would be glad for any information that could help solve this problem. Thanks!

Okay so for future issue viewers, I solved the error with the help of this website: https://www.synopsys.com/blogs/software-security/python-pickling/#:~:text=Whenever%20an%20object%20is%20pickled,reconstruct%20this%20object%20when%20unpickling..
It states that every customized object that goes through a parallel process needs to have the __reduce__ method implemented in order to be reconstructed.
I simply added this code to my custom object:
def __reduce__(self):
return IntegrityList, (self.file_path,)
After that the execution of the parallel processing works great.

Related

Using Thread.join() method with threads that inside inside a class

I am solving this question from LeetCode: 1116. Print Zero Even Odd
I am running this solution in VS Code with my own main function to understand the issue in depth.
After reading this question and the suggested solutions. In addition to reading this explanation.
I added this code to the code from the solution:
from threading import Semaphore
import threading
def threaded(fn):
def wrapper(*args, **kwargs):
threading.Thread(target=fn, args=args, kwargs=kwargs).start()
return wrapper
and before those functions from the question I added: #threaded
I added a printNumber function and main function to run it on VS Code.
def printNumber(num):
print(num, end="")
if __name__ == "__main__":
a = ZeroEvenOdd(7)
handle = a.zero(printNumber)
handle = a.even(printNumber)
handle = a.odd(printNumber)
Running this code gives me a correct answer but I do not get a new line printed in the terminal after that, I mean for input 7 in my main function, the output is: 01020304050607hostname and not what I want it to be:
01020304050607
hostname
So, I added print("\n") in the main and I saw that I get a random output like:
0102
0304050607
or
0
1020304050607
still without a new line in the end.
When I try to use the join function handle.join() then I get the error:
Exception has occurred: AttributeError 'NoneType' object has no
attribute 'join'
I tried to do this:
handle1 = a.zero(printNumber)
handle2 = a.even(printNumber)
handle3 = a.odd(printNumber)
handle1.join()
handle2.join()
handle3.join()
Still got the same error.
Where in the code should I do the waiting until the threads will terminate?
Thanks.
When I try to use...handle.join()...I get the error: "...'NoneType' object has no attribute, 'join'
The error message means that the value of handle was None at the point in your program where your code tried to call handle.join(). There is no join() operation available on the None value.
You probably wanted to join() a thread (i.e., the object returned by Threading.thread(...). For a single thread, you could do this:
t = Threading.thread(...)
t.start()
...
t.join()
Your program creates three threads, so you won't be able to just use a single variable t. You could use three separate variables, or you could create a list, or... I'll leave that up to you.

Python: Why does Pool.map() hang when attempting to use the input arguments to its map function?

I have the following function (shortened for readability), which I parallelize using Python's (3.5) multiprocessing module:
def evaluate_prediction(enumeration_tuple):
i = enumeration_tuple[0]
logits_pred = enumeration_tuple[1]
print("This prints succesfully")
print("This never gets printed: ")
print(enumeration_tuple[0])
filename = sample_names_test[i]
onehots_pred = logits_to_onehots(logits_pred)
np.save("/media/nfs/7_raid/ebos/models/fcn/" + channels + "/test/ndarrays/" + filename, onehots_pred)
However, this function hangs whenever I attempt to read its input argument. Execution can get past the logits_pred = enumeration_tuple[1] line, as evidenced by a print statement printing a simple string, but it halts whenever I print(logits_pred). So apparently, whenever I actually need the passed value, the process stops. I do not get an exception or error message. When using either Python's built-in map() function or a for-loop, the function finishes succesfully. I should have sufficient memory en computing power available. All processes are writing to different files. enumerate(predictions) yields correct index-value pairs, as expected. I call this function using Pool.map():
pool = multiprocessing.Pool()
file_results = pool.map(evaluate_prediction, enumerate(predictions))
Why is it hanging? And how can I get an exception, so I know what's going wrong?
UPDATE: After outsourcing the mapped function to another module, importing it from there, and adding __init__.py to my directory, I manage to print the first item in the tuple, but not the second.
I had a similar issue before, and a solution that worked for me was to put the function you want to parallelize in a separate module and then import it.
from eval_prediction import evaluate_prediction
pool = multiprocessing.Pool()
file_results = pool.map(evaluate_prediction, enumerate(predictions))
I assume you will save the function definition inside a filename eval_prediction.py in same directory. Make sure you have __init__.py as well.

Segmentation fault when initializing array

I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api
This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library
To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.

Python Memory Leak - Why is it happening?

For some background on my problem, I'm importing a module, data_read_module.pyd, written by someone else, and I cannot see the contents of that module.
I have one file, let's called it myfunctions. Ignore the ### for now, I'll comment on the commented portions later.
import data_read_module
def processData(fname):
data = data_read_module.read_data(fname)
''' process data here '''
return t, x
### return 1
I call this within the framework of a larger program, a TKinter GUI specifically. For purposes of this post, I've pared down to the bare essentials. Within the GUI code, I call the above as follows:
import myfunctions
class MyApplication:
def __init__(self,parent):
self.t = []
self.x = []
def openFileAndProcessData(self):
# self.t = None
# self.x = None
self.t,self.x = myfunctions.processData(fname)
## myfunctions.processData(fname)
I noticed what every time I run openFileAndProcessData, Windows Task Manager reports that my memory usage increases, so I thought that I had a memory leak somewhere in my GUI application. So the first thing I tried is the
# self.t = None
# self.x = None
that you see commented above. Next, I tried calling myfunctions.processData without assigning the output to any variables as follows:
## myfunctions.processData(fname)
This also had no effect. As a last ditch effort, I changed the processData function so it simply returns 1 without even processing any of the data that comes from the module, data_read_module.pyd. Unfortunately, even this results in more memory being taken up with each successive call to processData, which narrows the problem down to data_read_module.read_data. I thought that within the Python framework, this is the exact type of thing that is automatically taken care of. Referring to this website, it seems that memory taken up by a function will be released when the function terminates. In my case, I would expect the memory used in processData to be released after a call [with the exception of the output that I am keeping track of with self.t and self.x]. I understand I won't get a fix to this kind of issue without access to data_read_module.pyd, but I'd like to understand how this can happen to begin with.
A .pyd file is basically a DLL. You're calling code written in C, C++, or another such compiled language. If that code allocates memory and doesn't release it properly, you will get a memory leak. The fact that the code is being called from Python won't magically fix it.

Getting "maximum recursion depth exceeded while calling a Python object" with Celery

I am beginner.I'm running a task in Celery and getting this strange error
maximum recursion depth exceeded while calling a Python object
You can check the full error in this pastebin
I don't quite understand because I haven't change anything and yesterday it was working fine. I ran the task without celery in the python interpreter and it runs fine. You can check the function here. Finally, for what it is worth, this task is getting created 12 times by an other task.
Do you see anything that could create such an error?
EDIT:
This is the task I call this function / task
#celery.task(ignore_result=True)
def get_classicdata(leagueid):
print "getting team data for %s"%leagueid
returned_data = {}
for team in r.smembers('league:%s'%leagueid):
data = scrapteam.delay(team,r.get('currentgw'))
returned_data[team] = data.get()
Everything looks fine. The traceback implies that the returned object somewhere cannot be pickled, but your returned 'team' data structure is a dictionary containing a non-recursive data structure of basic types, so that can't cause a problem. For better remote debugging, please put a print statement before the "return team", so that it shows the content of the team. You might also try just having it return a {} and see if that changes thing.
Then also add a debugging print statement in get_classicdata showing the content of data.get(), as well as something just before the return there, in order to verify if that function reaches completion.

Categories