Multiprocessing in Python: impossible to get back my results (get()) (happens rarely)

Multiprocessing in Python: impossible to get back my results (get()) (happens rarely) - python

I use Multiprocessing in Python in order to do several requests to a database (and other stuff):
po = multiprocessing.Pool()
for element in setOfElements:
results.append(po.apply_async(myDBRequestModule, (element, other stuff...)))
po.close()
po.join()
for r in results:
newSet.add(r.get())
myDBRequestModule returns an object I defined, made of a list and two numbers. I redefined the hash function, in order to define what I mean by equality in my sets of these objects:
class myObject:
def __init__(self, aList, aNumber, anotherNumber):
self.list = aList
self.number1 = aNumber
self.number2 = anotherNumber
def __hash__(self):
# turn elements of list into a string, in order to hash the string
hash_text = ""
for element in self.list:
hash_text += str(element.x.id) # I use the ID of the element of my list...
return hash(hash_text)
def __eq__(self, other):
self_hash_text = ""
other_hash_text = ""
for element in self.list:
self_hash_text += str(element.x.id)
for element in other.listDest:
other_hash_text += str(element.x.id)
return self_hash_text == other_hash_text
And in most cases it works as it should. Twice, for no known reason and in exactly the same context, I had a bug:
newSet.add(r.get())
File "/usr/lib/python2.6/multiprocessing/pool.py", line 422, in get
raise self._value
TypeError: 'str' object does not support item assignment
It comes from the get method (last line):
def get(self, timeout=None):
self.wait(timeout)
if not self._ready:
raise TimeoutError
if self._success:
return self._value
else:
raise self._value
Since I had this mistake only once and it disappeared, I decided to give up earlier, but it created a second problem recently, and I really don't know how to fight this bug.
In particular, it's difficult for me to tell why it happens almost never, and usually works perfectly fine.

multiprocessing is not the issue here.
You have not given us the right code to diagnose the issue. At some point you have assigned a caught exception to self._value. That is where the error is occurring. Look at everywhere that self._value is assigned and you will be on your way to finding this error.

Related

Is it really impossible to unpickle a Python class if the original python file has been deleted?

Suppose you have the following:
file = 'hey.py'
class hey:
def __init__(self):
self.you =1
ins = hey()
temp = open("cool_class", "wb")
pickle.dump(ins, temp)
temp.close()
Now suppose you delete the file hey.py and you run the following code:
pkl_file = open("cool_class", 'rb')
obj = pickle.load(pkl_file)
pkl_file.close()
You'll get an error. I get that it's probably the case that you can't work around the problem of if you don't have the file hey.py with the class and the attributes of that class in the top level then you can't open the class with pickle. But it has to be the case that I can find out what the attributes of the serialized class are and then I can reconstruct the deleted file and open the class. I have pickles that are 2 years old and I have deleted the file that I used to construct them and I just have to find out what what the attributes of those classes are so that I can reopen these pickles
#####UPDATE
I know from the error messages that the module that originally contained the old class, let's just call it 'hey.py'. And I know the name of the class let's call it 'you'. But even after recreating the module and building a class called 'you' I still can't get the pickle to open. So I wrote this code on the hey.py module like so:
class hey:
def __init__(self):
self.hey = 1
def __setstate__(self):
self.__dict__ = ''
self.you = 1
But I get the error message: TypeError: init() takes 1 positional argument but 2 were given
#########UPDATE 2:
I Changed the code from
class hey:
to
class hey():
I then got an AttributeError but it doesn't tell me what attribute is missing. I then performed
obj= pickletools.dis(file)
And got an error on the pickletools.py file here
def _genops(data, yield_end_pos=False):
if isinstance(data, bytes_types):
data = io.BytesIO(data)
if hasattr(data, "tell"):
getpos = data.tell
else:
getpos = lambda: None
while True:
pos = getpos()
code = data.read(1)
opcode = code2op.get(code.decode("latin-1"))
if opcode is None:
if code == b"":
raise ValueError("pickle exhausted before seeing STOP")
else:
raise ValueError("at position %s, opcode %r unknown" % (
"<unknown>" if pos is None else pos,
code))
if opcode.arg is None:
arg = None
else:
arg = opcode.arg.reader(data)
if yield_end_pos:
yield opcode, arg, pos, getpos()
else:
yield opcode, arg, pos
if code == b'.':
assert opcode.name == 'STOP'
break
At this line:
code = data.read(1)
saying: AttributeError: 'str' object has no attribute 'read'
I will now try the other methods in the pickletools
########### UPDATE 3
I wanted to see what happened when I saved an object composed mostly of dictionary but some of the values in the dictionaries were classes. This is the class that was saved:
so here is the class in question:
class fss(frozenset):
def __init__(self, *args, **kwargs):
super(frozenset, self).__init__()
def __str__(self):
str1 = lbr + "{}" + rbr
return str1.format(','.join(str(x) for x in self))
Now keep in mind that the object pickled is mostly a dictionary and that class exists within the dictionary. After performing
obj= pickletools.genops(file)
I get the following output:
image
image2
I don't see how I would be able to construct the class referred to with that data if I hadn't known what the class was.
############### UPDATE #4
#AKK
Thanks for helping me out. I am able to see how your code works but my pickled file saved from 2 years ago and whose module and class have long since been deleted, I cannot open it into a bytes-like object which to me seems to be a necessity.
So the path of the file is
file ='hey.pkl'
pkl_file = open(file, 'rb')
x = MagicUnpickler(io.BytesIO(pkl_file)).load()
This returns the error:
TypeError: a bytes-like object is required, not '_io.BufferedReader'
But I thought the object was a bytes object since I opened it with open(file, 'rb')
############ UPDATE #5
Actually, I think with AKX's help I've solved the problem.
So using the code:
pkl_file = open(name, 'rb')
x = MagicUnpickler(pkl_file).load()
I then created two blank modules which once contained the classes found in the save pickle, but I did not have to put the classes on them. I was getting an error in the file pickle.py here:
def load_reduce(self):
stack = self.stack
args = stack.pop()
func = stack[-1]
try:
stack[-1] = func(*args)
except TypeError:
pass
dispatch[REDUCE[0]] = load_reduce
So after excepting that error, everything worked. I really want to thank AKX for helping me out. I have actually been trying to solve this problem for about 5 years because I use pickles far more often than most programmers. I used to not understand that if you alter a class then that ruins any pickled files saved with that class so I ran into this problem again and again. But now that I'm going back over some code which is 2 years old and it looks like some of the files were deleted, I'm going to need this code a lot in the future. So I really appreciate your help in getting this problem solved.

Well, with a bit of hacking and magic, sure, you can hydrate missing classes, but I'm not guaranteeing this will work for all pickle data you may encounter; for one, this doesn't touch the __setstate__/__reduce__ protocols, so I don't know if they work.
Given a script file (so72863050.py in my case):
import io
import pickle
import types
from logging import Formatter
# Create a couple empty classes. Could've just used `class C1`,
# but we're coming back to this syntax later.
C1 = type('C1', (), {})
C2 = type('C2', (), {})
# Create an instance or two, add some data...
inst = C1()
inst.child1 = C2()
inst.child1.magic = 42
inst.child2 = C2()
inst.child2.mystery = 'spooky'
inst.child2.log_formatter = Formatter('heyyyy %(message)s') # To prove we can unpickle regular classes still
inst.other_data = 'hello'
inst.some_dict = {'a': 1, 'b': 2}
# Pickle the data!
pickle_bytes = pickle.dumps(inst)
# Let's erase our memory of these two classes:
del C1
del C2
try:
print(pickle.loads(pickle_bytes))
except Exception as exc:
pass # Can't get attribute 'C1' on <module '__main__'> – yep, it certainly isn't there!
we now have successfully created some pickle data that we can't load anymore, since we forgot about those two classes. Now, since the unpickling mechanism is customizable, we can derive a magic unpickler, that in the face of certain defeat (or at least an AttributeError), synthesizes a simple class from thin air:
# Could derive from Unpickler, but that may be a C class, so our tracebacks would be less helpful
class MagicUnpickler(pickle._Unpickler):
def __init__(self, fp):
super().__init__(fp)
self._magic_classes = {}
def find_class(self, module, name):
try:
return super().find_class(module, name)
except AttributeError:
return self._create_magic_class(module, name)
def _create_magic_class(self, module, name):
cache_key = (module, name)
if cache_key not in self._magic_classes:
cls = type(f'<<Emulated Class {module}:{name}>>', (types.SimpleNamespace,), {})
self._magic_classes[cache_key] = cls
return self._magic_classes[cache_key]
Now, when we run that magic unpickler against a stream from the aforebuilt pickle_bytes that plain ol' pickle.loads() couldn't load...
x = MagicUnpickler(io.BytesIO(pickle_bytes)).load()
print(x)
print(x.child1.magic)
print(x.child2.mystery)
print(x.child2.log_formatter._style._fmt)
prints out
<<Emulated Class __main__:C1>>(child1=<<Emulated Class __main__:C2>>(magic=42), child2=<<Emulated Class __main__:C2>>(mystery='spooky'), other_data='hello', some_dict={'a': 1, 'b': 2})
42
spooky
heyyyy %(message)s
Hey, magic!

The error in function load_reduce(self) can be re-created by:
class Y(set):
pass
pickle_bytes = io.BytesIO(pickle.dumps(Y([2, 3, 4, 5])))
del Y
print(MagicUnpickler(pickle_bytes).load())
AKX's answer do not solve cases when the class inherit from base classes as set, dict, list,...

get iterator value from itertool created variable python

I created an iterator to increment the figure number in various plotting function calls:
figndx=itertools.count()
I then proceed to call these throughout my code, passing next(figndx) as an argument to increment the value I use for the figure number: - for ex:
an.plotimg(ref_frame,next(figndx),'Ref Frame')
an.plotimg(new_frame,next(figndx),'New Frame')
etc...
After some particular function call, I want to read back the figndx value and store it in a variable for later use. However, when I look at figndx , it returns count(7), for example. How do I extract the '7' from this?
I've tried :
figndx
figndx.__iter__()
and I can't find anything else in the 'suggested' methods (when I type the dot (.)) that will get the actual iterator value. Can this be done?
`

Just wrap a count object
class MyCount:
def __init__(self, *args, **kwargs):
self._c = itertools.count(*args, **kwargs)
self._current = next(self._c)
def __next__(self):
current = self._current
self._current = next(self._c)
return current
def __iter__(self):
return self
def peek(self):
return self._current

You can create yourself a peeker, using itertools.tee, and encapsulate the peek:
from itertools import count, tee
def peek(iterator):
iterator, peeker = tee(iterator)
return iterator, next(peeker)
Then you can call it like
figndx = count(1)
next(figndx)
next(figndx)
figndx, next_value = peek(figndx)
next_value
# 3

wrapping generators to have a single `next` call instead of two steps ( iter + next )

I'm receiving an unknown number of records for background processing from generators. If there is a more important job, I have to stop to release the process.
The main process is best described as:
def main():
generator_source = generator_for_test_data() # 1. contact server to get data.
uw = UploadWrapper(generator_source) # 2. wrap the data.
while not interrupt(): # 3. check for interrupts.
row = next(uw)
if row is None:
return
print(long_running_job(row)) # 4. do the work.
Is there a way to get to __next__ without having to plug __iter__?
Having two steps - (1) make an iterator, then (2) iterate over it, just seems clumsy.
There are many cases where I'd prefer to submit a function to a function manager (mapreduce style), but in this case I need an instantiated class with some settings. Registering a single function can therefor only work if that function alone is __next__
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self._iterator = None
def __iter__(self):
for page in self.generator:
yield from page.data
def __next__(self):
if self._iterator is None: # ugly bit.
self._iterator = self.__iter__() #
try:
return next(self._iterator)
except StopIteration:
return None
Q: Is there a simpler way?
Working sample added for completeness:
import time
import random
class Page(object):
def __init__(self, data):
self.data = data
def generator_for_test_data():
for t in range(10):
page = Page(data=[(t, i) for i in range(100, 110)])
yield page
def long_running_job(row):
time.sleep(random.randint(1,10)/100)
assert len(row) == 2
assert row[0] in range(10)
assert row[1] in range(100, 110)
return row
def interrupt(): # interrupt check
if random.randint(1,50) == 1:
print("INTERRUPT SIGNAL!")
return True
return False
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self._iterator = None
def __iter__(self):
for ft in self.generator:
yield from ft.data
def __next__(self):
if self._iterator is None:
self._iterator = self.__iter__()
try:
return next(self._iterator)
except StopIteration:
return None
def main():
gen = generator_for_test_data()
uw = UploadWrapper(gen)
while not interrupt(): # check for job interrupt.
row = next(uw)
if row is None:
return
print(long_running_job(row))
if __name__ == "__main__":
main()

Your UploadWrapper seems overtly complex, there is more than a single simpler solution.
My first thought is to ditch the class altogether and just use a function instead:
def uploadwrapper(page_gen):
for page in page_gen:
yield from page.data
Just replace uw = UploadWrapper(gen) with uw = uploadwrapper(gen), and that'll work.
If you insist on the class, you can just get rid of the __next__() and replace uw = UploadWrapper(gen) with uw = iter(UploadWrapper(gen)), and it'll work.
In either case, you must also catch the StopIteration in the caller. __next__() is supposed to raise StopIteration when it's done, and not return None, like yours does. Otherwise, it won't work with things expecting a well-behaving iterator, eg. for loops.
I think you might have some misconceptions about how it all is supposed to fit together, so I'll try my best to explain how it's supposed to work, to the best of my knowledge:
The point of __iter__() is that if you have eg. a list, you can get multiple independent iterators by calling iter(). When you have a for loop, you're essentially first getting an iterator with iter() and then calling next() on it on every loop iteration. If you have two nested loops that use the same list, the iterators and their positions are still separate so there's no conflict. __iter__() is supposed to return an iterator for the container it's on, or if it's called on an iterator, it's supposed to just return self. In that sense, it's kind of wrong for UploadWrapper not to return self in __iter__(), since it wraps a generator and so can't really give independent iterators. As for why leaving out __next__() works, it's because when you define a generator (ie. use yield in a function), the generator has an __iter__() (that returns self, as it should) and __next__() that does what you'd expect. In your original code, you're not really using __iter__() at all for what it's supposed to be used: the code works even if you rename it to something else! This is because you never call iter() on the instance, and just directly call next().
If you wanted to do it "properly" as a class, I think something like this might suffice:
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self.subgen = iter(next(generator).data)
def __iter__(self):
return self
def __next__(self):
while True:
try:
return next(self.subgen)
except StopIteration:
self.subgen = iter(next(self.generator).data)

python function that changes itself to list

So I'm working on a chemistry project for fun, and I have a function that initializes a list from a text file. What I want to do s make it so the function replaces itself with a list. So here's my first attempt at it which randomly will or won't work and I don't know why:
def periodicTable():
global periodicTable
tableAtoms = open('/Users/username/Dropbox/Python/Chem Project/atoms.csv','r')
listAtoms = tableAtoms.readlines()
tableAtoms.close()
del listAtoms[0]
atoms = []
for atom in listAtoms:
atom = atom.split(',')
atoms.append(Atom(*atom))
periodicTable = atoms
It gets called in in this way:
def findAtomBySymbol(symbol):
try:
periodicTable()
except:
pass
for atom in periodicTable:
if atom.symbol == symbol:
return atom
return None
Is there a way to make this work?

Don't do that. The correct thing to do would be using a decorator that ensures the function is only executed once and caches the return value:
def cachedfunction(f):
cache = []
def deco(*args, **kwargs):
if cache:
return cache[0]
result = f(*args, **kwargs)
cache.append(result)
return result
return deco
#cachedfunction
def periodicTable():
#etc
That said, there's nothing stopping you from replacing the function itself after it has been called, so your approach should generally work. I think the reason it doesn't is because an exception is thrown before you assign the result to periodicTable and thus it never gets replaced. Try removing the try/except block or replacing the blanket except with except TypeError to see what exactly happens.

This is very bad practice.
What would be better is to have your function remember if it has already loaded the table:
def periodicTable(_table=[]):
if _table:
return _table
tableAtoms = open('/Users/username/Dropbox/Python/Chem Project/atoms.csv','r')
listAtoms = tableAtoms.readlines()
tableAtoms.close()
del listAtoms[0]
atoms = []
for atom in listAtoms:
atom = atom.split(',')
atoms.append(Atom(*atom))
_table[:] = atoms
The first two lines check to see if the table has already been loaded, and if it has it simply returns it.

How to override Python list(iterator) behaviour?

Running this:
class DontList(object):
def __getitem__(self, key):
print 'Getting item %s' % key
if key == 10: raise KeyError("You get the idea.")
return None
def __getattr__(self, name):
print 'Getting attr %s' % name
return None
list(DontList())
Produces this:
Getting attr __length_hint__
Getting item 0
Getting item 1
Getting item 2
Getting item 3
Getting item 4
Getting item 5
Getting item 6
Getting item 7
Getting item 8
Getting item 9
Getting item 10
Traceback (most recent call last):
File "list.py", line 11, in <module>
list(DontList())
File "list.py", line 4, in __getitem__
if key == 10: raise KeyError("You get the idea.")
KeyError: 'You get the idea.'
How can I change that so that I'll get [], while still allowing access to those keys [1] etc.?
(I've tried putting in def __length_hint__(self): return 0, but it doesn't help.)
My real use case: (for perusal if it'll be useful; feel free to ignore past this point)
After applying a certain patch to iniparse, I've found a nasty side-effect to my patch. Having __getattr__ set on my Undefined class, which returns a new Undefined object. Unfortunately, this means that list(iniconfig.invalid_section) (where isinstance(iniconfig, iniparse.INIConfig)) is doing this (put in simple prints in the __getattr__ and __getitem__):
Getting attr __length_hint__
Getting item 0
Getting item 1
Getting item 2
Getting item 3
Getting item 4
Et cetera ad infinitum.

If you want to override the iteration then just define the __iter__ method in your class

As #Sven says, that's the wrong error to raise. But that's not the point, the point is that this is broken because it's not something you should do: preventing __getattr__ from raising AttributeError means that you have overridden Python's default methodology for testing whether an object has an attribute and replaced it with a new one (ini_defined(foo.bar)).
But Python already has hasattr! Why not use that?
>>> class Foo:
... bar = None
...
>>> hasattr(Foo, "bar")
True
>>> hasattr(Foo, "baz")
False

Just raise IndexError instead of KeyError. KeyError is meant for mapping-like classes (e.g. dict), while IndexError is meant for sequences.
If you define the __getitem__() method on your class, Python will automatically generate an iterator from it. And the iterator terminates upon IndexError -- see PEP234.

Override how your class is iterated by implementing an __iter__() method. Iterator signal when they're finished by raising a StopIteration exception, which is part of the normal iterator protocol and not propagated further. Here's one way of applying that to your example class:
class DontList(object):
def __getitem__(self, key):
print 'Getting item %s' % key
if key == 10: raise KeyError("You get the idea.")
return None
def __iter__(self):
class iterator(object):
def __init__(self, obj):
self.obj = obj
self.index = -1
def __iter__(self):
return self
def next(self):
if self.index < 9:
self.index += 1
return self.obj[self.index]
else:
raise StopIteration
return iterator(self)
list(DontList())
print 'done'
# Getting item 0
# Getting item 1
# ...
# Getting item 8
# Getting item 9
# done

I think that using return iter([]) is the right way, but let's start thinking how list() works:
Get an element from __iter__; if receive a StopIrteration error stops..then get that element..
So you have just to yield an empty generator in __iter__, for example (x for x in xrange(0, 0)), or simply iter([]))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing in Python: impossible to get back my results (get()) (happens rarely) - python

Related

Is it really impossible to unpickle a Python class if the original python file has been deleted?

get iterator value from itertool created variable python

wrapping generators to have a single `next` call instead of two steps ( iter + next )

python function that changes itself to list

How to override Python list(iterator) behaviour?

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing in Python: impossible to get back my results (get()) (happens rarely) - python

Related

Is it really impossible to unpickle a Python class if the original python file has been deleted?

get iterator value from itertool created variable python

wrapping generators to have a single `next` call instead of two steps ( __iter__ + __next__ )

python function that changes itself to list

How to override Python list(iterator) behaviour?

Categories

Resources

wrapping generators to have a single `next` call instead of two steps ( iter + next )