Python threads/processes and class vars - python

The question may be really stupid but I'm working on this code since this morning and now even stupid things are hard :\
I've got this code and I call it by making 8 processes and run them.
Then there's another thread that has to print infos about this 8 processes. (code is below).
import MSCHAPV2
import threading
import binascii
import multiprocessing
class CrackerThread(multiprocessing.Process):
password_header = "s."
current_pin = ""
username = ""
server_challenge = ""
peer_challenge = ""
nt_response = ""
starting_pin = 0
limit = 0
testing_pin = 0
event = None
def __init__(self, username, server_challenge, peer_challenge, nt_response, starting_pin, limit, event):
#threading.Thread.__init__(self)
super(CrackerThread, self).__init__()
self.username = username
self.server_challenge = server_challenge
self.peer_challenge = peer_challenge
self.nt_response = nt_response
self.starting_pin = starting_pin
self.limit = limit
self.event = event
self.testing_pin = starting_pin
#self.setDaemon(True)
def run(self):
mschap = MSCHAPV2.MSCHAPV2()
pin_range = self.starting_pin+self.limit
while self.testing_pin <= pin_range and not self.event.isSet():
self.current_pin = "%s%08d" % (self.password_header, self.testing_pin)
if(mschap.CheckPassword(self.server_challenge, self.peer_challenge, self.username, self.current_pin.encode("utf-16-le"), self.nt_response)):
self.event.set()
print "Found valid password!"
print "user =", self.username
print "password =", self.current_pin
self.testing_pin+=1
print "Thread for range (%d, %d) ended with no success." % (self.starting_pin, pin_range)
def getCurrentPin(self):
return self.testing_pin
def printCrackingState(threads):
info_string = '''
++++++++++++++++++++++++++++++++++
+ Starting password = s.%08d +
+--------------------------------+
+ Current pin = s.%08d +
++++++++++++++++++++++++++++++++++
+ Missing pins = %08d +
++++++++++++++++++++++++++++++++++
'''
while 1:
for t in threads:
printed_string = info_string % (t.starting_pin, t.getCurrentPin(), t.getMissingPinsCount())
sys.stdout.write(printed_string)
sys.stdout.write("--------------------------------------------------------------------")
time.sleep(30)
printCrackingState is called by these lines in my "main":
infoThread = threading.Thread(target = utils.printCrackingState, args=([processes]))
#infoThread = cursesTest.CursesPrinter(threads, processes, event)
infoThread.setDaemon(True)
infoThread.start()
Now the quesion is: why t.starting_pin and t.getCurrentPin() print the SAME value?
It's like the t.getCurrentPin() returns the value set in the __init__() method and is not aware that I'm incrementing it!
Suggestions?

Your problem here is that you're trying to update a variable in one process, and read it in another process. You can't do that. The whole point of multiprocessing, as opposed to multithreading, is that variables are not shared by default.
Read the docs, especially Exchanging objects between processes and Sharing state between processes, and it will explain the various ways around this. But really, there's two: either you need some kind of channel/API to let the parent process ask the child process for its current state, or you need some kind of shared memory to store the data in. And you may need a lock to protect either the channel/shared memory.
While shared memory may seem like the "obvious" answer here, you may want to time the following:
val = 0
for i in range(10000):
val += 1
val = Value('i', 0)
lock = Lock()
for i in range(10000):
with lock:
val.value += 1
It's worth noting that your code would also be incorrect with threads—although it would probably work, in CPython. If you don't do any synchronization, there is no guaranteed ordering. If you write a value in one thread and read it "later" in another thread, you can still read the older value. How much later? Well, if thread 0 runs on core 0, and thread 1 on core 1, and they both have the variable in their cache, and nobody tells the CPUs to flush the cache, thread 1 will go on reading the old value forever. In practice, CPython's Global Interpreter Lock eventually synchronizes everything implicitly (so we're talking milliseconds rather than infinity), and all variables have explicit memory locations rather than being, say, optimized into registers, and so on, so you can usually get away with writing unprotected races. But, thanks to Murphy's Law, you should read "usually" as "every time until the first demo to the investors" or "until we attach the live nuclear reactor".

Related

Unable to update a class variable with multiprocessing

I'm making a GUI application that tracks time spent on each foreground window. I attempted to do this with a loop for every process being monitored as such:
class processes(object):
def __init__(self, name, pid):
self.name = name
self.pid = pid
self.time_spent = 0
self.time_active = 0
p1 = multiprocessing.Process(target=self.loop, args=())
p1.start()
def loop(self):
t = 0
start_time = time.time()
while True:
#While the process is running, check if foreground window (window currently being used) is the same as the process
h_wnd = user32.GetForegroundWindow()
pid = wintypes.DWORD()
user32.GetWindowThreadProcessId(h_wnd, ctypes.byref(pid))
p = psutil.Process(pid.value)
name = str(p.name())
name2 = str(self.name)
if name2 == name:
t = time.time() - start_time
#Log the total time the user spent using the window
self.time_active += t
self.time_spent = time.perf_counter()
time.sleep(2)
def get_time(self):
print("{:.2f}".format(self.time_active) + " name: " + self.name)
I select the process I want in the gui and find it by its name in a list. Once found I call the function get_time() that's supposed to print how long the selected process has been in the foreground.
def display_time(Lb2):
for s in Lb2.curselection():
for e in process_list:
if Lb2.get(s) == e.name:
e.get_time()
The problem is time_active is 0 every time I print it.
I've debugged the program and can tell it's somewhat working (not perfectly, it still records time while the program is not on the foreground) and updating the variable inside the loop. However, when it comes to printing it out the value remains as 0. I think I'm having trouble understanding multiprocessing if anyone could clear up the confusion
The simplest solution was offered by #TheLizzard, i.e. just use threading instead of multiprocessing:
import threading
...
#p1 = multiprocessing.Process(target=self.loop, args=())
p1 = threading.Thread(target=self.loop, args=())
But that doesn't explain why creating a process instead did not work. What happened was that your process.__init__ code first created several attributes such as self.time_active, self.time_spent, etc. This code is executing in the main process. But when you execute the following two statements ...
p1 = multiprocessing.Process(target=self.loop, args=())
p1.start()
... the process object that was created must now be serialized/deserialized to the new address space in which the new Process instance you just created must run. Consequently, in the loop method when you execute a statement such as self.time_active += t, you are updating the instance of self.time_active that "lives" in the address space of the sub-process. But code that prints out the value of self.time_active is executing in the main process's address space and is therefore printing out only the original value of that attribute.
If you had to use multiprocessing because your loop method was CPU-intensive and you needed the parallelism with other processes, then the solution would be to create self.time_active and self.time_spent in shared memory so that both the main process and the sub-process would be accessing the same, shared attributes:
class processes(object):
def __init__(self, name, pid):
self.name = name
self.pid = pid
# Create shared floating point values:
self.time_spent = multiprocessing.Value('f', 0)
self.time_active = multiprocessing.Value('f', 0)
...
def loop(self):
...
self.time_active.value += t
self.time_spent.value = time.perf_counter()
...
def get_time(self):
print("{:.2f}".format(self.time_active.value) + " name: " + self.name)

How to solve the Sleeping Barbers analogy with multiple barbers in Python?

I have a working solution for the Sleeping Barber operating system problem using python 2.7 and threading that works with a single barber and a certain amount of chairs. But I would like it to be able to work in a situation where there are multiple barbers, in the same way as there is multiple customers.
Here is my current solution with a single barber:
from threading import Thread, Lock, Event
import time, random
from sys import exit
lock = Lock()
customerIntervalMin = 5
customerIntervalMax = 15
haircutDurationMin = 3
haircutDurationMax = 15
class BarberShop:
waitingCustomers = []
threads=[]
finishedCustomers = []
def __init__(self, barber, numberOfSeats):
self.barber = barber
self.numberOfSeats = numberOfSeats
def openShop(self):
print 'Barber shop is opening'
workingThread = Thread(target = self.barberGoToWork)
workingThread.start()
self.threads.append(workingThread)
def barberGoToWork(self):
while True:
lock.acquire()
if len(self.waitingCustomers) > 0 and len(self.finishedCustomers)< 5:
c = self.waitingCustomers[0]
del self.waitingCustomers[0]
lock.release()
self.barber.cutHair(c)
self.finishedCustomers.append(c)
elif len(self.waitingCustomers)==0 and len(self.finishedCustomers)<5:
lock.release()
print 'Aaah, all done, {0} is going to sleep'.format(barber.name)
barber.sleeps()
print '{0} woke up'.format(barber.name)
elif len(self.waitingCustomers)==0 and len(self.finishedCustomers)==5:
lock.release()
print 'The barber shop is closed. Come back tomorrow.'
exit(0)
def enterBarberShop(self, customer):
lock.acquire()
print '{0} entered the shop and is looking for a seat'.format(customer.name)
if len(self.waitingCustomers) == self.numberOfSeats:
print 'Waiting room is full, {0} is leaving.'.format(customer.name)
lock.release()
else:
print '{0} sat down in the waiting room'.format(customer.name)
self.waitingCustomers.append(c)
lock.release()
barber.wakeUp()
class Customer:
def __init__(self, name):
self.name = name
class Barber:
def __init__(self, name):
self.name = name
barberEvent = Event()
def sleeps(self):
self.barberEvent.wait()
def wakeUp(self):
self.barberEvent.set()
def cutHair(self, customer):
self.barberEvent.clear()
print '{0} is having a haircut done by {1}'.format(customer.name, self.name)
randomHairCuttingTime = random.randrange(haircutDurationMin, haircutDurationMax+1)
time.sleep(randomHairCuttingTime)
print '{0} is done with {1}'.format(customer.name, self.name)
if __name__ == '__main__':
customers = []
customers.append(Customer('Ken'))
customers.append(Customer('Scott'))
customers.append(Customer('Larry'))
customers.append(Customer('Liam'))
customers.append(Customer('Kieran'))
barber = Barber('Mark')
barberShop = BarberShop(barber, numberOfSeats=10)
barberShop.openShop()
while len(customers) > 0:
c = customers.pop()
barberShop.enterBarberShop(c)
customerInterval = random.randrange(customerIntervalMin,customerIntervalMax+1)
time.sleep(customerInterval)
I'm confused as to how to go about this. I originally thought it would be the same as the customers list inside the main where you would just append the class & the given name parameter into a list, use a loop to go through the list, pop each instance in the list & assign that to barber & keep the original barberShop definition in the main. But upon reflection that can't be right because that would just create 3 different threads with 10 seats each. So now I am unsure of how to solve the last part of this problem and while there is plenty of online implementations of this specific aspect of the problem in languages like Java & C but I don't have enough experience with those languages to even understand those solutions, let alone translate them into python & implement aspects of it in the solution above.
Is there any other way I can implement multiple barbers into this solution? Any help at all on this aspect of the problem or any suggestions for improvements that can be made to my solution would be greatly appreciated.
You need to design your system exactly as you've described in words: you instantiate three barbers, each of whom has full permissions on the same ten-chair waiting room. This means that your mutex operations have to be hardy enough for 3 actors and 10 targets. You also have to allow a new customer to wake up a sleeping barber, regardless of how many are asleep.
Think in terms of your barber states and your customer states. Focus on each combination in turn, and design how the system should react. Then, given those various actions (e.g. "customer wakes up barber", "barber moves customer to work chair"), decide how to allocate your methods to classes.
Does that help you get moving?

Threading and target function in external file (python)

I want to move some functions to an external file for making it clearer.
lets say i have this example code (which does indeed work):
import threading
from time import sleep
testVal = 0
def testFunc():
while True:
global testVal
sleep(1)
testVal = testVal + 1
print(testVal)
t = threading.Thread(target=testFunc, args=())
t.daemon = True
t.start()
try:
while True:
sleep(2)
print('testval = ' + str(testVal))
except KeyboardInterrupt:
pass
now i want to move testFunc() to a new python file. My guess was the following but the global variables don't seem to be the same.
testserver.py:
import threading
import testclient
from time import sleep
testVal = 0
t = threading.Thread(target=testclient.testFunc, args=())
t.daemon = True
t.start()
try:
while True:
sleep(2)
print('testval = ' + str(testVal))
except KeyboardInterrupt:
pass
and testclient.py:
from time import sleep
from testserver import testVal as val
def testFunc():
while True:
global val
sleep(1)
val = val + 1
print(val)
my output is:
1
testval = 0
2
3
testval = 0 (testval didn't change)
...
while it should:
1
testval = 1
2
3
testval = 3
...
any suggestions? Thanks!
Your immediate problem is not due to multithreading (we'll get to that) but due to how you use global variables. The thing is, when you use this:
from testserver import testVal as val
You're essentially doing this:
import testserver
val = testserver.testVal
i.e. you're creating a local reference val that points to the testserver.testVal value. This is all fine and dandy when you read it (the first time at least) but when you try to assign its value in your function with:
val = val + 1
You're actually re-assigning the local (to testclient.py) val variable, not setting the value of testserver.testVal. You have to directly reference the actual pointer (i.e. testserver.testVal += 1) if you want to change its value.
That being said, the next problem you might encounter might stem directly from multithreading - you can encounter a race-condition oddity where GIL pauses one thread right after reading the value, but before actually writing it, and the next thread reading it and overwriting the current value, then the first thread resumes and writes the same value resulting in single increase despite two calls. You need to use some sort of mutex to make sure that all non-atomic operations execute exclusively to one thread if you want to use your data this way. The easiest way to do it is with a Lock that comes with the threading module:
testserver.py:
# ...
testVal = 0
testValLock = threading.Lock()
# ...
testclient.py:
# ...
with testserver.testValLock:
testserver.testVal += 1
# ...
A third and final problem you might encounter is a circular dependency (testserver.py requires testclient.py, which requires testserver.py) and I'd advise you to re-think the way you want to approach this problem. If all you want is a common global store - create it separately from modules that might depend on it. That way you ensure proper loading and initializing order without the danger of unresolveable circular dependencies.

Is instance variable in Python 3.5 process-safe?

Testing Environment:
Python Version: 3.5.1
OS Platform: Ubuntu 16.04
IDE: PyCharm Community Edition 2016.3.2
I write a simple program to test process-safe. I find that subprocess2 won't run until subprocess1 finished. It seems that the instance variable self.count is process-safe.How the process share this variable? Does they share self directly?
Another question is when I use Queue, I have to use multiprocessing.Manager to guarantees process safety manually, or the program won't run as expected.(If you uncomment self.queue = multiprocessing.Queue(), this program won't run normally, but using self.queue = multiprocessing.Manager().Queue() is OK.)
The last question is why the final result is 900? I think it should be 102.
Sorry for asking so many questions, but I'm indeed curious about these things. Thanks a lot!
Code:
import multiprocessing
import time
class Test:
def __init__(self):
self.pool = multiprocessing.Pool(1)
self.count = 0
#self.queue = multiprocessing.Queue()
#self.queue = multiprocessing.Manager().Queue()
def subprocess1(self):
for i in range(3):
print("Subprocess 1, count = %d" %self.count)
self.count += 1
time.sleep(1)
print("Subprocess 1 Completed")
def subprocess2(self):
self.count = 100
for i in range(3):
print("Subprocess 2, count = %d" %self.count)
self.count += 1
time.sleep(1)
print("Subprocess 2 Completed")
def start(self):
self.pool.apply_async(func=self.subprocess1)
print("Subprocess 1 has been started")
self.count = 900
self.pool.apply_async(func=self.subprocess2)
print("Subprocess 2 has been started")
self.pool.close()
self.pool.join()
def __getstate__(self):
self_dict = self.__dict__.copy()
del self_dict['pool']
return self_dict
def __setstate__(self, state):
self.__dict__.update(state)
if __name__ == '__main__':
test = Test()
test.start()
print("Final Result, count = %d" %test.count)
Output:
Subprocess 1 has been started
Subprocess 2 has been started
Subprocess 1, count = 0
Subprocess 1, count = 1
Subprocess 1, count = 2
Subprocess 1 Completed
Subprocess 2, count = 100
Subprocess 2, count = 101
Subprocess 2, count = 102
Subprocess 2 Completed
Final Result, count = 900
The underlying details are rather tricky (see the Python3 documentation for more, and note that the details are slightly different for Python2), but essentially, when you pass self.subprocess1 or self.subprocess2 as an argument to self.pool.apply_async, Python ends up calling:
pickle.dumps(self)
in the main process—the initial one on Linux before forking, or the one invoked as __main__ on Windows—and then, eventually, pickle.loads() of the resulting byte-string in the pool process.1 The pickle.dumps code winds up calling your own __getstate__ function; that function's job is to return something that can be serialized to a byte-string.2 The subsequent pickle.loads creates a blank instance of the appropriate type, does not call its __init__, and then uses its __setstate__ function to fill in the object (instead of __init__ing it).
Your __getstate__ returns the dictionary holding the state of self, minus the pool object, for good reason:
>>> import multiprocessing
>>> x = multiprocessing.Pool(1)
>>> import pickle
>>> pickle.dumps(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 492, in __reduce__
'pool objects cannot be passed between processes or pickled'
NotImplementedError: pool objects cannot be passed between processes or pickled
Since pool objects refuse to be pickled (serialized), we must avoid even attempting to do that.
In any case, all of this means that the pool process has its own copy of self, which has its own copy of self.count (and is missing self.pool entirely). These items are not shared in any way so it is safe to modify self.count there.
I find the simplest mental model of this is to give each worker process a name: Alice, Bob, Carol, and so on, if you like. You can then think of the main process as "you": you copy something and give the copy to Alice, then copy it and give that one to Bob, and so on. Function calls, such as apply or apply_async, copy all of their arguments—including the implied self for bound methods.
When using a multiprocessing.Queue, you get something that knows how to work between the various processes, sharing data as needed, with appropriate synchronization. This lets you pass copies of data back and forth. However, like a pool instance, a multiprocessing.Queue instance cannot be copied. The multiprocessing routines do let you copy a multiprocessing.Manager().Queue() instance, which is good if you want a copied and otherwise private Queue() instance. (The internal details of this are complicated.3)
The final result you get is just 900 because you are looking only at the original self object.
Note that each applied functions (from apply or apply_async) returns a result. This result is copied back, from the worker process to the main process. With apply_async, you may choose to get called back as soon as the result is ready. If you want this result you should save it somewhere, or use the get function (as shown in that same answer) to wait for it when you need it.
1We can say "the" pool process here without worrying about which one, as you limited yourself to just one. In any case, though, there is a simple byte-oriented, two-way communications stream, managed by the multiprocessing code, connecting each worker process with the parent process that invoked it. If you create two such pool processes, each one has its own byte-stream connecting to the main process. This means it would not matter if there were two or more: the behavior would be the same.
2This "something" is often a dictionary, but see Simple example of use of __setstate__ and __getstate__ for details.
3The output of pickle.dumps on such an instance is:
>>> pickle.dumps(y)
(b'\x80\x03cmultiprocessing.managers\n'
b'RebuildProxy\n'
b'q\x00(cmultiprocessing.managers\n'
b'AutoProxy\n'
b'q\x01cmultiprocessing.managers\n'
b'Token\n'
b'q\x02)\x81q\x03X\x05\x00\x00\x00Queueq\x04X$\x00\x00\x00/tmp/pymp-pog4bhub/listener-0_uwd8c9q\x05X\t\x00\x00\x00801b92400q\x06\x87q\x07bX\x06\x00\x00\x00pickleq\x08}q\tX\x07\x00\x00\x00exposedq\n'
b'(X\x05\x00\x00\x00emptyq\x0bX\x04\x00\x00\x00fullq\x0cX\x03\x00\x00\x00getq\rX\n'
b'\x00\x00\x00get_nowaitq\x0eX\x04\x00\x00\x00joinq\x0fX\x03\x00\x00\x00putq\x10X\n'
b'\x00\x00\x00put_nowaitq\x11X\x05\x00\x00\x00qsizeq\x12X\t\x00\x00\x00task_doneq\x13tq\x14stq\x15Rq\x16.\n')
I did a little trickiness to split this at newlines and then manually added the parentheses, just to keep the long line from being super-long. The arguments will vary on different systems; this particular one uses a file system object that is a listener socket, that allows cooperating Python processes to establish a new byte stream between themselves.
Question: ... why the final result is 900? I think it should be 102.
The result should be 106, range are 0 based, you get 3 iterations.
You can get the expected output, for instance:
class PoolTasks(object):
def __init__(self):
self.count = None
def task(self, n, start):
import os
pid = os.getpid()
count = start
print("Task %s in Process %s has been started - start=%s" % (n, pid, count))
for i in range(3):
print("Task %s in Process %s, count = %d " % (n, pid, count))
count += 1
time.sleep(1)
print("Task %s in Process %s has been completed - count=%s" % (n, pid, count))
return count
def start(self):
with mp.Pool(processes=4) as pool:
# launching multiple tasks asynchronously using processes
multiple_results = [pool.apply_async(self.task, (p)) for p in [(1, 0), (2, 100)]]
# sum result from tasks
self.count = 0
for res in multiple_results:
self.count += res.get()
if __name__ == '__main__':
pool = PoolTasks()
pool.start()
print('sum(count) = %s' % pool.count)
Output:
Task 1 in Process 5601 has been started - start=0
Task 1 in Process 5601, count = 0
Task 2 in Process 5602 has been started - start=100
Task 2 in Process 5602, count = 100
Task 1 in Process 5601, count = 1
Task 2 in Process 5602, count = 101
Task 1 in Process 5601, count = 2
Task 2 in Process 5602, count = 102
Task 1 in Process 5601 has been completed - count=3
Task 2 in Process 5602 has been completed - count=103
sum(count) = 106
Tested with Python:3.4.2

How to properly set up multiprocessing proxy objects for objects that already exist

I'm trying to share an existing object across multiple processing using the proxy methods described here. My multiprocessing idiom is the worker/queue setup, modeled after the 4th example here.
The code needs to do some calculations on data that are stored in rather large files on disk. I have a class that encapsulates all the I/O interactions, and once it has read a file from disk, it saves the data in memory for the next time a task needs to use the same data (which happens often).
I thought I had everything working from reading the examples linked to above. Here is a mock up of the code that just uses numpy random arrays to model the disk I/O:
import numpy
from multiprocessing import Process, Queue, current_process, Lock
from multiprocessing.managers import BaseManager
nfiles = 200
njobs = 1000
class BigFiles:
def __init__(self, nfiles):
# Start out with nothing read in.
self.data = [ None for i in range(nfiles) ]
# Use a lock to make sure only one process is reading from disk at a time.
self.lock = Lock()
def access(self, i):
# Get the data for a particular file
# In my real application, this function reads in files from disk.
# Here I mock it up with random numpy arrays.
if self.data[i] is None:
with self.lock:
self.data[i] = numpy.random.rand(1024,1024)
return self.data[i]
def summary(self):
return 'BigFiles: %d, %d Storing %d of %d files in memory'%(
id(self),id(self.data),
(len(self.data) - self.data.count(None)),
len(self.data) )
# I'm using a worker/queue setup for the multprocessing:
def worker(input, output):
proc = current_process().name
for job in iter(input.get, 'STOP'):
(big_files, i, ifile) = job
data = big_files.access(ifile)
# Do some calculations on the data
answer = numpy.var(data)
msg = '%s, job %d'%(proc, i)
msg += '\n Answer for file %d = %f'%(ifile, answer)
msg += '\n ' + big_files.summary()
output.put(msg)
# A class that returns an existing file when called.
# This is my attempted workaround for the fact that Manager.register needs a callable.
class ObjectGetter:
def __init__(self, obj):
self.obj = obj
def __call__(self):
return self.obj
def main():
# Prior to the place where I want to do the multprocessing,
# I already have a BigFiles object, which might have some data already read in.
# (Here I start it out empty.)
big_files = BigFiles(nfiles)
print 'Initial big_files.summary = ',big_files.summary()
# My attempt at making a proxy class to pass big_files to the workers
class BigFileManager(BaseManager):
pass
getter = ObjectGetter(big_files)
BigFileManager.register('big_files', callable = getter)
manager = BigFileManager()
manager.start()
# Set up the jobs:
task_queue = Queue()
for i in range(njobs):
ifile = numpy.random.randint(0, nfiles)
big_files_proxy = manager.big_files()
task_queue.put( (big_files_proxy, i, ifile) )
# Set up the workers
nproc = 12
done_queue = Queue()
process_list = []
for j in range(nproc):
p = Process(target=worker, args=(task_queue, done_queue))
p.start()
process_list.append(p)
task_queue.put('STOP')
# Log the results
for i in range(njobs):
msg = done_queue.get()
print msg
print 'Finished all jobs'
print 'big_files.summary = ',big_files.summary()
# Shut down the workers
for j in range(nproc):
process_list[j].join()
task_queue.close()
done_queue.close()
main()
This works in the sense that it calculates everything correctly, and it is caching the data that is read along the way. The only problem I'm having is that at the end, the big_files object doesn't have any of the files loaded. The final msg returned is:
Process-2, job 999. Answer for file 198 = 0.083406
BigFiles: 4303246400, 4314056248 Storing 198 of 200 files in memory
But then after it's all done, we have:
Finished all jobs
big_files.summary = BigFiles: 4303246400, 4314056248 Storing 0 of 200 files in memory
So my question is: What happened to all the stored data? It's claiming to be using the same self.data according to the id(self.data). But it's empty now.
I want the end state of big_files to have all the saved data that it accumulated along the way, since I actually have to repeat this entire process many times, so I don't want to have to redo all the (slow) I/O each time.
I'm assuming it must have something to do with my ObjectGetter class. The examples for using BaseManager only show how to make a new object that will be shared, not share an existing one. So am I doing something wrong with way I get the existing big_files object? Can anyone suggest a better way to do this step?
Thanks much!

Categories