Threading with asyncio issue - python

I want to run few threads and in each thread to have independent asyncio loop where will be processed list of async routines.
Each thread has created local instance of class 'data' but in real its look like as a shared object between threads. I don't understand why it happend.
So, question are:
Why it happend? Each thread should have own local instance of 'data' (unique).
How to solve this issue? Synchronization across threads with object 'data' is not needed.
Here is code, dont worry about exceptions, thread joining, etc.. It's simplified as an example.
Expected output:
id=1, list a: ['1', '1', '1']
Real Output:
id=1, list a: ['1', '3', '2', '1', '3', '2', '3', '2', '1']
Data processing:
class data:
id = 0
a = []
b = []
def __init__(self, id):
self.id = id
async def load_0(self):
for i in range(0, 3):
self.a.append(str(self.id))
await asyncio.sleep(0.1)
async def load_n(self):
for i in range(0, 3):
self.b.append(str(self.id))
await asyncio.sleep(0.1)
Run asyncio tasks in thread:
async def thread_loop(loop, id):
tasks = []
d = data(id)
# 0 .. n tasks
tasks.append(asyncio.create_task(d.load_0()))
tasks.append(asyncio.create_task(d.load_n()))
await asyncio.gather(*tasks, return_exceptions = True)
if (id == 1):
print('id=' + str(d.id) + ', list a: ' + str(d.a))
New event loop in thread:
def thread_main(id):
loop = asyncio.new_event_loop()
loop.run_until_complete(thread_loop(loop, id))
Create and start threads:
async def start(threads):
threads.append(threading.Thread(target = thread_main, args = (1,)))
threads.append(threading.Thread(target = thread_main, args = (2,)))
for thread in threads:
thread.start()
while True:
await asyncio.sleep(0.1)
Main:
if __name__ == '__main__':
threads = []
loop = asyncio.get_event_loop()
loop.run_until_complete(start(threads))

Each of your threads has its own instance of data. You get that with d = data(id). The reason you're seeing that behavior when you inspect d.a and d.b is that they are shared across all threads. This isn't related to threads or asyncio; it's the way you define your class.
When you assign mutable objects to class-level attributes, these objects are shared across all instances of the class.
>>> class C:
... l = []
...
>>> c1 = C()
>>> c2 = C()
>>>
>>> c1.l.append(1)
>>> c2.l
[1]
The way to fix this is to move the assignment of the initial value to __init__.
>>> class C:
... def __init__(self):
... self.l = []
...
>>> c1 = C()
>>> c2 = C()
>>>
>>> c1.l.append(1)
>>> c2.l
[]
In your case that would be
class data:
id = 0
def __init__(self, id):
self.id = id
self.a = []
self.b = []
You can even remove id = 0 from the class's definition since you assign a value in __init__.
class data:
def __init__(self, id):
self.id = id
self.a = []
self.b = []
This may be more than you need, especially without knowing what your real code looks like, but you could also consider using a dataclass.
from dataclasses import dataclass, field
#dataclass
class data:
id: int
a: list[str] = field(default_factory=list)
b: list[str] = field(default_factory=list)
Note: Using list[str] requires either Python 3.10 or from __future__ import annotations. Otherwise you'll need to use typing.List[str] instead.

Related

Python multiprocessing shared list not getting written to

I have a certain class that has an attribute list. There are some functions in the class that write to, but never read from this list. I initialize the class with a list and call the functions from multiple threads, however after waiting for all threads to finish the list remains empty.
The value order in the list does not matter.
from multiprocessing import Process
class TestClass():
def __init__(self, vals):
self.vals = vals
def linearrun(self):
Is = range(2000)
Js = range(2000)
for i in Is:
for j in Js:
self.vals.append(i+j)
if __name__ == "__main__":
vals = []
instantiated_class = TestClass(vals)
processes = []
for _ in range(10):
new_process=Process(target=instantiated_class.linearrun)
processes.append(new_process)
new_process.start()
for p in processes:
p.join()
print(vals)

trying to understand threads

I'm in doubt, so please help me.
Let's say I have a class like this:
class class_1():
def __init__(self):
self.a = 0
self.b = False
def run(self):
while True:
self.worker()
if self.a > 10:
self.b = True
def worker(self):
self.a = self.a + 1
time.sleep(1)
def get_a(self):
return self.a
def get_b(self):
return self.b
I would like to start instance of a class in a thread (and at some point have several of instances), and pull values from instance(s) to main thread or to separate thread.
something like this:
def run_instance_1():
global one = class_1
one.run()
def pull_1():
global one
w = one.get_a()
z = one.get_b()
print ('one.a = ' + str(w))
print ('one.b = ' + str(z))
if __name__ == '__main__':
t1 = threading.Thread(name='process 1', target=run_instance_1)
t2 = threading.Thread(name='process 2', target=pull_1)
t1.start()
t2.start()
My question is: if I instantiate class in global variable like above, will it really run in a separate thread? Basic concept for this is to obtain data from "another" thread. Idea for instantiating class in a global variable is to be acessible from any thred. Am I on to something or totaly off?
You can share the same object between threads, but instead of using global variables, I'd much rather pass the instance to both threads:
def run_instance_1(one):
one.run()
def pull_1(one):
w = one.get_a()
z = one.get_b()
print ('one.a = ' + str(w))
print ('one.b = ' + str(z))
if __name__ == '__main__':
one = class_1()
t1 = threading.Thread(name='process 1', target=run_instance_1, args=(one,))
# alternatively: t1 = threading.Thread(target=one.run)
t2 = threading.Thread(name='process 2', target=pull_1, args=(one,))
t1.start()
t2.start()
That way you make sure that the instance is created before both threads start and make the functions reusable. Another potential problem is that the class attributes are modified in the thread, and those changes are not atomic. You're highly unlikely to get this in this particular case, but consider using locks for shared access to mutable data.

Updating member variable of object while using multprocessing pool

I have a class B which is composed of another class A.
In class B I am using multiprocessing pool to call a method from class A. This method updates a member variable of A (which is a dict).
When I print out this member variable it doesn't seem to have been updated. Here is the code describing the issue:
import multiprocessing as mp
class A():
def __init__(self):
self.aDict = {'key': 0}
def set_lock(self, lock):
self.lock = lock
def do_work(self, item):
print("Doing work for item: {}".format(item) )
self.aDict['key'] += 1
return [1,2,3] # return some list
class B():
def __init__(self):
self.objA = A()
def run_with_mp(self):
items=['item1', 'item2']
with mp.Pool(processes=mp.cpu_count()) as pool:
result = pool.map_async(self.objA.do_work, items)
result.wait()
pool.terminate()
print(self.objA.aDict)
def run(self):
items=['item1', 'item2']
for item in items:
self.objA.do_work(item)
print(self.objA.aDict)
if __name__ == "__main__":
b = B()
b.run_with_mp() # prints {'key': 0}
b.run() # prints {'key': 2}
b.run_with_mp() prints {'key': 0} whole b.run() prints {'key': 2}. I thought the multiprocessing pool version would also do the same since the object self.objA had scope for the full class of B where the multiprocessing pool runs.
I think each worker of the pool sees a different version of self.objA, which are different from the one in the main program flow. Is there a way to make all the workers update a common variable?
You are close to the explanation, indeed, each spawned process holds its own area of memory, it means that they are independent. When you run the do_work each process updates its version of aDict because that variable it's not shared. If you want to share a variable, the easiest way is to use a Manager, for example:
import multiprocessing as mp
class A():
def __init__(self):
self.aDict = mp.Manager().dict({'key': 0})
def set_lock(self, lock):
self.lock = lock
def do_work(self, item):
print("Doing work for item: {}".format(item) )
self.aDict['key'] += 1
return [1,2,3] # return some list
class B():
def __init__(self):
self.objA = A()
def run_with_mp(self):
items=['item1', 'item2']
with mp.Pool(processes=mp.cpu_count()) as pool:
result = pool.map_async(self.objA.do_work, items)
result.wait()
pool.terminate()
print(self.objA.aDict)
def run(self):
items=['item1', 'item2']
for item in items:
self.objA.do_work(item)
print(self.objA.aDict)
if __name__ == "__main__":
b = B()
b.run_with_mp() # prints {'key': 2}
b.run() # prints {'key': 4}
I modified your example to share the aDict variable, so each process will update that property (run_with_mp and run methods). Consider reading more in docs.

Tornado execution order

I am trying to understand and check how tornado executes coroutines.
I noticed a behavior which make me think that gen.coroutine doesn't work.
Look at the test below. It passes, I expected to get l = [ "S1", "BF", "S2", "S3" ] because when submain yields asyncSubMain it returns to event loop queue to pull next callback it should get "beforeYield", because it was scheduled earlier.
def test_call_coroutine_function(ioLoop):
l = []
def syncSubMain():
return "Oh! at last I understood how tornado turns"
#gen.coroutine
def asyncSubMain():
l.append("S2")
return syncSubMain()
def beforeYield():
l.append("BF")
#gen.coroutine
def submain():
l.append("S1")
ioLoop.add_callback(beforeYield)
y = yield asyncSubMain()
l.append("S3")
raise gen.Return(y)
#gen.coroutine
def main():
x = yield submain()
raise gen.Return(x)
assert ioLoop.run_sync(main).startswith("Oh!")
assert l == ["S1", "S2", "S3", "BF"]
Following test behaves like I want and I don't event use #gen.coroutine.
def test_sync_all_async(ioLoop):
class C:
f = 0
l = []
def mf(self):
return 1
# gen.coroutine is not needed and if just call a method
# with gen_coroutine it's executed synchronously
def m3(self, a):
self.l.append(a)
self.f = self.mf()
def m2(self):
self.l.append("A")
ioLoop.add_callback(self.m3, "B")
self.l.append("C")
def m(self):
self.m2()
c = C()
ioLoop.run_sync(c.m)
assert c.f == 1
assert c.l == ["A", "C", "B"]
I thought that #gen.coroutine is just a syntax sugar for the test above.
Out of these tests follows either it's not working or something different from an event loop with callbacks.

pass by reference between processes

I have an object:
from multiprocessing import Pool
import time
class ASYNC(object):
def __init__(self, THREADS=[]):
print('do')
pool = Pool(processes=len(THREADS))
self.THREAD_POOL = {}
thread_index = 0
for thread_ in THREADS:
self.THREAD_POOL[thread_index] = {
'thread': thread_['thread'],
'args': thread_['args'],
'callback': thread_['callback']
}
self.THREAD_POOL[thread_index]['running'] = True
pool.apply_async(self.run, [thread_index], callback=thread_['callback'])
thread_index += 1
def run(self, thread_index):
print('enter')
while(self.THREAD_POOL[thread_index]['running']):
print("loop")
self.THREAD_POOL[thread_index]['thread'](self.THREAD_POOL[thread_index])#HERE
time.sleep(1)
self.THREAD_POOL[thread_index]['running'] = False
def wait_for_finish(self):
for pool in self.THREAD_POOL:
while(self.THREAD_POOL[pool]['running']):
print("sleep" + str(self.THREAD_POOL[pool]['running']))
time.sleep(1)
def x(pool):#HERE
print(str(pool))
if(pool['args'][0] >= 15):
pool['running'] = False
pool['args'][0] += 1
def y(str):
print("done")
A = ASYNC([{'thread': x, 'args':[10], 'callback':y}])
print("start")
A.wait_for_finish()
I am having issues passing self.THREAD_POOL[thread_index] as reference to def x(pool)
I need x(pool) to change the value of the variable in the object.
If i check the value in wait_for_finish then the object is not changed.
Passing object by reference: (tested and works properly)
x = {"1":"one", "2","two"}
def test(a):
a["1"] = "ONE"
print(x["1"])#outputs ONE as expected
this means that dictionaries in python are passed by reference; So, why in my code is it passing by value?
SOLUTION
#DevShark
from multiprocessing import Process, Value, Array
def f(n, a):
n.value = 3.1415927
for i in range(len(a)):
a[i] = -a[i]
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print num.value
print arr[:]
according to the documentation, you should not do this unless absolutely needed. I decided not to use this. https://docs.python.org/2/library/multiprocessing.html#multiprocessing.JoinableQueue
instead i will be doing:
from multiprocessing import Pool
import time
class ASYNC(object):
def __init__(self, THREADS=[]):
print('do')
pool = Pool(processes=len(THREADS))
self.THREAD_POOL = {}
thread_index = 0
for thread_ in THREADS:
self.THREAD_POOL[thread_index] = {
'thread': thread_['thread'],
'args': thread_['args'],
'callback': thread_['callback']
}
self.THREAD_POOL[thread_index]['running'] = True
pool.apply_async(self.run, [thread_index], callback=thread_['callback'])
thread_index += 1
def run(self, thread_index):
print('enter')
while(self.THREAD_POOL[thread_index]['running']):
print("loop")
self.THREAD_POOL[thread_index]['thread'](thread_index)
time.sleep(1)
self.THREAD_POOL[thread_index]['running'] = False
def wait_for_finish(self):
for pool in self.THREAD_POOL:
while(self.THREAD_POOL[pool]['running']):
print("sleep" + str(self.THREAD_POOL[pool]['running']))
time.sleep(1)
def x(index):
global A
A.THREAD_POOL[index]
print(str(pool))
if(pool['args'][0] >= 15):
pool['running'] = False
pool['args'][0] += 1
def y(str):
print("done")
A = ASYNC([{'thread': x, 'args':[10], 'callback':y}])
print("start")
A.wait_for_finish()
You are running your function in a different process. That's the way multiprocessing works. Therefore it does not matter what you do with the object, modifications will not be seen in other processes.
To share data between process, see the doc as someone noted in a comment.
Data can be stored in a shared memory map using Value or Array.

Categories