How to change "for" into a multithreaded pool in python - python

So I made this program that I want to loop for ever until closed. So at the moment I use this piece of code;
while True:
a = start();
for aaa in a:
check(a[aaa], 0)
But that is pretty slow. How can I multithread this using this (this is my try, it's incorrect ofcourse);
pool = ThreadPool(threads)
results = pool.map(check, a, 0)
I tried that code, with threads = 1. And it just gave nothing. Could anyone help me with this?
==== EDIT ====
Start function;
def start():
global a
url = "URL_WAS_HERE" // receives a json like {"a":56564356, "b":654653453} etc. etc.
r = requests.get(url)
a = json.loads(r.text)
return a
Check function;
def check(idd, tries):
global checked
global snipe
global notworking
if tries < 1:
checked = checked+1
url = "URL_WAS_HERE"+str(idd) // Receives json with extra information about the id
r = requests.get(url)
try:
b = json.loads(r.text)
if b['rap'] > b['best_price']:
difference = b['rap']-b['best_price'];
print(str(idd)+" has a "+str(difference)+ "R$ difference. Price: "+str(b['best_price'])+" //\\ Rap: "+str(b['rap']))
snipe = snipe+1
except:
time.sleep(1)
tries = tries+1
notworking = notworking+1
check(idd, tries)
settitle("Snipes; "+str(snipe)+" //\\ Checked; "+str(checked)+" //\\ Errors; "+str(notworking))
I hope this helps a bit

Perhaps start by using a documented object, ThreadPoolExecutor. ThreadPool is an undocumented language feature.
The docs offer minimal examples to get you started. For your example try the following construction:
from concurrent.futures import ThreadPoolExecutor, as_completed
values_to_test = a()
result_container = []
with ThreadPoolExecutor(max_workers=2) as executor: # set `max_workers` as appropriate
pool = {executor.submit(check, val, tries=0): val for val in values_to_test}
for future in as_completed(pool):
try:
result_container.append(future.result())
except:
pass # handle exceptions here
If you are set on using the map method, you cannot pass 0 as an argument because it is not an iterable; see the method signature.

Related

Python How to check whether the variable state is changed which is shared and edited in another scheduled thread without using while loop to check

My API is to receive users' texts within 900ms and they will be sent to the model to calculate their length (just for a simple demo). I already realized it but the way is ugly. I will open a new background schedule thread. And API receives the query in the main thread, it will put it in the queue which is shared by the main and new thread. And the new thread will schedule get all texts in the queue and send them to the model. After the model calculated them, results are stored in a shared dict. In the main thread, get_response method will use a while loop to check the result in the shared dict, my question is how can I get rid of the while loop in get_response method. I wanna an elegant method. Thx!
this is server code, need to remove while sleep in get-response because it's ugly :
import asyncio
import uuid
from typing import Union, List
import threading
from queue import Queue
from fastapi import FastAPI, Request, Body, APIRouter
from fastapi_utils.tasks import repeat_every
import uvicorn
import time
import logging
import datetime
logger = logging.getLogger(__name__)
app = APIRouter()
def feed_data_into_model(queue,shared_dict,lock):
if queue.qsize() != 0:
data = []
ids = []
while queue.qsize() != 0:
task = queue.get()
task_id = task[0]
ids.append(task_id)
text = task[1]
data.append(text)
result = model_work(data)
# print("model result:",result)
for index,task_id in enumerate(ids):
value = result[index]
handle_dict(task_id,value,action = "put",lock=lock, shared_dict = shared_dict)
class TestThreading(object):
def __init__(self, interval, queue,shared_dict,lock):
self.interval = interval
thread = threading.Thread(target=self.run, args=(queue,shared_dict,lock))
thread.daemon = True
thread.start()
def run(self,queue,shared_dict,lock):
while True:
# More statements comes here
# print(datetime.datetime.now().__str__() + ' : Start task in the background')
feed_data_into_model(queue,shared_dict,lock)
time.sleep(self.interval)
if __name__ != "__main__":
# since uvicorn will init and reload the file, and __name__ will change, not as __main__, so I init variable here
# otherwise, we will have 2 background thread (one is empty) , it doesn't run but hard to debug due to the confusion
global queue, shared_dict, lock
queue = Queue(maxsize=64) #
shared_dict = {} # model result saved here!
lock = threading.Lock()
tr = TestThreading(0.9, queue,shared_dict,lock)
def handle_dict(key, value = None, action = "put", lock = None, shared_dict = None):
lock.acquire()
try:
if action == "put":
shared_dict[key] = value
elif action == "delete":
del shared_dict[key]
elif action == "get":
value = shared_dict[key]
elif action == "exist":
value = key in shared_dict
else:
pass
finally:
# Always called, even if exception is raised in try block
lock.release()
return value
def model_work(x:Union[str,List[str]]):
time.sleep(3)
if isinstance(x,str):
result = [len(x)]
else:
result = [len(_) for _ in x]
return result
async def get_response(task_id, lock, shared_dict):
not_exist_flag = True
while not_exist_flag:
not_exist_flag = handle_dict(task_id, None, action= "exist",lock=lock, shared_dict = shared_dict) is False
await asyncio.sleep(0.02)
value = handle_dict(task_id, None, action= "get", lock=lock, shared_dict = shared_dict)
handle_dict(task_id, None, action= "delete",lock=lock, shared_dict = shared_dict)
return value
#app.get("/{text}")
async def demo(text:str):
global queue, shared_dict, lock
task_id = str(uuid.uuid4())
logger.info(task_id)
state = "pending"
item= [task_id,text,state,""]
queue.put(item)
# TODO: await query_from_answer_dict , need to change since it's ugly to while wait the answer
value = await get_response(task_id, lock, shared_dict)
return 1
if __name__ == "__main__":
# what I want to do:
# single process run every 900ms, if queue is not empty then pop them out to model
# and model will save result in thread-safe dict, key is task-id
uvicorn.run("api:app", host="0.0.0.0", port=5555)
client code:
for n in {1..5}; do curl http://localhost:5555/a & ; done
The usual way to run a blocking task in asyncio code is to use asyncio's builtin run_in_executor to handle if for you. You can either setup an executor, or let it do it for you:
import asyncio
from time import sleep
def proc(t):
print("in thread")
sleep(t)
return f"Slept for {t} seconds"
async def submit_task(t):
print("submitting:", t)
res = await loop.run_in_executor(None, proc, t)
print("got:", res)
async def other_task():
for _ in range(4):
print("poll!")
await asyncio.sleep(1)
loop = asyncio.new_event_loop()
loop.create_task(other_task())
loop.run_until_complete(submit_task(3))
Note that if loop is not defined globally, you can get it inside the function with asyncio.get_event_loop(). I've deliberately used a simple example without fastapi/uvicorn to illustrate the point, but the idea is the same: fastapi (etc) just run in the event loop, which is why you define coroutines for the endpoints.
The advantage of this is that we can simply await the response directly, without messing about with awaiting an event and then using some other means (shared dict with mutex, pipe, queue, whatever) to get the result out, which keeps the code clean and readable, and is likely also a good deal quicker. If, for some reason, we want to make sure it runs in processes and not threads we can make our own executor:
from concurrent.futures import ProcessPoolExecutor
e = ProcessPoolExecutor()
...
res = await loop.run_in_executor(e, proc, t)
See the docs for more information.
Another option would be using a multiprocessing.Pool to run the task, and then apply_async. But you can't await multiprocessing futures directly. There is a library aiomultiprocessing to make the two play together but I have no experience with it and cannot see a reason to prefer it over the builtin executor for this case (running a single background task per invocation of the coro).
Lastly do note that the main reason to avoid a polling while loop is not that it's ugly (although it is), but that it's not nearly as performant as almost any other solution.
I think I already got the answer that is using asyncio.event to communicate across threads. Using set, clear, wait and asyncio.get_event_loop().

Python: Terminate Loop Using Timer

I'm quite new on python and working on a school project with this logic: Users have to answer a series of questions as fast as they can, within the given time.
For instance, the time allotted is 30 seconds, I wood loop through a dictionary of questions and get the answer. On timeout, the loop will start, even if the script is still waiting for an input.
def start_test():
for item on questions:
print(item)
answers.append(input(' : '))
I've tried using multiprocessing and multithreading, but I found out that stdin doesn't work subprocesses.
I'm looking for something like:
while duration > 0:
start_test()
def countdown():
global duration
while duration > 0:
duration -= 1
time.sleep(1)
# something lime start_test().stop()
But I can't figure out how to run the countdown function in parallel with the start_test function.
Any ideas?
So as far as I know the input is accessible via main thread only. I might be wrong.
However if that is the case, you need a non-blocking input.
Check this blog. The answer below is based on that.
Note: This is a really quick and dirty solution.
I have checked this on Linux.
If it doesn't work on Windows try this
link for further reference.
import _thread
import sys
import select
import time
def start_test():
questions = ['1','2','3']
answers = []
for item in questions:
print(item)
# Input in a non-blocking way
loop_flag = True
while loop_flag:
# Read documenation and examples on select
ready = select.select([sys.stdin], [], [], 0)[0]
if not ready:
# Check if timer has expired
if timeout:
return answers
else:
for file in ready:
line = file.readline()
if not line: # EOF, input is closed
loop_flag = False
break
elif line.rstrip():
# We have some input
answers.append(line)
# So as to get out of while
loop_flag = False
# Breaking out of for
break
return answers
def countdown():
global timeout
time.sleep(30)
timeout = True
# Global Timeout Flag
timeout = False
timer = _thread.start_new_thread(countdown, ())
answers = start_test()
print(answers)

Python - multiprocessing max # of processes

I would like to create and run at most N processes at once.
As soon as a process is finished, a new one should take its place.
The following code works(assuming Dostuff is the function to execute).
The problem is that I am using a loop and need time.sleep to allow
the processes to do their work. This is rather ineficient.
What's the best method for this task?
import time,multiprocessing
if __name__ == "__main__":
Jobs = []
for i in range(10):
while len(Jobs) >= 4:
NotDead = []
for Job in Jobs:
if Job.is_alive():
NotDead.append(Job)
Jobs = NotDead
time.sleep(0.05)
NewJob = multiprocessing.Process(target=Dostuff)
Jobs.append(NewJob)
NewJob.start()
After a bit of tinkering, I thought about creating new threads and then
launching my processes from these threads like so:
import threading,multiprocessing,time
def processf(num):
print("in process:",num)
now=time.clock()
while time.clock()-now < 2:
pass ##..Intensive processing..
def main():
z = [0]
lock = threading.Lock()
def threadf():
while z[0] < 20:
lock.acquire()
work = multiprocessing.Process(target=processf,args=(z[0],))
z[0] = z[0] +1
lock.release()
work.start()
work.join()
activet =[]
for i in range(2):
newt = threading.Thread(target=threadf)
activet.append(newt)
newt.start()
for i in activet:
i.join()
if __name__ == "__main__":
main()
This solution is better(doesn't slow down the launched processes), however,
I wouldn't really trust code that I wrote in a field I don't know..
I've had to use a list(z = [0]) since an integer was immutable.
Is there a way to embed processf into main()? I'd prefer not needing an additional
global variable. If I try to simply copy/paste the function inside, I get a nasty error(
Attribute error can't pickle local object 'main.(locals).processf')
Why not using concurrent.futures.ThreadPoolExecutor?
executor = ThreadPoolExecutor(max_workers=20)
res = execuror.submit(any_def)

read a variable while multithreading

I get two streams of data from an API, so there are 3 threads, main one, stream1 and stream2. Stream1 and Stream2 need to process this data and once they're done they store them on main_value1 and main_value2.
From main thread I need to read the last value at any given time (so if I need this value and it is still processing then I get the last processed/stored one), what would be the optimal way? from the code example here I need help in coding functions get_main_value1() and, of course, get_main_value2()
def stream1():
while True:
main_value1 = process()
def stream2():
while True:
main_value2 = process2()
def get_main_value1(): ?
def get main_value2(): ?
def main():
threading.Thread(function=stream1,).start()
threading.Thread(function=stream2).start()
while True:
time.sleep(random.randint(0,10))
A = get_main_value1()
B = get_main_value2()
One way would be to make them global:
STREAM1_LAST_VALUE = None
def stream1():
global STREAM1_LAST_VALUE
while True:
main_value1 = process()
STREAM1_LAST_VALUE = main_value1
STREAM2_LAST_VALUE = None
def stream2():
global STREAM2_LAST_VALUE
while True:
main_value2 = process2()
STREAM2_LAST_VALUE = main_value2
def get_main_value1():
return STREAM1_LAST_VALUE
def get main_value2():
return STREAM2_LAST_VALUE
def main():
threading.Thread(function=stream1,).start()
threading.Thread(function=stream2).start()
while True:
time.sleep(random.randint(0,10))
A = get_main_value1()
B = get_main_value2()

Threading and target function in external file (python)

I want to move some functions to an external file for making it clearer.
lets say i have this example code (which does indeed work):
import threading
from time import sleep
testVal = 0
def testFunc():
while True:
global testVal
sleep(1)
testVal = testVal + 1
print(testVal)
t = threading.Thread(target=testFunc, args=())
t.daemon = True
t.start()
try:
while True:
sleep(2)
print('testval = ' + str(testVal))
except KeyboardInterrupt:
pass
now i want to move testFunc() to a new python file. My guess was the following but the global variables don't seem to be the same.
testserver.py:
import threading
import testclient
from time import sleep
testVal = 0
t = threading.Thread(target=testclient.testFunc, args=())
t.daemon = True
t.start()
try:
while True:
sleep(2)
print('testval = ' + str(testVal))
except KeyboardInterrupt:
pass
and testclient.py:
from time import sleep
from testserver import testVal as val
def testFunc():
while True:
global val
sleep(1)
val = val + 1
print(val)
my output is:
1
testval = 0
2
3
testval = 0 (testval didn't change)
...
while it should:
1
testval = 1
2
3
testval = 3
...
any suggestions? Thanks!
Your immediate problem is not due to multithreading (we'll get to that) but due to how you use global variables. The thing is, when you use this:
from testserver import testVal as val
You're essentially doing this:
import testserver
val = testserver.testVal
i.e. you're creating a local reference val that points to the testserver.testVal value. This is all fine and dandy when you read it (the first time at least) but when you try to assign its value in your function with:
val = val + 1
You're actually re-assigning the local (to testclient.py) val variable, not setting the value of testserver.testVal. You have to directly reference the actual pointer (i.e. testserver.testVal += 1) if you want to change its value.
That being said, the next problem you might encounter might stem directly from multithreading - you can encounter a race-condition oddity where GIL pauses one thread right after reading the value, but before actually writing it, and the next thread reading it and overwriting the current value, then the first thread resumes and writes the same value resulting in single increase despite two calls. You need to use some sort of mutex to make sure that all non-atomic operations execute exclusively to one thread if you want to use your data this way. The easiest way to do it is with a Lock that comes with the threading module:
testserver.py:
# ...
testVal = 0
testValLock = threading.Lock()
# ...
testclient.py:
# ...
with testserver.testValLock:
testserver.testVal += 1
# ...
A third and final problem you might encounter is a circular dependency (testserver.py requires testclient.py, which requires testserver.py) and I'd advise you to re-think the way you want to approach this problem. If all you want is a common global store - create it separately from modules that might depend on it. That way you ensure proper loading and initializing order without the danger of unresolveable circular dependencies.

Categories