Choose Items from a List in multithreaded python - python

I am a beginner in python and cant figure out how to do this:
I am running a python script that puts a new value every 5-10 seconds into a list. I want to choose these elements from the list in another multithreaded python script however per thread one value, so one value shouldnt be reused, if theres no next value, then wait until next value is present. I have some code where I tried to do it but with no success:
Script that creates values:
values = ['a','b','c','d','e','f']
cap = []
while True:
cap.append(random.choice(values))
print cap
time.sleep(5)
Script that needs these values:
def adding(self):
p = cap.pop()
print (p)
However in a multithreaded environment, each thread gives me the same value, even thought I want value for each thread to be different (e.g remove value already used by thread) What are my options here?

If I understood correctly, you want to use one thread (a producer) to fill a list with values, and then a few different threads (consumers) to remove from that same list. Thus resulting with a series of consumers which have mutually exclusive subsets of the values added by the producer.
A possible outcome might be:
Producer
cap.append('a')
cap.append('c')
cap.append('b')
cap.append('f')
Consumer 1
cap.pop() # a
cap.pop() # f
Consumer 2
cap.pop() # c
cap.pop() # b
If this is the behavior you want I recommend using a thread-safe object like a Queue (python 2.*) or queue (python 3.*)
Here is one possible implementation
Producer
import Queue
values = ['a','b','c','d','e','f']
q = Queue.Queue()
while True:
q.put(random.choice(values))
print q
time.sleep(5)
Consumer
val = q.get() # this call will block (aka wait) for something to be available
print(val)
Its also very important that both the producer and the consumer have access to the same instance of the of q.

Related

Too many Threads delay Keyboard Inputs in general. (Python) Better Solution?

I'm running a python script that is creating 5 threads on the windows OS.
All of the threads are doing light work (api calls, simple computations)
However if I type on my keyboard, while the script is running, the letters appear with a very noticeable delay.
My assumption: The OS keyboard event listener process and my python script process are sharing the same core and my threads are blocking the core therefore there is less time for other threads (like the key event listener).
I removed a single thread and with 4/5 threads everything works perfectly fine. However I need all 5 threads therefore I tried to also change the CPU affinity of my python script to only use one specific processor core.
p = psutil.Process()
p.cpu_affinity([1])
Now everything works fine however I'm not sure if there is a better solution...
Edit:
Unfortunately I can't share the exact script however I tried to simplify the code but maintain it's meaning:
There are two different message types a user can input (at any time) A and B. They are getting preprocessed and the final result gets appended to the corresponding messgage list. Another thread generates a response (in the mean time new messages could arrive to the reader thread that's why I chose multithreading). After the response is generate the message will be appended to list_C where another thread (response_execution) handles the execution of the response. The last thread is a (on_activate_h) is for precaution. Because I want to cancel the robot at any time with a certain key combination.
def reader():
global list_A
global list_B
# checks constantly for inputs messages and evaluating the messages
# preprocesses input messages reformatting, etc.
# if message is okay append to list_A or list_B (depending on message type)
def response_generator():
global list_A
global list_B
global list_C
while True:
if list_A:
# if new message in list_A generate a corresponding response
# api call involved takes sometimes some time.
response = generate_response_A()
if list_B:
# if new message in list_B generate a corresponding response
# api call involved takes sometimes some time.
response = generate_response_B()
# append final response to list_C
list_C.append(response)
def response_execution():
global list_C
while True:
if list_C:
# api calls involved
# execute the generated response
pass
def on_activate_h():
# cancel execution immediately
robot.stop_execution()
input_thread = Thread(target=reader)
response_generator_thread = Thread(target=response_generator)
response_execution_thread = Thread(target=response_execution)
stop_robot_thread = keyboard.GlobalHotKeys({
'<ctrl>+<alt>': on_activate_h})
input_thread.start()
response_generator_thread.start()
response_execution_thread.start()
stop_robot_thread.start()
input_thread.join()
response_generator_thread.join()
response_execution_thread.join()
stop_robot_thread.join()

Passing updated args to multiple threads periodically in python

I have three base stations, they have to work in parallel, and they will receive a list every 10 seconds which contain information about their cluster, and I want to run this code for about 10 minutes. So, every 10 seconds my three threads have to call the target method with new arguments, and this process should last long for 10 minutes. I don't know how to do this, but I came up with the below idea which seems to be not quite a good one! Thus I appreciate any help.
I have a list named base_centroid_assign that I want to pass each item of it to a distinct thread. The list content will be updated frequently (supposed for instance 10 seconds), I so wish to recall my previous threads and give the update items to them.
In the below code, the list contains three items which have multiple items in them (it's nested). I want to have three threads stop after executing the quite simple target function, and then recall the threads with update item; however, when I run the below code, I ended up with 30 threads! (the run_time variable is 10 and list's length is 3).
How can I implement idea as mentioned above?
run_time = 10
def cluster_status_broadcasting(info_base_cent_avr):
print(threading.current_thread().name)
info_base_cent_avr.sort(key=lambda item: item[2], reverse=True)
start = time.time()
while(run_time > 0):
for item in base_centroid_assign:
t = threading.Thread(target=cluster_status_broadcasting, args=(item,))
t.daemon = True
t.start()
print('Entire job took:', time.time() - start)
run_time -= 1
Welcome to Stackoverflow.
Problems with thread synchronisation can be so tricky to handle that Python already has some very useful libraries specifically to handle such tasks. The primary such library is queue.Queue in Python 3. The idea is to have a queue for each "worker" thread. The main thread collect and put new data onto a queue, and have the subsidiary threads get the data from that queue.
When you call a Queue's get method its normal action is to block the thread until something is available, but presumably you want the threads to continue working on the current inputs until new ones are available, in which case it would make more sense to poll the queue and continue with the current data if there is nothing from the main thread.
I outline such an approach in my answer to this question, though in that case the worker threads are actually sending return values back on another queue.
The structure of your worker threads' run method would then need to be something like the following pseudo-code:
def run(self):
request_data = self.inq.get() # Wait for first item
while True:
process_with(request_data)
try:
request_data = self.inq.get(block=False)
except queue.Empty:
continue
You might like to add logic to terminate the thread cleanly when a sentinel value such as None is received.

How to use multiple, but a limited number of threads in Python to process a list

I have a dataframe, several thousand rows in length, that contains two pairs of GPS coordinates in one of the columns, with which I am trying to calculate the drive time between those coordinates. I have a function that takes in those coordinates and returns the drive time and it takes maybe 3-8 seconds to calculate each entry. So, the total process can take quite a while. What I'd like to be able to do is: using maybe 3-5 threads, iterate through the list and calculate the drive time and move on to the next entry while the other threads are completing and not creating more than 5 threads in the process. Independently, I have everything working - I can run multiple threads, I can track the thread count and wait until the max number of allowed threads drops below limit until the next starts and can iterate the dataframe and calculate the drive time. However, I'm having trouble piecing it all together. Here's an edited, slimmed down version of what I have.
import pandas
import threading
import arcgis
class MassFunction:
#This is intended to keep track of the active threads
MassFunction.threadCount = 0
def startThread(functionName,params=None):
#This kicks off a new thread and should count up to keep track of the threads
MassFunction.threadCount +=1
if params is None:
t = threading.Thread(target=functionName)
else:
t = threading.Thread(target=functionName,args=[params])
t.daemon = True
t.start()
class GeoAnalysis:
#This class handles the connection to the ArcGIS services
def __init__(self):
super(GeoAnalysis, self).__init__()
self.my_gis = arcgis.gis.GIS("https://www.arcgis.com", username, pw)
def drivetimeCalc(self, coordsString):
#The coords come in as a string, formatted as 'lat_1,long_1,lat_2,long_2'
#This is the bottleneck of the process, as this calculation/response
#below takes a few seconds to get a response
points = coordsString.split(", ")
route_service_url = self.my_gis.properties.helperServices.route.url
self.route_layer = arcgis.network.RouteLayer(route_service_url, gis=self.my_gis)
point_a_to_point_b = "{0}, {1}; {2}, {3}".format(points[1], points[0], points[3], points[2])
result = self.route_layer.solve(stops=point_a_to_point_b,return_directions=False, return_routes=True,output_lines='esriNAOutputLineNone',return_barriers=False, return_polygon_barriers=False,return_polyline_barriers=False)
travel_time = result['routes']['features'][0]['attributes']['Total_TravelTime']
#This is intended to 'remove' one of the active threads
MassFunction.threadCount -=1
return travel_time
class MainFunction:
#This is to give access to the GeoAnalysis class from this class
GA = GeoAnalysis()
def closureDriveTimeCalc(self,coordsList):
#This is intended to loop in the event that a fifth loop gets started and will prevent additional threads from starting
while MassFunction.threadCount > 4:
pass
MassFunction.startThread(MainFunction.GA.drivetimeCalc,coordsList)
def driveTimeAnalysis(self,location):
#This reads a csv file containing a few thousand entries.
#Each entry/row contains gps coordinates, which need to be
#iterated over to calculate the drivetimes
locationMemberFile = pandas.read_csv(someFileName)
#The built-in apply() method in pandas seems to be the
#fastest way to iterate through the rows
locationMemberFile['DRIVETIME'] = locationMemberFile['COORDS_COL'].apply(self.closureDriveTimeCalc)
When I run this right now, using VS Code, I can see the thread counts go up into the thousands in the call stack, so I feel like it is not waiting for the thread to finish and adding/subtracting from the threadCount value. Any ideas/suggestions/tips would be much appreciated.
EDIT: Essentially my problem is how do I get the travel_time value back so that it can be placed into the dataframe. I currently have no return statement for closureDriveTimeCalc function, so while the function runs correctly, it doesn't send any information back into the apply() method.
Rather than do this in an apply, I'd use multiprocessing Pool.map:
from multiprocessing import Pool
with Pool(processes=4) as pool:
locationMemberFile['DRIVETIME'] = pool.map(self.closureDriveTimeCalc, locationMemberFile['COORDS_COL']))

How do I continue iterating without having to wait output to finish?

import sqlite3
conn = sqlite3.connect('output.db')
count = 0
items = []
for item in InfStream: # assume I have an infinite stream
items.append((item,))
count += 1
if count == 10000:
conn.executemany("INSERT INTO table VALUES (?)", items)
conn.commit()
items = []
In this Python code, I have a stream of unknown length called InfStream from an API and I would like to insert the item in the stream to a table in a sqlite database. In this case, I firstly create a list of 10,000 items and then insert into the db using executemany. This will take around 1 hour. However, the code has a problem, when executemany is running, I have to wait around 15 seconds to finish. This is not acceptable in my case because, I need to keep getting the item from the stream, or otherwise it will be disconnected if I delay too long.
I would like the loop continues while executemany is running at the same time. Is it possible to do so?
nb. Input is far slower than the write. 10,000 items from input will take around 1 hour and output is only 15 seconds.
This is a classic Producer–consumer problem that can best be handled using Queue.
The Producer in this case is your InfStream, and the consumer is everything within your for block.
It would be straight forward to convert your sequential code to a multi-threaded Producer-Consumer Model and using Queue for dispatching data between the threads
Consider your Code
import sqlite3
conn = sqlite3.connect('output.db')
count = 0
items = []
for item in InfStream: # assume I have an infinite stream
items.append((item,))
count += 1
if count == 10000:
conn.executemany("INSERT INTO table VALUES (?)", items)
conn.commit()
items = []
Create a Consumer function, to consume the data
def consumer(q):
def helper():
while True:
items = [(q.get(),) for _ in range(10000)]
conn.executemany("INSERT INTO table VALUES (?)", items)
conn.commit()
return helper
And a Producer Function to produce it until infinitum
def producer():
q = Queue()
t = Thread(target=consumer(q))
t.daemon = True
t.start()
for item in InfStream:
q.put(item)
q.task_done()
Additional Notes in response to the comments
Theoretically, the queue can scale to infinite size, limited by system resource.
If the consumer cannot keep pace with producer
Span Multiple Consumer
ache the Data in a faster IO device and flush it later to the database.
Make the Count configurable and dynamic.
It sounds like executemany is blocked on IO, so threading might actually help here, so I would try that first. In particular, create a separate thread which will simply call executemany on data that the first threads throws onto a shared queue. Then, the first read can keep reading, while the second thread does the executemany. As the other answer pointed out, this is a Producer-Consumer problem.
If that does not solve the problem, switch to multiprocessing.
Note that if your input is flowing in more quickly than you can write in the second thread or process, then neither solution will work, because you will fill up memory faster than you can empty it. In that case, you will have to throttle the input reading rate, regardless.

Picking up items progressivly as soon as a queue is available

I am looking for a solid implementation to allow me to progressively work through a list of items using Queue.
The idea is that I want to use a set number of workers that willgo through a list of 20+ database intensive tasks and return the result. I want Python to start with the five first items and as soon as it's done with one task starts on the next task in the queue.
This is how I am currently doing it without Threading.
for key, v in self.sources.iteritems():
# Do Stuff
I would like to have a similar approach, but possibly without having to split the list up into subgroups of five. So that it automatically will pick up the next item in the list. The goal is to make sure that if one database is slowing down the process, it will not have a negative impact on the whole application.
You can implement this yourself, but Python 3 already comes with an Executor-based solution for thread management, which you can use in Python 2.x by installing the backported version.
Your code could then look like
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_key = {}
for key, value in sources.items():
future_to_idday[executor.submit(do_stuff, value)] = key
for future in concurrent.futures.as_completed(future_to_key):
key = future_to_key[future]
result = future.result()
# process result
If you are using python3, I recommend the concurrent futures module. If you are not using python3 and are not attached to threads (versus processes) then you might try multiprocessing.Pool (though it comes with some caveats and I have had trouble with pools not closing properly in my applications). If you must use threads, in python2, you might end up writing code yourself - spawn 5 threads running consumer functions and simply push the calls (function + args) into the queue iteratively for the consumers to find and process them.
You could do it using only stdlib:
#!/usr/bin/env python
from multiprocessing.dummy import Pool # use threads
def db_task(key_value):
try:
key, value = key_value
# compute result..
return result, None
except Exception as e:
return None, e
def main():
pool = Pool(5)
for result, error in pool.imap_unordered(db_task, sources.items()):
if error is None:
print(result)
if __name__=="__main__":
main()

Categories