Is it right to init multiprocess in class __init__? - python

from multiprocessing.dummy import Pool as ThreadPool
class TSNew:
def __init__(self):
self.redis_client = redis.StrictRedis(host="172.17.31.147", port=4401, db=0)
self.global_switch = 0
self.pool = ThreadPool(40) # init pool
self.dnn_model = None
self.nnf = None
self.md5sum_nnf = "initialize"
self.thread = threading.Thread(target=self.load_model_item)
self.ts_picked_ids = None
self.thread.start()
self.memory = deque(maxlen=3000)
self.process = threading.Thread(target=self.process_user_dict)
self.process.start()
def load_model_item(self):
'''
code
'''
def predict_memcache(self,user_dict):
'''
code
'''
def process_user_dict(self):
while True:
'''
code to generate user_dicts which is a list
'''
results = self.pool.map(self.predict_memcache, user_dicts)
'''
code
'''
TSNew_ = TSNew()
def get_user_result():
logging.info("----------------come in ------------------")
if request.method == 'POST':
user_dict_json = request.get_data()# userid
if user_dict_json == '' or user_dict_json is None:
logging.info("----------------user_dict_json is ''------------------")
return ''
try:
user_dict = json.loads(user_dict_json)
except:
logging.info("json load error, pass")
return ''
TSNew_.memory.append(user_dict)
logging.info('add to deque TSNew_.memory size: %d PID: %d', len(TSNew_.memory), os.getpid())
logging.info("add to deque userid: %s, nation: %s \n",user_dict['user_id'], user_dict['user_country'])
return 'SUCCESS\n'
#app.route('/', methods=['POST'])
def get_ts_gbdt_id():
return get_user_result()
from werkzeug.contrib.fixers import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=4444)
I create a multi thread pool in class __init__ and I use the self.pool
to map the function of predict_memcache.
I have two doubts:
(a) Should I initialize the pool in __init__ or just init it right before
results = self.pool.map(self.predict_memcache, user_dicts)
(b) Since the pool is a multi thread operation and it is executed in the thread of process_user_dict, so is there any hidden error ?
Thanks.

Question (a):
It depends. If you need to run process_user_dict more than once, then it makes sense to start the pool in the constructor and keep it running. Creating a thread pool always comes with some overhead and by keeping the pool alive between calls to process_user_dict you would avoid that additional overhead.
If you just want to process one set of input, you can as well create your pool right inside process_user_dict. But probably not right before results = self.pool.map(self.predict_memcache, user_dicts) because that would create a pool for every iteration of your surrounding while loop.
In your specific case, it does not make any difference. You create your TSNew_ object on module-level, so that it remains alive (and with it the thread pool) while your app is running; the same thread pool from the same TSNew instance is used to process all the requests during the lifetime of app.run().
Since you seem to be using that construct with self.process = threading.Thread(target=self.process_user_dict) as some sort of listener on self.memory, creating the pool in the constructor is functionally equivalent to creating the pool inside of process_user_dict (but outside the loop).
Question (b):
Technically, there is no hidden error by default when creating a thread inside a thread. In the end, any additional thread's ultimate parent is always the MainThread, that is implicitly created for every instance of a Python interpreter. Basically, every time you create a thread inside a Python program, you create a thread in a thread.
Actually, your code does not even create a thread inside a thread. Your self.pool is created inside the MainThread. When the pool is instantiated via self.pool = ThreadPool(40) it creates the desired number (40) of worker threads, plus one worker handler thread, one task handler thread and one result handler thread. All of these are child threads of the MainThread. All you do with regards to your pool inside your thread under self.process is calling its map method to assign tasks to it.
However, I do not really see the point of what you are doing with that self.process here.
Making a guess, I would say that you want to start the loop in process_user_dict to act as kind of a listener on self.memory, so that the pool starts processing user_dict as soon as they start showing up in the deque in self.memory. From what I see you doing in get_user_result, you seem to get one user_dict per request. I understand that you might have concurrent user sessions passing in these dicts, but do you really see benfit from process_user_dict running in an infinite loop over simply calling TSNew_.process_user_dict() after TSNew_.memory.append(user_dict)? You could even omit self.memory completely and pass the dict directly to process_user_dict, unless I am missing something you did not show us.

Related

Run only one Instance of a Thread

I am pretty new to Python and have a question about threading.
I have one function that is called pretty often. This function starts another function in a new Thread.
def calledOften(id):
t = threading.Thread(target=doit, args=(id))
t.start()
def doit(arg):
while true:
#Long running function that is using arg
When calledOften is called everytime a new Thread is created. My goal is to always terminate the last running thread --> At all times there should be only one running doit() Function.
What I tried:
How to stop a looping thread in Python?
def calledOften(id):
t = threading.Thread(target=doit, args=(id,))
t.start()
time.sleep(5)
t.do_run = False
This code (with a modified doit Function) worked for me to stop the thread after 5 seconds.
but i can not call t.do_run = False before I start the new thread... Thats pretty obvious because it is not defined...
Does somebody know how to stop the last running thread and start a new one?
Thank you ;)
I think you can decide when to terminate the execution of a thread from inside the thread by yourself. That should not be creating any problems for you. You can think of a Threading manager approach - something like below
import threading
class DoIt(threading.Thread):
def __init__(self, id, stop_flag):
super().__init__()
self.id = id
self.stop_flag = stop_flag
def run(self):
while not self.stop_flag():
pass # do something
class CalledOftenManager:
__stop_run = False
__instance = None
def _stop_flag(self):
return CalledOftenManager.__stop_run
def calledOften(self, id):
if CalledOftenManager.__instance is not None:
CalledOftenManager.__stop_run = True
while CalledOftenManager.__instance.isAlive():
pass # wait for the thread to terminate
CalledOftenManager.__stop_run = False
CalledOftenManager.__instance = DoIt(id, CalledOftenManager._stop_flag)
CalledOftenManager.__instance.start()
# Call Manager always
CalledOftenManager.calledOften(1)
CalledOftenManager.calledOften(2)
CalledOftenManager.calledOften(3)
Now, what I tried here is to make a controller for calling the thread DoIt. Its one approach to achieve what you need.

Select method to execute inside thread before starting thread

I created a small flask app to download images and text from pages, this can take verly long time, so
I would like to execute my requests in parell. I create threaded tasks. I would like this tasks to be able to download text or images from sites. I keep my tasks in a list of workers.
However I would like to select a method which thread will execute and then start whole thread.
How can I pass my method to thread run method()? Will this be a sub daemon thread?
import threading
import time
workers = []
class SavePage:
def get_text(self):
print("Getting text")
def get_images(self):
print("Getting images")
class Task(threading.Thread):
def __init__(self):
super().__init__()
self.save_page = SavePage()
def get_text_from_page(self):
self.save_page.get_text()
def get_images_from_page(self):
self.save_page.get_images()
if __name__ == '__main__':
task = Task()
task.get_images_from_page() # Why this executes, when I didn't put task.start() ?
# Moreover, is this really threaded? or just uses a method from class Task?
workers.append(task) # I want this list to be empty, after job is finished
print("".join(str(worker.is_alive()) for worker in workers)) #
print(workers)
task.get_images_from_page() # Why this executes, when I didn't put task.start() ?
# Moreover, is this really threaded? or just uses a method from class Task?
It's not threaded. It's just a normal method call in the main thread.
Thread.start is the method that will start Thread.run function inside another thread.
You could set some state in __init__ to choose which function to execute:
class Task(threading.Thread):
def __init__(self, action):
super().__init__()
self.save_page = SavePage()
self.action = action
def get_text_from_page(self):
self.save_page.get_text()
def get_images_from_page(self):
self.save_page.get_images()
def run(self):
if self.action == "text":
self.get_text_from_page()
elif self.action == "images":
self.get_images_from_page()
Keep in mind that threads can be run in simpler way by passing target function:
def target_func():
save_page = SavePage()
save_page.get_images()
t = threading.Thread(target=target_func)
t.start()
# or in this simple case:
save_page = SavePage()
t = threading.Thread(target=save_page.get_images)
t.start()

Thread cue is running serially, not parallel?

I'm making remote API calls using threads, using no join so that the program could make the next API call without waiting for the last to complete.
Like so:
def run_single_thread_no_join(function, args):
thread = Thread(target=function, args=(args,))
thread.start()
return
The problem was I needed to know when all API calls were completed. So I moved to code that's using a cue & join.
Threads seem to run in serial now.
I can't seem to figure out how to get the join to work so that threads execute in parallel.
What am I doing wrong?
def run_que_block(methods_list, num_worker_threads=10):
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param num_worker_threads: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
# log = StandardLogger(logger_name='run_que_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
print(item)
msg = threading.current_thread().name, item
# log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
while True:
item = q.get()
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(num_worker_threads):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method[0](*method[1]))
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
return method_returns
You're doing all the work in the main thread:
for method in methods_list:
q.put(method[0](*method[1]))
Assuming each entry in methods_list is a callable and a sequence of arguments for it, you did all the work in the main thread, then put the result from each function call in the queue, which doesn't allow any parallelization aside from printing (which is generally not a big enough cost to justify thread/queue overhead).
Presumably, you want the threads to do the work for each function, so change that loop to:
for method in methods_list:
q.put(method) # Don't call it, queue it to be called in worker
and change the _worker function so it calls the function that does the work in the thread:
def _worker():
while True:
method, args = q.get() # Extract and unpack callable and arguments
item = method(*args) # Call callable with provided args and store result
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()

Run a script with web server and maintain data to be used later

Note: I want to implement this without using any framework.
I have to create an web application using python. The application should maintain a running average of the CPU usage for each process over the past 60 seconds. It should should act as a web server and when it gets a request, it should return the current average for each process. Following are the scripts I've written. record_usage.py is a script which I want to run as soon as the server.py is run. So that it runs and maintain the cpu usage data, which I intend to read whenever I get an XHR request and send it back to the client.
So, my problem is how do I invoke this requirement? I tried running record_usage.py using subprocess.POPEN after starting the server. record_usage.py starts running in background as well. But when I try accessing the data created by it, the class object I create is not the one it uses but a new one. How to complete this link?
Kindly ask things that I could not make clear.
Latest changes in server.py
if __name__ == '__main__':
RU_OBJ = RU(settings.SAMPLING_FREQ, settings.AVG_INTERVAL)
RU_LOCK = RLock()
# Record CPU usage in a thread.
ru_thread = Thread(target=RU_OBJ.record, args=(RU_LOCK,))
ru_thread.daemon = True
ru_thread.start()
# Run server.
run()
Latest change in record_usage.py
def record(self, lock):
while True:
with lock:
self.add_processes()
time.sleep(self.sampling_freq)
Is this a proper way of applying locks? A similar lock is being applied when am reading the processes information. Would it work?
Added the functions:
def add_processes(self,):
for _process in psutil.process_iter():
try:
new_proc = _process.as_dict(attrs=['cpu_times', 'name', 'pid',
'status'])
except psutil.NoSuchProcess:
continue
pid, (user, _sys) = new_proc['pid'], new_proc.pop('cpu_times')
# Get or create details object for the process.
existing = self.processes.setdefault(pid, new_proc)
# Get or create queue object for the CPU times of the process.
queue_dict = self.process_queue.setdefault(pid, dict())
# User CPU time.
user_q = queue_dict.setdefault('user_q', PekableQueue(self.avg_interval))
user_q.enqueue(user)
user_avg = get_avg(user_q)
# System CPU time.
sys_q = queue_dict.setdefault('sys_q', PekableQueue(self.avg_interval))
sys_q.enqueue(_sys)
sys_avg = get_avg(sys_q)
# Update the details object for the process.
existing.update(user_avg=user_avg, sys_avg=sys_avg, **new_proc)
def get_curr_processes(self):
return [self.processes[pid] for pid in psutil.get_pid_list()
if pid in self.processes]
To collect statistics in another thread:
if __name__ == '__main__':
from threading import Thread, Lock
import record_usage
lock = Lock()
t = Thread(target=record_usage.record, args=[lock])
t.daemon = True
t.start()
run(lock)
If you change some shared data in one thread and read it in another then you could protect the places where you access/change the value with a lock:
#...
with self.lock:
existing = self.processes.setdefault(pid, new_proc)
#...
with self.lock:
existing.update(user_avg=user_avg, sys_avg=sys_avg, **new_proc)
#...
def get_curr_processes(self):
with self.lock:
return [self.processes[pid] for pid in psutil.get_pid_list()
if pid in self.processes]
It is essential that self.lock is the same object in all threads. If self.processes is a dict then you don't need to use a lock in CPython. The methods are implemented in C and the interpreter doesn't release GIL (global lock) while calling them i.e., only one thread at a time accesses the dict.

How can I invoke a thread multiple times in Python?

I'm sorry if it is a stupid question. I am trying to use a number of classes of multi-threading to finish different jobs, which involves invoking these multi-threadings at different times for many times. But I am not sure which method to use. The code looks like this:
class workers1(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
do some stuff
class workers2(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
do some stuff
class workers3(Thread):
def __init__(self):
Thread.__init__(self)
def run(self):
do some stuff
WorkerList1=[workers1(i) for i in range(X)]
WorkerList2=[workers2(i) for i in range(XX)]
WorkerList2=[workers3(i) for i in range(XXX)]
while True:
for thread in WorkerList1:
thread.run (start? join? or?)
for thread in WorkerList2:
thread.run (start? join? or?)
for thread in WorkerList3:
thread.run (start? join? or?)
do sth .
I am trying to have all the threads in all the WorkerList to start functioning at the same time, or at least start around the same time. After sometime once they were all terminated, I would like to invoke all the threads again.
If there were no loop, I can just use .start; but since I can only start a thread once, start apparently does not fit here. If I use run, it seems that all the threads start sequentially, not only the threads in the same list, but also threads from different lists.
Can anyone please help?
there are a lot of misconceptions here:
you can only start a specific instance of a thread once. but in your case, the for loop is looping over different instances of a thread, each instance being assigned to the variable thread in the loop, so there is no problem at all in calling the start() method over each thread. (you can think of it as if the variable thread is an alias of the Thread() object instantiated in your list)
run() is not the same as join(): calling run() performs as if you were programming sequentially. the run() method does not start a new thread, it simply execute the statements in in the method, as for any other function call.
join() does not start executing anything: it only waits for a thread to finish. in order for join() to work properly for a thread, you have to call start() on this thread first.
additionally, you should note that you cannot restart a thread once it has finished execution: you have to recreate the thread object for it to be started again. one workaround to get this working is to call Thread.__init__() at the end of the run() method. however, i would not recommend doing this since this will disallow the use of the join() method to detect the end of execution of the thread.
If you would call thread.start() in the loops, you would actually start every thread only once, because all the entries in your list are distinct thread objects (it does not matter they belong to the same class). You should never call the run() method of a thread directly -- it is meant to be called by the start() method. Calling it directly would not call it in a separate thread.
The code below creates a class that is just a thread but the start and calls the initialization of the Thread class again so that the thread doesn't know it has been called.
from threading import Thread
class MTThread(Thread):
def __init__(self, name = "", target = None):
self.mt_name = name
self.mt_target = target
Thread.__init__(self, name = name, target = target)
def start(self):
super().start()
Thread.__init__(self, name = self.mt_name, target = self.mt_target)
def run(self):
super().run()
Thread.__init__(self, name = self.mt_name, target = self.mt_target)
def code():
#Some code
thread = MTThread(name = "SomeThread", target = code)
thread.start()
thread.start()
I had this same dilemma and came up with this solution which has worked perfectly for me. It also allows a thread-killing decorator to be used efficiently.
The key feature is the use of a thread refresher which is instantiated and .started in main. This thread-refreshing thread will run a function that instantiates and starts all other (real, task-performing) threads. Decorating the thread-refreshing function with a thread-killer allows you to kill all threads when a certain condition is met, such as main terminating.
#ThreadKiller(arg) #qu'est-ce que c'est
def RefreshThreads():
threadTask1 = threading.Thread(name = "Task1", target = Task1, args = (anyArguments))
threadTask2 = threading.Thread(name = "Task2", target = Task2, args = (anyArguments))
threadTask1.start()
threadTask2.start()
#Main
while True:
#do stuff
threadRefreshThreads = threading.Thread(name = "RefreshThreads", target = RefreshThreads, args = ())
threadRefreshThreads.start()
from threading import Thread
from time import sleep
def runA():
while a==1:
print('A\n')
sleep(0.5)
if __name__ == "__main__":
a=1
t1 = Thread(target = runA)
t1.setDaemon(True)
t1.start()
sleep(2)
a=0
print(" now def runA stops")
sleep(3)
print("and now def runA continue")
a=1
t1 = Thread(target = runA)
t1.start()
sleep(2)

Categories