Run a function from another module in new thread? - python

I created a simple plugin system for my application and for now, I want to run each plugin in a new thread.
Here is a part of my code:
def newThread(self, f, args=()):
t = threading.Thread(target=f, args=args)
t.deamon = True
t.start()
return t
print "s"
for mod in imported_modules:
if 'init' in vars(mod):
newThread(mod.init, None)
print 1
One of my plugins is a TCP server that is listening on the socket . If I run it in the main thread, the application doesn't load other plugins and wait until the server stops!
Also the above code does not run the init function on my plugin.
Now the question is:
How to call an external function in a new thread ?
Thanks in advance!

The problem is that when we are trying to create a new thread, we should pass args to the method we want to call it in new thread. If it doesn't get any params, we should pass it an empty tuple like this:
newThread(mod.init, ())

Related

Dash running in a PyQt QThread() - most appropriate way to re-initialize with new data

At the moment I have a PyQt5 app running Dash in a QThread like so:
class MapView(QObject):
message = pyqtSignal(str)
shutting_down = pyqtSignal(bool)
def __init__(self, data):
super().__init__()
self.app = dash.Dash(__name__)
self.data = data
#manual callbacks
self.app.callback(
Output('hover-data', 'children'),
Input('basic-interactions', 'hoverData'))(self.display_hover_data)
self.app.callback(
Output('page-content', 'children'),
Input('url', 'pathname'))(self.shutdown)
#...more callbacks
def shutdown(self, pathname):
if pathname != '/shutdown':
return
print("Trying to shutdown map dash")
func = request.environ.get('werkzeug.server.shutdown')
if func is None:
raise RuntimeError('Not running with the Werkzeug Server')
func()
self.shutting_down.emit(True)
#.....lots of stuff like graphs, the layout, a function to start the server etc
and then the threading code in my main pyqt5 app is:
def show_map(self):
self.mapthread = QThread()
self.map = map_dash.MapView(self.data)
self.map.moveToThread(self.mapthread)
self.map.message.connect(self.update_output)
self.map.shutting_down.connect(self.close_map_thread)
self.mapthread.finished.connect(self.open_project)
self.mapthread.started.connect(self.map.run)
self.mapthread.start()
self.browser.setUrl(QUrl('http://127.0.0.1:8050'))
self.update_output("Map plotted.")
My initial problem here was that if I tried to run show_map() with new data when Dash was already running (the user goes to File -> Open when a project is already running) , I would get the fatal Qthread: destroyed while thread is still running. So I embarked along the path of shutting down the flask server, then closing the thread, then going back to the function that opens a new project. It's like this:
User goes to File -> Open
def open_project() checks mapthread.isRunning()
It's not running so it opens a new project, creates a new QThread and new MapView instance
User goes to File -> Open again
The check in (2) returns True so the flask server is asked to shut down
After the server has shut down, the shutting_down signal causes the thread to be asked to quit() (I'm not sure how robust this is because it isn't a signal from flask, just a line after I've asked it to shut down. But it seems to work for now).
Once the thread has finished, the thread emits finished() which calls open_project() again
open_project this time sees that the thread is not running and allows the user to open a new file.
4 to 8 doesn't take too long to run but it all seems a bit complicated and the Dash layout glitches a bit for whatever reason. As it stands, when the QThread finishes under any other circumstances it would call open_project (although I could probably work around that). Is there a better way to do this? Can I feed the existing map instance new data somehow? I have read the documentation on dcc.Interval but that doesn't seem a great way to do it either...
Update (as per comments below):
Now what I'm doing is passing new data to the thread self.map.data = new_data, then using a callback on url to re-draw the map and refresh the layout. Exactly like this:
elif pathname == '/refresh':
self.draw_map()
self.set_layout()
But here is the problem. self.set_layout() refreshes dash with the OLD figure. I have verified that self.draw_map() is drawing a new figure. But calling /refresh a second time does cause dash to use the new figure. So is dash storing the old figure in a cache that hasn't been updated in the fraction of a second it takes to set the layout again? How can I make dash wait for the cache to be updated?
One way to do this is to pass the new data into the dash instance
self.map.data = new_data
but this does not refresh the layout and you can't simply call
self.map.set_layout() from the main thread. Instead you have to use the url callback like this:
def shutdown(self, pathname):
if pathname == '/shutdown':
print("Trying to shutdown map dash")
func = request.environ.get('werkzeug.server.shutdown')
if func is None:
raise RuntimeError('Not running with the Werkzeug Server')
func()
self.shutting_down.emit(True)
elif pathname == '/refresh':
self.draw_map()
self.set_layout()
of course it would be more appropriate to call that method something like handle_url() now.
But there is a further problem. I'm guessing due to some caching issue, dash displays the old version of the figure unless the user manually calls refresh again. The solution is to add a dcc.Interval callback that updates the figure (it calls draw_map()). It doesn't need to be doing this every second, you can set the interval to 999999, for example. In fact you can set 'max_intervals=1' and it still works. As long as the callback is there, it will be called the first time the page refreshes and the figure will be updated.
In fact, with that in place, the draw_map() and set_layout() functions don't even need to be called in the url callback. So the code now looks like this:
def shutdown(self, pathname):
if pathname == '/shutdown':
print("Trying to shutdown map dash")
func = request.environ.get('werkzeug.server.shutdown')
if func is None:
raise RuntimeError('Not running with the Werkzeug Server')
func()
self.shutting_down.emit(True)
elif pathname == '/refresh':
print("I'm doing nothing")
And it works. But the url callback itself IS necessary.

How to redirect logs from secondary threads in Azure Functions using Python

I am using Azure functions to run a Python script that launches multiple threads (for performance reasons). Everything is working as expected, except for the fact that only the info logs from the main() thread appear on the Azure Functions log.
All the logs that I am using in the "secondary" threads that I start in main() do not appear in the Azure Functions logs.
Is there a way to ensure that the logs from the secondary threads show on the Azure Functions log?
The modules that I am using are "logging" and "threading".
I am using Python 3.6; I have already tried to lower the logging level in the secondary threads, but this did not help unfortunately.
The various secondary thread functions are in different modules.
My function has a structure similar to the following pseudo-code:
def main()->None:
logging.basicConfig(level=logging.INFO)
logging.info("Starting the process...")
thread1 = threading.Thread(target=foo,args=("one arg",))
thread2 = threading.Thread(target=foo,args=("another arg",))
thread3 = threading.Thread(target=foo,args=("yet another arg",))
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
return
# in another module
def foo(st:str)->None:
logging.basicConfig(level=logging.INFO)
logging.info(f"Starting thread for arg {st}")
The current Azure log output is:
INFO: Starting the process...
INFO: "All threads started successfully!"
I would like it to be something like:
INFO: Starting the process...
INFO: Starting thread for arg one arg
INFO: Starting thread for arg another arg
INFO: Starting thread for arg yet another arg
INFO: All threads started successfully!
(of course the order of the secondary threads could be anything)
Azure functions Python worker framework sets AsyncLoggingHandler as a handler to the root logger. From this handler to its destination it seems logs are filtered along the path by an invocation_id.
An invocation_id is set if the framework starts threads itself, as it does for the main sync function. On the other hand if we start threads ourselves from the main function, we must set the invocation_id in the started thread for the logs to reach its destination.
This azure_functions_worker.dispatcher.get_current_invocation_id function checks if the current thread has a running event loop. If no running loop is found, it just checks azure_functions_worker.dispatcher._invocation_id_local, which is thread local storage, for an attribute named v for the value of invocation_id.
Because the threads we start doesn't have a running event loop, we have to get invocation_id from the context and set it on azure_functions_worker.dispatcher._invocation_id_local.v in every thread we start.
The invocation_id is made available by the framework in context parameter of main function.
Tested it on Ubuntu 18.04, azure-functions-core-tools-4 and Python 3.8.
import sys
import azure.functions as func
import logging
import threading
# import thread local storage
from azure_functions_worker.dispatcher import (
_invocation_id_local as tls,
)
def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
logging.info("Starting the process...")
thread1 = threading.Thread(
target=foo,
args=(
context,
"one arg",
),
)
thread2 = threading.Thread(
target=foo,
args=(
context,
"another arg",
),
)
thread3 = threading.Thread(
target=foo,
args=(
context,
"yet another arg",
),
)
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
name = req.params.get("name")
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get("name")
if name:
return func.HttpResponse(
f"Hello, {name}. This HTTP triggered function executed successfully."
)
else:
return func.HttpResponse(
"This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
status_code=200,
)
# in another module
def foo(context, st: str) -> None:
# invocation_id_local = sys.modules[
# "azure_functions_worker.dispatcher"
# ]._invocation_id_local
# invocation_id_local.v = context.invocation_id
tls.v = context.invocation_id
logging.info(f"Starting thread for arg {st}")
https://github.com/Azure/azure-functions-python-worker/blob/81b84102dc14b7d209ad7e00be68f25c37987c1e/azure_functions_worker/dispatcher.py
This must be something in your Azure setup: in a non-Azure setup, it works as expected. You should add join() calls for your threads. And basicConfig() should be called only once, from a main entry point.
Are your threads I/O bound? Due to the GIL, having multiple compute-bound threads doesn't give your code any performance advantages. It might be better to structure your code around concurrent.futures.ProcessPoolExecutor or multiprocessing.
Here is a Repl which shows a slightly modified version of your code working as expected.
I may be wrong but I suspect azure to run your main function in a daemon thread.
Quoting https://docs.python.org/3/library/threading.html: The entire Python program exits when no alive non-daemon threads are left.
When not setting daemon in the Thread constructor, it reuses the value of the father thread.
You can check this is your issue by printing thread1.daemon before starting your childs threads.
Anyway, I can reproduce the issue on my pc writing (without any Azure, just plain python3):
def main():
logging.basicConfig(level=logging.INFO)
logging.info("Starting the process...")
thread1 = threading.Thread(target=foo,args=("one arg",),daemon=True)
thread2 = threading.Thread(target=foo,args=("another arg",),daemon=True)
thread3 = threading.Thread(target=foo,args=("yet another arg",),daemon=True)
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
return
def foo(st):
for i in range(2000): # Giving a bit a of time for race condition to happen
print ('tamere', file = open('/dev/null','w'))
logging.basicConfig(level=logging.INFO)
logging.info(f"Starting thread for arg {st}")
main()
If I force daemon to False / leave it undefined, it work. Thus I guess your issue is that azure start your main function in a daemon thread, and since you don't override daemon flag to False, the whole process exit instantly.
PD: I know nothing about Azure, there is a possibility that you are indeed trying to do something the wrong way and there is another interface to do exactly what you want but in the way Azure expect you to. So this answer is potentially just an explanation of what happens rather than real guidance.
Azure functions is an async environment.
If you define an async def, it'll be run with asyncio.
Otherwise it'll be run with concurrent.futures.ThreadPoolExecutor.
It's better to define your functions async.
Threading works. You don't need to start threads manually. Thread pool executes your blocking code. You have to make it work for you.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-app-settings#python_threadpool_thread_count

Win32com events not raising inside thread?

I am new to both COM and Python, so im not very familiar with exact terminologies. So apologies for using inexact terms.
I am trying to connect to a desktop application via a proprietary COM interface using pywin32.
I created a PoC and it runs fine. The COM function call is processed and I get the expected event.
class MyEvents:
def __init__(self):
print("Callback class initialized")
def OnMyEvent(self, data):
print('MyEvent raised')
class ComUser:
comObj = None
def __init__(self):
comObj = win32com.client.DispatchWithEvents("ProproetaryInterface.InterfaceClass",
MyEvents)
comObj.Register()
comObj.DoSomething(data)
time.sleep(120)
userObj = ComUser()
So far so good. I get the event on the screen
Callback class initialized
MyEvent raised
Next I tried to put it into my application where I have multiple threads. To explain it in simple terms:
Main creates an object of Class X which initializes an XMLRPC Server thread.
The XMLRPC handler simply takes incoming info and puts it into a queue
The queue is from multiprocessing lib.
Another thread waits on this queue for an incoming message
def __startPollingThread(self):
pythoncom.CoInitialize()
pollingThread = Thread(target=self.__checkQueue )
pollingThread.start()
pythoncom.CoUninitialize()
This is the polling thread method:
def __checkQueue(self):
try:
pythoncom.CoInitialize()
while True:
currMessage = self.__messageQueue.get()
self.__processMessage(currMessage);
except :
#Log message
finally:
pythoncom.CoUninitialize()
The __processMessage passes through multliple classes (something like a strategy pattern + state pattern) before it hits the class that handles COM interface.
In the ComUser class, i have a method which registers with the client application's com interface:
def initSystem(self):
import pythoncom
try:
pythoncom.CoInitialize()
self.ComConnector = win32com.client.DispatchWithEvents("ProprietaryInterface.InterfaceClass",
MyEvents)
self.ComConnector.Register()
except:
finally:
pythoncom.CoUninitialize()
Another method handles the specific requests as they arrive and makes the corresponding COM calls.
def handleMessage(self, message):
#if message = this then
comObj.DoSomething(data)
Both methods are called from the __processMessage method. All my classes reside in separate Py files. Except ComUser and MyEvents which are in same py module
I can call the Com Interface and see the Application reacting to the COM method calls but I cant see any events being raised. I have tried a whole lot of combinations of CoInitialize and Uninitialze and "import pythoncom" statements to ensure that it is not a problem with the threading. Also tried setting the sys.coinit_flags = 0 and checked. Seems to make no difference. I just dont see any events.
Is it a problem that I call DispatchWithEvents in a child thread instead of the main thread(The calls seem to work fine) ? Or is it that the main thread (ie main method of the program) dies out. I tried putting a long sleep there too. I even tried a separate thread with PumpWaitingMessages loop but it made no difference. I cant think of any solutions.

Unable to access variables after creating a daemon thread in python

This is my first program in threading and I am completely new to OS concepts. I was just trying to understand how to asynchronously do stuffs using python. I am trying to establish a a session and send Keepalives on a daemon thread and send protocol messages using the Main Thread. However I noticed that once I created the thread I am unable to access the variables which I was able to access before creating thread. I do not see the global variables that I used to see before I created this new thread.
Can some one help me understand how threading is working here. I am trying to understand:
How to print properly so that logging is useful.
How to kill the thread that we created?
How to access variables from one thread
def pcep_init(ip):
global pcc_client,keeppkt,pkt,thread1
accept_connection(ip)
send_pcep_open()
# Create new Daemon thread to send KA
thread1 = myThread(1, "Keepalive-Thread\r")
thread1.setDaemon(True)
# Start new Threads
print "This is thread1 before start %r" % thread1
thread1.start()
print "This is thread1 after start %r" % thread1
print "Coming out of pcep_init"
return 1
However when I executed the API i see that the print is not kind of misaligned due to async
>>> ret_val=pcep_init("192.168.25.2").
starting pce server on 192.168.25.2 port 4189
connection from ('192.168.25.1', 42352)
, initial daemon)>fore start <myThread(Keepalive-Thread
, started daemon 140302767515408)>ead(Keepalive-Thread
Coming out of pcep_init
>>> Starting Keepalive-Thread <------ I am supposed to hit the enter button to get the python prompt not sure why thats needed.
>>> thread1
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'thread1' is not defined
>>> threading.currentThread()
<_MainThread(MainThread, started)>
>>> threading.activeCount()
2
>>> threading.enumerate() <-------------- Not sure why this is not showing the Main Thread
, started daemon 140302767515408)>], <myThread(Keepalive-Thread
>>>

Errno 9 using the multiprocessing module with Tornado in Python

For operations in my Tornado server that are expected to block (and can't be easily modified to use things like Tornado's asynchronous HTTP request client), I have been offloading the work to separate worker processes using the multiprocessing module. Specifically, I was using a multiprocessing Pool because it offers a method called apply_async, which works very well with Tornado since it takes a callback as one of its arguments.
I recently realized that a pool preallocates the number of processes, so if they all become blocking, operations that require a new process will have to wait. I do realize that the server can still take connections since apply_async works by adding things to a task queue, and is rather immediately finished, itself, but I'm looking to spawn n processes for n amount of blocking tasks I need to perform.
I figured that I could use the add_handler method for my Tornado server's IOLoop to add a handler for each new PID that I create to that IOLoop. I've done something similar before, but it was using popen and an arbitrary command. An example of such use of this method is here. I wanted to pass arguments into an arbitrary target Python function within my scope, though, so I wanted to stick with multiprocessing.
However, it seems that something doesn't like the PIDs that my multiprocessing.Process objects have. I get IOError: [Errno 9] Bad file descriptor. Are these processes restricted somehow? I know that the PID isn't available until I actually start the process, but I do start the process. Here's the source code of an example I've made that demonstrates this issue:
#!/usr/bin/env python
"""Creates a small Tornado program to demonstrate asynchronous programming.
Specifically, this demonstrates using the multiprocessing module."""
import tornado.httpserver
import tornado.ioloop
import tornado.web
import multiprocessing as mp
import random
import time
__author__ = 'Brian McFadden'
__email__ = 'brimcfadden#gmail.com'
def sleepy(queue):
"""Pushes a string to the queue after sleeping for 5 seconds.
This sleeping can be thought of as a blocking operation."""
time.sleep(5)
queue.put("Now I'm awake.")
return
def random_num():
"""Returns a string containing a random number.
This function can be used by handlers to receive text for writing which
facilitates noticing change on the webpage when it is refreshed."""
n = random.random()
return "<br />Here is a random number to show change: {0}".format(n)
class SyncHandler(tornado.web.RequestHandler):
"""Demonstrates handing a request synchronously.
It executes sleepy() before writing some more text and a random number to
the webpage. While the process is sleeping, the Tornado server cannot
handle any requests at all."""
def get(self):
q = mp.Queue()
sleepy(q)
val = q.get()
self.write(val)
self.write('<br />Brought to you by SyncHandler.')
self.write('<br />Try refreshing me and then the main page.')
self.write(random_num())
class AsyncHandler(tornado.web.RequestHandler):
"""Demonstrates handing a request asynchronously.
It executes sleepy() before writing some more text and a random number to
the webpage. It passes the sleeping function off to another process using
the multiprocessing module in order to handle more requests concurrently to
the sleeping, which is like a blocking operation."""
#tornado.web.asynchronous
def get(self):
"""Handles the original GET request (normal function delegation).
Instead of directly invoking sleepy(), it passes a reference to the
function to the multiprocessing pool."""
# Create an interprocess data structure, a queue.
q = mp.Queue()
# Create a process for the sleepy function. Provide the queue.
p = mp.Process(target=sleepy, args=(q,))
# Start it, but don't use p.join(); that would block us.
p.start()
# Add our callback function to the IOLoop. The async_callback wrapper
# makes sure that Tornado sends an HTTP 500 error to the client if an
# uncaught exception occurs in the callback.
iol = tornado.ioloop.IOLoop.instance()
print "p.pid:", p.pid
iol.add_handler(p.pid, self.async_callback(self._finish, q), iol.READ)
def _finish(self, q):
"""This is the callback for post-sleepy() request handling.
Operation of this function occurs in the original process."""
val = q.get()
self.write(val)
self.write('<br />Brought to you by AsyncHandler.')
self.write('<br />Try refreshing me and then the main page.')
self.write(random_num())
# Asynchronous handling must be manually finished.
self.finish()
class MainHandler(tornado.web.RequestHandler):
"""Returns a string and a random number.
Try to access this page in one window immediately after (<5 seconds of)
accessing /async or /sync in another window to see the difference between
them. Asynchronously performing the sleepy() function won't make the client
wait for data from this handler, but synchronously doing so will!"""
def get(self):
self.write('This is just responding to a simple request.')
self.write('<br />Try refreshing me after one of the other pages.')
self.write(random_num())
if __name__ == '__main__':
# Create an application using the above handlers.
application = tornado.web.Application([
(r"/", MainHandler),
(r"/sync", SyncHandler),
(r"/async", AsyncHandler),
])
# Create a single-process Tornado server from the application.
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)
print 'The HTTP server is listening on port 8888.'
tornado.ioloop.IOLoop.instance().start()
Here is the traceback:
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/tornado/web.py", line 810, in _stack_context
yield
File "/usr/local/lib/python2.6/dist-packages/tornado/stack_context.py", line 77, in StackContext
yield
File "/usr/local/lib/python2.6/dist-packages/tornado/web.py", line 827, in _execute
getattr(self, self.request.method.lower())(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/tornado/web.py", line 909, in wrapper
return method(self, *args, **kwargs)
File "./process_async.py", line 73, in get
iol.add_handler(p.pid, self.async_callback(self._finish, q), iol.READ)
File "/usr/local/lib/python2.6/dist-packages/tornado/ioloop.py", line 151, in add_handler
self._impl.register(fd, events | self.ERROR)
IOError: [Errno 9] Bad file descriptor
The above code is actually modified from an older example that used process pools. I've had it saved for reference for my coworkers and myself (hence the heavy amount of comments) for quite a while. I constructed it in such a way so that I could open two small browser windows side-by-side to demonstrate to my boss that the /sync URI blocks connections while /async allows more connections. For the purposes of this question, all you need to do to reproduce it is try to access the /async handler. It errors immediately.
What should I do about this? How can the PID be "bad"? If you run the program, you can see it be printed to stdout.
For the record, I'm using Python 2.6.5 on Ubuntu 10.04. Tornado is 1.1.
add_handler takes a valid file descriptor, not a PID. As an example of what's expected, tornado itself uses add_handler normally by passing in a socket object's fileno(), which returns the object's file descriptor. PID is irrelevant in this case.
Check out this project:
https://github.com/vukasin/tornado-subprocess
it allows you to start arbitrary processes from tornado and get a callback when they finish (with access to their status, stdout and stderr).

Categories