When should I be considering using threading

When should I be considering using threading - python

I am creating an application in Python that uses SQLite databases and wxPython. I want to implement it using MVC in some way. I am just curious about threading. Should I be doing this in any scenario that uses a GUI? Would this kind of application require it?

One thing I learned from javascript/node.js is that there is a difference between asynchronous programming and parallel programming. In asynchronous programming you may have things running out of sequence, but any given task runs to completion before something else starts running. That way you don't have to worry about synchronizing shared resources with semaphores and locks and things like that, which would be an issue if you have multiple threads running in parallel, with either run simultaneously or might get preempted, thus the need for locks.
Most likely you are doing some sort of asynchronous code in a gui environment, and there isn't any need for you to also do parallel multi-threaded code.

You will use multithreading to perform parallel or background tasks that you don't want the main thread to wait, you don't want it to hang the GUI while it runs, or interfer with the user interactivity, or some other priority tasks.
Most applications today don't use multithreading or use very little of it. Even if they do use multi threads, its usually because of libraries the final programmer is using and isn't even aware that multithreading is happening there as he developed his application.
Even major softwares like AutoCAD use very little multithreading. It's not that its poorly made, but multithreading has very specific applications. For instance, it is pointless to allow user interaction while the project he wants to work on is still loading. A software designed to interact with a single user will hardly need it.
Where you can see multithreading fit a really important role is in servers, where a single application can attend requests from thousands of users without interfering with each other. In this scenario the easier way to make sure everyone is happy is by creating a new thread to each request.

Actually, GUIs are typically single threaded implementations where a single thread (called UI thread) keeps polling for events and keeps executing them in the order they occur.
Regarding the main question, consider this scenario.
At the click of a button you want to do something time consuming that takes say 5-10 seconds or more. You have got 2 options.
Do that operation in the main UI thread itself. This will freeze the UI for that duration and user will not be able to interact with it.
Do that operation in a separate thread that would on completion just notify the main UI thread (in case UI thread needs to make any UI updates based on result of the operation). This option will not block the UI thread and user can continue to use the application.
However, there will be situations where you do not want user to be using the application while something happens. In such cases usually you can still use a separate thread but block the UI using some sort of overlay / progress indicator combination.

almost certainly you already are...
alot of wx is already driven by an asynchronous event loop ..
that said you should use wx.PubSub for communication within an MVC style wx Application, but it is unlikely that you will need to implement any kind of threading (you get it for free practically)
a few good places to python threading(locked by gil) use are:
serial communication
socket servers
a few places to use multiprocessing (still locked by gil but at least it sends it to different cores)
bitcoin miners
anything that requires massive amounts of data processing that can be parallelized
there are lots more places to use it, however most gui are already fairly asynchronously driven by events (not entirely true, but close enough), and sqlite3 queries definitely should be executed one at a time from the same thread(in fact sqlite breaks horribly if you try to write to it in two different threads)
this is likely all a gross oversimplification

Related

What's the best way to use IB / TWS in algortihmic trading for Python?

I was wondering what is the most efficient way, from a performance perspective, to use the TWS/IB API in Python? I want to compute and update my strategies based on real-time data (Python has a lot of libraries that may be helpful in contrast to Java I think) and based on that send buy/sell orders. These strategies computations may involve quite some processing time, so in that sense, I was thinking about implementing some sort of threading/concurrency (for Java it uses 3 threads if I understand correctly, see *1).
I know there is IBpy (I think it is the same only wrapped up some things for convenience). I came accross IB-insync as an alternative to threading in Python due to Python's concurrency limitations, if I understand correctly:
https://ib-insync.readthedocs.io/api.html
which implements the IB API asynchronously and single-threaded.
Reading about concurrency in Python here:
https://realpython.com/python-concurrency/
async has some major advantages if I understand correctly since Python was designed using Global Interpreter Lock (GIL) (only one thread to hold the control of the Python interpreter). However, the IB-insync library may have some limitations too (but can be fixed by adapting code, as suggested below):
If, for example, the user code spends much time in a calculation, or
uses time.sleep() with a long delay, the framework will stop spinning,
messages accumulate and things may go awry
If a user operation takes a long time then it can be farmed out to a
different process. Alternatively the operation can be made such that
it periodically calls IB.sleep(0); This will let the framework handle
any pending work and return when finished. The operation should be
aware that the current state may have been updated during the sleep(0)
call.
For introducing a delay, never use time.sleep() but use sleep()
instead.
Would a multi-threading solution be better just like Java (I do not know if there is a Java Async equivalent which can be combined with a lot of easy tools/libs that manipulate data)? Or should I stick to Python Async? Other suggestions are welcome, too. With regard to multiple threads in Python (and Java), the following site:
https://interactivebrokers.github.io/tws-api/connection.html
mentions (*1):
API programs always have at least two threads of execution. One thread
is used for sending messages to TWS, and another thread is used for
reading returned messages. The second thread uses the API EReader
class to read from the socket and add messages to a queue. Everytime a
new message is added to the message queue, a notification flag is
triggered to let other threads now that there is a message waiting to
be processed. In the two-thread design of an API program, the message
queue is also processed by the first thread. In a three-thread design,
an additional thread is created to perform this task.
The phrase "The two-threaded design is used in the IB Python sample Program.py..." suggests that there are already two threads involved, which is a little bit confusion to me since the second reference mentions Python being single-threaded.

Python is not technically single-threaded, you can create multiple threads in Python, but there is GIL, which only allows one thread to run at a time, that is why it is sometimes said single-threaded ! But, GIL handles it so efficiently that it doesn't seem single threaded ! I have used multi-threading in Python and it is good . GIL handles all the orchestration of switching and swapping threads, but this proves to be a significance to single-threaded programs as a small speed boost, and a bit slow in multi-threaded programs .
I am also searching for a multi-threaded SDK for IB API ! I have not found one yet, except the Native one, which is a bit complicated for me .
And IB_Insync is not allowing for multi-threading :(
Btw, I am new to Stack Overflow, so don't mind me ...

How to create cancellable tasks in Python?

I'm building a Python IDE, which needs to highlight all occurrences of the name under cursor (using Jedi library). The process of finding the occurrences can be quite slow.
In order to avoid freezing the GUI, I could run the search in another thread, but when the user moves quickly over several words, the background threads could pile up while working on now obsolete tasks. I would like to cancel the search for previous occurrences when user moves to new name.
Looks like killing a thread is complicated in Python. What are the other options for creating an easily cancellable background tasks in Python 3.4+?

I think concurrent.futures is the answer.
You can create a Thread / Process pool, submit any callable, receive a Future, which you can cancel if needed.
Reference: https://docs.python.org/3/library/concurrent.futures.html

A thread cannot be stopped by another one. This is a OS limitation rather than a Python one. Only thing you can do is periodically inspect a variable and, if set, stop the thread itself (just return).
Moreover, threads in Python suffer from the GIL. This means that CPU intensive operations, when carried out in a separate thread, will still affect your main loop as only one thread per process can run at a time.
I'd recommend you to run the search in a separate process which you can easily cancel whenever you want.
What the guys of YouCompleteMe are doing for example is wrapping Jedi in a HTTP server which they can query in the background. If the user moves the cursor before the completion comes back, the IDE can simply drop the request.

Well, my personal favorites are work queues. If it's a one-time application you should take a look at python rq. Extremely easy and fun to use. If you want to build something more "professional-grade" take a look at something like celery.
You might also want to look at multiprocessing

High availability for Python's asyncio

I am trying to create an asynchronous application using Python's asyncio module. However, all implementations I can find on the documentation is based on a single Event Loop.
Is there any way to launch multiple Event Loops running the same application, so I can achieve high availability and fault tolerance? In other words, I want to scale-out my application by inserting new nodes that would share the execution of coroutines behind a load balancer.
I understand there is an inherent issue between asynchronous programming and thread-safety, perhaps what I have in mind isn't even possible. If so, how to avoid this kind of SPOF on asynchronous architectures?

The standard way to deal with this is by starting multiple server processes (each with its own event loop), with a load balancer in front. Each such process typically cannot utilize more than one CPU core, so you might want to have as many processes as you have cores.

I've done this before. I even wrote code to monitor the processes I spawned. But it turned out that Python & asyncio are quite stable by themselves, I never saw a critical error that stopped the whole event loop. So I don't recommend spawning multiple processes for the sole purpose of high availability.
The code is here if you are interested: https://github.com/l04m33/shinpachi
You may want to check out shinpachi/__init__.py and shinpachi/processes.py.

How to implement pause (and more) functionality?

My apologies beforehand for the length of the question, I didn't want to leave anything out.
Some background information
I'm trying to automate a data entry process by writing a Python application that uses the Windows API to simulate keystrokes, mouse movement and window/control manipulation. I have to resort to this method because I do not (yet) have the security clearance required to access the datastore/database directly (e.g. using SQL) or indirectly through a better suited API. Bureaucracy, it's a pain ;-)
The data entry process involves the correction of sales orders due to changes in article availability. The unavailable articles are either removed from the order or replaced by another suitable article.
Initially I want a human to be able to monitor the automatic data entry process to make sure everything goes right. To achieve this I slow down the actions on the one hand but also inform the user of what is currently going on through a pinned window.
The actual question
To allow the user to halt the automation process I'm registering the Pause/Break key as a hotkey and in the handler I want to pause the automation functionality. However, I'm currently struggling to figure out a way to properly pause the execution of the automation functionality. When the pause function is invoked I want the automation process to stop dead in its tracks, no matter what it is doing. I don't want it to even execute another keystroke.
UPDATE [23/01]: I actually want to do more than just pause, I want to be able to communicate with the automation process while it is running and request it to pause, skip the current sales order, give up completely and perhaps even more.
Can anybody show me The Right Way (TM) to achieve what I want?
Some more information
Here's an example of how the automation works (I'm using the pywinauto library):
from pywinauto import application
app = application.Application()
app.start_("notepad")
app.Notepad.TypeKeys("abcdef")
UPDATE [25/01]: After a few days of working on my application I've noticed I don't really use pywinauto that much, right now I'm only using it for finding window and then I directly use SendKeysCtypes.SendKeys to simulate keyboard input and win32api functions to simulate mouse input.
What I've found out so far
Here are a few methods I've come across so far in my search for an answer:
I could separate the automation functionality and the interface + hotkey listener in two separate processes. Let's refer to the former as "automator" and the latter as "manager". The manager can then pause the execution of the automator by sending the process a SIGSTOP signal and unpause it using the SIGCONT signal (or the Windows equivalents through SuspendThread/ResumeThread).
To be able to update the user interface the automator will need to inform the manager of its progression through some sort of an IPC mechanism.
Cons:
Would using SIGSTOP not be a little harsh? Would it even work properly? Lots of people seem to be advising against it and even calling it "dangerous".
I am worried that implementing the IPC mechanism is going to be a bit complicated. On the other hand, I have worked with DBus which wouldn't be too hard to implement.
The second method and one that lots of people seem to be suggesting involves using threads and essentially boils down to the following (simplified):
while True:
if self.pause: # pause
# Do the work...
However, doing it this way it seems it will only pause after there is no more work to do. The only way I see this method would work would be to divide the work (the entire automation process) into smaller work segments (i.e. tasks). Before starting on a new task the worker thread would check if it should pause and wait.
Cons:
Seems like an implementation to divide the work into smaller segments, such as the one above, would be very ugly code wise (aesthetically).
The way I imagine it, all statements would be transformed to look something like: queue.put((function, args)) (e.g. queue.put((app.Notepad.TypeKeys, "abcdef"))) and you'd have the automating process thread running through the tasks and continuously checking for the pause state before starting a task. That just can't be right...
The program would not actually stop dead in its tracks, but would first finish a task (however small) before actually pausing.
Progress made
UPDATE [23/01]: I've implemented a version of my application using the first method through the mentioned SuspendThread/ResumeThread functionality. So far this seems to work very nicely and also allows me to write the automation stuff just like you'd write any other script. The only quirk I've come across is that keyboard modifiers (CTRL, ALT, SHIFT) get "stuck" while paused. Something I can probably easily work around.
I've also written a test using the second method (threads and signals/message passing) and implemented the pause functionality. However, it looks really ugly (both checking for the pause flag and everything related to the "doing the work"). So if anybody can show me a proper example of something similar to the second method I'd appreciate it.
Related questions
Pausing a process?
Pausing a thread using threading class
Alex Martelli posted an answer saying:
There is no method for other threads to forcibly pause a thread (any more than there is for other threads to kill that thread) -- the target thread must cooperate by occasionally checking appropriate "flags" (a threading.Condition might be appropriate for the pause/unpause case).
He then referred to the multiprocessing module and SIGSTOP/SIGCONT.
Is there a way to indefinitely pause a thread?
Pausing a process in Windows
An answer to this question quotes the MSDN documentation regarding SuspendThread:
This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. Calling SuspendThread on a thread that owns a synchronization object, such as a mutex or critical section, can lead to a deadlock if the calling thread tries to obtain a synchronization object owned by a suspended thread. To avoid this situation, a thread within an application that is not a debugger should signal the other thread to suspend itself. The target thread must be designed to watch for this signal and respond appropriately.
Is there any way to kill a Thread in Python?
How do I pass an exception between threads in python

Keep in mind that although in your level of abstraction, "executing a keystroke" is a single atomic operation, it's implemented on the machine as a rather complicated sequence of machine instructions. So, pausing a thread at arbitrary points could lead to things being in an indeterminate state. Sending SIGSTOP is the same level of dangerous as pausing a thread at an arbitrary point. Depending on where you are in a particular step, though, your automation could potentially be broken. For example, if you pause in the middle of a timing-dependent step.
It seems to me that this problem would be best solved at the level of the automation library. I'm not very familiar with the automation library that you're using. It might be worth contacting the developers of the library to see if they have any suggestions for pausing the execution of automation steps at safe sub-step levels.

I don't know pywinauto. But I'll assume that you have something like an Application class which you obtain and have methods like SendKeys/SendMouseEvent/etc to do things.
Create your own MyApplication class which holds a reference to pywinauto's application class. Provide the same methods but before each method check whether a pause event has occurred. If it has, you can jump into code which handles the pause event. That way you are checking for a pause every time you cause an event, but this all is handled by the one class without putting pause all over your code.
Once you've detected the pause you can handle it any way you like. For example, you can throw an exception to force giving up on the current task.

Separating the functionality and the interface thread/process is definately the best option imho, the second solution is quicker and easier but definately not better.
Perhaps using multiple threads and an exception would be a better idea than using multiple processes. But if you're using multiple processes than SIGSTOP might be your only way to get it to work.
Is there anything against using 2 threads for this?
1 thread for actually executing
1 thread for reading the user input

I use Python but not pywinauto; for this sort of tasks I use AutoHotKey . One way to implement a simple pause in an AutoHotkey script may be using a "toggle" key like ScrollLock and testing the key state in the script. Also, the script can restore the key state after switching the internal pause setting on / off.

python threading/fork?

I'm making a python script that needs to do 3 things simultaneously.
What is a good way to achieve this as do to what i've heard about the GIL i'm not so lean into using threads anymore.
2 of the things that the script needs to do will be heavily active, they will have lots of work to do and then i need to have the third thing reporting to the user over a socket when he asks (so it will be like a tiny server) about the status of the other 2 processes.
Now my question is what would be a good way to achieve this? I don't want to have three different script and also due to GIL using threads i think i won't get much performance and i'll make things worse.
Is there a fork() for python like in C so from my script so fork 2 processes that will do their job and from the main process to report to the user? And how can i communicate from the forked processes with the main process?
LE:: to be more precise 1thread should get email from a imap server and store them into a database, another thread should get messages from db that needs to be sent and then send them and the main thread should be a tiny http server that will just accept one url and will show the status of those two threads in json format. So are threads oK? will the work be done simultaneously or due to the gil there will be performance issues?

I think you could use the multiprocessing package that has an API similar to the threading package and will allow you to get a better performance with multiple cores on a single CPU.
To view the gain of performance using multiprocessing instead threading, check on this link about the average time comparison of the same program using multiprocessing x threading.

The GIL is really only something to care about if you want to do multiprocessing, that is spread the load over several cores/processors. If that is the case, and it kinda sounds like it from your description, use multiprocessing.
If you just need to do three things "simultaneously" in that way that you need to wait in the background for things to happen, then threads are just fine. That's what threads are for in the first place. 8-I)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.