Python - Howto manage a list of threads [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am using Python 2.7.6 and the threading module.
I am fairly new to python threading. I am trying to write a program to read files from a filesystem and store some hashes in my database. That are a lot of files and I would like to do it in threads. Like one thread for every folder that starts with a, one thread for every folder that starts with b. Since I want to use a database connection in the threads I don't want to generate 26 threads at once. So I would like to have 10 threads running and always if one of them finishes I want to start a new thread.
The main program should hold a list of threads with a specified max
amount of threads (e.g. 10)
The main program should start 10 threads
The main program should be notified when one thread finished
If a thread is finished start a new one
And so on ... until the job is done and every thread is finished
I am not quite sure how the main program has to look like. How can I manage this list of threads without a big overhead?

I'd like to indicate you that python doesn't manage well multi-threading : As you might know (or not) python comes with a Global Interpreter Lock (GIL), that doesn't allow real concurrency : Indeed, only one thread will execute at a time. (However you will not see the execution as a sequential one, thanks to the process scheduler of your machine)
Take a look here for more information : http://www.dabeaz.com/python/UnderstandingGIL.pdf
That said, if you still want to do it this way, take a look at semaphores : every thread will have to acquire it, and if you initialize this lock to 10, only 10 thread at a time will be able to acquire it.
https://docs.python.org/2/library/threading.html#threading.Semaphore
Hope it helps

Related

Python: Async IO Tasks vs. Threads [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this question
I recently heard of this feature in Python3.7+ where the asyncio brought a thing called "tasks" which people refer to as background tasks. So that's may first question:
Do these tasks really run in background?
Also, when comparing asyncio tasks to threads in Python, we know that python has a GIL. So, there's nothing like parallel. I know the difference in core structure i.e. asyncio tasks run in an event loop inside the same thread, while python threads are simply forked threads. But when it comes to speed, none of these are parallel.
We can call them concurrent instead. So the second question is:
Which of these two would be faster?
A few things I got to know about memory consumption is:
Threads consume a fair amount of data since each thread needs to have its own stack. With async code, all the code shares the same stack and the stack is kept small due to continuously unwinding the stack between tasks.
Threads are OS structures and therefore require more memory for the platform to support. There is no such problem with asynchronous tasks.
References:
What does asyncio.create_task() do?
How does asyncio actually work?
Coming to my last question:
When should you use asyncio tasks compared to threads? (This question has came in my mind as we can even fire async task from sync code)

Why is this statement true? "Tasks that spend much of their time waiting for external events are generally good candidates for threading." [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am very new to the concept of threading. I was going over the content in this site about threading and came across this claim that Tasks that spend much of their time waiting for external events are generally good candidates for threading. May I know why is this statement true.
Threading allows for efficient CPU usage. Tasks that spend a lot of time waiting for other events to finish can be put to sleep (this means temporarily stopped) with Threading.
By putting a thread to sleep, the CPU it was being executed with becomes free to execute other tasks while waiting for the thread to be woken up.
The ability to sleep and wake up allows:
(1) Faster computation without much overhead
(2) A reduction in wasted computational resources
Alternative viewpoint:
I don't know about Python specifically, but in many other programming languages/libraries, there will be some kind of "async I/O" mechanism, or "select" mechanism, or "event-driven" paradigm that enables a single-threaded program to actively do one thing while simultaneously waiting for multiple other things to happen.
The problem that threads solve comes from the fact that each independent source of input/events typically drives some independent "activity," and each of those activities has its own separate state. In an async/event-driven programming style, the state of each activity must be explicitly represented in some set of variables: When the program receives an event that drives activity X, it has to be able to examine those variables so that it can "pick up where it left off" last time it was working on activity X.
With threads, part or all of the state of activity X can be implicit in the X thread's context (that is, in the value of its program counter, in its registers, and in the contents of its call stack.) The code for a thread that handles one particular activity can look a lot like the pure-procedural code that that we all first learned to write when we were rank beginners—much more familiar looking than any system of "event handlers" and explicit state variables.
The down-side of using multiple threads, is that the familiar look and feel of the code can lull us into a false sense of security—we can easily overlook the possibility of deadlocks, and race conditions, and other hazards to which multi-threading exposes us. Multi-threaded code can be easy to read, but it can be much harder to write it without making subtle mistakes that are hard to catch in testing.

How to get all threads ready and send all of them at once [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Is there any way to improve performance of Python script by making all threads ready and sending all of them at once?
For example, "get ready" 100 different threads with HTTP requests, and when they are ready, they will be released at the same time with smallest delay possible.
Is there any possibility to make all threads ready (for example 500 threads) and send all of them without waiting?
Yes.
What you need is a synchronization object. Basically you start all threads, but they try to acquire access to a resource, which is not possible initially. When all 500 threads are waiting, you release that resource and all 500 threads will run.
Please note that
on usual computers, you can only run 8 threads really parallel, because the CPU only has 8 cores. So starting 500 threads and having 1 HTTP request each will likely result in the same as running 8 threads that do 62 HTTP requests in a loop.
specifically for Python, it has the GIL (global interpreter lock), so you don't need multithreading, you need multiprocessing.
this seems to be used for load testing. There's software available which was specifically built for such purposes, they are reliable and tested. Don't reinvent the wheel, that's error prone.
Thread scheduling on Windows is done in 17 ms intervals, AFAIK. That's because there is a hardware timer causing an interrupt. This interrupt gives the kernel control over the CPU. So your 10 ms requirements may not be possible.

Python. Threading [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hi I have a Server/client model using SocketServer module. The server job is to receive test name from the clients and launch the test.
the test is launched using subprocess module.
I would like the server to keep answering clients and any new jobs to be stacked on a list or queue and launch one after the other, the only restriction I have is the server should not launch the test unless currently running one is completed.
Thanks
You can use the module multiprocessing for starting new processes. On the server-side, you would have a variable which refers to the current running process. You can still have your SocketServer running and accepting requests and storing them in a list. Every second (or whatever you want), in another thread, you would check if the current process is dead or not by calling isAlive(). If it is dead, then just simply run the next test on the list.
Another way to do it (better), is that on the third thread (the one that checks), you call .join() from the process so that it will only call the next line of code once the current process is dead. That way you don't have to keep checking every second or whatever and it is more efficient.
What you might want to do is:
Get test name in server socket, put it in a Queue
In a separate thread, read test names from the Queue one by one
Execute the process and wait for it to end using communicate()
Keep polling Queue for new tests, repeat steps 2, 3 if test names are available
Meanwhile server continues receiving and putting test names in Queue

Parallelizing a program in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am working a lot with texts in Python, but im kinda new to the language and don't yet know how to employ multi-threading in Py.
My usecase is the following:
Single producer P (database/XML) which generates texts T_s.
Each of the texts in T_s could be processed independently. Processed texts compose T_p set.
The resulting set is written to a text-file/XML/database by a single thread S.
Data volumes are huge and all the processing couldn't keep anything except for the current data in the memory.
I would organize the process as the following:
Producer put the texts into Q_s queue.
There are a set of workers and a manager that gets texts from the queue and distributes between workers.
Each worker puts the processed text to the Q_p.
Sink process reads processed texts from Q_p and persists them.
Beyound all that Producer should be able to communicate that it ended reading the input data source to the manager and the sink.
Summary. I learned so far, that there is a nice lib/solution for each of the typical tasks in Py. Is there any for my current task?
Due to the nature of CPython (see gil), you will need to use multiple processes rather than threads if your tasks are CPU and not I/O bound. Python comes with the multiprocessing module that has everything you need to get the job done. Specifically, it has pools and thread-safe queues.
In your case, you need an input and output queues that you pass to each worker and they asynchronously read from the input queue and write to the output queue. The single threaded producers/consumers just operate on their respective queues, keeping only what's necessary in memory. The only potential quirk here is that order of outputs may not correlate with the order of the inputs.
Note: you can communicate status with the JoinableQueue class.

Categories