Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hi I have a Server/client model using SocketServer module. The server job is to receive test name from the clients and launch the test.
the test is launched using subprocess module.
I would like the server to keep answering clients and any new jobs to be stacked on a list or queue and launch one after the other, the only restriction I have is the server should not launch the test unless currently running one is completed.
Thanks
You can use the module multiprocessing for starting new processes. On the server-side, you would have a variable which refers to the current running process. You can still have your SocketServer running and accepting requests and storing them in a list. Every second (or whatever you want), in another thread, you would check if the current process is dead or not by calling isAlive(). If it is dead, then just simply run the next test on the list.
Another way to do it (better), is that on the third thread (the one that checks), you call .join() from the process so that it will only call the next line of code once the current process is dead. That way you don't have to keep checking every second or whatever and it is more efficient.
What you might want to do is:
Get test name in server socket, put it in a Queue
In a separate thread, read test names from the Queue one by one
Execute the process and wait for it to end using communicate()
Keep polling Queue for new tests, repeat steps 2, 3 if test names are available
Meanwhile server continues receiving and putting test names in Queue
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I use an API that has ~30 endpoints and I have settings how often I need to send request to each endpoint. For some endpoints it's seconds and for some hours. I want to implement python app that will call each API endpoint (and execute some code) after every N seconds where N can be different for each endpoint. If one call is still in progress when second one kicks in, then that one should be added to queue (or something similar) and executed after the first one finishes.
What would be the correct way to implement this using python?
I have some experience with RabbitMQ but I think that might be overkill for this problem.
You said "executed after the first one finishes", so it's a single thread program.
Just use def() to create some functions and then execute them one by one.
For example
import time
def task1(n):
print("Task1 start")
time.sleep(n)
print("Task1 end ")
def task2(n):
print("Task2 start")
time.sleep(n)
print("Task2 end ")
task1(5) #After 5sec, task1 end and execute task2
task2(3) #task2 need 3sec to execute.
You could build your code in this way:
store somewhere the URL, method and parameters for each type of query. A dictionary would be nice: {"query1": {"url":"/a","method":"GET","parameters":None} , "query2": {"url":"/b", "method":"GET","parameters":"c"}} but you can do this any way you want, including a database if needed.
store somewhere a relationship between query type and interval. Again, you could do this with a case statement, or with a dict (maybe the same you previously used), or an interval column in a database.
Every N seconds, push the corresponding query entry to a queue (queue.put)
an HTTP client library such as requests runs continuously, removes an element from the queue, runs the HTTP request and when it gets a result it removes the following element.
Of course if your code is going to be distributed across multiple nodes for scalability or high availability, you will need a distributed queue such as RabbitMQ, Ray or similar.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am using Python 2.7.6 and the threading module.
I am fairly new to python threading. I am trying to write a program to read files from a filesystem and store some hashes in my database. That are a lot of files and I would like to do it in threads. Like one thread for every folder that starts with a, one thread for every folder that starts with b. Since I want to use a database connection in the threads I don't want to generate 26 threads at once. So I would like to have 10 threads running and always if one of them finishes I want to start a new thread.
The main program should hold a list of threads with a specified max
amount of threads (e.g. 10)
The main program should start 10 threads
The main program should be notified when one thread finished
If a thread is finished start a new one
And so on ... until the job is done and every thread is finished
I am not quite sure how the main program has to look like. How can I manage this list of threads without a big overhead?
I'd like to indicate you that python doesn't manage well multi-threading : As you might know (or not) python comes with a Global Interpreter Lock (GIL), that doesn't allow real concurrency : Indeed, only one thread will execute at a time. (However you will not see the execution as a sequential one, thanks to the process scheduler of your machine)
Take a look here for more information : http://www.dabeaz.com/python/UnderstandingGIL.pdf
That said, if you still want to do it this way, take a look at semaphores : every thread will have to acquire it, and if you initialize this lock to 10, only 10 thread at a time will be able to acquire it.
https://docs.python.org/2/library/threading.html#threading.Semaphore
Hope it helps
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am working a lot with texts in Python, but im kinda new to the language and don't yet know how to employ multi-threading in Py.
My usecase is the following:
Single producer P (database/XML) which generates texts T_s.
Each of the texts in T_s could be processed independently. Processed texts compose T_p set.
The resulting set is written to a text-file/XML/database by a single thread S.
Data volumes are huge and all the processing couldn't keep anything except for the current data in the memory.
I would organize the process as the following:
Producer put the texts into Q_s queue.
There are a set of workers and a manager that gets texts from the queue and distributes between workers.
Each worker puts the processed text to the Q_p.
Sink process reads processed texts from Q_p and persists them.
Beyound all that Producer should be able to communicate that it ended reading the input data source to the manager and the sink.
Summary. I learned so far, that there is a nice lib/solution for each of the typical tasks in Py. Is there any for my current task?
Due to the nature of CPython (see gil), you will need to use multiple processes rather than threads if your tasks are CPU and not I/O bound. Python comes with the multiprocessing module that has everything you need to get the job done. Specifically, it has pools and thread-safe queues.
In your case, you need an input and output queues that you pass to each worker and they asynchronously read from the input queue and write to the output queue. The single threaded producers/consumers just operate on their respective queues, keeping only what's necessary in memory. The only potential quirk here is that order of outputs may not correlate with the order of the inputs.
Note: you can communicate status with the JoinableQueue class.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need to develop the performance review based python script , here is the scenario.
I need to send the logs to ElK (Elasticsearch, logstash , Kibana)
from yocto linux but only when system resources are free enough
So what I need here a python script which continuously monitor the
system performance and when system resources like CPU is less then 50%
start sending the logs and if CPU again goes above 50% PAUSE the logging
Now I am don't have idea we can pause any process with python
or not? This is because I want this for logs so when its start
again send the logs from where it stops last time
Yes, all your requirements are possible in Python.
In fact it's possible in basically any language because you're not asking for cutting edge stuff, this is basic scripting.
Sending logs to ES/Kibana
It's possible, Kibana, ES and Splunk all have public API's with good documentation on how to do it, so yes it's possible.
Pausing a process in Linux
Yes, also possible. If it's a external process simply find the PID of your process and send kill -STOP <PID> which would stop the process, to resume the process, do run kill -CONT <PID>. If it's your own process that you want to pause, simply enter a sleep cycle in your code (simple example while PAUSED: time.sleep(0.5).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a script A (python script) which opens the database and executes some queries and then closes the database connection.
I am not sure how long will script A run it all depends on the load.
I have an other script B (shell script) which runs the script A in a while loop. Which means that script A will be always running.
My database uses almost 100% or more of my CPU. I think it is because of repeatedly opening and closing connection.
Is there any way to improve performance?
I am using MYSQL database, planning to move to PostgreSQL.
I want to store the connection in some place and use the same if it is active or create a new one. I am not sure how to do it? Any ideas?
I think it is because of repeatedly opening and closing connection.
Based on what evidence? Done any tracing/profiling to try to trace it?
All the Python interpreter starts won't help either. Overall this all this sounds very inefficient.
Personally I recommend getting rid of the shell script wrapper; do it in the same Python script. Connect once in the outer loop and re-use the same connection in each inner iteration.
You can't "save" the connection. When the script terminates, the connection closes.
You could use a connection pooler like PgBouncer to reduce the overhead of creating and destroying all those connections but it won't be as good as just doing everything within the single script.
You can add a logical flag inside the script B and not execute A unless it has finish the previous run. You can activate the flag once you start script A and deactivate it at the end. This will prevent overlapping and executing A in parallel.