why passing Queue and Database connection as parameter in multithreading? [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I was reading multi-threading priority queue here. In this I don't understand why workQueue is passed as parameter
to the self method in class myThread we could have directly used workQueue instead of
using self.q. So I wrote without it worked but then I tried to do the same for connecting to
database.I opened a common DB connection and allowed every thread to use it. But it did not work , ( my update was not reflected in database). I thought that as threads were pre-emptying
it was not possible for them to maintain a connection to execute the query. But then I gave every thread a DB connection which I initially passed to the self method.
Basically, I implemented this. And to my surprise this worked. How is it different from what I was doing?

in this I don't understand why workQueue is passed as parameter to the self method in class
myThread we could have directly used workQueue instead of using self.q
In this particular example, sure you could just reference the global workQueue variable.
But that's not a very general approach, global variables might often create a mess. What if you want the object to be able to work with several different work queues for different purposes ? Better to just pass the queue you want the object to work with, instead of having the object reference a global variable.
.I opened a common DB connection and allowed every thread to use it.
Database connections are not thread safe, so expect random stuff to happen when you do that.
As the documentation states:
The MySQL protocol can not handle multiple threads using the same
connection at once. ... The general upshot of this is: Don't share
connections between threads.
So what you should be doing, is use one connection per thread, which as you discovered works fine. This is different from how the Queue is used, which in the example code is properly locked when you access it.

According to the documentation:
The MySQL protocol can not handle multiple threads using the same connection at once.
That's why it doesn't works, you can't share a db connection (at least not for MySQL) between threads.
The example you linked to is creating a connection for each thread:
for thread in range(threads):
try:
connections.append(MySQLdb.connect(host=mysql_host, user=mysql_user, passwd=mysql_pass, db=mysql_db, port=mysql_port))

Related

Sending requests to different API endpoints every N seconds [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I use an API that has ~30 endpoints and I have settings how often I need to send request to each endpoint. For some endpoints it's seconds and for some hours. I want to implement python app that will call each API endpoint (and execute some code) after every N seconds where N can be different for each endpoint. If one call is still in progress when second one kicks in, then that one should be added to queue (or something similar) and executed after the first one finishes.
What would be the correct way to implement this using python?
I have some experience with RabbitMQ but I think that might be overkill for this problem.
You said "executed after the first one finishes", so it's a single thread program.
Just use def() to create some functions and then execute them one by one.
For example
import time
def task1(n):
print("Task1 start")
time.sleep(n)
print("Task1 end ")
def task2(n):
print("Task2 start")
time.sleep(n)
print("Task2 end ")
task1(5) #After 5sec, task1 end and execute task2
task2(3) #task2 need 3sec to execute.
You could build your code in this way:
store somewhere the URL, method and parameters for each type of query. A dictionary would be nice: {"query1": {"url":"/a","method":"GET","parameters":None} , "query2": {"url":"/b", "method":"GET","parameters":"c"}} but you can do this any way you want, including a database if needed.
store somewhere a relationship between query type and interval. Again, you could do this with a case statement, or with a dict (maybe the same you previously used), or an interval column in a database.
Every N seconds, push the corresponding query entry to a queue (queue.put)
an HTTP client library such as requests runs continuously, removes an element from the queue, runs the HTTP request and when it gets a result it removes the following element.
Of course if your code is going to be distributed across multiple nodes for scalability or high availability, you will need a distributed queue such as RabbitMQ, Ray or similar.

Python. Threading [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hi I have a Server/client model using SocketServer module. The server job is to receive test name from the clients and launch the test.
the test is launched using subprocess module.
I would like the server to keep answering clients and any new jobs to be stacked on a list or queue and launch one after the other, the only restriction I have is the server should not launch the test unless currently running one is completed.
Thanks
You can use the module multiprocessing for starting new processes. On the server-side, you would have a variable which refers to the current running process. You can still have your SocketServer running and accepting requests and storing them in a list. Every second (or whatever you want), in another thread, you would check if the current process is dead or not by calling isAlive(). If it is dead, then just simply run the next test on the list.
Another way to do it (better), is that on the third thread (the one that checks), you call .join() from the process so that it will only call the next line of code once the current process is dead. That way you don't have to keep checking every second or whatever and it is more efficient.
What you might want to do is:
Get test name in server socket, put it in a Queue
In a separate thread, read test names from the Queue one by one
Execute the process and wait for it to end using communicate()
Keep polling Queue for new tests, repeat steps 2, 3 if test names are available
Meanwhile server continues receiving and putting test names in Queue

Python - Howto manage a list of threads [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am using Python 2.7.6 and the threading module.
I am fairly new to python threading. I am trying to write a program to read files from a filesystem and store some hashes in my database. That are a lot of files and I would like to do it in threads. Like one thread for every folder that starts with a, one thread for every folder that starts with b. Since I want to use a database connection in the threads I don't want to generate 26 threads at once. So I would like to have 10 threads running and always if one of them finishes I want to start a new thread.
The main program should hold a list of threads with a specified max
amount of threads (e.g. 10)
The main program should start 10 threads
The main program should be notified when one thread finished
If a thread is finished start a new one
And so on ... until the job is done and every thread is finished
I am not quite sure how the main program has to look like. How can I manage this list of threads without a big overhead?
I'd like to indicate you that python doesn't manage well multi-threading : As you might know (or not) python comes with a Global Interpreter Lock (GIL), that doesn't allow real concurrency : Indeed, only one thread will execute at a time. (However you will not see the execution as a sequential one, thanks to the process scheduler of your machine)
Take a look here for more information : http://www.dabeaz.com/python/UnderstandingGIL.pdf
That said, if you still want to do it this way, take a look at semaphores : every thread will have to acquire it, and if you initialize this lock to 10, only 10 thread at a time will be able to acquire it.
https://docs.python.org/2/library/threading.html#threading.Semaphore
Hope it helps

python non blocking recv with pipe between processes?

Seen this line of code but could not find documentation
self.conn.setblocking(0)
The question is, how do you poll a pool of pipes without blocking?
Got a parent process that needs to communicate with some unstable child processes and wish to poll and check periodically if they've something to say. Do not wish to block if they decide they need more time before they have something new to say. Will this magically do this?
Creating a pipe will return two connection objects. A connection object offers the polling functionality, where you can check if there is anything to read. Polling functionality allows you to specify a timeout to wait for.
If you have a group of connection objects that you are waiting on, then you can use multiprocessing.connection.wait(), or the non-multiprocessing version of it.
For details , see
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.connection.Connection
which will show you the connection object details. Look at the poll function
This is most likely what you were looking at: https://docs.python.org/2/library/socket.html#socket.socket.setblocking
You don't give much detail so I'm not exactly sure what you are trying to do, but usually when you have a number of sockets that you want to poll, you will use select (see these examples from PyMOTW).
you can check p.poll(0) then if the result was True then the pipe is not empty and you can receive the data without blocking .

Mysql Connection, one or many?

I'm writing a script in python which basically queries WMI and updates the information in a mysql database. One of those "write something you need" to learn to program exercises.
In case something breaks in the middle of the script, for example, the remote computer turns off, it's separated out into functions.
Query Some WMI data
Update that to the database
Query Other WMI data
Update that to the database
Is it better to open one mysql connection at the beginning and leave it open or close the connection after each update?
It seems as though one connection would use less resources. (Although I'm just learning, so this is a complete guess.) However, opening and closing the connection with each update seems more 'neat'. Functions would be more stand alone, rather than depend on code outside that function.
"However, opening and closing the connection with each update seems more 'neat'. "
It's also a huge amount of overhead -- and there's no actual benefit.
Creating and disposing of connections is relatively expensive. More importantly, what's the actual reason? How does it improve, simplify, clarify?
Generally, most applications have one connection that they use from when they start to when they stop.
I don't think that there is "better" solution. Its too early to think about resources. And since wmi is quite slow ( in comparison to sql connection ) the db is not an issue.
Just make it work. And then make it better.
The good thing about working with open connection here, is that the "natural" solution is to use objects and not just functions. So it will be a learning experience( In case you are learning python and not mysql).
Think for a moment about the following scenario:
for dataItem in dataSet:
update(dataItem)
If you open and close your connection inside of the update function and your dataSet contains a thousand items then you will destroy the performance of your application and ruin any transactional capabilities.
A better way would be to open a connection and pass it to the update function. You could even have your update function call a connection manager of sorts. If you intend to perform single updates periodically then open and close your connection around your update function calls.
In this way you will be able to use functions to encapsulate your data operations and be able to share a connection between them.
However, this approach is not great for performing bulk inserts or updates.
Useful clues in S.Lott's and Igal Serban's answers. I think you should first find out your actual requirements and code accordingly.
Just to mention a different strategy; some applications keep a pool of database (or whatever) connections and in case of a transaction just pull one from that pool. It seems rather obvious you just need one connection for this kind of application. But you can still keep a pool of one connection and apply following;
Whenever database transaction is needed the connection is pulled from the pool and returned back at the end.
(optional) The connection is expired (and of replaced by a new one) after a certain amount of time.
(optional) The connection is expired after a certain amount of usage.
(optional) The pool can check (by sending an inexpensive query) if the connection is alive before handing it over the program.
This is somewhat in between single connection and connection per transaction strategies.

Categories