Python threading test not working - python

EDIT
I solved the issue by forking the process instead of using threads. From the comments and links in the comments, I don't think threading is the right move here.
Thanks everyone for your assistance.
FINISHED EDIT
I haven't done much with threading before. I've created a few simple example "Hello World" scripts but nothing that actually did any work.
To help me grasp it, I wrote a simple script using the binaries from Nagios to query services like HTTP. This script works although with a timeout of 1 second if I have 10 services that timeout, the script will take over 10 seconds long.
What I want to do is run all checks in parallel to each other. This should reduce the time it takes to complete.
I'm currently getting segfaults but not all the time. Strangely at the point where I check the host in the processCheck function, I can printout all hosts. Just after checking the host though, the hosts variable only prints one or two of the hosts in the set. I have a feeling it's a namespace issue but I'm not sure how to resolve.
I've posted the entire code here sans the MySQL db but a result from he service_list view looks like.
Any assistance is greatly appreciated.
6543L, 'moretesting.com', 'smtp')
(6543L, 'moretesting.com', 'ping')
(6543L, 'moretesting.com', 'http')
from commands import getstatusoutput
import MySQLdb
import threading
import Queue
import time
def checkHost(x, service):
command = {}
command['http'] = './plugins/check_http -t 1 -H '
command['smtp'] = './plugins/check_smtp -t 1 -H '
cmd = command[service]
cmd += x
retval = getstatusoutput(cmd)
if retval[0] == 0:
return 0
else:
return retval[1]
def fetchHosts():
hostList = []
cur.execute('SELECT veid, hostname, service from service_list')
for row in cur.fetchall():
hostList.append(row)
return hostList
def insertFailure(veid, hostname, service, error):
query = 'INSERT INTO failures (veid, hostname, service, error) '
query += "VALUES ('%s', '%s', '%s', '%s')" % (veid, hostname, service, error)
cur.execute(query)
cur.execute('COMMIT')
def processCheck(host):
#If I print the host tuple here I get all hosts/services
retval = checkHost(host[1], host[2])
#If I print the host tuple here, I get one host maybe two
if retval != 0:
try:
insertFailure(host[0], host[1], host[2], retval)
except:
pass
else:
try:
#if service is back up, remove old failure entry
query = "DELETE FROM failures WHERE veid='%s' AND service='%s' AND hostname='%s'" % (host[0], host[2], host[1])
cur.execute(query)
cur.execute('COMMIT')
except:
pass
return 0
class ThreadClass(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
processCheck(queue.get())
time.sleep(1)
def main():
for host in fetchHosts():
queue.put(host)
t = ThreadClass(queue)
t.setDaemon(True)
t.start()
if __name__ == '__main__':
conn = MySQLdb.connect('localhost', 'root', '', 'testing')
cur = conn.cursor()
queue = Queue.Queue()
main()
conn.close()

The MySQL DB driver isn't thread safe. You're using the same cursor concurrently from all threads.
Try creating a new connection in each thread, or create a pool of connections that the threads can use (e.g. keep them in a Queue, each thread gets a connection, and puts it pack when it's done).

You should be constructing and populating your queue first. When the entire queue is constructed and populated, then you should construct a number of threads which then, each in a loop, polls the queue and processes an item on the queue.

You realize Python doesn't do true multi-threading as you would expect on a multi-core processor:
See Here
And Here
Don't expect those 10 things to take 1 second each. Besides, even in true multi-threading there is a little overhead associated with the threads. I'd like to add that this isn't a slur against Python.

Related

Creating a Pool of Connections as Threads for MongoDB Connection with Python

Things I would like to Achieve:
I want to store my mongodb connections as 3-4 threads on Memory by which I want to make a pool of connections. I don't want to create a connection everytime when my core functions work. If I have a pool of connections then I can take connections from the pool, use it and release it back to the pool, this is the typical usecase.
What I have tried:
I thought of creating a Daemon process by which according to the number of workers corresponding threads will be created. But the thing is how can I keep the connections always alive, so that whenever I need it, I can use the connection and release it.
Links I have referred
I read that mongodb do have internal connection pooling mechanism and I can achieve it by setting maxPoolSize=200 from https://api.mongodb.com/python/current/faq.html#how-does-connection-pooling-work-in-pymongo. But here in my case still for each process one db connection will be opened and I can't afford that so would like to avoid that instead of that I would like to create and keep some connections alive, when my server is booted.
https://stackoverflow.com/a/14700365 for variable sharing between processes, using this I am able to talk between python scripts.
https://stackoverflow.com/a/14299004 for concurrent.futures library in Python, using this I am able to create pool.
https://realpython.com/intro-to-python-threading/ for various threading related libraries in Python
I am not able to combine above programs.
Question
Am I doing the right thing?
Do we have any other ways to pool the connections and store it in RAM so that I can access it as and when my core scipt needs a db connection.
I don't want to create the same via sockets as because connecting over socket may also become an overhead (I feel so but I am not sure.)
In the following script I tried to create threads with the help of thread pool and create some connections. I am able to do so, but I am not sure about how I can store this in memory so that each time I can access the connection.
import threading
import time
import logging
import configparser
from pymongo import MongoClient
logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)-9s) %(message)s',)
class ThreadPool(object):
def __init__(self):
super(ThreadPool, self).__init__()
self.active = []
self.lock = threading.Lock()
def makeActive(self, name):
with self.lock:
self.active.append(name)
logging.debug('Running: %s', self.active)
def makeInactive(self, name):
with self.lock:
self.active.remove(name)
logging.debug('Running: %s', self.active)
def f(s, pool):
logging.debug('Waiting to join the pool')
with s:
name = threading.currentThread().getName()
config = configparser.ConfigParser()
config.read('.env')
url = config['mongoDB']['url']
port = config['mongoDB']['port']
user = config['mongoDB']['user']
password = config['mongoDB']['password']
db = config['mongoDB']['db']
connectionString = 'mongodb://' + user + ':' + password + '#' + url + ':' + port + '/' + db
pool.makeActive(name)
conn = MongoClient(connectionString)
logging.debug(conn)
#time.sleep(0.5)
pool.makeInactive(name)
if __name__ == '__main__':
pool = ThreadPool()
s = threading.Semaphore(2)
for i in range(10):
t = threading.Thread(target=f, name='thread_'+str(i), args=(s, pool))
t.daemon = True
t.start()

When using multiprocessing to access MySQL, it always throws the following error, how to fix?

When I used Python multiprocessing to access MySQL database, always got the errors:
OperationalError: (2006, 'MySQL server has gone away')
Lost connection to MySQL server during query
I would greatly appreciate if somebody could explain this to me.
Here is my code:
class MetricSource:
def __init__(self, task):
self.task = task
def get_next_task_time(self):
try:
task_id = self.task.id
next_task = Task.objects.get(id=task_id)
next_time = next_task.last_success_time
except Task.DoesNotExist as e:
print 'Find Exception: %d' % self.task.id
def access_queue(input_queue, output_queue):
while True:
try:
metric_source = input_queue.get(timeout=0.5)
metric_source.get_next_task_time()
output_queue.put(metric_source)
except Queue.Empty:
print "Queue Empty Error"
continue
class Command(BaseCommand):
def handle(self, *args, **options):
self.manager = multiprocessing.Manager()
self.input_queue = self.manager.Queue()
self.output_queue = self.manager.Queue()
self.init_data()
for i in range(PROCESS_NUM):
Process(target=access_queue, args=(self.input_queue, self.output_queue)).start()
def init_data(self):
for i in range(200):
try:
task = Task.objects.get(id=i+1)
self.input_queue.put(MetricSource(task))
except Exception as e:
print 'find task_id %d' % i
continue
except IOError as e:
print "find IOError: %r" % e
continue
And I have doubt that it's my MySQL configuration's problem, but I think it's not the problem.
Here is my.cnf:
[mysqld]
default-character-set=utf8
collation_server = utf8_general_ci
character_set_server = utf8
max_allowed_packet = 100M
datadir=/home/work/data1/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
slow_query_log
slow_query_log_file=/home/work/data1/mysql/mysql-slow.log
max_allowed_packet=100M
log-error=/home/work/data1/mysql/error.log
general_log
general_log_file=/home/work/data1/mysql/mysql.log
tmp_table_size=2G
max_heap_table_size=2G
wait_timeout=2880000
interactive_timeout=2880000
innodb_data_home_dir=/home/work/data1/mysql/ibdata/
[mysqld_safe]
default-character-set=utf8
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
i have found the reason why the db connection crash, share some points here.
The problem is caused by the order of 'init_data' and 'fork subProcess':
self.init_data()
for i in range(PROCESS_NUM):
Process(target=access_queue, args=(self.input_queue, self.output_queue)).start()
in this order, when executing init_data(), the db connection has been built and then fork the subProcess, then the connection is copyed into the subprocess. So all the subprocess are using the same connection actually, it will cause some indeterminate problems definitely.
When i change the order to:
for i in range(PROCESS_NUM):
Process(target=access_queue, args=(self.input_queue, self.output_queue)).start()
self.init_data()
and add sleep for subprocess:
def access_queue(id, input_queue, output_queue):
time.sleep(5)
while True:
...
it works. Because under this change, the subProcesses will be constructed before the connection built and then the subProcesses will use their separate connection to db.
so i have a question:
Is there any graceful solutions to resolve/prevent this kind of problems in multiprocessing to access db by orm ?
Anyone to share some points?
thanks
Have found a good post: Django multiprocessing and database connections

Using Processes as Threads with Networking in Python

Basically, my idea was to write some sort of basic server where I could connect to my computer and then run a command remotely. This didn't seem to be much of a problem; but then I had the bright idea that the next step would logically be to add some sort of threading so I could spawn multiple connections.
I read that, because of the GIL, multiprocessing.Process would be the best to try to do this. I don't completely understand threading and it's hard to find good documentation on it; so I'm kind of just throwing stuff and trying to figure out how it works.
Well, it seems like I might be close to doing this right; but I have a feeling I'm just as likely to be no where near doing this correctly. My program now does allow multiple connections, which it didn't when I first started working with threading; but once a connection is established, and then another is established, the first connection is no longer able to send a command to the server. I would appreciate it if someone could give me any help, or point me in the right direction on what I need to learn and understand.
Here's my code:
class server:
def __init__(self):
self.s = socket.socket()
try:
self.s.bind(("",69696))
self.s.listen(1)
except socket.error,(value,message):
if self.s:
self.s.close()
def connection(self):
while True:
client , address = self.s.accept()
data = client.recv(5)
password = 'hello'
while 1:
if data == password:
subprocess.call('firefox')
client.close()
else:
client.send('wrong password')
data = client.recv(5)
p = Process(target=x.connection())
p.start()
x = server()
if __name__ == '__main':
main()
Well, this answer only applies if you're on a unix or unix-like operating system(windows does not have os.fork() which we use).
One of the most common approaches for doing these things on unix platforms is to fork a new process to handle the client connection while the master process continues to listen for requests.
Below is code for a simple echo server that can handle multiple simultaneous connections. You just need to modify handle_client_connection() to fit your needs
import socket
import os
class ForkingServer:
def serve_forever(self):
self.s = socket.socket()
try:
self.s.bind(("", 9000))
self.s.listen(1)
except socket.error, (value,message):
print "error:", message
if self.s:
self.s.close()
return
while True:
client,address = self.s.accept()
pid = os.fork()
# You should read the documentation for how fork() works if you don't
# know it already
# The short version is that at this point in the code, there are 2 processes
# completely identical to each other which are simulatenously executing
# The only difference is that the parent process gets the pid of the child
# returned from fork() and the child process gets a value of 0 returned
if pid == 0:
# only the newly spawned process will execute this
self.handle_client_connection(client, address)
break
# In the meantime the parent process will continue on to here
# thus it will go back to the beginning of the loop and accept a new connection
def handle_client_connection(self, client,address):
#simple echo server
print "Got a connection from:", address
while True:
data = client.recv(5)
if not data:
# client closed the connection
break
client.send(data)
print "Connection from", address, "closed"
server = ForkingServer()
server.serve_forever()

Thread synchronization in Python

I am currently working on a school project where the assignment, among other things, is to set up a threaded server/client system. Each client in the system is supposed to be assigned its own thread on the server when connecting to it. In addition i would like the server to run other threads, one concerning input from the command line and another concerning broadcasting messages to all clients. However, I can't get this to run as i want to. It seems like the threads are blocking each other. I would like my program to take inputs from the command line, at the "same time" as the server listens to connected clients, and so on.
I am new to python programming and multithreading, and allthough I think my idea is good, I'm not suprised my code doesn't work. Thing is I'm not exactly sure how I'm going to implement the message passing between the different threads. Nor am I sure exactly how to implement the resource lock commands properly. I'm going to post the code for my server file and my client file here, and I hope someone could help me with this. I think this actually should be two relative simple scripts. I have tried to comment on my code as good as possible to some extend.
import select
import socket
import sys
import threading
import client
class Server:
#initializing server socket
def __init__(self, event):
self.host = 'localhost'
self.port = 50000
self.backlog = 5
self.size = 1024
self.server = None
self.server_running = False
self.listen_threads = []
self.local_threads = []
self.clients = []
self.serverSocketLock = None
self.cmdLock = None
#here i have also declared some events for the command line input
#and the receive function respectively, not sure if correct
self.cmd_event = event
self.socket_event = event
def openSocket(self):
#binding server to port
try:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server.bind((self.host, self.port))
self.server.listen(5)
print "Listening to port " + str(self.port) + "..."
except socket.error, (value,message):
if self.server:
self.server.close()
print "Could not open socket: " + message
sys.exit(1)
def run(self):
self.openSocket()
#making Rlocks for the socket and for the command line input
self.serverSocketLock = threading.RLock()
self.cmdLock = threading.RLock()
#set blocking to non-blocking
self.server.setblocking(0)
#making two threads always running on the server,
#one for the command line input, and one for broadcasting (sending)
cmd_thread = threading.Thread(target=self.server_cmd)
broadcast_thread = threading.Thread(target=self.broadcast,args=[self.clients])
cmd_thread.daemon = True
broadcast_thread.daemon = True
#append the threads to thread list
self.local_threads.append(cmd_thread)
self.local_threads.append(broadcast_thread)
cmd_thread.start()
broadcast_thread.start()
self.server_running = True
while self.server_running:
#connecting to "knocking" clients
try:
c = client.Client(self.server.accept())
self.clients.append(c)
print "Client " + str(c.address) + " connected"
#making a thread for each clientn and appending it to client list
listen_thread = threading.Thread(target=self.listenToClient,args=[c])
self.listen_threads.append(listen_thread)
listen_thread.daemon = True
listen_thread.start()
#setting event "client has connected"
self.socket_event.set()
except socket.error, (value, message):
continue
#close threads
self.server.close()
print "Closing client threads"
for c in self.listen_threads:
c.join()
def listenToClient(self, c):
while self.server_running:
#the idea here is to wait until the thread gets the message "client
#has connected"
self.socket_event.wait()
#then clear the event immidiately...
self.socket_event.clear()
#and aquire the socket resource
self.serverSocketLock.acquire()
#the below is the receive thingy
try:
recvd_data = c.client.recv(self.size)
if recvd_data == "" or recvd_data == "close\n":
print "Client " + str(c.address) + (" disconnected...")
self.socket_event.clear()
self.serverSocketLock.release()
return
print recvd_data
#I put these here to avoid locking the resource if no message
#has been received
self.socket_event.clear()
self.serverSocketLock.release()
except socket.error, (value, message):
continue
def server_cmd(self):
#this is a simple command line utility
while self.server_running:
#got to have a smart way to make this work
self.cmd_event.wait()
self.cmd_event.clear()
self.cmdLock.acquire()
cmd = sys.stdin.readline()
if cmd == "":
continue
if cmd == "close\n":
print "Server shutting down..."
self.server_running = False
self.cmdLock.release()
def broadcast(self, clients):
while self.server_running:
#this function will broadcast a message received from one
#client, to all other clients, but i guess any thread
#aspects applied to the above, will work here also
try:
send_data = sys.stdin.readline()
if send_data == "":
continue
else:
for c in clients:
c.client.send(send_data)
self.serverSocketLock.release()
self.cmdLock.release()
except socket.error, (value, message):
continue
if __name__ == "__main__":
e = threading.Event()
s = Server(e)
s.run()
And then the client file
import select
import socket
import sys
import server
import threading
class Client(threading.Thread):
#initializing client socket
def __init__(self,(client,address)):
threading.Thread.__init__(self)
self.client = client
self.address = address
self.size = 1024
self.client_running = False
self.running_threads = []
self.ClientSocketLock = None
def run(self):
#connect to server
self.client.connect(('localhost',50000))
#making a lock for the socket resource
self.clientSocketLock = threading.Lock()
self.client.setblocking(0)
self.client_running = True
#making two threads, one for receiving messages from server...
listen = threading.Thread(target=self.listenToServer)
#...and one for sending messages to server
speak = threading.Thread(target=self.speakToServer)
#not actually sure wat daemon means
listen.daemon = True
speak.daemon = True
#appending the threads to the thread-list
self.running_threads.append(listen)
self.running_threads.append(speak)
listen.start()
speak.start()
#this while-loop is just for avoiding the script terminating
while self.client_running:
dummy = 1
#closing the threads if the client goes down
print "Client operating on its own"
self.client.close()
#close threads
for t in self.running_threads:
t.join()
return
#defining "listen"-function
def listenToServer(self):
while self.client_running:
#here i acquire the socket to this function, but i realize I also
#should have a message passing wait()-function or something
#somewhere
self.clientSocketLock.acquire()
try:
data_recvd = self.client.recv(self.size)
print data_recvd
except socket.error, (value,message):
continue
#releasing the socket resource
self.clientSocketLock.release()
#defining "speak"-function, doing much the same as for the above function
def speakToServer(self):
while self.client_running:
self.clientSocketLock.acquire()
try:
send_data = sys.stdin.readline()
if send_data == "close\n":
print "Disconnecting..."
self.client_running = False
else:
self.client.send(send_data)
except socket.error, (value,message):
continue
self.clientSocketLock.release()
if __name__ == "__main__":
c = Client((socket.socket(socket.AF_INET, socket.SOCK_STREAM),'localhost'))
c.run()
I realize this is quite a few code lines for you to read through, but as I said, I think the concept and the script in it self should be quite simple to understand. It would be very much appriciated if someone could help me synchronize my threads in a proper way =)
Thanks in advance
---Edit---
OK. So I now have simplified my code to just containing send and receive functions in both the server and the client modules. The clients connecting to the server gets their own threads, and the send and receive functions in both modules operetes in their own separate threads. This works like a charm, with the broadcast function in the server module echoing strings it gets from one client to all clients. So far so good!
The next thing i want my script to do, is taking specific commands, i.e. "close", in the client module to shut down the client, and join all running threads in the thread list. Im using an event flag to notify the listenToServer and the main thread that the speakToServer thread has read the input "close". It seems like the main thread jumps out of its while loop and starts the for loop that is supposed to join the other threads. But here it hangs. It seems like the while loop in the listenToServer thread never stops even though server_running should be set to False when the event flag is set.
I'm posting only the client module here, because I guess an answer to get these two threads to synchronize will relate to synchronizing more threads in both the client and the server module also.
import select
import socket
import sys
import server_bygg0203
import threading
from time import sleep
class Client(threading.Thread):
#initializing client socket
def __init__(self,(client,address)):
threading.Thread.__init__(self)
self.client = client
self.address = address
self.size = 1024
self.client_running = False
self.running_threads = []
self.ClientSocketLock = None
self.disconnected = threading.Event()
def run(self):
#connect to server
self.client.connect(('localhost',50000))
#self.client.setblocking(0)
self.client_running = True
#making two threads, one for receiving messages from server...
listen = threading.Thread(target=self.listenToServer)
#...and one for sending messages to server
speak = threading.Thread(target=self.speakToServer)
#not actually sure what daemon means
listen.daemon = True
speak.daemon = True
#appending the threads to the thread-list
self.running_threads.append((listen,"listen"))
self.running_threads.append((speak, "speak"))
listen.start()
speak.start()
while self.client_running:
#check if event is set, and if it is
#set while statement to false
if self.disconnected.isSet():
self.client_running = False
#closing the threads if the client goes down
print "Client operating on its own"
self.client.shutdown(1)
self.client.close()
#close threads
#the script hangs at the for-loop below, and
#refuses to close the listen-thread (and possibly
#also the speak thread, but it never gets that far)
for t in self.running_threads:
print "Waiting for " + t[1] + " to close..."
t[0].join()
self.disconnected.clear()
return
#defining "speak"-function
def speakToServer(self):
#sends strings to server
while self.client_running:
try:
send_data = sys.stdin.readline()
self.client.send(send_data)
#I want the "close" command
#to set an event flag, which is being read by all other threads,
#and, at the same time set the while statement to false
if send_data == "close\n":
print "Disconnecting..."
self.disconnected.set()
self.client_running = False
except socket.error, (value,message):
continue
return
#defining "listen"-function
def listenToServer(self):
#receives strings from server
while self.client_running:
#check if event is set, and if it is
#set while statement to false
if self.disconnected.isSet():
self.client_running = False
try:
data_recvd = self.client.recv(self.size)
print data_recvd
except socket.error, (value,message):
continue
return
if __name__ == "__main__":
c = Client((socket.socket(socket.AF_INET, socket.SOCK_STREAM),'localhost'))
c.run()
Later on, when I get this server/client system up and running, I will use this system on some elevator models we have here on the lab, with each client receiving floor orders or "up" and "down" calls. The server will be running an distribution algorithm and updating the elevator queues on the clients that are most appropriate for the requested order. I realize it's a long way to go, but I guess one should just take one step at the time =)
Hope someone has the time to look into this. Thanks in advance.
The biggest problem I see with this code is that you have far too much going on right away to easily debug your problem. Threading can get extremely complicated because of how non-linear the logic becomes. Especially when you have to worry about synchronizing with locks.
The reason you are seeing clients blocking on each other is because of the way you are using your serverSocketLock in your listenToClient() loop in the server. To be honest this isn't exactly your problem right now with your code, but it became the problem when I started to debug it and turned the sockets into blocking sockets. If you are putting each connection into its own thread and reading from them, then there is no reason to use a global server lock here. They can all read from their own sockets at the same time, which is the purpose of the thread.
Here is my recommendation to you:
Get rid of all the locks and extra threads that you don't need, and start from the beginning
Have the clients connect as you do, and put them in their thread as you do. And simply have them send data every second. Verify that you can get more than one client connecting and sending, and that your server is looping and receiving. Once you have this part working, you can move on to the next part.
Right now you have your sockets set to non-blocking. This is causing them all to spin really fast over their loops when data is not ready. Since you are threading, you should set them to block. Then the reader threads will simply sit and wait for data and respond immediately.
Locks are used when threads will be accessing shared resources. You obviously need to for any time a thread will try and modify a server attribute like a list or a value. But not when they are working on their own private sockets.
The event you are using to trigger your readers doesn't seem necessary here. You have received the client, and you start the thread afterwards. So it is ready to go.
In a nutshell...simplify and test one bit at a time. When its working, add more. There are too many threads and locks right now.
Here is a simplified example of your listenToClient method:
def listenToClient(self, c):
while self.server_running:
try:
recvd_data = c.client.recv(self.size)
print "received:", c, recvd_data
if recvd_data == "" or recvd_data == "close\n":
print "Client " + str(c.address) + (" disconnected...")
return
print recvd_data
except socket.error, (value, message):
if value == 35:
continue
else:
print "Error:", value, message
Backup your work, then toss it - partially.
You need to implement your program in pieces, and test each piece as you go. First, tackle the input part of your program. Don't worry about how to broadcast the input you received. Instead worry that you are able to successfully and repeatedly receive input over your socket. So far - so good.
Now, I assume you would like to react to this input by broadcasting to the other attached clients. Well too bad, you can't do that yet! Because, I left one minor detail out of the paragraph above. You have to design a PROTOCOL.
What is a protocol? It's a set of rules for communication. How does your server know when the client had finished sending it's data? Is it terminated by some special character? Or perhaps you encode the size of the message to be sent as the first byte or two of the message.
This is turning out to be a lot of work, isn't it? :-)
What's a simple protocol. A line-oriented protocol is simple. Read 1 character at a time until you get to the end of record terminator - '\n'. So, clients would send records like this to your server --
HELO\n
MSG DAVE Where Are Your Kids?\n
So, assuming you have this simple protocol designed, implement it. For now, DON'T WORRY ABOUT THE MULTITHREADING STUFF! Just worry about making it work.
Your current protocol is to read 1024 bytes. Which may not be bad, just make sure you send 1024 byte messages from the client.
Once you have the protocol stuff setup, move on to reacting to the input. But for now you need something that will read input. Once that is done, we can worry about doing something with it.
jdi is right, you have too much program to work with. Pieces are easier to fix.

Why are my threaded MySQLdb queries in Python slower than the same non-threaded queries?

I am building a threaded class to run MySQL queries using Python and MySQLdb. I don't understand why running these queries threaded is slower than running them non-threaded. Here's my code to show what I'm doing.
First, here's the non-threaded function.
def testQueryDo(query_list):
db = MySQLdb.connect('localhost', 'user', 'pass', 'db_name')
cursor = db.cursor()
q_list = query_list
for each in q_list:
cursor.execute(each)
results = cursor.fetchall()
db.close()
Here's my threaded class:
class queryThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
self.db = MySQLdb.connect('localhost', 'user', 'pass', 'db_name')
self.cursor = self.db.cursor()
def run(self):
cur_query = self.queue.get()
self.cursor.execute(cur_query)
results = self.cursor.fetchall()
self.db.close()
self.queue.task_done()
And here's the handler:
def queryHandler(query_list):
queue = Queue.Queue()
for query in query_list:
queue.put(query)
total_queries = len(query_list)
for query in range(total_queries):
t = queryThread(queue)
t.setDaemon(True)
t.start()
queue.join()
I'm not sure why this threaded code is running slower. What's interesting is that if I use the same code, only do something simple like addition of numbers, the threaded code is significantly faster.
I understand that I must be missing something completely obvious, however any support would be much appreciated!
You're starting N threads, each of which creates its own connection to MySQL, and you're using a synchronous queue to deliver the queries to the threads. Each thread is blocking on queue.get() (acquiring an exclusive lock) to get a query, then creating a connection to the database, and then calling task_done() which lets the next thread proceed. So while thread 1 is working, N-1 threads are doing nothing. This overhead of lock acquire/release, plus the addditional overhead of serially creating and closing several connections to the database adds up.

Categories