Can you have an object shared across multiple WSGI threads/processes (and in a manner that would work on both *NIX and Windows)?
The basic premise:
(1) I have a WSGI front end that will connect to a back end server. I have a serialization class, that contains rules on how to serialize/unserialize various objects, including classes specific to this project. As such, it needs to have some setup telling it how to handle custom objects. However, it is otherwise stateless - multiple threads can access it at the same time to serialize their data without issue.
(2) There are sockets to connect to the back end. I'd rather have a socket pool than create/destroy every time there's a connection.
Note: I don't mind a solution where there are multiple instances of (1) and (2), but ideally, I'd like them created/initialized as few times as possible. I'm not sure about the internals, but if a thread loops rather than closing and having the server reopen on a new request, it would be fine to have one data set per thread (and hence, the socket and serializer are initialized once per thread, but reused each subsequent call it handles.) Actually having one socket per thread, if that's how it works, would be best, since I wouldn't need a socket pool and have to deal with mutexes.
Note: this is not sessions and has nothing to do with sessions. This shouldn't care who is making the call to the server. It's only about tweaking performance on systems that have slow thread creation, or a lot of memory but relatively slow CPUs.
Edit 2: The below code will give some info on how your system shares variables. You'll need to load the page a few times to get some diagnostics...
from datetime import *;
from threading import *;
import thread;
from cgi import escape;
from os import getpid;
count = 0;
responses = [];
lock = RLock();
start_ident = "%08X::%08X" % (thread.get_ident(), getpid());
show_env = False;
def application(environ, start_response):
status = '200 OK';
this_ident = "%08X::%08X" % (thread.get_ident(), getpid());
lock.acquire();
current_response = """<HR>
<B>Request Number</B>: """ + str(count) + """<BR>
<B>Request Time</B>: """ + str(datetime.now()) + """<BR>
<B>Current Thread</B>: """ + this_ident + """<BR>
<B>Initializing Thread</B>: """ + start_ident + """<BR>
Multithread/Multiprocess: """ + str(environ["wsgi.multithread"]) + "/" + str(environ["wsgi.multiprocess"]) +"""<BR>
"""
global count;
count += 1;
global responses;
responses.append(current_response)
if(len(responses) >= 100):
responses = responses[1:];
lock.release();
output="<HTML><BODY>";
if(show_env):
output+="<H2>Environment</H2><TABLE><TR><TD>Key</TD><TD>Value</TD></TR>";
for k in environ.keys():
output += "<TR><TD>"+escape(k)+"</TD><TD>"+escape(str(environ[k]))+"</TD></TR>";
output+="</TABLE>";
output += "<H2>Response History</H2>";
for r in responses:
output += r;
output+="</BODY></HTML>"
response_headers = [('Content-type', 'text/html'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
return [output]
For some reading on how mod_wsgi process/threading model works see:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Pay particular note to the section on building a portable application.
The comments there aren't really any different no matter what WSGI server you use and. All WSGI servers also fall into one of those multi process/single process, multi thread/single thread categories.
By my reading of http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading, if you have multiprocessing and multithreading on (as with worker, or if you have
WSGIDaemonProcess example processes=2 threads=25
then you have BOTH problems: multiple threads mean you could share a variable, but it would only be shared within each of 2 processes. There is no real way to share vars between processes unless you explicitly have another daemon (NON-APACHE) handling the message passing and requests.
Let's say you have a simple need for a database connection pool. With the above configuration, you'd have two pools, each serving 25 threads. This is fine for most people, as threads are lightweight and processes aren't (supposedly).
So, how to implement this?
In one of your modules, create a variable that instantiates a connection pool. Have each thread (really, the code that services an individual request) at an appropriate time get a connection use it, and return it to the pool. Don't forget the last part, you'll run out of connections quickly.
Create another daemon (not Apache related). instantiate an array of shared memory. Into this array put objects that consist of a db connection and a process Id (null to start). In a while-True loop, listen for connections on a socket, and when you get one, spawn a subprocess, passing in the shared array, the number of the array element. the subprocess fills in the process id it knows, it handles the request, closes any cursors, then removes its process id and exits.
Hire a programmer familiar with WSGI and db connection pooling to do it for you ;-) .
Related
To begin with, we're given the following piece of code:
from validate_email import validate_email
import time
import os
def verify_emails(email_path, good_filepath, bad_filepath):
good_emails = open(good_filepath, 'w+')
bad_emails = open(bad_filepath, 'w+')
emails = set()
with open(email_path) as f:
for email in f:
email = email.strip()
if email in emails:
continue
emails.add(email)
if validate_email(email, verify=True):
good_emails.write(email + '\n')
else:
bad_emails.write(email + '\n')
if __name__ == "__main__":
os.system('cls')
verify_emails("emails.txt", "good_emails.txt", "bad_emails.txt")
I expect contacting SMTP servers to be the most expensive part by far from my program when emails.txt contains large amount of lines (>1k). Using some form of parallel or asynchronous I/O should speed this up a lot, since I can wait for multiple servers to respond instead of waiting sequentially.
As far as I have read:
Asynchronous I/O operates by queuing a request for I/O to the file
descriptor, tracked independently of the calling process. For a file
descriptor that supports asynchronous I/O (raw disk devcies
typically), a process can call aio_read() (for instance) to request a
number of bytes be read from the file descriptor. The system call
returns immediately, whether or not the I/O has completed. Some time
later, the process then polls the operating system for the completion
of the I/O (that is, buffer is filled with data).
To be sincere, I didn't quite understand how to implement async I/O on my program. Can anybody take a little time and explain me the whole process ?
EDIT as per PArakleta suggested:
from validate_email import validate_email
import time
import os
from multiprocessing import Pool
import itertools
def validate_map(e):
return (validate_email(e.strip(), verify=True), e)
seen_emails = set()
def unique(e):
if e in seen_emails:
return False
seen_emails.add(e)
return True
def verify_emails(email_path, good_filepath, bad_filepath):
good_emails = open(good_filepath, 'w+')
bad_emails = open(bad_filepath, 'w+')
with open(email_path, "r") as f:
for result in Pool().imap_unordered(validate_map,
itertools.ifilter(unique, f):
(good, email) = result
if good:
good_emails.write(email)
else:
bad_emails.write(email)
good_emails.close()
bad_emails.close()
if __name__ == "__main__":
os.system('cls')
verify_emails("emails.txt", "good_emails.txt", "bad_emails.txt")
You're asking the wrong question
Having looked at the validate_email package your real problem is that you're not efficiently batching your results. You should be only doing the MX lookup once per domain and then only connect to each MX server once, go through the handshake, and then check all of the addresses for that server in a single batch. Thankfully the validate_email package does the MX result caching for you, but you still need to be group the email addresses by server to batch the query to the server itself.
You need to edit the validate_email package to implement batching, and then probably give a thread to each domain using the actual threading library rather than multiprocessing.
It's always important to profile your program if it's slow and figure out where it is actually spending the time rather than trying to apply optimisation tricks blindly.
The requested solution
IO is already asynchronous if you are using buffered IO and your use case fits with the OS buffering. The only place you could potentially get some advantage is in read-ahead but Python already does this if you use the iterator access to a file (which you are doing). AsyncIO is an advantage to programs that are moving large amounts of data and have disabled the OS buffers to prevent copying the data twice.
You need to actually profile/benchmark your program to see if it has any room for improvement. If your disks aren't already throughput bound then there is a chance to improve the performance by parallel execution of the processing of each email (address?). The easiest way to check this is probably to check to see if the core running your program is maxed out (i.e. you are CPU bound and not IO bound).
If you are CPU bound then you need to look at threading. Unfortunately Python threading doesn't work in parallel unless you have non-Python work to be done so instead you'll have to use multiprocessing (I'm assuming validate_email is a Python function).
How exactly you proceed depends on where the bottleneck's in your program are and how much of a speed up you need to get to the point where you are IO bound (since you cannot actually go any faster than that you can stop optimising when you hit that point).
The emails set object is hard to share because you'll need to lock around it so it's probably best that you keep that in one thread. Looking at the multiprocessing library the easiest mechanism to use is probably Process Pools.
Using this you would need to wrap your file iterable in an itertools.ifilter which discards duplicates, and then feed this into a Pool.imap_unordered and then iterate that result and write into your two output files.
Something like:
with open(email_path) as f:
for result in Pool().imap_unordered(validate_map,
itertools.ifilter(unique, f):
(good, email) = result
if good:
good_emails.write(email)
else:
bad_emails.write(email)
The validate_map function should be something simple like:
def validate_map(e):
return (validate_email(e.strip(), verify=True), e)
The unique function should be something like:
seen_emails = set()
def unique(e):
if e in seen_emails:
return False
seen_emails.add(e)
return True
ETA: I just realised that validate_email is a library which actually contacts SMTP servers. Given that it's not busy in Python code you can use threading. The threading API though is not as convenient as the multiprocessing library but you can use multiprocessing.dummy to have a thread based Pool.
If you are CPU bound then it's not really worth having more threads/processes than cores but since your bottleneck is network IO you can benefit from many more threads/processes. Since processes are expensive you want to swap to threads and then crank up the number running in parallel (although you should be polite not to DOS-attack the servers you are connecting to).
Consider from multiprocessing.dummy import Pool as ThreadPool and then call ThreadPool(processes=32).imap_unordered().
TL;DR: Getting different results after running code with threading and multiprocessing and single threaded. Need guidance on troubleshooting.
Hello, I apologize in advance if this may be a bit too generic, but I need a bit of help troubleshooting an issue and I am not sure how best to proceed.
Here is the story; I have a bunch of data indexed into a Solr Collection (~250m items), all items in that collection have a sessionid. Some items can share the same session id. I am combing through the collection to extract all items that have the same session, massage the data a bit and spit out another JSON file for indexing later.
The code has two main functions:
proc_day - accepts a day and processes all the sessions for that day
and
proc_session - does everything that needs to happen for a single session.
Multiprocessing is implemented on proc_day, so each day would be processed by a separate process, the proc_session function can be ran with threads. Below is the code I am using for threading/multiprocessing below. It accepts a function, a list of arguments and number of threads / multiprocesses. It will then create a queue based on input args, then create processes/threads and let them go through it. I am not posting the actual code, since it generally runs fine single threaded without any issues, but can post it if needed.
autoprocs.py
import sys
import logging
from multiprocessing import Process, Queue,JoinableQueue
import time
import multiprocessing
import os
def proc_proc(func,data,threads,delay=10):
if threads < 0:
return
q = JoinableQueue()
procs = []
for i in range(threads):
thread = Process(target=proc_exec,args=(func,q))
thread.daemon = True;
thread.start()
procs.append(thread)
for item in data:
q.put(item)
logging.debug(str(os.getpid()) + ' *** Processes started and data loaded into queue waiting')
s = q.qsize()
while s > 0:
logging.info(str(os.getpid()) + " - Proc Queue Size is:" + str(s))
s = q.qsize()
time.sleep(delay)
for p in procs:
logging.debug(str(os.getpid()) + " - Joining Process {}".format(p))
p.join(1)
logging.debug(str(os.getpid()) + ' - *** Main Proc waiting')
q.join()
logging.debug(str(os.getpid()) + ' - *** Done')
def proc_exec(func,q):
p = multiprocessing.current_process()
logging.debug(str(os.getpid()) + ' - Starting:{},{}'.format(p.name, p.pid))
while True:
d = q.get()
try:
logging.debug(str(os.getpid()) + " - Starting to Process {}".format(d))
func(d)
sys.stdout.flush()
logging.debug(str(os.getpid()) + " - Marking Task as Done")
q.task_done()
except:
logging.error(str(os.getpid()) + " - Exception in subprocess execution")
logging.error(sys.exc_info()[0])
logging.debug(str(os.getpid()) + 'Ending:{},{}'.format(p.name, p.pid))
autothreads.py:
import threading
import logging
import time
from queue import Queue
def thread_proc(func,data,threads):
if threads < 0:
return "Thead Count not specified"
q = Queue()
for i in range(threads):
thread = threading.Thread(target=thread_exec,args=(func,q))
thread.daemon = True
thread.start()
for item in data:
q.put(item)
logging.debug('*** Main thread waiting')
s = q.qsize()
while s > 0:
logging.debug("Queue Size is:" + str(s))
s = q.qsize()
time.sleep(1)
logging.debug('*** Main thread waiting')
q.join()
logging.debug('*** Done')
def thread_exec(func,q):
while True:
d = q.get()
#logging.debug("Working...")
try:
func(d)
except:
pass
q.task_done()
I am running into problems with validating data after python runs under different multiprocessing/threading configs. There is a lot of data, so I really need to get multiprocessing working. Here are the results of my test yesterday.
Only with multiprocessing - 10 procs:
Days Processed 30
Sessions Found 3,507,475
Sessions Processed 3,514,496
Files 162,140
Data Output: 1.9G
multiprocessing and multithreading - 10 procs 10 threads
Days Processed 30
Sessions Found 3,356,362
Sessions Processed 3,272,402
Files 424,005
Data Output: 2.2GB
just threading - 10 threads
Days Processed 31
Sessions Found 3,595,263
Sessions Processed 3,595,263
Files 733,664
Data Output: 3.3GB
Single process/ no threading
Days Processed 31
Sessions Found 3,595,263
Sessions Processed 3,595,263
Files 162,190
Data Output: 1.9GB
These counts were gathered by grepping and counties entries in the log files (1 per main process). The first thing that jumps out is that days processed doesn't match. However, I manually checked the log files and it looks like a log entry was missing, there are follow on log entries to indicate that the day was actually processed. I have no idea why it was omitted.
I really don't want to write more code to validate this code, just seems like a terrible waste of time, is there any alternative?
I gave some general hints in the comments above. I think there are multiple problems with your approach, at very different levels of abstraction. You are also not showing all code of relevance.
The issue might very well be
in the method you are using to read from solr or in preparing read data before feeding it to your workers.
in the architecture you have come up with for distributing the work among multiple processes.
in your logging infrastructure (as you have pointed out yourself).
in your analysis approach.
You have to go through all of these points, and as of the complexity of the issue surely nobody here will be able to identify the exact issues for you.
Regarding points (3) and (4):
If you are not sure about the completeness of your log files, you should perform the analysis based on the payload output of your processing engine. What I am trying to say: the log files probably are just a side product of your data processing. The primary product is the thing you should analyze. Of course it is also important to get your logs right. But these two problems should be treated independently.
My contribution regarding point (2) in the list above:
What is especially suspicious about your multiprocessing-based solution is your way to wait for the workers to finish. You seem not to be sure by which method you should wait for your workers, so you apply three different methods:
First, you are monitoring the size of the queue in a while loop and wait for it to become 0. This is a non-canonical approach, which might actually work.
Secondly, you join() your processes in a weird way:
for p in procs:
logging.debug(str(os.getpid()) + " - Joining Process {}".format(p))
p.join(1)
Why are you defining a timeout of one second here and do not respond to whether the process actually terminated within that time frame? You should either really join a process, i.e. wait until it has terminated or you specify a timeout and, if that timeout expires before the process finishes, treat that situation specially. Your code does not distinguish these situations, so p.join(1) is like writing time.sleep(1) instead.
Thirdly, you join the queue.
So, after making sure that q.qsize() returns 0 and after waiting for another second, do you really think that joining the queue is important? Does it make any difference? One of these approaches should be enough, and you need to think about which of these criteria is most important to your problem. That is, one of these conditions should deterministically implicate the other two.
All this looks like a quick & dirty hack of a multiprocessing solution, whereas you yourself are not really sure how that solution should behave. One of the most important insights I have obtained while working on concurrency architectures: You, the architect, must be 100 % aware of how the communication and control flow works in your system. Not properly monitoring and controlling the state of your worker processes may very well be the source of the issues you are observing.
I figured it out, I followed Jan-Philip's advice and started examining the output data of the multiprocess/multithreaded process. Turned out that an object that does all these things with the data from Solr was shared among threads. I did not have any locking mechanisms, so in a case it had mixed data from multiple sessions which caused inconsistent output. I validated this by instantiating a new object for every thread and the counts matched up. It is a bit slower, but still workable.
Thanks
Background
I'm looking into using celery (3.1.8) to process huge text files (~30GB) each. These files are in fastq format and contain about 118M sequencing "reads", which are essentially each a combination of header, DNA sequence, and quality string). Also, these sequences are from a paired-end sequencing run, so I'm iterating two files simultaneously (via itertools.izip). What I'd like to be able to do is take each pair of reads, send them to a queue, and have them be processed on one of the machines in our cluster (don't care which) to return a cleaned-up version of the read, if cleaning needs to happen (e.g., based on quality).
I've set up celery and rabbitmq, and my workers are launched as follows:
celery worker -A tasks --autoreload -Q transient
and configured like:
from kombu import Queue
BROKER_URL = 'amqp://guest#godel97'
CELERY_RESULT_BACKEND = 'rpc'
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'pickle'
CELERY_ACCEPT_CONTENT=['pickle', 'json']
CELERY_TIMEZONE = 'America/New York'
CELERY_ENABLE_UTC = True
CELERYD_PREFETCH_MULTIPLIER = 500
CELERY_QUEUES = (
Queue('celery', routing_key='celery'),
Queue('transient', routing_key='transient',delivery_mode=1),
)
I've chosen to use an rpc backend and pickle serialization for performance, as well as not
writing anything to disk in the 'transient' queue (via delivery_mode).
Celery startup
To set up the celery framework, I first launch the rabbitmq server (3.2.3, Erlang R16B03-1) on a 64-way box, writing log files to a fast /tmp disk. Worker processes (as above) are launched on each node on the cluster (about 34 of them) ranging anywhere from 8-way to 64-way SMP for a total of 688 cores. So, I have a ton of available CPUs for the workers to use to process of the queue.
Job submission/performance
Once celery is up and running, I submit the jobs via an ipython notebook as below:
files = [foo, bar]
f1 = open(files[0])
f2 = open(files[1])
res = []
count = 0
for r1, r2 in izip(FastqGeneralIterator(f1), FastqGeneralIterator(f2)):
count += 1
res.append(tasks.process_read_pair.s(r1, r2))
if count == 10000:
break
t.stop()
g = group(res)
for task in g.tasks:
task.set(queue="transient")
This takes about a 1.5s for 10000 pairs of reads. Then, I call delay on the group to submit to the workers, which takes about 20s, as below:
result = g.delay()
Monitoring with rabbitmq console, I see that I'm doing OK, but not nearly fast enough.
Question
So, is there any way to speed this up? I mean, I'd like to see at least 50,000 read pairs processed every second rather than 500. Is there anything obvious that I'm missing in my celery configuration? My worker and rabbit logs are essentially empty. Would love some advice on how to get my performance up. Each individual read pair processes pretty quickly, too:
[2014-01-29 13:13:06,352: INFO/Worker-1] tasks.process_read_pair[95ec7f2f-0143-455a-a23b-c032998951b8]: HWI-ST425:143:C04A5ACXX:3:1101:13938:2894 1:N:0:ACAGTG HWI-ST425:143:C04A5ACXX:3:1101:13938:2894 2:N:0:ACAGTG 0.00840497016907 sec
Up to this point
So up to this point, I've googled all I can think of with celery, performance, routing, rabbitmq, etc. I've been through the celery website and docs. If I can't get the performance higher, I'll have to abandon this method in favor of another solution (basically dividing up the work into many smaller physical files and processing them directly on each compute node with multiprocessing or something). It would be a shame to not be able to spread this load out over the cluster, though. Plus, this seems like an exquisitely elegant solution.
Thanks in advance for any help!
Not an answer but too long for a comment.
Let's narrow the problem down a little...
Firstly, try skipping all your normal logic/message preparation and just do the tightest possible publishing loop with your current library. See what rate you get. This will identify if it's a problem with your non-queue-related code.
If it's still slow, set up a new python script but use amqplib instead of celery. I've managed to get it publishing at over 6000/s while doing useful work (and json encoding) on a mid-range desktop, so I know that it's performant. This will identify if the problem is with the celery library. (To save you time, I've snipped the following from a project of mine and hopefully not broken it when simplifying...)
from amqplib import client_0_8 as amqp
try:
lConnection = amqp.Connection(
host=###,
userid=###,
password=###,
virtual_host=###,
insist=False)
lChannel = lConnection.channel()
Exchange = ###
for i in range(100000):
lMessage = amqp.Message("~130 bytes of test data..........................................................................................................")
lMessage.properties["delivery_mode"] = 2
lChannel.basic_publish(lMessage, exchange=Exchange)
lChannel.close()
lConnection.close()
except Exception as e:
#Fail
Between the two approaches above you should be able to track down the problem to one of the Queue, the Library or your code.
Reusing the producer instance should give you some performance improvement:
with app.producer_or_acquire() as producer:
task.apply_async(producer=producer)
Also the task may be a proxy object and if so must be evaluated for every invocation:
task = task._get_current_object()
Using group will automatically reuse the producer and is usually what you would
do in a loop like this:
process_read_pair = tasks.process_read_pair.s
g = group(
process_read_pair(r1, r2)
for r1, r2 in islice(
izip(FastGeneralIterator(f1), FastGeneralIterator(f2)), 0, 1000)
)
result = g.delay()
You can also consider installing the librabbitmq module which is written in C.
The amqp:// transport will automatically use it if available (or can be specified manually using librabbitmq://:
pip install librabbitmq
Publishing messages directly using the underlying library may be faster
since it will bypass the celery routing helpers and so on, but I would not
think it was that much slower. If so there is definitely room for optimization in Celery,
as I have mostly focused on optimizing the consumer side so far.
Note also that you may want to process multiple DNA pairs in the same task,
as using coarser task granularity may be beneficial for CPU/memory caches and so on,
and it will often saturate parallelization anyway since that is a finite resource.
NOTE: The transient queue should be durable=False
One solution you have is that the reads are highly compressible so replacing the following
res.append(tasks.process_read_pair.s(r1, r2))
by
res.append(tasks.process_bytes(zlib.compress(pickle.dumps((r1, r2))),
protocol = pickle.HIGHEST_PROTOCOL),
level=1))
and calling a pickle.loads(zlib.decompress(obj)) on the other side.
It should win a factor around big factor for long enough DNA sequence if they are not long enough you can grouping them by chunk in an array which you dumps and compress.
another win can be to use zeroMQ for transport if you don't do yet.
I'm not sure what process_byte should be
Again, not an answer, but too long for comments. Per Basic's comments/answer below, I set up the following test using the same exchange and routing as my application:
from amqplib import client_0_8 as amqp
try:
lConnection = amqp.Connection()
lChannel = lConnection.channel()
Exchange = 'celery'
for i in xrange(1000000):
lMessage = amqp.Message("~130 bytes of test data..........................................................................................................")
lMessage.properties["delivery_mode"] = 1
lChannel.basic_publish(lMessage, exchange=Exchange, routing_key='transient')
lChannel.close()
lConnection.close()
except Exception as e:
print e
You can see that it's rocking right along.
I guess now it's up to finding out the difference between this and what's going on inside of celery
I added amqp into my logic, and it's fast. FML.
from amqplib import client_0_8 as amqp
try:
import stopwatch
lConnection = amqp.Connection()
lChannel = lConnection.channel()
Exchange = 'celery'
t = stopwatch.Timer()
files = [foo, bar]
f1 = open(files[0])
f2 = open(files[1])
res = []
count = 0
for r1, r2 in izip(FastqGeneralIterator(f1), FastqGeneralIterator(f2)):
count += 1
#res.append(tasks.process_read_pair.s(args=(r1, r2)))
#lMessage = amqp.Message("~130 bytes of test data..........................................................................................................")
lMessage = amqp.Message(" ".join(r1) + " ".join(r2))
res.append(lMessage)
lMessage.properties["delivery_mode"] = 1
lChannel.basic_publish(lMessage, exchange=Exchange, routing_key='transient')
if count == 1000000:
break
t.stop()
print "added %d tasks in %s" % (count, t)
lChannel.close()
lConnection.close()
except Exception as e:
print e
So, I made a change to submit an async task to celery in the loop, as below:
res.append(tasks.speed.apply_async(args=("FML",), queue="transient"))
The speed method is just this:
#app.task()
def speed(s):
return s
Submitting the tasks I'm slow again!
So, it doesn't appear to have anything to do with:
How I'm iterating to submit to the queue
The message that I'm submitting
but rather, it has to do with the queueing of the function?!?! I'm confused.
Again, not an answer, but more of an observation. By simply changing my backend from rpc to redis, I more than triple my throughput:
I have a fairly high-level question about Python and running interactive simulations. Here is the setup:
I am porting to Python some simulation software I originally wrote in Smalltalk (VW). It is a kind of Recurrent Neural Network controlled interactively from a graphical interface. The interface allows the manipulation of most the network's parameters in real time, in addition to controlling the simulation itself (starting it, stopping it, etc). In the original Smalltalk implementation, I had two processes running with different priority levels:
The interface itself with a higher priority
The neural network running forever at a lower priority
Communication between the two processes was trivial, because all Smalltalk processes share the same address space (the Object memory).
I am now starting to realize that replicating a similar setup in Python is not so trivial. The threading module does not allow its threads to share address space, as far as I can tell. The multiprocessing module does, but in a rather complex way (with Queues, etc).
So I am starting to think that my Smalltalk perspective is leading me astray and I am approaching a relatively simple problem from the wrong angle altogether. Problem is, I don't know what is the right angle! How would you recommend I approach the problem? I am fairly new to Python (obviously) and more than willing to learn. But I would greatly appreciate suggestions on how to frame the issues and which multiprocessing modules (if any!) I should delve into.
Thanks,
Stefano
I'll offer my take on how to approach this problem. Within the multiprocessing module the Pipe and Queue IPC mechanisms are really the best way to go; in spite of the added complexity you allude to, it's worth learning how they work. The Pipe is fairly straightforward so I'll use that to illustrate.
Here's the code, followed by some explanation:
import sys
import os
import random
import time
import multiprocessing
class computing_task(multiprocessing.Process):
def __init__(self, name, pipe):
# call this before anything else
multiprocessing.Process.__init__(self)
# then any other initialization
self.name = name
self.ipcPipe = pipe
self.number1 = 0.0
self.number2 = 0.0
sys.stdout.write('[%s] created: %f\n' % (self.name, self.number1))
# Do some kind of computation
def someComputation(self):
try:
count = 0
while True:
count += 1
self.number1 = (random.uniform(0.0, 10.0)) * self.number2
sys.stdout.write('[%s]\t%d \t%g \t%g\n' % (self.name, count, self.number1, self.number2))
# Send result via pipe to parent process.
# Can send lists, whatever - anything picklable.
self.ipcPipe.send([self.name, self.number1])
# Get new data from parent process
newData = self.ipcPipe.recv()
self.number2 = newData[0]
time.sleep(0.5)
except KeyboardInterrupt:
return
def run(self):
sys.stdout.write('[%s] started ... process id: %s\n'
% (self.name, os.getpid()))
self.someComputation()
# When done, send final update to parent process and close pipe.
self.ipcPipe.send([self.name, self.number1])
self.ipcPipe.close()
sys.stdout.write('[%s] task completed: %f\n' % (self.name, self.number1))
def main():
# Create pipe
parent_conn, child_conn = multiprocessing.Pipe()
# Instantiate an object which contains the computation
# (give "child process pipe" to the object so it can phone home :) )
computeTask = computing_task('foo', child_conn)
# Start process
computeTask.start()
# Continually send and receive updates to/from the child process
try:
while True:
# receive data from child process
result = parent_conn.recv()
print "recv: ", result
# send new data to child process
parent_conn.send([random.uniform(0.0, 1.0)])
except KeyboardInterrupt:
computeTask.join()
parent_conn.close()
print "joined, exiting"
if (__name__ == "__main__"):
main()
I have encapsulated the computing to be done inside a class derived from Process. This isn't strictly necessary but makes the code easier to understand and extend, in most cases. From the main process you can start your computing task with the start() method on an instance of this class (this will start a separate process to run the contents of your object).
As you can see, we use Pipe in the parent process to create two connectors ("ends" of the pipe) and give one to the child while the the parent holds the other. Each of these connectors is a two-way communication mechanism between the processes holding the ends, with send() and recv() methods for doing what their names imply. In this example I've used the pipe to transmit lists of numbers and text, but in general you can send lists, tuples, objects, or anything that's picklable (i.e. serializable with Python's pickle facility). So you've got some latitude for what you send back and forth between processes.
So you set up your connectors, invoke start() on your new process, and you're off and computing. Here we're just multiplying two numbers, but you can see it's being done "interactively" in the subprocess with updates sent from the parent. Likewise the parent process is informed regularly of new results from the computing process.
Note that the connector's recv() method is blocking, i.e. if the other end hasn't sent anything yet, recv() will wait until something is there to read, and prevent anything else from happening in the meantime. So just be aware of that.
Hope this helps. Again, this is a barebones example and in real life you'll want to do more error handling, possibly use poll() on the connection objects, and so forth, but hopefully this conveys the major ideas and gets you started.
I have a site that runs with follow configuration:
Django + mod-wsgi + apache
In one of user's request, I send another HTTP request to another service, and solve this by httplib library of python.
But sometimes this service don't get answer too long, and timeout for httplib doesn't work. So I creating thread, in this thread I send request to service, and join it after 20 sec (20 sec - is a timeout of request). This is how it works:
class HttpGetTimeOut(threading.Thread):
def __init__(self,**kwargs):
self.config = kwargs
self.resp_data = None
self.exception = None
super(HttpGetTimeOut,self).__init__()
def run(self):
h = httplib.HTTPSConnection(self.config['server'])
h.connect()
sended_data = self.config['sended_data']
h.putrequest("POST", self.config['path'])
h.putheader("Content-Length", str(len(sended_data)))
h.putheader("Content-Type", 'text/xml; charset="utf-8"')
if 'base_auth' in self.config:
base64string = base64.encodestring('%s:%s' % self.config['base_auth'])[:-1]
h.putheader("Authorization", "Basic %s" % base64string)
h.endheaders()
try:
h.send(sended_data)
self.resp_data = h.getresponse()
except httplib.HTTPException,e:
self.exception = e
except Exception,e:
self.exception = e
something like this...
And use it by this function:
getting = HttpGetTimeOut(**req_config)
getting.start()
getting.join(COOPERATION_TIMEOUT)
if getting.isAlive(): #maybe need some block
getting._Thread__stop()
raise ValueError('Timeout')
else:
if getting.resp_data:
r = getting.resp_data
else:
if getting.exception:
raise ValueError('REquest Exception')
else:
raise ValueError('Undefined exception')
And all works fine, but sometime I start catching this exception:
error: can't start new thread
at the line of starting new thread:
getting.start()
and the next and the final line of traceback is
File "/usr/lib/python2.5/threading.py", line 440, in start
_start_new_thread(self.__bootstrap, ())
And the answer is: What's happen?
Thank's for all, and sorry for my pure English. :)
The "can't start new thread" error almost certainly due to the fact that you have already have too many threads running within your python process, and due to a resource limit of some kind the request to create a new thread is refused.
You should probably look at the number of threads you're creating; the maximum number you will be able to create will be determined by your environment, but it should be in the order of hundreds at least.
It would probably be a good idea to re-think your architecture here; seeing as this is running asynchronously anyhow, perhaps you could use a pool of threads to fetch resources from another site instead of always starting up a thread for every request.
Another improvement to consider is your use of Thread.join and Thread.stop; this would probably be better accomplished by providing a timeout value to the constructor of HTTPSConnection.
You are starting more threads than can be handled by your system. There is a limit to the number of threads that can be active for one process.
Your application is starting threads faster than the threads are running to completion. If you need to start many threads you need to do it in a more controlled manner I would suggest using a thread pool.
I was running on a similar situation, but my process needed a lot of threads running to take care of a lot of connections.
I counted the number of threads with the command:
ps -fLu user | wc -l
It displayed 4098.
I switched to the user and looked to system limits:
sudo -u myuser -s /bin/bash
ulimit -u
Got 4096 as response.
So, I edited /etc/security/limits.d/30-myuser.conf and added the lines:
myuser hard nproc 16384
myuser soft nproc 16384
Restarted the service and now it's running with 7017 threads.
Ps. I have a 32 cores server and I'm handling 18k simultaneous connections with this configuration.
I think the best way in your case is to set socket timeout instead of spawning thread:
h = httplib.HTTPSConnection(self.config['server'],
timeout=self.config['timeout'])
Also you can set global default timeout with socket.setdefaulttimeout() function.
Update: See answers to Is there any way to kill a Thread in Python? question (there are several quite informative) to understand why. Thread.__stop() doesn't terminate thread, but rather set internal flag so that it's considered already stopped.
I completely rewrite code from httplib to pycurl.
c = pycurl.Curl()
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.setopt(pycurl.MAXREDIRS, 5)
c.setopt(pycurl.CONNECTTIMEOUT, CONNECTION_TIMEOUT)
c.setopt(pycurl.TIMEOUT, COOPERATION_TIMEOUT)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.POST, 1)
c.setopt(pycurl.SSL_VERIFYHOST, 0)
c.setopt(pycurl.SSL_VERIFYPEER, 0)
c.setopt(pycurl.URL, "https://"+server+path)
c.setopt(pycurl.POSTFIELDS,sended_data)
b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
something like that.
And I testing it now. Thanks all of you for help.
If you are tying to set timeout why don't you use urllib2.
I'm running a python script on my machine only to copy and convert some files from one format to another, I want to maximize the number of running threads to finish as quickly as possible.
Note: It is not a good workaround from an architecture perspective If you aren't using it for a quick script on a specific machine.
In my case, I checked the max number of running threads that my machine can run before I got the error, It was 150
I added this code before starting a new thread. which checks if the max limit of running threads is reached then the app will wait until some of the running threads finish, then it will start new threads
while threading.active_count()>150 :
time.sleep(5)
mythread.start()
If you are using a ThreadPoolExecutor, the problem may be that your max_workers is higher than the threads allowed by your OS.
It seems that the executor keeps the information of the last executed threads in the process table, even if the threads are already done. This means that when your application has been running for a long time, eventually it will register in the process table as many threads as ThreadPoolExecutor.max_workers
As far as I can tell it's not a python problem. Your system somehow cannot create another thread (I had the same problem and couldn't start htop on another cli via ssh).
The answer of Fernando Ulisses dos Santos is really good. I just want to add, that there are other tools limiting the number of processes and memory usage "from the outside". It's pretty common for virtual servers. Starting point is the interface of your vendor or you might have luck finding some information in files like
/proc/user_beancounters