As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Could someone suggest a concurrency framework (threads, event based) in Python that can handle two tasks, one that has to process a LOT of events, and another that can process one or more commands at a slower rate? I am going to prototype with Twisted and see if it will meet my needs. I went through http://www.slideshare.net/dabeaz/an-introduction-to-python-concurrency which is informative and so the other choice I can try seems to be the multiprocessing module.
Background
I am trying to write a program that can interface with a C program on one side and the network on the other. The C program generates events at a high rate (hundreds of thousands, possibly a million messages per second) which need to be consumed without letting it block, and the C program needs to be sent commands arriving from the network.
I think Python with zeromq (http://www.zeromq.org/) will suffice for consuming the events from the C program. But I need to also concurrently process commands from the network in my program. I have used Python with Twisted before to do asynchronous programming, but am not sure if it can handle the zeromq messages concurrently with other tasks fast enough.
I am going to try it out, but I was wondering if anybody has any thoughts on other ways of doing things. I would rather use Python as it would make handling of the commands and keeping state easier than if I had to do it in C.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
Let's say I want to run some function once a day at 10 am.
Do I simply keep a script running in the background forever?
What if I don't want to keep my laptop open/on for many days at a time?
Will the process eat a lot of CPU?
Are the answers to these questions different if I use cron/launchd vs scheduling programmatically? Thanks!
The answer to this question will likely depend on your platform, the available facilities and your particular project needs.
First let me address system resources. If you want to use the fewest resources, just call time.sleep(NNN), where NNN is the number of seconds until the next instance of 10AM. time.sleep will suspend execution of your program and should consume zero (or virtually zero resources). The python GC may periodically wake up and do maintenance, but it's work should be negligible.
If you're on Unix, cron is the typical facility for scheduling future tasks. It implements a fairly efficient Franta–Maly event list manager. It will determine based on the list of tasks which will occurr next and sleep until then.
On Windows, you have the Schedule Manager. It's a Frankenstein of complexity -- but it's incredibly flexible and can handle running missed events due to power outages and laptop hibernates, etc...
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am now writing a unicast chat server model, the flow will be as follows:
Sender send out message to the chat server, in the message the server also specify the message recipient id
The chat server will route the message to the right client, based on the recipient id
I implemented the chat server model using python standard library asyncore. I found that the CPU goes up, once the client connect to the server (1% vs 24%). I believe the performance is limited by the looping of the handle_write function.
Is there a better (e.g. more efficient) framework to accomplish my chat server requirement?
Thanks in advance
Of course we'd need actual code to debug the problem. But what you're mainly asking is:
Is there a better (e.g. more efficient) framework to accomplish my chat implementation?
Yes. It's generally accepted that asyncore sucks. It's hard to use as well as being inefficient. It's especially bad on Windows, because select especially sucks on Windows.
So, yes, using a different framework will probably make things better.
Unfortunately, an SO question is not a good place to get recommendations for frameworks, but I can throw out a list of usual suspects: twisted, monocle, gevent, eventlet, tulip.
Alternatively, if you're not worried about scalability to more than a few dozen clients, just about performance at the small scale, using a thread per client (or even two threads, one for reads and one for writes) and blocking I/O is incredibly simple.
Finally, if you're staying up to date with Python 3.x, there's a good chance that 3.4 will have a new and improved async I/O module that's much more efficient and much easier to use than asyncore (and it will almost certainly be based on tulip). So… the best solution may be to get a time machine and go forward a few months. (Or, if you're a reader searching for this answer in the future, look in the standard library under IPC and guess which module is the new-and-improved async I/O module.)
I just read from a web, talking about the efficiency between different python web servers (Link).
I think I will use gevent as it is very efficient (seems).
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm writing a program in Python for which I'm considering a local client-server model, but I am struggling to figure out the best way for the server to communicate with the client(s). A simple, canned solution would be best--I'm not looking to reinvent the wheel. Here are my needs for this program:
Runs on Linux
Server and clients are on the same system, so I don't need to go over a network.
Latency that's not likely to be annoying to an interactive user.
Multiple clients can connect to the same server.
Clients are started independently of the server and can connect/disconnect at any time.
The number of clients is measurable in dozens; I don't need to scale very high.
Clients can come in a few different flavors:
Stream readers - Reads a continuous stream of data (in practice, this is all text).
State readers - Reads some state information that updates every once in a while.
Writers - Sends some data to the server, receives some response each time.
Client type 1 seems simple enough; it's a unidirectional dumb pipe. Client type 2 is a bit more interesting. I want to avoid simply polling the server to check for new data periodically since that would add noticeable latency for the user. The server needs some way to signal to all and only the relevant clients when the state information is updated so that the client can receive the updated state from the server. Client type 3 must be bidirectional; it will send user-supplied data to the server and receive some kind of response after each send.
I've looked at Python's IPC page (http://docs.python.org/2/library/ipc.html), but I don't think any of those solutions are right for my needs. The subprocess module is completely inappropriate, and everything else is a bit more low-level than I'd like.
The similar question Efficient Python to Python IPC isn't quite the same; I don't need to transfer Python objects, I'm not especially worried about CPU efficiency for the number of clients I'll have, I only care about Linux, and none of the answers to that question are especially helpful to me anyway.
Update:
I cannot accept an answer that just points me at a framework/library/module/tool without actually giving an explanation of how it can be used for my three different server-client relationships. If you say, "All of this can be done with named pipes!" I will have to ask "How?" Code snippets would be ideal, but a high-level description of a solution can work too.
Have you already looked into ZeroMQ? It has excellent Python support, and the documented examples already cover your use cases.
It's easy to use on a single platform, single machine setup, but it can be very easily extended to a network.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a working Python script that checks the 6,300 or so sites we have to ensure they are up by sending an HTTP request to each and measuring the response. Currently the script takes about 40 min to run completely, I was interested in possibly some other ways to speed up the script, two thoughts were either threading or multiple running instances.
This is the order of execution now:
MySQL query to get all of the active domains to scan (6,300 give or take)
Iterate through each domain and using urllib send an HTTP request to each
If the site doesn't return '200' then log the results
repeat until complete
This seems like it could possibly be sped up significantly with threading but I am not quite sure how that process flow would look since I am not familiar with threading.
If someone could offer a sample high-level process flow and any other pointers for working with threading or offer any other insights on how to improve the script in general it would be appreciated.
The flow would look something like this:
Create a domain Queue
Create a result Queue
MySQL query to get all of the active domains to scan
Put the domains in the domain Queue
Spawn a pool of worker threads
Run the threads
Each worker will get a domain from the domain Queue, send a request and put the result in the result Queue
Wait for the threads to finish
Get everything from the result Queue and log it
You'll probably want to tune the number of threads, thus the pool, and not just 6300 threads for every domain.
You can take a look at scrapy framework. It's made for web scraping. It's asynchronus build on twisted and pretty fast.
In your case you can just get list of domains to scrape, and only see if it will return 200 without actually scraping anything. It should be much faster.
Here's the link:
http://scrapy.org/
Threading is definitely what you need. It will remove the serialized nature of your algorithm, and since it is mostly IO bounded, you will gain a lot by sending HTTP requests in parallel.
Your flow would become:
MySQL query to get all of the active domains to scan (6,300 give or take)
Iterate through each domain and create a thread that will use urllib to send an HTTP request to each
Log the results in threads
You can make this algorithm better by creating a n worker threads with queues, and add domains to queues instead of creating one thread per each domain. I just wanted to make things a little bit easier for you since you're not familiar with threads.
I guees you should go for threading, taking under investigation the optimal number of processes to start in order to avoid killing your client. Python manual offers good examples by the way take a look here Download multiple pages concurrently?
and to urllib, threading, multiprocessing
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I did see this post but it does not answer my question: C/Python Socket Performance?
I have been tasked with creating an application that can create thousands of connections based on sockets. I can do this in Python but I want to have room for performance improvements. I know it's possible in Python because of my past projects, but I'm curious how much of a performance improvement this would be if I was to do this project in C (not C++)?
It really depends on what you're doing with the sockets.
The best generic answer is: Usually Python is good enough that it doesn't matter, but sometimes it's not.
The overhead in the time taken to create and connect sockets is minimal, and reading and writing isn't much worse. But that doesn't matter, that's pretty much never any significant time spent doing that anyway.
There are reactors and proactors for Python every bit as good as the general-purpose ones available for C (and half of the C libraries have Python bindings). If you're not doing much significant work beyond the sockets, this is often your main bottleneck. If you've got a very specific use pattern and/or very tightly specified hardware, you might be able to write a custom reactor or proactor that beats out anything general-purpose. In that case, you pretty much have to go with C, not Python.
But usually, you've got significant work to do beyond just manipulating sockets.
If that work is mostly independent and highly parallelizable, C obviously beats Python (because of the GIL), unless the jobs are heavy enough that you can multi-process them (and keep in mind that "heavy enough" can be pretty heavy on Windows platforms). Except, of course, that it's incredibly easy to screw up performance (not to mention stability) writing multi-threaded C code; really, something like Erlang or Haskell is probably a better bet here than either C or Python. (If you're about to say, "But we've got people who are experienced at C but they can't learn Haskell", then those people are probably not good enough programmers to write multi-threaded code.)
If that work is mostly memory copying between socket buffers, and you can deal with a tightly-specified system, you may be able to write C code that optimizes zero-copies, and there's no way to do that in Python.
But if it's mostly typical things like waiting on disk or serialized computation, then it scarcely matters how you write the socket-stuff, because it's going to end up waiting on the real code anyway.
So, without any more information, I'd go with Python, because the time you save getting things up and running and debugged vs. C can be spent optimizing or otherwise improving whatever turns out to matter.
If you're using the Windows platform, learn the one thread per core concept of IOCPs and stay away from using thread pools that entail a more or less one thread per socket usage.