Hey guys I have looked at the best ways to do this. I want to make sure I understand correctly.
If I want:
http1
http2
http3
http...
To be sent at exactly the same time. I should set these up into a thread and then start the thread? I need to make sure it is exact same time.
I think this can be done in Java but Im not familiar with it. Thanks guys for any help you can give!
After reading up more on the process I dont think this was super clear. Will the async processing send these packets at the same time so they arrive at the destination at the same time? From reading different articles it seems like the async is just that.
I believe for what Im looking for, I would need to use a synchronous method like multiprocessing.
Thoughts?
You question is not totally clear to me but have you looked at Twisted? It's an event-driven networking engine written in Python. If you're not familiar with event-driven programming, this Linux Journal is a good introduction. Basically instead of threads, Asynchronous I/O is used with the reactor pattern (which encapsulates an event loop).
Twisted has multiple web clients. You should probably start with the newer one, called Agent (twisted.web.client.Agent), rather then the older getPage.
If you want to understand Twisted, I can recommend Dave Peticolas's Twisted Introduction. It's long but accessible and detailed.
Related
I've created a socket object for Telnet communication, and I'm using it to communicate with an API, sending and receiving data. I need to configure it in such a way that I can send and receive data at the same time. By that, I mean data should be sent as soon as the application tries to send it, and data should be processed immediately on receipt. Currently, I have a configuration which allows receipt to be instant, and sending to be second priority with a very short delay.
Currently the best way I have found to do this is by having an event queue, and pushing data to send into it, then having a response queue into which I put messages from the server. I have a thread which polls the buffer every .1 seconds to check for new data, if there isn't any, it then checks the request queue and processes anything there, and that's running in a continuous loop. I then have threads insert data into the request queue, and read data from the response queue. Everything is just about linear enough that this works fine.
This is not "asynchronous", in a sense that I've had to make it as asynchronous as possible without actually achieving it. Is there a proper way to do this? Or is anything under the hood going to be doing exactly the same as I am?
Other things I have investigated as a solution to this problem:
A callback system, where I might call socket.on_receipt(handle_message, args) to call the method handle_message with args as a parameter, passing the received data into the method. The only way I could find to achieve this is by implementing what I already have, then registering a callback for it (in fact, this is very close to what I do already have).
Please note: I am approaching this as a learning exercise to understand better how asynchronous systems work, not to understand how to use a particular library, so please do not suggest an existing library unless it contains very clear code which is simple to understand and answers the question fully and concisely.
This seems like a pretty straightforward use case for asyncio. I wouldn't consider using asyncio as "using a particular library" since socket programming paired with asyncio's event loop is pretty low-level and the concept is very transparent if you have experience with other languages and just want to see how async programming works in Python.
You can use this async chat as an example: https://gist.github.com/gregvish/7665915
Essentially, you create a non-blocking socket, see standard library reference on socket.setblocking(0):
https://docs.python.org/3/library/socket.html#socket.socket.setblocking
I'd also suggest this amazing session by David Beazley as a must-see for async Python programming. He explains the concurrency concepts in Python using sockets, exactly what you need: https://www.youtube.com/watch?v=MCs5OvhV9S4
Using web.py, I'm building a website in which I display search results from two third party websites through their public API. Unfortunately, for the APIs to send back the result takes about 4 seconds. If I query the second API only after I received the answer from the first, this obviously takes me about 8 seconds, which is way too long. To bring this down I want to send the requests to the APIs simultaneously and simply continue as soon as I received an answer from both the APIs.
My problem is now: how to do this?
I've never worked with parallel computing, but I've heard of multiprocessing and threading. I don't really know what the difference or advantages of each are. I also know that for example C++ is able to do parallel computations. It could therefore also be an option to write the part that queries the APIs in C++ (I'm a beginner in C++, but I think I'd manage). Finally, there could of course be options that I am totally overlooking. Maybe web.py has some options to do this, or maybe there are Python modules which are specifically made to do this?
Since only researching and understanding all of these options would take me quite a lot of time, I thought I'd ask you guys here for some tips.
So which one do you think I should go for? And most importantly: why? All tips are welcome!
You want an asynchronous HTTP request library. Examples of this would be gevent, or grequests.
Alternatively, you could use Python's built-in threading module to run synchronous requests in multiple threads.
Either way, no need to go to another language.
I want to create 2 applications in Python which should communicate with each other. One of these application should behave like a server and the second should be the GUI of a client. They could be run on the same system(on the same machine) or remotely and on different devices.
I want to ask you, which technology should I use - an AMQP messaging (like RabbitMQ), Twisted like server (or Tornado) or ZeroMQ and connect applications to it. In the future I would like to have some kind of authentication etc.
I have read really lot of questions and articles (like this one: Why do we need to use rabbitmq), and a lot of people are telling "rabbitmq and twisted are different". I know they are. I really love to know the differences and why one of these solutions will be superior than the other in this case.
EDIT:
I want to use it with following requirements:
There will be more than 1 user connected at a time - I think there will be 1 - 10 users connected to the same program and they would work collaboratively
The data send are "messages" telling what user did - something like remote calls (but don't focus on that, because the GUIS can be written in different languages, so the messages will be something like json informations).
The system should allow for collaborative work - so it should be as interactive as possible. (data will be send all the time when user something types or performs some action).
Additional I would love to hear why one solution would be better than the other not only in this particular case.
Twisted is used to solve the C10k networking problem by giving you asynchronous networking through the Reactor Pattern. Its also convenient because it provides a nice concurrency abstraction as threading/concurrency in Python is not as easy as say Erlang. Consequently some people use Twisted to dispatch work tasks but thats not what it is designed for.
RabbitMQ is based on the message queue pattern. Its all about reliable message passing and is not about networking. I stress the reliable part as there are many different asynchronous networking frameworks (Vert.x for example) that provide message passing (also known as pub/sub).
More often than not most people combine the two patterns together to create a "message bus" that will work with a variety of networking needs with out unnecessary network blocking and for great integration and scalability.
The reason a "message queue" goes so well with a networking "reactor loop" is that you should not block on the reactor loop so you have to dispatch blocking work to some other process (thread, lwp, separate machine process, queue, etc..). In practice the cleanest way to do this is distributed message passing.
Based on your requirements it sounds like you should use asynchronous networking if you want the results to show up instantly and scale but you could probably get away with a simple system that just polls given you only have handful of clients. So the question is how many total users (Twisted)? And how reliable do you want the updates to be delivered (RabbitMQ)? Finally do you want your architecture to be language and platform agnostic... maybe you want to use Node.js later (focus on the message queue instead of async networking... ie RabbitMQ). Personally I would take a look at Vert.x which allows you to write in Python.
When someone is telling you that Twisted and RabbitMQ is different is because compare both is like compare two things with different target.
Twisted is a asynchronous framework, like Tornadao. RabbitMQ is a message queue system. You can't compare each one straight for.
You should turn your ask into a two new questions, the first one wich protocol should I use to communicate my process ? The answer can be figure out with words like amqp, Protocol Buffers ...
And the other one, which framework should I use to write my client and server program ? Here the answer can fall on Twisted, Tornado, ....
I'm working on a simple experiment in Python. I have a "master" process, in charge of all the others, and every single process has a connection via unix socket to the master process. I would like to be able for the master process to be able to monitor all of the sockets for a response - but there could theoretically be almost a hundred of them. How would threads impact the memory and performance of the application? What would be the best solution? Thanks a lot!
One hundred simultaneous threads might be pushing the reasonable limits of threading. If you find this is the cleanest way to organize your code, I'd say give it a try, but threading really doesn't scale very far.
What works better is to use a technique like select to wait for one of the sockets to be readable / writable / or has an error to report. This mechanism lets you go to sleep until something interesting happens, handle as many sockets have content to handle, and then go back to sleep again, all in a single thread of execution. Removing the multi-threading can often reduce chances for errors, and this style of programming should get you into the hundreds of connections no trouble. (If you want to go beyond about 100, I'd use the poll functionality instead of select -- constantly rebuilding the list of interesting file descriptors takes time that poll does not require.)
Something to consider is the Python Twisted Framework. They've gone to some length to provide a consistent way to hook callbacks onto events for this exact sort of programming. (If you're familiar with node.js, it's a bit like that, but Python.) I must admit a slight aversion to Twisted -- I never got very far in their documentation without being utterly baffled -- but a lot of people made it further in the docs than I did. You might find it a better fit than I have.
The easiest way to conduct comparative tests of threads versus processes for socket handling is to use the SocketServer in Python's standard library. You can easily switch approaches (while keeping everything else the same) by inheriting from either ThreadingMixIn or ForkingMixIn. Here is a simple example to get you started.
Another alternative is a select/poll approach using non-blocking sockets in a single process and a single thread.
If you're interested in software that is already fully developed and highly evolved, consider these high-performance Python based server packages:
The Twisted framework uses the async single process, single thread style.
The Tornado framework is similar (less evolved, less full featured, but easier to understand)
And Gunicorn which is a high-performance forking server.
I'm writing a simple site spider and I've decided to take this opportunity to learn something new in concurrent programming in Python. Instead of using threads and a queue, I decided to try something else, but I don't know what would suit me.
I have heard about Stackless, Celery, Twisted, Tornado, and other things. I don't want to have to set up a database and the whole other dependencies of Celery, but I would if it's a good fit for my purpose.
My question is: What is a good balance between suitability for my app and usefulness in general? I have taken a look at the tasklets in Stackless but I'm not sure that the urlopen() call won't block or that they will execute in parallel, I haven't seen that mentioned anywhere.
Can someone give me a few details on my options and what would be best to use?
Thanks.
Tornado is a web server, so it wouldn't help you much in writing a spider. Twisted is much more general (and, inevitably, complex), good for all kinds of networking tasks (and with good integration with the event loop of several GUI frameworks). Indeed, there used to be a twisted.web.spider (but it was removed years ago, since it was unmaintained -- so you'll have to roll your own on top of the facilities Twisted does provide).
I must say that Twisted gets my vote.
Performing event-drive tasks is fairly straightforward in Twisted. Integration with other important system components such as GTK+ and DBus is very easy.
The HTTP client support is basic for now but improving (>9.0.0): see related question.
The added bonus is that Twisted is available in the Ubuntu default repository ;-)
For a quick look at package sizes, see
ohloh.net/p/compare .
Of course source size is only a rough metric (what I'd really like is nr pages doc, nr pages examples,
dependencies), but it can help.