Efficient Python IPC [closed]

Efficient Python IPC [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm making an application in Python3, which will be divided in batch and gui parts.
Batch is responsible for processing logic and gui is responsible for displaying it.
Which inter-process communication (IPC) framework should I use with the following requirements:
The GUI can be run on other device than batch (GUI can be run on the same device, on smartphone, tablet etc, locally or over network).
The batch (Python3 IPc library) should work with no problem on Linux, Mac, Windows, ...
The IPC should support GUI written in different languages (Python, Javascript, ...)
The performance of IPC is important - it should be as "interactive" as possible, but without losing information.
Several GUI could be connected to the same batch.
additional: Will the choice be other if the GUI will be guaranteed to be written in Python also?
Edit:
I have found a lot of IPC libraries, like here: Efficient Python to Python IPC or ActiveMQ or RabbitMQ or ZeroMQ or.
The best looking options I have found so far are:
rabbitmq
zeromq
pyro
Are they appropriate slutions to this problem? If not why? And if something is better, please tell me why also.

The three you mentioned seem a good fit and will uphold your requirements. I think you should go on with what you feel most comfortable\familiar with.
From my personal experience, I do believe ZeroMQ is the best combination between efficiency, ease of use and inter-operability. I had an easy time integrating zmq 2.2 with Python 2.7, so that would be my personal favorite. However as I said I'm quite sure you can't go wrong with all 3 frameworks.
Half related: Requirements tend to change with time, you may decide to switch framework later on, therefore encapsulating the dependency on the framework would be a good design pattern to use. (e.g. having a single conduit module that interacts with the framework and have its API use your internal datastructures and domain language)

I've used the Redis engine for this. Extremely simple, and lightweight.
Server side does:
import redis
r = redis.Redis() # Init
r.subscribe(['mychannel']) # Subscribe to "channel"
for x in r.listen():
print "I got message",x
Client side does:
import redis
r = redis.Redis() # Init
r.publish('mychannel',mymessage)
"messages" are strings (of any size). If you need to pass complex data structures, I like to use json.loads and json.dumps to convert between python dicts/arrays and strings -
"pickle" is perhaps the better way to do this for python-to-python communication, though JSON means "the other side" can be written in anything.
Now there are a billion other things Redis is good for - and they all inherently are just as simple.

You are asking for a lot of things from the framework; network enabled, multi-platform, multi-language, high performance (which ideally should be further specified - what does it mean, bandwidth? latency? what is "good enough"; are we talking kB/s, MB/s, GB/s? 1 ms or 1000 ms round-trip?) Plus there are a lot of things not mentioned which can easily come into play, e.g. do you need authentication or encryption? Some frameworks give you such functionality, others rely on implementing that part of the puzzle yourself.
There probably exists no silver bullet product which is going to give you an ideal solution which optimizes all those requirements at the same time. As for the 'additional' component of your question - yes, if you restrict language requirements to python only, or further distinguish between key vs. nice-to-have requirements, there would be more solutions available.
One technology you might want to have a look at is Versile Python (full disclosure: I am one of the developers). It is multi-platform and supports python v2.6+/v3, and java SE6+. Regarding performance, it depends on what are your requirements. If you have any questions about the technology, just ask on the forum.

The solution is dbus
It is a mature solution and availiable for a lot of languages (C, Python, ..., just google for dbus + your favorite language), though not as fast as shared memory, but still fast enough for pretty much everything not requiring (hard) realtime properties.

I'll take a different tack here and say why not use the de facto RPC language of the Internet? I.e. HTTP REST APIs?
With Python Requests on the client side and Flask on the server side, you get these kinds of benefits:
Existing HTTP REST tools like Postman can access and test your server.
Those same tools can document your API.
If you also use JSON, then you get a lot of tooling that works with that too.
You get proven security practices and solutions (session based security and SSL).
It's a familiar pattern for a lot of different developers.

Related

Distributed programming in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I plan do program a simple data flow framework, which basically consists of lazy method calls of objects. If I ever consider distributed programming, what is the easiest way to enable that in Python? Any transparent solution without me doing network programming?
Or for a start, how can I make use of multi-core processors in Python?

lazy method calls of objects
Can be anything at all really, so let's break it down:
Simple Let-Me-Call-That-Function (RPC)
Well lucky you! python has the one of greatest implementations of Remote Procedure Calls:
RPyC.
Just run the server (double click a file, see the tutorial),
Open an interpreter and:
import rpyc
conn = rpyc.classic.connect("localhost")
data_obj = conn.modules.lazyme.AwesomeObject("ABCDE")
print(data_obj.calculate(10))
And a lazy version (async):
# wrap the remote function with async(), which turns the invocation asynchronous
acalc = rpyc.async(data_obj.calculate)
res = acalc(10)
print(res.ready, res.value)
Simple Data Distribution
You have a defined unit of work, say a complex image manipulation.
What you do is roughly create Node(s), which does the actual work (aka, take an image, do the manipulation, and return the result), someone who collect the results (a Sink) and someone who create the work (the Distributor).
Take a look at Celery.
If it's very small scale, or if you just want to play with it, see the Pool object in the multiprocessing package:
from multiprocessing import Pool
p = Pool(5)
def f(x):
return x*x
print(p.map(f, [1,2,3]))
And the truly-lazy version:
print(p.map_async(f, [1,2,3]))
Which returns a Result object which can be inspected for results.
Complex Data Distribution
Some multi-level more-than-just-fire&forget complex data manipulation, or a multi-step processing use case.
In such case, you should use a Message Broker such as ZeroMQ or RabbitMQ.
They allow to you send 'messages' across multiple servers with great ease.
They save you from the horrors of the TCP land, but they are a bit more complex (some, like RabbitMQ, require a separate process/server for the Broker). However, they give you much more fine-grained control over the flow of data, and help you build a truly scalable application.
Lazy-Anything
While not data-distribution per se, It is the hottest trend in web server back-ends: use 'green' threads (or events, or coroutines) to delegate IO heavy tasks to a dedicated thread, while the application code is busy maxing-out the CPU.
I like Eventlet a lot, and gevent is another option.

Try Gearman http://gearman.org/
Gearman provides a generic application framework to farm out work to
other machines or processes that are better suited to do the work. It
allows you to do work in parallel, to load balance processing, and to
call functions between languages. It can be used in a variety of
applications, from high-availability web sites to the transport of
database replication events. In other words, it is the nervous system
for how distributed processing communicates.

Please read python.org official resoureces as the starter:
http://wiki.python.org/moin/ParallelProcessing

Another framework you might consider is Versile Python (full disclosure: I am a VPy developer). Documentation recipes has relevant code examples. With the framework it is easy to set up and connect to services, and you can either define explicit public method interfaces to classes or use the native python type framework to remotely access local methods.
Note you would have to set up your program to run in multiple processes in order to take advantage of multiple cores (due to the python global interpreter lock).

What's so cool about Twisted? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm increasingly hearing that Python's Twisted framework rocks and other frameworks pale in comparison.
Can anybody shed some light on this and possibly compare Twisted with other network programming frameworks.

There are a lot of different aspects of Twisted that you might find cool.
Twisted includes lots and lots of protocol implementations, meaning that more likely than not there will be an API you can use to talk to some remote system (either client or server in most cases) - be it HTTP, FTP, SMTP, POP3, IMAP4, DNS, IRC, MSN, OSCAR, XMPP/Jabber, telnet, SSH, SSL, NNTP, or one of the really obscure protocols like Finger, or ident, or one of the lower level protocol-building-protocols like DJB's netstrings, simple line-oriented protocols, or even one of Twisted's custom protocols like Perspective Broker (PB) or Asynchronous Messaging Protocol (AMP).
Another cool thing about Twisted is that on top of these low-level protocol implementations, you'll often find an abstraction that's somewhat easier to use. For example, when writing an HTTP server, Twisted Web provides a "Resource" abstraction which lets you construct URL hierarchies out of Python objects to define how requests will be responded to.
All of this is tied together with cooperating APIs, mainly due to the fact that none of this functionality is implemented by blocking on the network, so you don't need to start a thread for every operation you want to do. This contributes to the scalability that people often attribute to Twisted (although it is the kind of scalability that only involves a single computer, not the kind of scalability that lets your application grow to use a whole cluster of hosts) because Twisted can handle thousands of connections in a single thread, which tends to work better than having thousands of threads, each for a single connection.
Avoiding threading is also beneficial for testing and debugging (and hence reliability in general). Since there is no pre-emptive context switching in a typical Twisted-based program, you generally don't need to worry about locking. Race conditions that depend on the order of different network events happening can easily be unit tested by simulating those network events (whereas simulating a context switch isn't a feature provided by most (any?) threading libraries).
Twisted is also really, really concerned with quality. So you'll rarely find regressions in a Twisted release, and most of the APIs just work, even if you aren't using them in the common way (because we try to test all the ways you might use them, not just the common way). This is particularly true for all of the code added to Twisted (or modified) in the last 3 or 4 years, since 100% line coverage has been a minimum testing requirement since then.
Another often overlooked strength of Twisted is its ten years of figuring out different platform quirks. There are lots of undocumented socket errors on different platforms and it's really hard to learn that they even exist, let alone handle them. Twisted has gradually covered more and more of these, and it's pretty good about it at this point. Younger projects don't have this experience, so they miss obscure failure modes that will probably only happen to users of any project you release, not to you.
All that say, what I find coolest about Twisted is that it's a pretty boring library that lets me ignore a lot of really boring problems and just focus on the interesting and fun things. :)

Well it's probably according to taste.
Twisted allows you to easily create event driven network servers/clients, without really worrying about everything that goes into accomplishing this. And thanks to the MIT License, Twisted can be used almost anywhere. But I haven't done any benchmarking so I have no idea how it scales, but I'm guessing quite good.
Another plus would be the Twisted Projects, with which you can quickly see how to implement most of the server/services that you would want to.
Twisted also has some great documentation, when I started with it a couple of weeks ago I was able to quickly get a working prototype.
Quite new to the python scene please correct me if i'm in the wrong.

How should I jump into the Flex-Python Boat

http://www.artima.com/weblogs/viewpost.jsp?thread=208528
Bruce Eckel talked about using Flex and Python together. Since then, we have had PyAMF and the likes.
It has been almost three years, but googling does not reveal much more than a bunch of articles/comments linking to that article above (or related ones). There is no buzz, no excitement. Not much on SO either.
I am thinking of attempting something using Flex/Python which would require me to be heavily invested in it. What I worry about is that the support system is very weak and activity is almost nonexistent.
I really want to do this. Can anyone direct me towards some useful resource?

An application written in Flex/Flash is server agnostic...and it should be easy to replace the server side language with another one. The client application will consume some web services exposed by the server(REST/SOAP), or it can use as an alternative remote method invocation. The last one is implemented for the most important languages, from what I know.
There are some exceptions..if you want to use messaging the professional solutions are offered mainly by the frameworks build on top of Java.
So if you do not rely heavy on messaging the heavily investment is going to be mainly of the client side, especially if you haven't worked before with the so called "fat" clients. But not on the integration side..it not so complicated.
Regarding useful Flex resources, my suggestion is to take a look at http://www.adobe.com/devnet/flex.html

How would an irc bot written in tcl stack up against a python/node.js clone?

I believe eggdrop is the most active/popular bot and it's written in tcl ( and according to wiki the core is C but I haven't confirmed that ).
I'm wondering if there would be any performance benefit of recoding it's functionality in node.js or Python, in addition to making it more accessible since Python and JS are arguably more popular languages and not many are familiar with tcl.
So, how would they stack up vs tcl in general, performance-wise?

As you suspected, eggdrop is not written in tcl, it is written in C, however it does use tcl as its scripting/extension language.
I would expect that in the case of an eggdrop, the performance difference between using tcl as a scripting language, and using Python, Lua, JS, or virtually anything else would be negligible, as eggdrops generally aren't performing high load tasks.
In the event it really was an issue, your question would need more specifics. Performance for what task under what conditions? Memory use? CPU efficiency? Latency? And the answer would probably be "measure and find out". Given the typical use of an eggdrop, it doesn't take particularly efficient code to respond to the occasional IRC trigger command once every few minutes or hours.
As a more general case, I'm sure you could find benchmark comparisons of specific algorithms or tasks performed by various scripting languages on particular operating systems or environments, at which point it wouldn't really have anything to do with IRC or eggdrop.

If you're not doing much other than waiting on a quiet channel for something to happen, performance is pretty much irrelevant. You could probably write that in BF (well, with network connectivity primitives added) and have it perform OK.
If you're running on lots of busy channels with many things being watched for, that's different. Tcl's very good at event-driven IO, which is ideal for this sort of situation. (Python can do that, but needs external libraries, as does Lua. I don't know JS enough to comment there.)
If you're needing to do significant non-IO-bound processing for some message responses, you're into needing threads. I know that both Tcl and Python support threads, but with utterly different threading models (Python has a shared-memory model which makes it easier to pass some types of task around, especially when the data is large, and Tcl has an apartment model which greatly reduces the amount of locking required in the implementation for a good performance boost in CPU-bound code).
How is that relevant for IRC bots? Well, it all depends on what you're doing in the bot.

What are my options for doing multithreaded/concurrent programming in Python?

I'm writing a simple site spider and I've decided to take this opportunity to learn something new in concurrent programming in Python. Instead of using threads and a queue, I decided to try something else, but I don't know what would suit me.
I have heard about Stackless, Celery, Twisted, Tornado, and other things. I don't want to have to set up a database and the whole other dependencies of Celery, but I would if it's a good fit for my purpose.
My question is: What is a good balance between suitability for my app and usefulness in general? I have taken a look at the tasklets in Stackless but I'm not sure that the urlopen() call won't block or that they will execute in parallel, I haven't seen that mentioned anywhere.
Can someone give me a few details on my options and what would be best to use?
Thanks.

Tornado is a web server, so it wouldn't help you much in writing a spider. Twisted is much more general (and, inevitably, complex), good for all kinds of networking tasks (and with good integration with the event loop of several GUI frameworks). Indeed, there used to be a twisted.web.spider (but it was removed years ago, since it was unmaintained -- so you'll have to roll your own on top of the facilities Twisted does provide).

I must say that Twisted gets my vote.
Performing event-drive tasks is fairly straightforward in Twisted. Integration with other important system components such as GTK+ and DBus is very easy.
The HTTP client support is basic for now but improving (>9.0.0): see related question.
The added bonus is that Twisted is available in the Ubuntu default repository ;-)

For a quick look at package sizes, see
ohloh.net/p/compare .
Of course source size is only a rough metric (what I'd really like is nr pages doc, nr pages examples,
dependencies), but it can help.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.