What's so cool about Twisted? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm increasingly hearing that Python's Twisted framework rocks and other frameworks pale in comparison.
Can anybody shed some light on this and possibly compare Twisted with other network programming frameworks.

There are a lot of different aspects of Twisted that you might find cool.
Twisted includes lots and lots of protocol implementations, meaning that more likely than not there will be an API you can use to talk to some remote system (either client or server in most cases) - be it HTTP, FTP, SMTP, POP3, IMAP4, DNS, IRC, MSN, OSCAR, XMPP/Jabber, telnet, SSH, SSL, NNTP, or one of the really obscure protocols like Finger, or ident, or one of the lower level protocol-building-protocols like DJB's netstrings, simple line-oriented protocols, or even one of Twisted's custom protocols like Perspective Broker (PB) or Asynchronous Messaging Protocol (AMP).
Another cool thing about Twisted is that on top of these low-level protocol implementations, you'll often find an abstraction that's somewhat easier to use. For example, when writing an HTTP server, Twisted Web provides a "Resource" abstraction which lets you construct URL hierarchies out of Python objects to define how requests will be responded to.
All of this is tied together with cooperating APIs, mainly due to the fact that none of this functionality is implemented by blocking on the network, so you don't need to start a thread for every operation you want to do. This contributes to the scalability that people often attribute to Twisted (although it is the kind of scalability that only involves a single computer, not the kind of scalability that lets your application grow to use a whole cluster of hosts) because Twisted can handle thousands of connections in a single thread, which tends to work better than having thousands of threads, each for a single connection.
Avoiding threading is also beneficial for testing and debugging (and hence reliability in general). Since there is no pre-emptive context switching in a typical Twisted-based program, you generally don't need to worry about locking. Race conditions that depend on the order of different network events happening can easily be unit tested by simulating those network events (whereas simulating a context switch isn't a feature provided by most (any?) threading libraries).
Twisted is also really, really concerned with quality. So you'll rarely find regressions in a Twisted release, and most of the APIs just work, even if you aren't using them in the common way (because we try to test all the ways you might use them, not just the common way). This is particularly true for all of the code added to Twisted (or modified) in the last 3 or 4 years, since 100% line coverage has been a minimum testing requirement since then.
Another often overlooked strength of Twisted is its ten years of figuring out different platform quirks. There are lots of undocumented socket errors on different platforms and it's really hard to learn that they even exist, let alone handle them. Twisted has gradually covered more and more of these, and it's pretty good about it at this point. Younger projects don't have this experience, so they miss obscure failure modes that will probably only happen to users of any project you release, not to you.
All that say, what I find coolest about Twisted is that it's a pretty boring library that lets me ignore a lot of really boring problems and just focus on the interesting and fun things. :)

Well it's probably according to taste.
Twisted allows you to easily create event driven network servers/clients, without really worrying about everything that goes into accomplishing this. And thanks to the MIT License, Twisted can be used almost anywhere. But I haven't done any benchmarking so I have no idea how it scales, but I'm guessing quite good.
Another plus would be the Twisted Projects, with which you can quickly see how to implement most of the server/services that you would want to.
Twisted also has some great documentation, when I started with it a couple of weeks ago I was able to quickly get a working prototype.
Quite new to the python scene please correct me if i'm in the wrong.

Related

Twisted, genvent, asyncoro - are they what I might need?

Learning Python and trying to do something ambitious (perhaps too much).
The application (console, that runs silently like a server), needs to talk to 2 serial ports, needs to deal with timers, needs to push information on Redis KV-store, write logs, and interact with bunch of other similar applications using unix IPC (or socket comm.)
The easier way (to my mind) to think of such an application is to work with threads and event queues. However due to what I understand as GIL enforced limitation with threading, it is not quite an option with Python (unless, I misunderstood things). The alternative way, what I understood - is to work with asynchronous I/O framework, green-threads, coroutines etc.
Are twisted, gevent and asyncoro really alternatives in Python for asynchronous event-driven programming that I intend to write ?
Since learning twisted seems to be such a big investment (in terms of time/effort), I was wondering if gevent and asyncoro could be easier and better alternative ? From the bit of superficial document reading done so far, asyncoro seems to be simplest, with very limited amount of new learning, and Twisted is other extreme, with gevent being somewhere in the middle -- but then I am not sure, if they are really comparable.
Here's an example of what the application would do if were multi-threaded:
Thread:1 - Monitor health of serial port, periodically i.e. with a timer. Say check every 2 minutes if last state was healthy. If last state was unhealthy then check every 30 seconds for first 5 mins, then every minute for next 10 mins... like in exponential backoff. Note that there are multiple such serial ports.
Thread:2 - Monitor state of application-level sessions that come-and-go from time to time, over the serial ports, and the communication that happens over it. Redis is (planned) to be used to write to distributed KV-store s.t. other instances of application (running on same or other servers), can coordinate certain other actions.
Thread:3 - Performs some other housekeeping tasks.
All of the threads need to do logging, all the threads use timers (& other events) to do certain things. Timers are used for periodic execution of some logic and as timeouts to guard certain actions (blocking or non-blocking).
My experience with Python is extremely limited, but I have experience writing similar programs in C/C++ and Java. Using Python for this, to learn.
You can use any of the libraries you've mentioned here to implement the application you've described. You can also use traditional threads. The GIL prevents you from achieving hardware-level parallelism in the execution of Python byte code operations (as distinct from, say, native code being invoked from your Python program). It does not prevent you from performing parallel I/O operations - which is what it sounds like your application is primarily concerned with.
There isn't enough detail in your question to provide a recommendation of one of these tools over another (and if there were enough detail, the question would probably be enormous and the effort to answer it correctly would probably discourage anyone on SO from doing so). It's typically safe to say that the threading approach is probably the worst, though (for a variety of reasons I won't even attempt to expain here; they're documented well enough on the internet at large).

Thread vs Event Loop - network programming (language agnostic) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am writing a simple daemon to receive data from N many mobile devices. The device will poll the server and send the data it needs as simple JSON. In generic terms the server will receive the data and then "do stuff" with it.
I know this topic has been beaten a bunch of times but I am having a hard time understanding the pros and cons.
Would threads or events (think Twisted in Python) work better for this situation as far as concurrency and scalability is concerned? The event model seems to make more sense but I wanted to poll you guys. Data comes in -> Process data -> wait for more data. What if the "do stuff" was something very computationally intensive? What if the "do stuff" was very IO intensive (such as inserting into a database). Would this block the event loop? What are the pros and drawbacks of each approach?
I can only answer in the context of Python, since that's where most of my experience is. The answer is actually probably a little different depending on the language you choose. Python, for example, is a lot better at parallelizing I/O intensive operations than CPU intensive operations.
Asynchronous programming libraries like twisted, tornado, gevent, etc. are really good at handling lots of I/O in parallel. If your workload involves many clients connecting, doing light CPU operations and/or lots of I/O operations (like db reads/writes), or if your clients are making long-lasting connections primarily used for I/O (think WebSockets), then an asynchronous library will work really well for you. Most of the asynchronous libraries for Python have asynchronous drivers for popular DBs, so you'll be able to interact with them without blocking your event loop.
If your server is going to be doing lots of CPU intensive work, you can still use asynchronous libraries, but have to understand that every time you're doing CPU work, the event loop will be blocked. No other clients are going to be able to anything at all. However, there are ways around this. You can use thread/process pools to farm the CPU work out, and just wait on the response asynchronously. But obviously that complicates your implementation a little bit.
With Python, using threads instead actually doesn't buy you all that much with CPU operations, because in most cases only one thread can run a time, so you're not really reaping the benefits of having a multi-core CPU (google "Python GIL" to learn more about this). Ignoring Python-specific issues with threads, threads will let you avoid the "blocked event loop" problem completely, and threaded code is usually easier to understand than asynchronous code, especially if you're not familiar with how asynchronous programming works already. But you also have to deal with the usual thread headaches (synchronizing shared state, etc.), and they don't scale as well as asynchronous I/O does with lots of clients (see http://en.wikipedia.org/wiki/C10k_problem)
Both approaches are used very successfully in production, so its really up to you to decide what fits your needs/preferences better.
I think your question is in the 'it depends' category.
Different languages have different strengths and weaknesses when it comes to threading/process/events (python having some special weaknesses in threading tied to the global interpreter lock)
Beyond that, operating systems ALSO have different strengths and weaknesses when you look at processes vs threads vs events. What is right on unix isn't going to be the same as windows.
With that said, the way that I sort out multifaceted IO projects is:
These projects are complex, no tool with simply make the complexity go away, therefor you have two choices on how you can deal:
Have the OS deal with as much complexity as possible, making life easier for the programers, but at the cost of machine efficiency
Have the programer take on as much complexity as is practical so that they can optimize the design and squeeze as much performance out machine as possible, at the cost of more complex code that requires significantly higher-end programers.
Option 1 is normally best accomplished by breaking apart the task into threads or processes with one blocking state-machine per thread/process
Option 2 is normally best accomplished by multiplexing all the tasks into one process and using the OS hook for an event system. (select/poll/epoll/kqueue/WaitForMultipleObjects/CoreFoundation/ libevent etc..)
In my experience project framework/internal-arch often come down to the skills of the programers at hand (and the budget the project has for hardware).
If you have programmers with a background in OS internals: Twisted will work great for python, Node.js will work great for Javascript, libevent/libev will work great for C or C++. You'll also end up with super efficient code that can scale easily, though you'll have a nightmare trying to hire more programmers
If you have newbie programers and you can dump money into lots of cloud services then breaking the project into many threads or processes will give you the best chance of getting something working, though scaling will eventually become a problem.
All-in-all I would say the sanest pattern for a project with multiple iterations is to prototype in simple blocking tools (flask) and then re-write into something harder/more-scalable (twisted), otherwise your falling in the classic Premature optimization is the root of all evil trap
The connection scheme is also important in the choice. How many concurrent connections do you expect ? How long will a client stay connected ?
If each connection is tied to a thread, many concurrent connections or very long lasting connections ( as with websockets ) will choke the system. For these scenarios an event loop based solution will be better.
When the connections are short and the heavy treatment comes in after the disconnection, both models weigh each other.

Efficient Python IPC [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm making an application in Python3, which will be divided in batch and gui parts.
Batch is responsible for processing logic and gui is responsible for displaying it.
Which inter-process communication (IPC) framework should I use with the following requirements:
The GUI can be run on other device than batch (GUI can be run on the same device, on smartphone, tablet etc, locally or over network).
The batch (Python3 IPc library) should work with no problem on Linux, Mac, Windows, ...
The IPC should support GUI written in different languages (Python, Javascript, ...)
The performance of IPC is important - it should be as "interactive" as possible, but without losing information.
Several GUI could be connected to the same batch.
additional: Will the choice be other if the GUI will be guaranteed to be written in Python also?
Edit:
I have found a lot of IPC libraries, like here: Efficient Python to Python IPC or ActiveMQ or RabbitMQ or ZeroMQ or.
The best looking options I have found so far are:
rabbitmq
zeromq
pyro
Are they appropriate slutions to this problem? If not why? And if something is better, please tell me why also.
The three you mentioned seem a good fit and will uphold your requirements. I think you should go on with what you feel most comfortable\familiar with.
From my personal experience, I do believe ZeroMQ is the best combination between efficiency, ease of use and inter-operability. I had an easy time integrating zmq 2.2 with Python 2.7, so that would be my personal favorite. However as I said I'm quite sure you can't go wrong with all 3 frameworks.
Half related: Requirements tend to change with time, you may decide to switch framework later on, therefore encapsulating the dependency on the framework would be a good design pattern to use. (e.g. having a single conduit module that interacts with the framework and have its API use your internal datastructures and domain language)
I've used the Redis engine for this. Extremely simple, and lightweight.
Server side does:
import redis
r = redis.Redis() # Init
r.subscribe(['mychannel']) # Subscribe to "channel"
for x in r.listen():
print "I got message",x
Client side does:
import redis
r = redis.Redis() # Init
r.publish('mychannel',mymessage)
"messages" are strings (of any size). If you need to pass complex data structures, I like to use json.loads and json.dumps to convert between python dicts/arrays and strings -
"pickle" is perhaps the better way to do this for python-to-python communication, though JSON means "the other side" can be written in anything.
Now there are a billion other things Redis is good for - and they all inherently are just as simple.
You are asking for a lot of things from the framework; network enabled, multi-platform, multi-language, high performance (which ideally should be further specified - what does it mean, bandwidth? latency? what is "good enough"; are we talking kB/s, MB/s, GB/s? 1 ms or 1000 ms round-trip?) Plus there are a lot of things not mentioned which can easily come into play, e.g. do you need authentication or encryption? Some frameworks give you such functionality, others rely on implementing that part of the puzzle yourself.
There probably exists no silver bullet product which is going to give you an ideal solution which optimizes all those requirements at the same time. As for the 'additional' component of your question - yes, if you restrict language requirements to python only, or further distinguish between key vs. nice-to-have requirements, there would be more solutions available.
One technology you might want to have a look at is Versile Python (full disclosure: I am one of the developers). It is multi-platform and supports python v2.6+/v3, and java SE6+. Regarding performance, it depends on what are your requirements. If you have any questions about the technology, just ask on the forum.
The solution is dbus
It is a mature solution and availiable for a lot of languages (C, Python, ..., just google for dbus + your favorite language), though not as fast as shared memory, but still fast enough for pretty much everything not requiring (hard) realtime properties.
I'll take a different tack here and say why not use the de facto RPC language of the Internet? I.e. HTTP REST APIs?
With Python Requests on the client side and Flask on the server side, you get these kinds of benefits:
Existing HTTP REST tools like Postman can access and test your server.
Those same tools can document your API.
If you also use JSON, then you get a lot of tooling that works with that too.
You get proven security practices and solutions (session based security and SSL).
It's a familiar pattern for a lot of different developers.

How should I jump into the Flex-Python Boat

http://www.artima.com/weblogs/viewpost.jsp?thread=208528
Bruce Eckel talked about using Flex and Python together. Since then, we have had PyAMF and the likes.
It has been almost three years, but googling does not reveal much more than a bunch of articles/comments linking to that article above (or related ones). There is no buzz, no excitement. Not much on SO either.
I am thinking of attempting something using Flex/Python which would require me to be heavily invested in it. What I worry about is that the support system is very weak and activity is almost nonexistent.
I really want to do this. Can anyone direct me towards some useful resource?
An application written in Flex/Flash is server agnostic...and it should be easy to replace the server side language with another one. The client application will consume some web services exposed by the server(REST/SOAP), or it can use as an alternative remote method invocation. The last one is implemented for the most important languages, from what I know.
There are some exceptions..if you want to use messaging the professional solutions are offered mainly by the frameworks build on top of Java.
So if you do not rely heavy on messaging the heavily investment is going to be mainly of the client side, especially if you haven't worked before with the so called "fat" clients. But not on the integration side..it not so complicated.
Regarding useful Flex resources, my suggestion is to take a look at http://www.adobe.com/devnet/flex.html

What are my options for doing multithreaded/concurrent programming in Python?

I'm writing a simple site spider and I've decided to take this opportunity to learn something new in concurrent programming in Python. Instead of using threads and a queue, I decided to try something else, but I don't know what would suit me.
I have heard about Stackless, Celery, Twisted, Tornado, and other things. I don't want to have to set up a database and the whole other dependencies of Celery, but I would if it's a good fit for my purpose.
My question is: What is a good balance between suitability for my app and usefulness in general? I have taken a look at the tasklets in Stackless but I'm not sure that the urlopen() call won't block or that they will execute in parallel, I haven't seen that mentioned anywhere.
Can someone give me a few details on my options and what would be best to use?
Thanks.
Tornado is a web server, so it wouldn't help you much in writing a spider. Twisted is much more general (and, inevitably, complex), good for all kinds of networking tasks (and with good integration with the event loop of several GUI frameworks). Indeed, there used to be a twisted.web.spider (but it was removed years ago, since it was unmaintained -- so you'll have to roll your own on top of the facilities Twisted does provide).
I must say that Twisted gets my vote.
Performing event-drive tasks is fairly straightforward in Twisted. Integration with other important system components such as GTK+ and DBus is very easy.
The HTTP client support is basic for now but improving (>9.0.0): see related question.
The added bonus is that Twisted is available in the Ubuntu default repository ;-)
For a quick look at package sizes, see
ohloh.net/p/compare .
Of course source size is only a rough metric (what I'd really like is nr pages doc, nr pages examples,
dependencies), but it can help.

Categories