I'm experimenting with running twisted/crossbar.io on QNX (target:powerpc-unknown-nto-qnx6.5.0), however it appears that QNX does not have siginterrupt() and SA_RESTART flag is not supported. As result signals.siginterrupt() does not exist in embedded python.
Is there any way to run/patch python/twisted on a system like this? Right now it dies when the handlers are installed because signals module does not have siginterrupt(). Even in the old 2.6 days when iternet/signals were built as c library, they relied on essentially implementing siginterrupt using SA_RESTART.
Is there any other alternative?
Did you try reactor.run(installSignalHandlers=False)? This limits the reactor's functionality a bit, but it may allow you to limp along.
Is there any way to run/patch python/twisted on a system like this?
The general answer is "port Twisted to your target platform". Twisted has interacts extensively with the platform it is running on. You might trick it into not dying with an AttributeError in one place with a simple patch but this doesn't mean that Twisted will actually behave the way it is intended to behave.
Do you have plans to complete a porting effort of Twisted to QNX? Or do you just have your fingers crossed that with signal issues out of the way everything else will Just Work? At minimum, you should be running the test suite to see where there may be problems (though passing tests also don't guarantee Twisted is actually working correctly, since those tests were all written with other platforms in mind).
A more specific answer is that you could grab an older version of the twisted.internet._signals module (try r35834; r35835 deleted a lot of old support code). The Python 3 porting effort removed some of the alternate (not as good but more portable) signal handling strategies from this module.
Related
I've got a relatively simple SSL server running in Twisted and I'd like to write some unit tests for it. I'm not really sure the best way to do this when using Python 3. All the documentation I've found describes using Twisted Trial which unfortunately is incomplete for Py3k.
What I'm thinking is doing something like this:
loading up my code and doing everything but reactor.run()
send the data I want my code to handle
run reactor.doIteration() (or maybe reactor.iterate() is better?)
check that my server did what it was supposed to
Is this a legitimate way of handling this sort of situation?
EDIT:
answer from glyph that this may be a bad idea (but it's not specifically talking about testing)
EDIT 2:
I guess the major issue is when you're trying to test components intertwined with Twisted and you're not sure how to pull it apart to properly test the individual components. Is there any reliable way to test this? Should .run() be called and then insert an event that's run a few seconds after you've completed your action to stop the reactor and test the result?
Are you writing low-level networking code? Does your application interact with BSD socket APIs? If not, why do you want to run the reactor at all in your tests? Unit tests should exercise a well-defined, narrow set of code. This should be your application code - not the implementation of the TLS transport in Twisted.
To specifically address the idea of calling reactor.iterate, either in your unit tests or elsewhere: yes, it is a bad idea. This is not how Twisted itself is tested. There is no reason to expect that because you write code that works when you call reactor.iterate it will work when you call reactor.run instead. reactor.iterate is a left-over from a mistaken idea about how event loops should be integrated with other systems. There may be an extreme edge case or two where the idea of reactor.iterate is correct and useful but in practice no one uses Twisted in such cases and no one who works on Twisted bears them in mind when making changes. So this is not a place you want your application to be. When things go wrong you'll be very lonely.
I have a python library that performs asynchronous network via multicast which may garner replies from other services. It hides the dirty work by returning a Future which will capture a reply. I am integrating this library into an existing gevent application. The call pattern is as simple as:
future = service.broadcast()
# next call blocks the current thread
reply = future.result(some_timeout)
Under the hood, concurrent.futures.Future.result() uses threading.Condition.wait().
With a monkey-patched threading module, this seems fine and safe, and non-blocking with greenlets.
Is there any reason to be worried here or when mixing gevent and concurrent.futures?
Well, as far as I can tell, futures isn't documented to work on top of threading.Condition, and gevent isn't documented to be able to patch futures safely. So, in theory, someone could write a Python implementation that would break gevent.
But in practice? It's hard to imagine what such an implementation would look like. You obviously need some kind of sync objects to make a Future work. Sure, you could use an Event, Lock, and Rlock instead of a Condition, but that won't cause a problem for gevent. The only way an implementation could plausibly break things would be to go directly to the pthreads/Win32/Java/.NET/whatever sync objects instead of using the wrappers in threading.
How would you deal with that if it happened? Well, futures is implemented in pure Python, and it's pretty simple Python, and there's a fully functional backport that works with 2.5+/3.2+. So, you'd just have to grab that backport and swap out concurrent.futures for futures.
So, if you're doing something wacky like deploying a server that's going to run for 5 years unattended and may have its Python repeatedly upgraded underneath it, maybe I'd install the backport now and use that instead.
Otherwise, I'd just document the assumption (and the workaround in case it's ever broken) in the appropriate place, and then just use the stdlib module.
This question already has answers here:
How can I sandbox Python in pure Python?
(7 answers)
Python, safe, sandbox [duplicate]
(9 answers)
Closed 9 years ago.
Let's say there is a server on the internet that one can send a piece of code to for evaluation. At some point server takes all code that has been submitted, and starts running and evaluating it. However, at some point it will definitely bump into "os.system('rm -rf *')" sent by some evil programmer. Apart from "rm -rf" you could expect people try using the server to send spam or dos someone, or fool around with "while True: pass" kind of things.
Is there a way to coop with such unfriendly/untrusted code? In particular I'm interested in a solution for python. However if you have info for any other language, please share.
If you are not specific to CPython implementation, you should consider looking at PyPy[wiki] for these purposes — this Python dialect allows transparent code sandboxing.
Otherwise, you can provide fake __builtin__ and __builtins__ in the corresponding globals/locals arguments to exec or eval.
Moreover, you can provide dictionary-like object instead of real dictionary and trace what untrusted code does with it's namespace.
Moreover, you can actually trace that code (issuing sys.settrace() inside restricted environment before any other code executed) so you can break execution if something will go bad.
If none of solutions is acceptable, use OS-level sandboxing like chroot, unionfs and standard multiprocess python module to spawn code worker in separate secured process.
You can check pysandbox which does just that, though the VM route is probably safer if you can afford it.
It's impossible to provide an absolute solution for this because the definition of 'bad' is pretty hard to nail down.
Is opening and writing to a file bad or good? What if that file is /dev/ram?
You can profile signatures of behavior, or you can try to block anything that might be bad, but you'll never win. Javascript is a pretty good example of this, people run arbitrary javascript code all the time on their computers -- it's supposed to be sandboxed but there's all sorts of security problems and edge conditions that crop up.
I'm not saying don't try, you'll learn a lot from the process.
Many companies have spent millions (Intel just spent billions on McAffee) trying to understand how to detect 'bad code' -- and every day machines running McAffe anti-virus get infected with viruses. Python code isn't any less dangerous than C. You can run system calls, bind to C libraries, etc.
I would seriously consider virtualizing the environment to run this stuff, so that exploits in whatever mechanism you implement can be firewalled one more time by the configuration of the virtual machine.
Number of users and what kind of code you expect to test/run would have considerable influence on choices btw. If they aren't expected to link to files or databases, or run computationally intensive tasks, and you have very low pressure, you could be almost fine by just preventing file access entirely and imposing a time limit on the process before it gets killed and the submission flagged as too expensive or malicious.
If the code you're supposed to test might be any arbitrary Django extension or page, then you're in for a lot of work probably.
You can try some generic sanbox such as Sydbox or Gentoo's sandbox. They are not Python-specific.
Both can be configured to restrict read/write to some directories. Sydbox can even sandbox sockets.
I think a fix like this is going to be really hard and it reminds me of a lecture I attended about the benefits of programming in a virtual environment.
If you're doing it virtually its cool if they bugger it. It wont solve a while True: pass but rm -rf / won't matter.
Unless I'm mistaken (and I very well might be), this is much of the reason behind the way Google changed Python for the App Engine. You run Python code on their server, but they've removed the ability to write to files. All data is saved in the "nosql" database.
It's not a direct answer to your question, but an example of how this problem has been dealt with in some circumstances.
We're using Twisted extensively for apps requiring a great deal of asynchronous io. There are some cases where stuff is cpu bound instead and for that we spawn a pool of processes to do the work and have a system for managing these across multiple servers as well - all done in Twisted. Works great. The problem is that it's hard to bring new team members up to speed. Writing asynchronous code in Twisted requires a near vertical learning curve. It's as if humans just don't think that way naturally.
We're considering a mixed approach perhaps. Maybe keep the xmlrpc server part and process management in Twisted and implement the other stuff in code that at least looks synchronous to some extent while not being as such. Then again I like explicit over implicit so I have to think about this a bit more. Anyway onto greenlets - how well does that stuff work? So there's Stackless and as you can see from my Gallentean avatar I'm well aware of the tremendous success in it's use for CCP's flagship EVE Online game first hand. What about Eventlet or gevent? Well for now only Eventlet works with Twisted. However gevent claims to be faster as it's not a pure python implementation but rather relies upon libevent instead. It also claims to have fewer idiosyncrasies and defects. gevent It's maintained by 1 guy as far as I can tell. This makes me somewhat leery but all great projects start this way so... Then there's PyPy - I haven't even finished reading about that one yet - just saw it in this thread: Drawbacks of Stackless.
So confusing - I'm wondering what the heck to do - sounds like Eventlet is probably the best bet but is it really stable enough? Anyone out there have any experience with it? Should we go with Stackless instead as it's been around and is proven technology - just like Twisted is as well - and they do work together nicely. But still I hate having to have a separate version of Python to do this. what to do....
This somewhat obnoxious blog entry hit the nail on the head for me though: Asynchronous IO for Grownups I don't get the Twisted is being like Java remark as to me Java is typically where you are in the threading mindset but whatever. Nevertheless if that monkey patch thing really works just like that then wow. Just wow!
You might want to check out:
Comparing gevent to eventlet
Reports from users who moved from twisted or eventlet to gevent
Eventlet and gevent are not really comparable to Stackless, because Stackless ships with a standard library that is not aware of tasklets. There are implementations of socket for Stackless but there isn't anything as comprehensive as gevent.monkey. CCP does not use bare bones Stackless, it has something called Stackless I/O which AFAIK is windows-only and was never open sourced (?).
Both eventlet and gevent could be made to run on Stackless rather than on greenlet. At some point we even tried to do this as a GSoC project but did not find a student.
Answering part of your question - if you look at http://speed.pypy.org you'll see that using twisted on top of PyPy may give you some speedups. This depends of course on your workload, but it's probably worth checking out.
Cheers,
fijal
I've built a little real time web app on top of eventlet and repoze.bfg (I gave up on django quite a while ago). I've found eventlet and monkey patching to be just as easy as Ted says.
Gevent isn't pure Python, and it strictly depends on CPython.
From web frameworks you mentioned Eventlet (OpenStack) and Tornado (FriendsFeed, Quora) has the biggest deploy.
I was recently reading this document which lists a number of strategies that could be employed to implement a socket server. Namely, they are:
Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
Serve many clients with each thread, and use nonblocking I/O and readiness change notification
Serve many clients with each server thread, and use asynchronous I/O
serve one client with each server thread, and use blocking I/O
Build the server code into the kernel
Now, I would appreciate a hint on which should be used in CPython, which we know has some good points, and some bad points. I am mostly interested in performance under high concurrency, and yes a number of the current implementations are too slow.
So if I may start with the easy one, "5" is out, as I am not going to be hacking anything into the kernel.
"4" Also looks like it must be out because of the GIL. Of course, you could use multiprocessing in place of threads here, and that does give a significant boost. Blocking IO also has the advantage of being easier to understand.
And here my knowledge wanes a bit:
"1" is traditional select or poll which could be trivially combined with multiprocessing.
"2" is the readiness-change notification, used by the newer epoll and kqueue
"3" I am not sure there are any kernel implementations for this that have Python wrappers.
So, in Python we have a bag of great tools like Twisted. Perhaps they are a better approach, though I have benchmarked Twisted and found it too slow on a multiple processor machine. Perhaps having 4 twisteds with a load balancer might do it, I don't know. Any advice would be appreciated.
asyncore is basically "1" - It uses select internally, and you just have one thread handling all requests. According to the docs it can also use poll. (EDIT: Removed Twisted reference, I thought it used asyncore, but I was wrong).
"2" might be implemented with python-epoll (Just googled it - never seen it before).
EDIT: (from the comments) In python 2.6 the select module has epoll, kqueue and kevent build-in (on supported platforms). So you don't need any external libraries to do edge-triggered serving.
Don't rule out "4", as the GIL will be dropped when a thread is actually doing or waiting for IO-operations (most of the time probably). It doesn't make sense if you've got huge numbers of connections of course. If you've got lots of processing to do, then python may not make sense with any of these schemes.
For flexibility maybe look at Twisted?
In practice your problem boils down to how much processing you are going to do for requests. If you've got a lot of processing, and need to take advantage of multi-core parallel operation, then you'll probably need multiple processes. On the other hand if you just need to listen on lots of connections, then select or epoll, with a small number of threads should work.
How about "fork"? (I assume that is what the ForkingMixIn does) If the requests are handled in a "shared nothing" (other than DB or file system) architecture, fork() starts pretty quickly on most *nixes, and you don't have to worry about all the silly bugs and complications from threading.
Threads are a design illness forced on us by OSes with too-heavy-weight processes, IMHO. Cloning a page table with copy-on-write attributes seems a small price, especially if you are running an interpreter anyway.
Sorry I can't be more specific, but I'm more of a Perl-transitioning-to-Ruby programmer (when I'm not slaving over masses of Java at work)
Update: I finally did some timings on thread vs fork in my "spare time". Check it out:
http://roboprogs.com/devel/2009.04.html
Expanded:
http://roboprogs.com/devel/2009.12.html
One sollution is gevent. Gevent maries a libevent based event polling with lightweight cooperative task switching implemented by greenlet.
What you get is all the performance and scalability of an event system with the elegance and straightforward model of blocking IO programing.
(I don't know what the SO convention about answering to realy old questions is, but decided I'd still add my 2 cents)
Can I suggest additional links?
cogen is a crossplatform library for network oriented, coroutine based programming using the enhanced generators from python 2.5. On the main page of cogen project there're links to several projects with similar purpose.
I like Douglas' answer, but as an aside...
You could use a centralized dispatch thread/process that listens for readiness notifications using select and delegates to a pool of worker threads/processes to help accomplish your parallelism goals.
As Douglas mentioned, however, the GIL won't be held during most lengthy I/O operations (since no Python-API things are happening), so if it's response latency you're concerned about you can try moving the critical portions of your code to CPython API.
http://docs.python.org/library/socketserver.html#asynchronous-mixins
As for multi-processor (multi-core) machines. With CPython due to GIL you'll need at least one process per core, to scale. As you say that you need CPython, you might try to benchmark that with ForkingMixIn. With Linux 2.6 might give some interesting results.
Other way is to use Stackless Python. That's how EVE solved it. But I understand that it's not always possible.