I've got a relatively simple SSL server running in Twisted and I'd like to write some unit tests for it. I'm not really sure the best way to do this when using Python 3. All the documentation I've found describes using Twisted Trial which unfortunately is incomplete for Py3k.
What I'm thinking is doing something like this:
loading up my code and doing everything but reactor.run()
send the data I want my code to handle
run reactor.doIteration() (or maybe reactor.iterate() is better?)
check that my server did what it was supposed to
Is this a legitimate way of handling this sort of situation?
EDIT:
answer from glyph that this may be a bad idea (but it's not specifically talking about testing)
EDIT 2:
I guess the major issue is when you're trying to test components intertwined with Twisted and you're not sure how to pull it apart to properly test the individual components. Is there any reliable way to test this? Should .run() be called and then insert an event that's run a few seconds after you've completed your action to stop the reactor and test the result?
Are you writing low-level networking code? Does your application interact with BSD socket APIs? If not, why do you want to run the reactor at all in your tests? Unit tests should exercise a well-defined, narrow set of code. This should be your application code - not the implementation of the TLS transport in Twisted.
To specifically address the idea of calling reactor.iterate, either in your unit tests or elsewhere: yes, it is a bad idea. This is not how Twisted itself is tested. There is no reason to expect that because you write code that works when you call reactor.iterate it will work when you call reactor.run instead. reactor.iterate is a left-over from a mistaken idea about how event loops should be integrated with other systems. There may be an extreme edge case or two where the idea of reactor.iterate is correct and useful but in practice no one uses Twisted in such cases and no one who works on Twisted bears them in mind when making changes. So this is not a place you want your application to be. When things go wrong you'll be very lonely.
Related
I'm experimenting with running twisted/crossbar.io on QNX (target:powerpc-unknown-nto-qnx6.5.0), however it appears that QNX does not have siginterrupt() and SA_RESTART flag is not supported. As result signals.siginterrupt() does not exist in embedded python.
Is there any way to run/patch python/twisted on a system like this? Right now it dies when the handlers are installed because signals module does not have siginterrupt(). Even in the old 2.6 days when iternet/signals were built as c library, they relied on essentially implementing siginterrupt using SA_RESTART.
Is there any other alternative?
Did you try reactor.run(installSignalHandlers=False)? This limits the reactor's functionality a bit, but it may allow you to limp along.
Is there any way to run/patch python/twisted on a system like this?
The general answer is "port Twisted to your target platform". Twisted has interacts extensively with the platform it is running on. You might trick it into not dying with an AttributeError in one place with a simple patch but this doesn't mean that Twisted will actually behave the way it is intended to behave.
Do you have plans to complete a porting effort of Twisted to QNX? Or do you just have your fingers crossed that with signal issues out of the way everything else will Just Work? At minimum, you should be running the test suite to see where there may be problems (though passing tests also don't guarantee Twisted is actually working correctly, since those tests were all written with other platforms in mind).
A more specific answer is that you could grab an older version of the twisted.internet._signals module (try r35834; r35835 deleted a lot of old support code). The Python 3 porting effort removed some of the alternate (not as good but more portable) signal handling strategies from this module.
I'm working on a simple experiment in Python. I have a "master" process, in charge of all the others, and every single process has a connection via unix socket to the master process. I would like to be able for the master process to be able to monitor all of the sockets for a response - but there could theoretically be almost a hundred of them. How would threads impact the memory and performance of the application? What would be the best solution? Thanks a lot!
One hundred simultaneous threads might be pushing the reasonable limits of threading. If you find this is the cleanest way to organize your code, I'd say give it a try, but threading really doesn't scale very far.
What works better is to use a technique like select to wait for one of the sockets to be readable / writable / or has an error to report. This mechanism lets you go to sleep until something interesting happens, handle as many sockets have content to handle, and then go back to sleep again, all in a single thread of execution. Removing the multi-threading can often reduce chances for errors, and this style of programming should get you into the hundreds of connections no trouble. (If you want to go beyond about 100, I'd use the poll functionality instead of select -- constantly rebuilding the list of interesting file descriptors takes time that poll does not require.)
Something to consider is the Python Twisted Framework. They've gone to some length to provide a consistent way to hook callbacks onto events for this exact sort of programming. (If you're familiar with node.js, it's a bit like that, but Python.) I must admit a slight aversion to Twisted -- I never got very far in their documentation without being utterly baffled -- but a lot of people made it further in the docs than I did. You might find it a better fit than I have.
The easiest way to conduct comparative tests of threads versus processes for socket handling is to use the SocketServer in Python's standard library. You can easily switch approaches (while keeping everything else the same) by inheriting from either ThreadingMixIn or ForkingMixIn. Here is a simple example to get you started.
Another alternative is a select/poll approach using non-blocking sockets in a single process and a single thread.
If you're interested in software that is already fully developed and highly evolved, consider these high-performance Python based server packages:
The Twisted framework uses the async single process, single thread style.
The Tornado framework is similar (less evolved, less full featured, but easier to understand)
And Gunicorn which is a high-performance forking server.
This question already has answers here:
How can I sandbox Python in pure Python?
(7 answers)
Python, safe, sandbox [duplicate]
(9 answers)
Closed 9 years ago.
Let's say there is a server on the internet that one can send a piece of code to for evaluation. At some point server takes all code that has been submitted, and starts running and evaluating it. However, at some point it will definitely bump into "os.system('rm -rf *')" sent by some evil programmer. Apart from "rm -rf" you could expect people try using the server to send spam or dos someone, or fool around with "while True: pass" kind of things.
Is there a way to coop with such unfriendly/untrusted code? In particular I'm interested in a solution for python. However if you have info for any other language, please share.
If you are not specific to CPython implementation, you should consider looking at PyPy[wiki] for these purposes — this Python dialect allows transparent code sandboxing.
Otherwise, you can provide fake __builtin__ and __builtins__ in the corresponding globals/locals arguments to exec or eval.
Moreover, you can provide dictionary-like object instead of real dictionary and trace what untrusted code does with it's namespace.
Moreover, you can actually trace that code (issuing sys.settrace() inside restricted environment before any other code executed) so you can break execution if something will go bad.
If none of solutions is acceptable, use OS-level sandboxing like chroot, unionfs and standard multiprocess python module to spawn code worker in separate secured process.
You can check pysandbox which does just that, though the VM route is probably safer if you can afford it.
It's impossible to provide an absolute solution for this because the definition of 'bad' is pretty hard to nail down.
Is opening and writing to a file bad or good? What if that file is /dev/ram?
You can profile signatures of behavior, or you can try to block anything that might be bad, but you'll never win. Javascript is a pretty good example of this, people run arbitrary javascript code all the time on their computers -- it's supposed to be sandboxed but there's all sorts of security problems and edge conditions that crop up.
I'm not saying don't try, you'll learn a lot from the process.
Many companies have spent millions (Intel just spent billions on McAffee) trying to understand how to detect 'bad code' -- and every day machines running McAffe anti-virus get infected with viruses. Python code isn't any less dangerous than C. You can run system calls, bind to C libraries, etc.
I would seriously consider virtualizing the environment to run this stuff, so that exploits in whatever mechanism you implement can be firewalled one more time by the configuration of the virtual machine.
Number of users and what kind of code you expect to test/run would have considerable influence on choices btw. If they aren't expected to link to files or databases, or run computationally intensive tasks, and you have very low pressure, you could be almost fine by just preventing file access entirely and imposing a time limit on the process before it gets killed and the submission flagged as too expensive or malicious.
If the code you're supposed to test might be any arbitrary Django extension or page, then you're in for a lot of work probably.
You can try some generic sanbox such as Sydbox or Gentoo's sandbox. They are not Python-specific.
Both can be configured to restrict read/write to some directories. Sydbox can even sandbox sockets.
I think a fix like this is going to be really hard and it reminds me of a lecture I attended about the benefits of programming in a virtual environment.
If you're doing it virtually its cool if they bugger it. It wont solve a while True: pass but rm -rf / won't matter.
Unless I'm mistaken (and I very well might be), this is much of the reason behind the way Google changed Python for the App Engine. You run Python code on their server, but they've removed the ability to write to files. All data is saved in the "nosql" database.
It's not a direct answer to your question, but an example of how this problem has been dealt with in some circumstances.
I wrote an application server (using python & twisted) and I want to start writing some tests. But I do not want to use Twisted's Trial due to time constraints and not having time to play with it now. So here is what I have in mind: write a small test client that connects to the app server and makes the necessary requests (the communication protocol is some in-house XML), store in a static way the received XML and then write some tests on those static data using unitest.
My question is: Is this a correct approach and if yes, what kind of tests are covered with this approach?
Also, using this method has several disadvantages, like: not being able to access the database layer in order to build/rebuild the schema, when will the test client going to connect to the server: per each unit test or before running the test suite?
You should use Trial. It really isn't very hard. Trial's documentation could stand to be improved, but if you know how to use the standard library unit test, the only difference is that instead of writing
import unittest
you should write
from twisted.trial import unittest
... and then you can return Deferreds from your test_ methods. Pretty much everything else is the same.
The one other difference is that instead of building a giant test object at the bottom of your module and then running
python your/test_module.py
you can simply define your test cases and then run
trial your.test_module
If you don't care about reactor integration at all, in fact, you can just run trial on a set of existing Python unit tests. Trial supports the standard library 'unittest' module.
"My question is: Is this a correct approach?"
It's what you chose. You made a lot of excuses, so I'm assuming that your pretty well fixed on this course. It's not the best, but you've already listed all your reasons for doing it (and then asked follow-up questions on this specific course of action). "correct" doesn't enter into it anymore, so there's no answer to this question.
"what kind of tests are covered with this approach?"
They call it "black-box" testing. The application server is a black box that has a few inputs and outputs, and you can't test any of it's internals. It's considered one acceptable form of testing because it tests the bottom-line external interfaces for acceptable behavior.
If you have problems, it turns out to be useless for doing diagnostic work. You'll find that you need to also to white-box testing on the internal structures.
"not being able to access the database layer in order to build/rebuild the schema,"
Why not? This is Python. Write a separate tool that imports that layer and does database builds.
"when will the test client going to connect to the server: per each unit test or before running the test suite?"
Depends on the intent of the test. Depends on your use cases. What happens in the "real world" with your actual intended clients?
You'll want to test client-like behavior, making connections the way clients make connections.
Also, you'll want to test abnormal behavior, like clients dropping connections or doing things out of order, or unconnected.
I think you chose the wrong direction. It's true that the Trial docs is very light. But Trial is base on unittest and only add some stuff to deal with the reactor loop and the asynchronous calls (it's not easy to write tests that deal with deffers). All your tests that are not including deffer/asynchronous call will be exactly like normal unittest.
The Trial command is a test runner (a bit like nose), so you don't have to write test suites for your tests. You will save time with it. On top of that, the Trial command can output profiling and coverage information. Just do Trial -h for more info.
But in any way the first thing you should ask yourself is which kind of tests do you need the most, unit tests, integration tests or system tests (black-box). It's possible to do all with Trial but it's not necessary allways the best fit.
haven't used twisted before, and the twisted/trial documentation isn't stellar from what I just saw, but it'll likely take you 2-3 days to implement correctly the test system you describe above. Now, like I said I have no idea about Trial, but I GUESS you could probably get it working in 1-2 days, since you already have a Twisted application. Now if Trial gives you more coverage in less time, I'd go with Trial.
But remember this is just an answer from a very cursory look at the docs
I'm building a program that has a class used locally, but I want the same class to be used the same way over the network. This means I need to be able to make synchronous calls to any of its public methods. The class reads and writes files, so I think XML-RPC is too much overhead. I created a basic rpc client/server using the examples from twisted, but I'm having trouble with the client.
c = ClientCreator(reactor, Greeter)
c.connectTCP(self.host, self.port).addCallback(request)
reactor.run()
This works for a single call, when the data is received I'm calling reactor.stop(), but if I make any more calls the reactor won't restart. Is there something else I should be using for this? maybe a different twisted module or another framework?
(I'm not including the details of how the protocol works, because the main point is that I only get one call out of this.)
Addendum & Clarification:
I shared a google doc with notes on what I'm doing. http://docs.google.com/Doc?id=ddv9rsfd_37ftshgpgz
I have a version written that uses fuse and can combine multiple local folders into the fuse mount point. The file access is already handled within a class, so I want to have servers that give me network access to the same class. After continuing to search, I suspect pyro (http://pyro.sourceforge.net/) might be what I'm really looking for (simply based on reading their home page right now) but I'm open to any suggestions.
I could achieve similar results by using an nfs mount and combining it with my local folder, but I want all of the peers to have access to the same combined filesystem, so that would require every computer to bee an nfs server with a number of nfs mounts equal to the number of computers in the network.
Conclusion:
I have decided to use rpyc as it gave me exactly what I was looking for. A server that keeps an instance of a class that I can manipulate as if it was local. If anyone is interested I put my project up on Launchpad (http://launchpad.net/dstorage).
If you're even considering Pyro, check out RPyC first, and re-consider XML-RPC.
Regarding Twisted: try leaving the reactor up instead of stopping it, and just ClientCreator(...).connectTCP(...) each time.
If you self.transport.loseConnection() in your Protocol you won't be leaving open connections.
For a synchronous client, Twisted probably isn't the right option. Instead, you might want to use the socket module directly.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((self.host, self.port))
s.send(output)
data = s.recv(size)
s.close()
The recv() call might need to be repeated until you get an empty string, but this shows the basics.
Alternatively, you can rearrange your entire program to support asynchronous calls...
Why do you feel that it needs to be synchronous?
If you want to ensure that only one of these is happening at a time, invoke all of the calls through a DeferredSemaphore so you can rate limit the actual invocations (to any arbitrary value).
If you want to be able to run multiple streams of these at different times, but don't care about concurrency limits, then you should at least separate reactor startup and teardown from the invocations (the reactor should run throughout the entire lifetime of the process).
If you just can't figure out how to express your application's logic in a reactor pattern, you can use deferToThread and write a chunk of purely synchronous code -- although I would guess this would not be necessary.
If you are using Twisted you should probably know that:
You will not be making synchronous calls to any network service
The reactor can only ever be run once, so do not stop it (by calling reactor.stop()) until your application is ready to exit.
I hope this answers your question. I personally believe that Twisted is exactly the correct solution for your use case, but that you need to work around your synchronicity issue.
Addendum & Clarification:
Part of what I don't understand is
that when I call reactor.run() it
seems to go into a loop that just
watches for network activity. How do I
continue running the rest of my
program while it uses the network? if
I can get past that, then I can
probably work through the
synchronicity issue.
That is exactly what reactor.run() does. It runs a main loop which is an event reactor. It will not only wait for entwork events, but anything else you have scheduled to happen. With Twisted you will need to structure the rest of your application in a way to deal with its asynchronous nature. Perhaps if we knew what kind of application it is, we could advise.