How to sniff a network interface with Twisted? - python

I need to receive raw packets from a network interface within Twisted code. The packets will not have the correct IP or MAC address, nor valid headers, so I need the raw thing.
I have tried looking into twisted.pair, but I was not able to figure out how to use it to get at the raw interface.
Normally, I would use scapy.all.sniff. However, that is blocking, so I can't just use it with Twisted. (I also cannot use scapy.all.sniff with a timeout and busy-loop, because I don't want to lose packets.)
A possible solution would be to run scapy.all.sniff in a thread and somehow call back into Twisted when I get a packet. This seems a bit inelegant (and also, I don't know how to do it because I am a Twisted beginner), but I might settle for that if I don't find anything better.

You could run a distributed system and pass the data through a central queuing system. Take the Unix philosophy and create a single application that does a few tasks and does them well. Create one application that sniffs the packets (you can use scapy here since it won't really matter if you block anything) then sends them to a queue (RabitMQ, Redis, SQS, etc) and have another application process the packet from the queue. This method should give you the least amount of headache.
If you need to run everything in a single application, then threads/multiprocessing is the only option. But there are some design patterns you'll want to follow. You can also break up the following code into separate functions and use a dedicated queuing system.
from threading import Thread
from time import sleep
from twisted.internet import defer, reactor
class Sniffer(Thread):
def __init__(self, _reactor, shared_queue):
super().__init__()
self.reactor = _reactor
self.shared_queue = shared_queue
def run(self):
"""
Sniffer logic here
"""
while True:
self.reactor.callFromThread(self.shared_queue.put, 'hello world')
sleep(5)
#defer.inlineCallbacks
def consume_from_queue(_id, _reactor, shared_queue):
item = yield shared_queue.get()
print(str(_id), item)
_reactor.callLater(0, consume_from_queue, _id, _reactor, shared_queue)
def main():
shared_queue = defer.DeferredQueue()
sniffer = Sniffer(reactor, shared_queue)
sniffer.daemon = True
sniffer.start()
workers = 4
for i in range(workers):
consume_from_queue(i+1, reactor, shared_queue)
reactor.run()
main()
The Sniffer class starts outside of Twisted's control. Notice the sniffer.daemon = True, this is so that the thread will stop when the main thread has stopped. If it were set to False (default) then the application will exit only if all the threads have come to an end. Depending on the task at hand this may or may not always be possible. If you can take breaks from sniffing to check a thread event, then you might be able to stop the thread in a safer way.
self.reactor.callFromThread(self.shared_queue.put, 'hello world') is necessary so that the item being put into the queue happens in the main reactor thread as opposed to the thread the Sniffer executes. The main benefit of this would be that there would be some sort of synchronization of the messages coming from the threads (assuming you plan to scale to sniffing multiple interfaces). Also, I wasn't sure of DeferredQueue objects are thread safe :) I treated them like they were not.
Since Twisted isn't managing the threads in this case, it's vital that the developer does. Notice the worker loop and consume_from_queue(i+1, reactor, shared_queue). This loop ensures only the desired number of workers are handling tasks. Inside the consume_from_queue() function, shared_queue.get() will wait (non-blocking) until an item is put into the queue, prints the item, then schedule another consume_from_queue().

Related

Workaround for exiting threads in Python

I'm trying to write a server program and I have a thread for listening for new clients:
class ClientFinder(Thread):
def __init__(self, port):
Thread.__init__(self)
self._continue = True
self._port = port
# try to create socket
def run(self):
# listen for new clients
while self._continue:
# add new clients
def stop(self):
# stop client
self._continue = False
client_finder = ClientFinder(8000)
client_finder.start()
client_finder.stop()
client_finder.join()
I can't join client_finder because it never ends. Calling stop() lets the thread stop after the next client is accepted, so the program just hangs forever.
1) Is it okay for my program to just end even if I haven't joined all my threads (such as by removing the join)? Or is this lazy/bad practice?
2) If it is a problem, what's the solution/best practice to avoid this? From what I've found so far, there's no way to force a thread to stop.
Whether waiting for the current clients to finish is a problem is really your choice. It may be a good idea, or you may prefer to kill connections.
Waiting for a new client is probably a worse thing, since it may never happen. An easy solution would be to have some reasonable timeout for the listening - let's say if nobody connects in 5s, you go back to the loop to check the flag. This is short enough for a typical shutdown solution, but long enough that rechecking shouldn't affect your CPU usage.
If you don't want to wait for a short timeout, you can add a pipe/socket between the thread doing shutdown and your ClientFinder and send a notification to shutdown. Instead of only waiting for a new client, you'd need to wait on both fds (I'm assuming ClientFinder uses sockets) and check which of them got a message.

Design of asynchronous request and blocking processing using Tornado

I'm trying to implement a Python app that uses async functions to receive and emit messages using NATS, using a client based on Tornado. Once a message is received, a blocking function must be called, that I'm trying to implement on a separate thread, to allow the reception and publication of messages to put messages in a Tornado queue for later processing of the blocking function.
I'm very new to Tornado (and to python multithreading), but after reading several times the Tornado documentation and other sources, I've been able to put up a working version of the code, that looks like this:
import tornado.gen
import tornado.ioloop
from tornado.queues import Queue
from concurrent.futures import ThreadPoolExecutor
from nats.io.client import Client as NATS
messageQueue = Queue()
nc = NATS()
#tornado.gen.coroutine
def consumer():
def processMessage(currentMessage):
# process the message ...
while True:
currentMessage = yield messageQueue.get()
try:
# execute the call in a separate thread to prevent blocking the queue
EXECUTOR.submit(processMessage, currentMessage)
finally:
messageQueue.task_done()
#tornado.gen.coroutine
def producer():
#tornado.gen.coroutine
def enqueueMessage(currentMessage):
yield messageQueue.put(currentMessage)
yield nc.subscribe("new_event", "", enqueueMessage)
#tornado.gen.coroutine
def main():
tornado.ioloop.IOLoop.current().spawn_callback(consumer)
yield producer()
if __name__ == '__main__':
main()
tornado.ioloop.IOLoop.current().start()
My questions are:
1) Is this the correct way of using Tornado to call a blocking function?
2) What's the best practice for implementing a consumer/producer scheme that is always listening? I'm afraid my while True: statement is actually blocking the processor...
3) How can I inspect the Queue to make sure a burst of calls is being enqueued? I've tried using Queue().qSize(), but it always returns zero, which makes me wonder if the enqueuing is done correctly or not.
General rule (credits to NYKevin) is:
multiprocessing for CPU- and GPU-bound computations.
Event-driven stuff for non-blocking I/O (which should be preferred over blocking I/O where possible, since it scales much more effectively).
Threads for blocking I/O (you can also use multiprocessing, but the per-process overhead probably isn't worth it).
ThreadPoolExecutor for IO, ProcessPoolExecutor for CPU. Both have internal queue, both scale to at most specified max_workers. More info about concurrent executors in docs.
So answer are:
Reimplementing pool is an overhead. Thread or Process depends on what you plan to do.
while True is not blocking if you have e.g. some yielded async calls (even yield gen.sleep(0.01)), it gives back control to ioloop
qsize() is the right to call, but since I have not run/debug this and I would take a different approach (existing pool), it is hard to find a problem here.

Non-blocking, non-concurrent tasks in Python

I am working on an implementation of a very small library in Python that has to be non-blocking.
On some production code, at some point, a call to this library will be done and it needs to do its own work, in its most simple form it would be a callable that needs to pass some information to a service.
This "passing information to a service" is a non-intensive task, probably sending some data to an HTTP service or something similar. It also doesn't need to be concurrent or to share information, however it does need to terminate at some point, possibly with a timeout.
I have used the threading module before and it seems the most appropriate thing to use, but the application where this library will be used is so big that I am worried of hitting the threading limit.
On local testing I was able to hit that limit at around ~2500 threads spawned.
There is a good possibility (given the size of the application) that I can hit that limit easily. It also makes me weary of using a Queue given the memory implications of placing tasks at a high rate in it.
I have also looked at gevent but I couldn't see an example of being able to spawn something that would do some work and terminate without joining. The examples I went through where calling .join() on a spawned Greenlet or on an array of greenlets.
I don't need to know the result of the work being done! It just needs to fire off and try to talk to the HTTP service and die with a sensible timeout if it didn't.
Have I misinterpreted the guides/tutorials for gevent ? Is there any other possibility to spawn a callable in fully non-blocking fashion that can't hit a ~2500 limit?
This is a simple example in Threading that does work as I would expect:
from threading import Thread
class Synchronizer(Thread):
def __init__(self, number):
self.number = number
Thread.__init__(self)
def run(self):
# Simulating some work
import time
time.sleep(5)
print self.number
for i in range(4000): # totally doesn't get past 2,500
sync = Synchronizer(i)
sync.setDaemon(True)
sync.start()
print "spawned a thread, number %s" % i
And this is what I've tried with gevent, where it obviously blocks at the end to
see what the workers did:
def task(pid):
"""
Some non-deterministic task
"""
gevent.sleep(1)
print('Task', pid, 'done')
for i in range(100):
gevent.spawn(task, i)
EDIT:
My problem stemmed out from my lack of familiarity with gevent. While the Thread code was indeed spawning threads, it also prevented the script from terminating while it did some work.
gevent doesn't really do that in the code above, unless you add a .join(). All I had to do to see the gevent code do some work with the spawned greenlets was to make it a long running process. This definitely fixes my problem as the code that needs to spawn the greenlets is done within a framework that is a long running process in itself.
Nothing requires you to call join in gevent, if you're expecting your main thread to last longer than any of your workers.
The only reason for the join call is to make sure the main thread lasts at least as long as all of the workers (so that the program doesn't terminate early).
Why not spawn a subprocess with a connected pipe or similar and then, instead of a callable, just drop your data on the pipe and let the subprocess handle it completely out of band.
As explained in Understanding Asynchronous/Multiprocessing in Python, asyncoro framework supports asynchronous, concurrent processes. You can run tens or hundreds of thousands of concurrent processes; for reference, running 100,000 simple processes takes about 200MB. If you want to, you can mix threads in rest of the system and coroutines with asyncoro (provided threads and coroutines don't share variables, but use coroutine interface functions to send messages etc.).

Pattern for a background Twisted server that fills an incoming message queue and empties an outgoing message queue?

I'd like to do something like this:
twistedServer.start() # This would be a nonblocking call
while True:
while twistedServer.haveMessage():
message = twistedServer.getMessage()
response = handleMessage(message)
twistedServer.sendResponse(response)
doSomeOtherLogic()
The key thing I want to do is run the server in a background thread. I'm hoping to do this with a thread instead of through multiprocessing/queue because I already have one layer of messaging for my app and I'd like to avoid two. I'm bringing this up because I can already see how to do this in a separate process, but what I'd like to know is how to do it in a thread, or if I can. Or if perhaps there is some other pattern I can use that accomplishes this same thing, like perhaps writing my own reactor.run method. Thanks for any help.
:)
The key thing I want to do is run the server in a background thread.
You don't explain why this is key, though. Generally, things like "use threads" are implementation details. Perhaps threads are appropriate, perhaps not, but the actual goal is agnostic on the point. What is your goal? To handle multiple clients concurrently? To handle messages of this sort simultaneously with events from another source (for example, a web server)? Without knowing the ultimate goal, there's no way to know if an implementation strategy I suggest will work or not.
With that in mind, here are two possibilities.
First, you could forget about threads. This would entail defining your event handling logic above as only the event handling parts. The part that tries to get an event would be delegated to another part of the application, probably something ultimately based on one of the reactor APIs (for example, you might set up a TCP server which accepts messages and turns them into the events you're processing, in which case you would start off with a call to reactor.listenTCP of some sort).
So your example might turn into something like this (with some added specificity to try to increase the instructive value):
from twisted.internet import reactor
class MessageReverser(object):
"""
Accept messages, reverse them, and send them onwards.
"""
def __init__(self, server):
self.server = server
def messageReceived(self, message):
"""
Callback invoked whenever a message is received. This implementation
will reverse and re-send the message.
"""
self.server.sendMessage(message[::-1])
doSomeOtherLogic()
def main():
twistedServer = ...
twistedServer.start(MessageReverser(twistedServer))
reactor.run()
main()
Several points to note about this example:
I'm not sure how your twistedServer is defined. I'm imagining that it interfaces with the network in some way. Your version of the code would have had it receiving messages and buffering them until they were removed from the buffer by your loop for processing. This version would probably have no buffer, but instead just call the messageReceived method of the object passed to start as soon as a message arrives. You could still add buffering of some sort if you want, by putting it into the messageReceived method.
There is now a call to reactor.run which will block. You might instead write this code as a twistd plugin or a .tac file, in which case you wouldn't be directly responsible for starting the reactor. However, someone must start the reactor, or most APIs from Twisted won't do anything. reactor.run blocks, of course, until someone calls reactor.stop.
There are no threads used by this approach. Twisted's cooperative multitasking approach to concurrency means you can still do multiple things at once, as long as you're mindful to cooperate (which usually means returning to the reactor once in a while).
The exact times the doSomeOtherLogic function is called is changed slightly, because there's no notion of "the buffer is empty for now" separate from "I just handled a message". You could change this so that the function is installed called once a second, or after every N messages, or whatever is appropriate.
The second possibility would be to really use threads. This might look very similar to the previous example, but you would call reactor.run in another thread, rather than the main thread. For example,
from Queue import Queue
from threading import Thread
class MessageQueuer(object):
def __init__(self, queue):
self.queue = queue
def messageReceived(self, message):
self.queue.put(message)
def main():
queue = Queue()
twistedServer = ...
twistedServer.start(MessageQueuer(queue))
Thread(target=reactor.run, args=(False,)).start()
while True:
message = queue.get()
response = handleMessage(message)
reactor.callFromThread(twistedServer.sendResponse, response)
main()
This version assumes a twistedServer which works similarly, but uses a thread to let you have the while True: loop. Note:
You must invoke reactor.run(False) if you use a thread, to prevent Twisted from trying to install any signal handlers, which Python only allows to be installed in the main thread. This means the Ctrl-C handling will be disabled and reactor.spawnProcess won't work reliably.
MessageQueuer has the same interface as MessageReverser, only its implementation of messageReceived is different. It uses the threadsafe Queue object to communicate between the reactor thread (in which it will be called) and your main thread where the while True: loop is running.
You must use reactor.callFromThread to send the message back to the reactor thread (assuming twistedServer.sendResponse is actually based on Twisted APIs). Twisted APIs are typically not threadsafe and must be called in the reactor thread. This is what reactor.callFromThread does for you.
You'll want to implement some way to stop the loop and the reactor, one supposes. The python process won't exit cleanly until after you call reactor.stop.
Note that while the threaded version gives you the familiar, desired while True loop, it doesn't actually do anything much better than the non-threaded version. It's just more complicated. So, consider whether you actually need threads, or if they're merely an implementation technique that can be exchanged for something else.

A programming strategy to bypass the os thread limit?

The scenario: We have a python script that checks thousands of proxys simultaneously.
The program uses threads, 1 per proxy, to speed the process. When it reaches the 1007 thread, the script crashes because of the thread limit.
My solution is: A global variable that gets incremented when a thread spawns and decrements when a thread finishes. The function which spawns the threads monitors the variable so that the limit is not reached.
What will your solution be, friends?
Thanks for the answers.
You want to do non-blocking I/O with the select module.
There are a couple of different specific techniques. select.select should work for every major platform. There are other variations that are more efficient (and could matter if you are checking tens of thousands of connections simultaneously) but you will then need to write the code for you specific platform.
I've run into this situation before. Just make a pool of Tasks, and spawn a fixed number of threads that run an endless loop which grabs a Task from the pool, run it, and repeat. Essentially you're implementing your own thread abstraction and using the OS threads to implement it.
This does have drawbacks, the major one being that if your Tasks block for long periods of time they can prevent the execution of other Tasks. But it does let you create an unbounded number of Tasks, limited only by memory.
Does Python have any sort of asynchronous IO functionality? That would be the preferred answer IMO - spawning an extra thread for each outbound connection isn't as neat as having a single thread which is effectively event-driven.
Using different processes, and pipes to transfer data. Using threads in python is pretty lame. From what I heard, they don't actually run in parallel, even if you have a multi-core processor... But maybe it was fixed in python3.
My solution is: A global variable that gets incremented when a thread spawns and decrements when a thread finishes. The function which spawns the threads monitors the variable so that the limit is not reached.
The standard way is to have each thread get next tasks in a loop instead of dying after processing just one. This way you don't have to keep track of the number of threads, since you just fire a fixed number of them. As a bonus, you save on thread creation/destruction.
A counting semaphore should do the trick.
from socket import *
from threading import *
maxthreads = 1000
threads_sem = Semaphore(maxthreads)
class MyThread(Thread):
def __init__(self, conn, addr):
Thread.__init__(self)
self.conn = conn
self.addr = addr
def run(self):
try:
read = conn.recv(4096)
if read == 'go away\n':
global running
running = False
conn.close()
finally:
threads_sem.release()
sock = socket()
sock.bind(('0.0.0.0', 2323))
sock.listen(1)
running = True
while running:
conn, addr = sock.accept()
threads_sem.acquire()
MyThread(conn, addr).start()
Make sure your threads get destroyed properly after they've been used or use a threadpool, although per what I see they're not that effective in Python
see here:
http://code.activestate.com/recipes/203871/
Using the select module or a similar library would most probably be a more efficient solution, but that would require bigger architectural changes.
If you just want to limit the number of threads, a global counter should be fine, as long as you access it in a thread safe way.
Be careful to minimize the default thread stack size. At least on Linux, the default limit puts severe restrictions on the number of created threads. Linux allocates a chunk of the process virtual address space to the stack (usually 10MB). 300 threads x 10MB stack allocation = 3GB of virtual address space dedicated to stack, and on a 32 bit system you have a 3GB limit. You can probably get away with much less.
Twisted is a perfect fit for this problem. See http://twistedmatrix.com/documents/current/core/howto/clients.html for a tutorial on writing a client.
If you don't mind using alternate Python implmentations, Stackless has light-weight (non-native) threads. The only company I know doing much with it though is CCP--they use it for tasklets in their game on both the client and server. You still need to do async I/O with Stackless because if a thread blocks, the process blocks.
As mentioned in another thread, why do you spawn off a new thread for each single operation? This is a classical producer - consumer problem, isn't it? Depending a bit on how you look at it, the proxy checkers might be comsumers or producers.
Anyway, the solution is to make a "queue" of "tasks" to process, and make the threads in a loop check if there are any more tasks to perform in the queue, and if there isn't, wait a predefined interval, and check again.
You should protect your queue with some locking mechanisms, i.e. semaphores, to prevent race conditions.
It's really not that difficult. But it requires a bit of thinking getting it right. Good luck!

Categories