Thread locals in Python - negatives, with regards to scalability?

Thread locals in Python - negatives, with regards to scalability? - python

I'm wondering if there are some serious implications I might be creating for myself by using thread locals. I noticed that in the case of Flask, they use thread locals, and mention that it can cause issues with servers that aren't built with threads in mind. Is this an outdated concern? I'm using thread locals with Django for a few things, deploying with NGINX in front of UWSGI, or Gunicorn, on Ubuntu 10.04 with Postgres (not that the OS or DB probably matter, but just for clarity). Do I need to be worried?

Threadlocals aren't the most robust or secure way to do things - check out this note, for instance. [ Though also see Glenn's comment, below ]
I suppose if you have coded cleanly, with the idea that you're putting stuff into a big global pot of info, accepting unguaranteed data consistency in those threaded locals and taking care to avoid race conditions, etc, etc, you might well be ok.
But, even with that in mind, there's still the 'magic'ness of threaded local vars, so documenting clearly what the heck is going on and any time a threadedlocal var is used might help you/future developers of the codebase down the line.

Related

How to use python multiprocessing proxy objects in multithreading

I am using pyhton's multiprocessing package in a standard client-server model.
I have a few types of objects in the server that I register through the BaseManager.register method, and use from the client through proxies (based on the AutoProxy class).
I've had random errors pop up when I was using those proxies from multiple client threads, and following some reading I discovered that the Proxy instances themselves are not thread safe. See from the python multiprocessing documentation:
Thread safety of proxies
Do not use a proxy object from more than one thread unless you protect it with a lock.
(There is never a problem with different processes using the same proxy.)
My scenario fits this perfectly then. OK, I know why it fails. But I want it to work :) so I seek an advice - what is the best method to make this thread-safe?
My particular case being that (a) I work with a single client thread 90% of the time, (b) the actual objects behind the proxy are thread safe, and (c) that I would like to call multiple methods of the same proxied-object concurrently.
As always, Internet, those who help me shall live on and never die! Those who do their best might get a consolation prize too.
Thanks,
Yonatan

Unfortunately, this issue probably doesn't relate to too many people :(
Here's what we're doing, for future readers:
We'll be using regular thread locks to guard usage of the 'main' proxy
The main proxy provides proxies to other instances in the server process, and these are thread-specific in our contexts, so we'll be using those without a lock
Not a very interesting solution, yeah, I know. We also considered making a tailored version of the AutoProxy (the package supports that) with built-in locking. We might do that if this scenario repeats in our system. Auto-locking should be done carefully, since this is done 'behind the scenes' and can lead to race-condition deadlocks.
If anyone has similar issues in the future - please comment or contact me directly.

sandbox to execute possibly unfriendly python code [duplicate]

This question already has answers here:
How can I sandbox Python in pure Python?
(7 answers)
Python, safe, sandbox [duplicate]
(9 answers)
Closed 9 years ago.
Let's say there is a server on the internet that one can send a piece of code to for evaluation. At some point server takes all code that has been submitted, and starts running and evaluating it. However, at some point it will definitely bump into "os.system('rm -rf *')" sent by some evil programmer. Apart from "rm -rf" you could expect people try using the server to send spam or dos someone, or fool around with "while True: pass" kind of things.
Is there a way to coop with such unfriendly/untrusted code? In particular I'm interested in a solution for python. However if you have info for any other language, please share.

If you are not specific to CPython implementation, you should consider looking at PyPy[wiki] for these purposes — this Python dialect allows transparent code sandboxing.
Otherwise, you can provide fake __builtin__ and __builtins__ in the corresponding globals/locals arguments to exec or eval.
Moreover, you can provide dictionary-like object instead of real dictionary and trace what untrusted code does with it's namespace.
Moreover, you can actually trace that code (issuing sys.settrace() inside restricted environment before any other code executed) so you can break execution if something will go bad.
If none of solutions is acceptable, use OS-level sandboxing like chroot, unionfs and standard multiprocess python module to spawn code worker in separate secured process.

You can check pysandbox which does just that, though the VM route is probably safer if you can afford it.

It's impossible to provide an absolute solution for this because the definition of 'bad' is pretty hard to nail down.
Is opening and writing to a file bad or good? What if that file is /dev/ram?
You can profile signatures of behavior, or you can try to block anything that might be bad, but you'll never win. Javascript is a pretty good example of this, people run arbitrary javascript code all the time on their computers -- it's supposed to be sandboxed but there's all sorts of security problems and edge conditions that crop up.
I'm not saying don't try, you'll learn a lot from the process.
Many companies have spent millions (Intel just spent billions on McAffee) trying to understand how to detect 'bad code' -- and every day machines running McAffe anti-virus get infected with viruses. Python code isn't any less dangerous than C. You can run system calls, bind to C libraries, etc.

I would seriously consider virtualizing the environment to run this stuff, so that exploits in whatever mechanism you implement can be firewalled one more time by the configuration of the virtual machine.
Number of users and what kind of code you expect to test/run would have considerable influence on choices btw. If they aren't expected to link to files or databases, or run computationally intensive tasks, and you have very low pressure, you could be almost fine by just preventing file access entirely and imposing a time limit on the process before it gets killed and the submission flagged as too expensive or malicious.
If the code you're supposed to test might be any arbitrary Django extension or page, then you're in for a lot of work probably.

You can try some generic sanbox such as Sydbox or Gentoo's sandbox. They are not Python-specific.
Both can be configured to restrict read/write to some directories. Sydbox can even sandbox sockets.

I think a fix like this is going to be really hard and it reminds me of a lecture I attended about the benefits of programming in a virtual environment.
If you're doing it virtually its cool if they bugger it. It wont solve a while True: pass but rm -rf / won't matter.

Unless I'm mistaken (and I very well might be), this is much of the reason behind the way Google changed Python for the App Engine. You run Python code on their server, but they've removed the ability to write to files. All data is saved in the "nosql" database.
It's not a direct answer to your question, but an example of how this problem has been dealt with in some circumstances.

having to run multiple instances of a web service for ruby/python seems like a hack to me

Is it just me or is having to run multiple instances of a web server to scale a hack?
Am I wrong in this?
Clarification
I am referring to how I read people run multiple instances of a web service on a single server. I am not talking about a cluster of servers.

Not really, people were running multiple frontends across a cluster of servers before multicore cpus became widespread
So there has been all the infrastructure for supporting sessions properly across multiple frontends for quite some time before it became really advantageous to run a bunch of threads on one machine.
Infact using asynchronous style frontends gives better performance on the same hardware than a multithreaded approach, so I would say that not running multiple instances in favour of a multithreaded monster is a hack

Since we are now moving towards more cores, rather than faster processors - in order to scale more and more, you will need to be running more instances.
So yes, I reckon you are wrong.
This does not by any means condone brain-dead programming with the excuse that you can just scale it horizontally, that just seems retarded.

With no details, it is very difficult to see what you are getting at. That being said, it is quite possible that you are simply not using the right approach for your problem.
Sometimes multiple separate instances are better. Sometimes, your Python services are actually better deployed behind a single Apache instance (using mod_wsgi) which may elect to use more than a single process. I don't know about Ruby to opinionate there.
In short, if you want to make your service scalable then the way to do so depends heavily on additional details. Is it scaling up or scaling out? What is the operating system and available or possibly installable server software? Is the service itself easily parallelized and how much is it database dependent? How is the database deployed?

Even if Ruby/Python interpreters were perfect, and could utilize all avail CPU with single process, you would still reach maximal capability of single server sooner or later and have to scale across several machines, going back to running several instances of your app.

I would hesitate to say that the issue is a "hack". Or indeed that threaded solutions are necessarily superior.
The situation is a result of design decisions used in the interpreters of languages like Ruby and Python.
I work with Ruby, so the details may be different for other languages.
BUT ... essentially, Ruby uses a Global Interpreter Lock to prevent threading issues:
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
The side-effect of this is that to achieve concurrency with frameworks like Rails, rather than relying on multiple threads within the VM, we use multiple processes, each with its own interpreter and instance of your framework and application code
Each instance of the app handles a single request at a time. To achieve concurrency we have to spin up multiple instances.
In the olden days (2-3 years ago) we would run multiple mongrel (or similar) instances behind a proxy (generally apache). Passenger changed some of this because it is smart enough to manage the processes itself, rather than requiring manual setup. You tell Passenger how many processes it can use and off it goes.
The whole structure is actually not as bad as the thread-orthodoxy would have you believe. For a start, it's pretty easy to make this type of architecture work in a multicore environment. Any modern database is designed to handle highly concurrent loads, so having multiple processes has very little if any effect at that level.
If you use a language like JRuby you can deploy into a threaded app server like Tomcat and have a deployment that looks much more "java-like". However, this is not as big a win as you might think, because now your application needs to be much more thread-aware and you can see side effects and strangeness from threading issues.

Your assumption that Tomcat's and IIS's single process per server is superior is flawed. The choice of a multi-threaded server and a multi-process server depends on a lot of variables.
One main thing is the underlying operating system. Unix systems have always had great support for multi-processing because of the copy-on-write nature of the fork system call. This makes multi-processes a really attractive option because web-serving is usually very shared-nothing and you don't have to worry about locking. Windows on the other hand had much heavier processes and lighter threads so programs like IIS would gravitate to a multi-threading model.
As for the question to wether it's a hack to run multiple servers really depends on your perspective. If you look at Apache, it comes with a variety of pluggable engines to choose from. The MPM-prefork one is the default because it allows the programmer to easily use non-thread-safe C/Perl/database libraries without having to throw locks and semaphores all over the place. To some that might be a hack to work around poorly implemented libraries. To me it's a brilliant way of leaving it to the OS to handle the problems and letting me get back to work.
Also a multi-process model comes with a few features that would be very difficult to implement in a multi-threaded server. Because they are just processes, zero-downtime rolling-updates are trivial. You can do it with a bash script.
It also has it's short-comings. In a single-server model setting up a singleton that holds some global state is trivial, while on a multi-process model you have to serialize that state to a database or Redis server. (Of course if your single-process server outgrows a single server you'll have to do that anyway.)
Is it a hack? Yes and no. Both original implementations (MRI, and CPython) have Global Interpreter Locks that will prevent a multi-core server from operating at it's 100% potential. On the other hand multi-process has it's advantages (especially on the Unix-side of the fence).
There's also nothing inherent in the languages themselves that makes them require a GIL, so you can run your application with Jython, JRuby, IronPython or IronRuby if you really want to share state inside a single process.

Threading in Python [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What are the modules used to write multi-threaded applications in Python? I'm aware of the basic concurrency mechanisms provided by the language and also of Stackless Python, but what are their respective strengths and weaknesses?

In order of increasing complexity:
Use the threading module
Pros:
It's really easy to run any function (any callable in fact) in its
own thread.
Sharing data is if not easy (locks are never easy :), at
least simple.
Cons:
As mentioned by Juergen Python threads cannot actually concurrently access state in the interpreter (there's one big lock, the infamous Global Interpreter Lock.) What that means in practice is that threads are useful for I/O bound tasks (networking, writing to disk, and so on), but not at all useful for doing concurrent computation.
Use the multiprocessing module
In the simple use case this looks exactly like using threading except each task is run in its own process not its own thread. (Almost literally: If you take Eli's example, and replace threading with multiprocessing, Thread, with Process, and Queue (the module) with multiprocessing.Queue, it should run just fine.)
Pros:
Actual concurrency for all tasks (no Global Interpreter Lock).
Scales to multiple processors, can even scale to multiple machines.
Cons:
Processes are slower than threads.
Data sharing between processes is trickier than with threads.
Memory is not implicitly shared. You either have to explicitly share it or you have to pickle variables and send them back and forth. This is safer, but harder. (If it matters increasingly the Python developers seem to be pushing people in this direction.)
Use an event model, such as Twisted
Pros:
You get extremely fine control over priority, over what executes when.
Cons:
Even with a good library, asynchronous programming is usually harder than threaded programming, hard both in terms of understanding what's supposed to happen and in terms of debugging what actually is happening.
In all cases I'm assuming you already understand many of the issues involved with multitasking, specifically the tricky issue of how to share data between tasks. If for some reason you don't know when and how to use locks and conditions you have to start with those. Multitasking code is full of subtleties and gotchas, and it's really best to have a good understanding of concepts before you start.

You've already gotten a fair variety of answers, from "fake threads" all the way to external frameworks, but I've seen nobody mention Queue.Queue -- the "secret sauce" of CPython threading.
To expand: as long as you don't need to overlap pure-Python CPU-heavy processing (in which case you need multiprocessing -- but it comes with its own Queue implementation, too, so you can with some needed cautions apply the general advice I'm giving;-), Python's built-in threading will do... but it will do it much better if you use it advisedly, e.g., as follows.
"Forget" shared memory, supposedly the main plus of threading vs multiprocessing -- it doesn't work well, it doesn't scale well, never has, never will. Use shared memory only for data structures that are set up once before you spawn sub-threads and never changed afterwards -- for everything else, make a single thread responsible for that resource, and communicate with that thread via Queue.
Devote a specialized thread to every resource you'd normally think to protect by locks: a mutable data structure or cohesive group thereof, a connection to an external process (a DB, an XMLRPC server, etc), an external file, etc, etc. Get a small thread pool going for general purpose tasks that don't have or need a dedicated resource of that kind -- don't spawn threads as and when needed, or the thread-switching overhead will overwhelm you.
Communication between two threads is always via Queue.Queue -- a form of message passing, the only sane foundation for multiprocessing (besides transactional-memory, which is promising but for which I know of no production-worthy implementations except In Haskell).
Each dedicated thread managing a single resource (or small cohesive set of resources) listens for requests on a specific Queue.Queue instance. Threads in a pool wait on a single shared Queue.Queue (Queue is solidly threadsafe and won't fail you in this).
Threads that just need to queue up a request on some queue (shared or dedicated) do so without waiting for results, and move on. Threads that eventually DO need a result or confirmation for a request queue a pair (request, receivingqueue) with an instance of Queue.Queue they just made, and eventually, when the response or confirmation is indispensable in order to proceed, they get (waiting) from their receivingqueue. Be sure you're ready to get error-responses as well as real responses or confirmations (Twisted's deferreds are great at organizing this kind of structured response, BTW!).
You can also use Queue to "park" instances of resources which can be used by any one thread but never be shared among multiple threads at one time (DB connections with some DBAPI compoents, cursors with others, etc) -- this lets you relax the dedicated-thread requirement in favor of more pooling (a pool thread that gets from the shared queue a request needing a queueable resource will get that resource from the apppropriate queue, waiting if necessary, etc etc).
Twisted is actually a good way to organize this minuet (or square dance as the case may be), not just thanks to deferreds but because of its sound, solid, highly scalable base architecture: you may arrange things to use threads or subprocesses only when truly warranted, while doing most things normally considered thread-worthy in a single event-driven thread.
But, I realize Twisted is not for everybody -- the "dedicate or pool resources, use Queue up the wazoo, never do anything needing a Lock or, Guido forbid, any synchronization procedure even more advanced, such as semaphore or condition" approach can still be used even if you just can't wrap your head around async event-driven methodologies, and will still deliver more reliability and performance than any other widely-applicable threading approach I've ever stumbled upon.

It depends on what you're trying to do, but I'm partial to just using the threading module in the standard library because it makes it really easy to take any function and just run it in a separate thread.
from threading import Thread
def f():
...
def g(arg1, arg2, arg3=None):
....
Thread(target=f).start()
Thread(target=g, args=[5, 6], kwargs={"arg3": 12}).start()
And so on. I often have a producer/consumer setup using a synchronized queue provided by the Queue module
from Queue import Queue
from threading import Thread
q = Queue()
def consumer():
while True:
print sum(q.get())
def producer(data_source):
for line in data_source:
q.put( map(int, line.split()) )
Thread(target=producer, args=[SOME_INPUT_FILE_OR_SOMETHING]).start()
for i in range(10):
Thread(target=consumer).start()

Kamaelia is a python framework for building applications with lots of communicating processes.
(source: kamaelia.org) Kamaelia - Concurrency made useful, fun
In Kamaelia you build systems from simple components that talk to each other. This speeds development, massively aids maintenance and also means you build naturally concurrent software. It's intended to be accessible by any developer, including novices. It also makes it fun :)
What sort of systems? Network servers, clients, desktop applications, pygame based games, transcode systems and pipelines, digital TV systems, spam eradicators, teaching tools, and a fair amount more :)
Here's a video from Pycon 2009. It starts by comparing Kamaelia to Twisted and Parallel Python and then gives a hands on demonstration of Kamaelia.
Easy Concurrency with Kamaelia - Part 1 (59:08)
Easy Concurrency with Kamaelia - Part 2 (18:15)

Regarding Kamaelia, the answer above doesn't really cover the benefit here. Kamaelia's approach provides a unified interface, which is pragmatic not perfect, for dealing with threads, generators & processes in a single system for concurrency.
Fundamentally it provides a metaphor of a running thing which has inboxes, and outboxes. You send messages to outboxes, and when wired together, messages flow from outboxes to inboxes. This metaphor/API remains the same whether you're using generators, threads or processes, or speaking to other systems.
The "not perfect" part is due to syntactic sugar not being added as yet for inboxes and outboxes (though this is under discussion) - there is a focus on safety/usability in the system.
Taking the producer consumer example using bare threading above, this becomes this in Kamaelia:
Pipeline(Producer(), Consumer() )
In this example it doesn't matter if these are threaded components or otherwise, the only difference is between them from a usage perspective is the baseclass for the component. Generator components communicate using lists, threaded components using Queue.Queues and process based using os.pipes.
The reason behind this approach though is to make it harder to make hard to debug bugs. In threading - or any shared memory concurrency you have, the number one problem you face is accidentally broken shared data updates. By using message passing you eliminate one class of bugs.
If you use bare threading and locks everywhere you're generally working on the assumption that when you write code that you won't make any mistakes. Whilst we all aspire to that, it's very rare that will happen. By wrapping up the locking behaviour in one place you simplify where things can go wrong. (Context handlers help, but don't help with accidental updates outside the context handler)
Obviously not every piece of code can be written as message passing and shared style which is why Kamaelia also has a simple software transactional memory (STM), which is a really neat idea with a nasty name - it's more like version control for variables - ie check out some variables, update them and commit back. If you get a clash you rinse and repeat.
Relevant links:
Europython 09 tutorial
Monthly releases
Mailing list
Examples
Example Apps
Reusable components (generator & thread)
Anyway, I hope that's a useful answer. FWIW, the core reason behind Kamaelia's setup is to make concurrency safer & easier to use in python systems, without the tail wagging the dog. (ie the big bucket of components
I can understand why the other Kamaelia answer was modded down, since even to me it looks more like an ad than an answer. As the author of Kamaelia it's nice to see enthusiasm though I hope this contains a bit more relevant content :-)
And that's my way of saying, please take the caveat that this answer is by definition biased, but for me, Kamaelia's aim is to try and wrap what is IMO best practice. I'd suggest trying a few systems out, and seeing which works for you. (also if this is inappropriate for stack overflow, sorry - I'm new to this forum :-)

I would use the Microthreads (Tasklets) of Stackless Python, if I had to use threads at all.
A whole online game (massivly multiplayer) is build around Stackless and its multithreading principle -- since the original is just to slow for the massivly multiplayer property of the game.
Threads in CPython are widely discouraged. One reason is the GIL -- a global interpreter lock -- that serializes threading for many parts of the execution. My experiance is, that it is really difficult to create fast applications this way. My example codings where all slower with threading -- with one core (but many waits for input should have made some performance boosts possible).
With CPython, rather use seperate processes if possible.

If you really want to get your hands dirty, you can try using generators to fake coroutines. It probably isn't the most efficient in terms of work involved, but coroutines do offer you very fine control of co-operative multitasking rather than pre-emptive multitasking you'll find elsewhere.
One advantage you'll find is that by and large, you will not need locks or mutexes when using co-operative multitasking, but the more important advantage for me was the nearly-zero switching speed between "threads". Of course, Stackless Python is said to be very good for that as well; and then there's Erlang, if it doesn't have to be Python.
Probably the biggest disadvantage in co-operative multitasking is the general lack of workaround for blocking I/O. And in the faked coroutines, you'll also encounter the issue that you can't switch "threads" from anything but the top level of the stack within a thread.
After you've made an even slightly complex application with fake coroutines, you'll really begin to appreciate the work that goes into process scheduling at the OS level.

Writing a socket-based server in Python, recommended strategies?

I was recently reading this document which lists a number of strategies that could be employed to implement a socket server. Namely, they are:
Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
Serve many clients with each thread, and use nonblocking I/O and readiness change notification
Serve many clients with each server thread, and use asynchronous I/O
serve one client with each server thread, and use blocking I/O
Build the server code into the kernel
Now, I would appreciate a hint on which should be used in CPython, which we know has some good points, and some bad points. I am mostly interested in performance under high concurrency, and yes a number of the current implementations are too slow.
So if I may start with the easy one, "5" is out, as I am not going to be hacking anything into the kernel.
"4" Also looks like it must be out because of the GIL. Of course, you could use multiprocessing in place of threads here, and that does give a significant boost. Blocking IO also has the advantage of being easier to understand.
And here my knowledge wanes a bit:
"1" is traditional select or poll which could be trivially combined with multiprocessing.
"2" is the readiness-change notification, used by the newer epoll and kqueue
"3" I am not sure there are any kernel implementations for this that have Python wrappers.
So, in Python we have a bag of great tools like Twisted. Perhaps they are a better approach, though I have benchmarked Twisted and found it too slow on a multiple processor machine. Perhaps having 4 twisteds with a load balancer might do it, I don't know. Any advice would be appreciated.

asyncore is basically "1" - It uses select internally, and you just have one thread handling all requests. According to the docs it can also use poll. (EDIT: Removed Twisted reference, I thought it used asyncore, but I was wrong).
"2" might be implemented with python-epoll (Just googled it - never seen it before).
EDIT: (from the comments) In python 2.6 the select module has epoll, kqueue and kevent build-in (on supported platforms). So you don't need any external libraries to do edge-triggered serving.
Don't rule out "4", as the GIL will be dropped when a thread is actually doing or waiting for IO-operations (most of the time probably). It doesn't make sense if you've got huge numbers of connections of course. If you've got lots of processing to do, then python may not make sense with any of these schemes.
For flexibility maybe look at Twisted?
In practice your problem boils down to how much processing you are going to do for requests. If you've got a lot of processing, and need to take advantage of multi-core parallel operation, then you'll probably need multiple processes. On the other hand if you just need to listen on lots of connections, then select or epoll, with a small number of threads should work.

How about "fork"? (I assume that is what the ForkingMixIn does) If the requests are handled in a "shared nothing" (other than DB or file system) architecture, fork() starts pretty quickly on most *nixes, and you don't have to worry about all the silly bugs and complications from threading.
Threads are a design illness forced on us by OSes with too-heavy-weight processes, IMHO. Cloning a page table with copy-on-write attributes seems a small price, especially if you are running an interpreter anyway.
Sorry I can't be more specific, but I'm more of a Perl-transitioning-to-Ruby programmer (when I'm not slaving over masses of Java at work)
Update: I finally did some timings on thread vs fork in my "spare time". Check it out:
http://roboprogs.com/devel/2009.04.html
Expanded:
http://roboprogs.com/devel/2009.12.html

One sollution is gevent. Gevent maries a libevent based event polling with lightweight cooperative task switching implemented by greenlet.
What you get is all the performance and scalability of an event system with the elegance and straightforward model of blocking IO programing.
(I don't know what the SO convention about answering to realy old questions is, but decided I'd still add my 2 cents)

Can I suggest additional links?
cogen is a crossplatform library for network oriented, coroutine based programming using the enhanced generators from python 2.5. On the main page of cogen project there're links to several projects with similar purpose.

I like Douglas' answer, but as an aside...
You could use a centralized dispatch thread/process that listens for readiness notifications using select and delegates to a pool of worker threads/processes to help accomplish your parallelism goals.
As Douglas mentioned, however, the GIL won't be held during most lengthy I/O operations (since no Python-API things are happening), so if it's response latency you're concerned about you can try moving the critical portions of your code to CPython API.

http://docs.python.org/library/socketserver.html#asynchronous-mixins
As for multi-processor (multi-core) machines. With CPython due to GIL you'll need at least one process per core, to scale. As you say that you need CPython, you might try to benchmark that with ForkingMixIn. With Linux 2.6 might give some interesting results.
Other way is to use Stackless Python. That's how EVE solved it. But I understand that it's not always possible.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.