I am trying to understand the with statement in python. Everywhere I look it talks of opening and closing a file, and is meant to replace the try-finally block. Could someone post some other examples too. I am just trying out flask and there are with statements galore in it. Definitely request someone to provide some clarity on it.
There's a very nice explanation here. Basically, the with statement calls two special methods on the associated object. The __enter__ and __exit__ methods. The enter method returns the variable associated with the "with" statement. While the __exit__ method is called after the statement executes to handle any cleanup (such as closing a file pointer).
The idea of the with statement is to make "doing the right thing" the path of least resistance. While the file example is the simplest, threading locks actually provide a more classic example of non-obviously buggy code:
try:
lock.acquire()
# do stuff
finally:
lock.release()
This code is broken - if the lock acquisition ever fails, either the wrong exception will be thrown (since the code will attempt to release a lock that it never acquired), or, worse, if this is a recursive lock, it will be released early. The correct code looks like this:
lock.acquire()
try:
# do stuff
finally:
# If lock.acquire() fails, this *doesn't* run
lock.release()
By using a with statement, it becomes impossible to get this wrong, since it is built into the context manager:
with lock: # The lock *knows* how to correctly handle acquisition and release
# do stuff
The other place where the with statement helps greatly is similar to the major benefit of function and class decorators: it takes "two piece" code, which may be separated by an arbitrary number of lines of code (the function definition for decorators, the try block in the current case) and turns it into "one piece" code where the programmer simply declares up front what they're trying to do.
For short examples, this doesn't look like a big gain, but it actually makes a huge difference when reviewing code. When I see lock.acquire() in a piece of code, I need to scroll down and check for a corresponding lock.release(). When I see with lock:, though, no such check is needed - I can see immediately that the lock will be released correctly.
There are twelve examples of using with in PEP343, including the file-open example:
A template for ensuring that a lock, acquired at the start of a
block, is released when the block is left
A template for opening a file that ensures the file is closed
when the block is left
A template for committing or rolling back a database
transaction
Example 1 rewritten without a generator
Redirect stdout temporarily
A variant on opened() that also returns an error condition
Another useful example would be an operation that blocks
signals
Another use for this feature is the Decimal context
Here's a simple context manager for the decimal module
A generic "object-closing" context manager
a released() context to temporarily release a previously acquired lock by swapping the acquire() and release() calls
A "nested" context manager that automatically nests the
supplied contexts from left-to-right to avoid excessive
indentation
Related
I'm using mutex for blocking part of code in the first function. Can I unlock mutex in the second function?
For example:
import threading
mutex = threading.Lock()
def function1():
mutex.acquire()
#do something
def function2():
#do something
mutex.release()
#do something
You certainly can do what you're asking, locking the mutex in one function and unlocking it in another one. But you probably shouldn't. It's bad design. If the code that uses those functions calls them in the wrong order, the mutex may be locked and never unlocked, or be unlocked when it isn't locked (or even worse, when it's locked by a different thread). If you can only ever call the functions in exactly one order, why are they even separate functions?
A better idea may be to move the lock-handling code out of the functions and make the caller responsible for locking and unlocking. Then you can use a with statement that ensures the lock and unlock are exactly paired up, even in the face of exceptions or other unexpected behavior.
with mutex:
function1()
function2()
Or if not all parts of the two functions are "hot" and need the lock held to ensure they run correctly, you might consider factoring out the parts that need the lock into a third function that runs in between the other two:
function1_cold_parts()
with mutex:
hot_parts()
function2_cold_parts()
I am bit of a python newbie, but I am implementing a benchmarking tool in python that will for example create several sets of resources which depend on each other. And when the program goes out of scope, I want to cleanup the resources in the correct order.
I'm from a C++ background, in C++ I know I can do this with RAII (constructors, destructors).
What is an equivalent pattern in pattern for this problem? Is there a way to do RAII in python or there is a better way to solve this problem?
You are probably looking for a context manager, which is an object that can be used in a with statement:
with context() as c:
do_something(c)
When the with statement is entered, the expression (in this case, context()) will be evaluated, and should return a context manager. __enter__() will be called on the context manager, and the result (which may or may not be the same object as the context manager) is assigned to the variable specified with as. No matter how control is exiting the with body, __exit__() will be called on the context manager, with arguments that specify whether an exception was thrown or not.
As an example: the builtin open() should be used in this way in order to close the opened file after interacting with it.
A new context manager type can easily be defined with contextlib.
For a more one-off solution, you can use try/finally: the finally block is executed after the try block, no matter how control exits the try block:
try:
do_something()
finally:
cleanup()
I am hoping for some clarification on the best way to deal with handling "first" deferreds , ie not just adding callbacks and errbacks to existing Twisted methods that return a deferred, but the best way of creating those original deferreds.
As a concrete example, here are 2 variations of the same method :
it just counts the number of lines in some rather big text files, and is used as the starting point for a chain of deferreds.
Method 1:
This one does not feel so good, as the deferred is fired directly by the reactor.callLater method.
def get_line_count(self):
deferred = defer.Deferred()
def count_lines(result):
try:
print_file = file(self.print_file_path, "r")
self.line_count = sum(1 for line in print_file)
print_file.close()
return self.line_count
except Exception as inst:
raise InvalidFile()
deferred.addCallback(count_lines)
reactor.callLater(1, deferred.callback, None)
return deferred
Method 2:
slightly better , as the deferred is actually fired when the result is available
def get_line_count(self):
deferred = defer.Deferred()
def count_lines():
try:
print_file = file(self.print_file_path, "r")
self.line_count = sum(1 for line in print_file)
print_file.close()
deferred.callback(self.line_count)
except Exception as inst:
deferred.errback(InvalidFile())
reactor.callLater(1, count_lines)
return deferred
Note: You could also point out that both of these are actually synchronous, and potentially blocking methods, (and I perhaps could use "MaybeDeferred"?).
But well, that that is actually one of the aspects I get confused by.
For Method 2, if the count_lines method is very slow (counting the lines in some huge files etc), will it potentially "block" the whole Twisted app ?
I read quite a lot of documentation on how callbacks and errbacks and the reactor behave together (callbacks need to be executed quickly, or return deferreds themselves etc), but in this case , I just don't see and would really appreciate some pointers/examples etc
Are there some articles/clear explanations that deal with the best approach to creating these "first" deferreds? I have read through these excellent articles , and they have helped a lot with some of the basic understanding, but I still feel like I am missing a piece.
For blocking code, would this be this a typicall case for DeferToThread or reactor.spawnprocess ?
I read through a lot of questions like this one and this article, but I still am not 100% sure on how to deal with potentially blocking code, mostly when dealing with file i/o
Sorry if any of this seems too basic , but I really want to get the hang of using Twisted more thoroughly. (It has been a really powerful tool for all the more network-oriented aspects).
Thank you for your time!
Yes, you've got it right: you need threads or separate processes to avoid blocking the Twisted event loop. Using Deferreds wont magically make your code non-blocking. For your questions:
Yes, you would block the event loop if count_lines is very slow. Deferring it to a thread would solve this.
I used Twisteds documentation to learn how Deferreds work, but I guess you've already been through that. The article on database support was information since it clearly says that this library is built using threads. This is how you bridge the synchronous–asynchronous gap.
If the call is truly blocking, then you need to DeferToThread. Python itself is kind-of single threaded, meaning that only one thread can execute Python byte code at a time. However, if the thread you create will block on I/O anyway, then this model works fine: the thread will release the global interpreter lock and so let other Python threads run, including the main thread with the Twisted event loop.
It can also be the case that you can use non-blocking I/O in your code. This can be done with the select module, for example. In that case, you don't need a separate thread. Twisted uses this technique internally and you don't have to think of this if you do normal network I/O. But if you're doing something exotic, then it's good to know how things are built so that you can do the same.
I hope that makes things a bit clearer!
I have a bunch of different methods that are not supposed to run concurrently, so I use a single lock to synchronize them. Looks something like this:
selected_method = choose_method()
with lock:
selected_method()
In some of these methods, I sometimes call a helper function that does some slow network IO. (Let's call that one network_method()). I would like to release the lock while this function is running, to allow other threads to continue their processing.
One way to achieve this would be by calling lock.release() and lock.acquire() before and after calling the network method. However, I would prefer to keep the methods oblivious to the lock, since there are many of them and they change all the time.
I would much prefer to rewrite network_method() so that it checks to see whether the lock is held, and if so release it before starting and acquire it again at the end.
Note that network_method() sometimes gets called from other places, so it shouldn't release the lock if it's not on the thread that holds it.
I tried using the locked() method on the Lock object, but that method only tells me whether the lock is held, not if it is held by the current thread.
By the way, lock is a global object and I'm fine with that.
I would much prefer to rewrite network_method() so that it checks to see whether the lock is held, and if so release it before starting and acquire it again at the end.
Note that network_method() sometimes gets called from other places, so it shouldn't release the lock if it's not on the thread that holds it.
This just sounds like entirely the wrong thing to do :(
For a start, it's bad to have a function that sometimes has some other magical side-effect depending on where you call it from. That's the sort of thing that is a nightmare to debug.
Secondly, a lock should have clear acquire and release semantics. If I look at code that says "lock(); do_something(); unlock();" then I expect it to be locked for the duration of do_something(). In fact, it is also telling me that do_something() requires a lock. If I find out that someone has written a particular do_something() which actually unlocks the lock that I just saw to be locked, I will either (a) fire them or (b) hunt them down with weapons, depending on whether I am in a position of seniority relative to them or not.
By the way, lock is a global object and I'm fine with that.
Incidentally, this is also why globals are bad. If I modify a value, call a function, and then modify a value again, I don't want that function in the middle being able to reach back out and modify this value in an unpredictable way.
My suggestion to you is this: your lock is in the wrong place, or doing the wrong thing, or both. You say these methods aren't supposed to run concurrently, but you actually want some of them to run concurrently. The fact that one of them is "slow" can't possibly make it acceptable to remove the lock - either you need the mutual exclusion during this type of operation for it to be correct, or you do not. If the slower operation is indeed inherently safe when the others are not, then maybe it doesn't need the lock - but that implies the lock should go inside each of the faster operations, not outside them. But all of this is dependent on what exactly the lock is for.
Why not just do this?
with lock:
before_network()
do_network_stuff()
with lock:
after_network()
I need to dynamically load code (comes as source), run it and get the results. The code that I load always includes a run method, which returns the needed results. Everything looks ridiculously easy, as usual in Python, since I can do
exec(source) #source includes run() definition
result = run(params)
#do stuff with result
The only problem is, the run() method in the dynamically generated code can potentially not terminate, so I need to only run it for up to x seconds. I could spawn a new thread for this, and specify a time for .join() method, but then I cannot easily get the result out of it (or can I). Performance is also an issue to consider, since all of this is happening in a long while loop
Any suggestions on how to proceed?
Edit: to clear things up per dcrosta's request: the loaded code is not untrusted, but generated automatically on the machine. The purpose for this is genetic programming.
The only "really good" solutions -- imposing essentially no overhead -- are going to be based on SIGALRM, either directly or through a nice abstraction layer; but as already remarked Windows does not support this. Threads are no use, not because it's hard to get results out (that would be trivial, with a Queue!), but because forcibly terminating a runaway thread in a nice cross-platform way is unfeasible.
This leaves high-overhead multiprocessing as the only viable cross-platform solution. You'll want a process pool to reduce process-spawning overhead (since presumably the need to kill a runaway function is only occasional, most of the time you'll be able to reuse an existing process by sending it new functions to execute). Again, Queue (the multiprocessing kind) makes getting results back easy (albeit with a modicum more caution than for the threading case, since in the multiprocessing case deadlocks are possible).
If you don't need to strictly serialize the executions of your functions, but rather can arrange your architecture to try two or more of them in parallel, AND are running on a multi-core machine (or multiple machines on a fast LAN), then suddenly multiprocessing becomes a high-performance solution, easily paying back for the spawning and IPC overhead and more, exactly because you can exploit as many processors (or nodes in a cluster) as you can use.
You could use the multiprocessing library to run the code in a separate process, and call .join() on the process to wait for it to finish, with the timeout parameter set to whatever you want. The library provides several ways of getting data back from another process - using a Value object (seen in the Shared Memory example on that page) is probably sufficient. You can use the terminate() call on the process if you really need to, though it's not recommended.
You could also use Stackless Python, as it allows for cooperative scheduling of microthreads. Here you can specify a maximum number of instructions to execute before returning. Setting up the routines and getting the return value out is a little more tricky though.
I could spawn a new thread for this, and specify a time for .join() method, but then I cannot easily get the result out of it
If the timeout expires, that means the method didn't finish, so there's no result to get. If you have incremental results, you can store them somewhere and read them out however you like (keeping threadsafety in mind).
Using SIGALRM-based systems is dicey, because it can deliver async signals at any time, even during an except or finally handler where you're not expecting one. (Other languages deal with this better, unfortunately.) For example:
try:
# code
finally:
cleanup1()
cleanup2()
cleanup3()
A signal passed up via SIGALRM might happen during cleanup2(), which would cause cleanup3() to never be executed. Python simply does not have a way to terminate a running thread in a way that's both uncooperative and safe.
You should just have the code check the timeout on its own.
import threading
from datetime import datetime, timedelta
local = threading.local()
class ExecutionTimeout(Exception): pass
def start(max_duration = timedelta(seconds=1)):
local.start_time = datetime.now()
local.max_duration = max_duration
def check():
if datetime.now() - local.start_time > local.max_duration:
raise ExecutionTimeout()
def do_work():
start()
while True:
check()
# do stuff here
return 10
try:
print do_work()
except ExecutionTimeout:
print "Timed out"
(Of course, this belongs in a module, so the code would actually look like "timeout.start()"; "timeout.check()".)
If you're generating code dynamically, then generate a timeout.check() call at the start of each loop.
Consider using the stopit package that could be useful in some cases you need timeout control. Its doc emphasizes the limitations.
https://pypi.python.org/pypi/stopit
a quick google for "python timeout" reveals a TimeoutFunction class
Executing untrusted code is dangerous, and should usually be avoided unless it's impossible to do so. I think you're right to be worried about the time of the run() method, but the run() method could do other things as well: delete all your files, open sockets and make network connections, begin cracking your password and email the result back to an attacker, etc.
Perhaps if you can give some more detail on what the dynamically loaded code does, the SO community can help suggest alternatives.