Let's say that I am writing code where I need a fail-safe section that - in case of a failure - will save me from wasting money. For example, code that buys virtual servers via AWS API and since those are paid per hour, it's preferable to shut them down as soon as they stop being useful.
The problem is, I don't know how many such instances I would use and I would create them on the fly, adding them to some list or whatnot. The "destructor" of each of the instance might have an unexpected exception and because of that, I'm afraid that any code with a finally clause would be ugly. I cannot also think of how I would use with because I would be introducing objects on the fly. What are other fail-safe solutions I could use with Python?
Related
Recently I have a problem when a coworker made a change in a signature for a function return, we have clients that call the function in this way:
example = function()
But then as I was depending on his changes, he unintentionally change to this:
example, other_stuff = function()
I was not aware of this change, I did the merge and everything seem ok but then the error happen as I was expecting one value, but now it was trying to unpack two
So my question is knowing python is not a typed language, is there a way to know this happen and prevent this behavior (a tool or something), because sadly was until a runtime error was raise when I notice this, or how did we need to handle this
Sounds like a process error. An API shouldn't change its signature without considering its users. Within a project its easy, just search. Externally, the API version number should be bumped and the change should be in the change notes.
The API should have unit tests which include return value tests. "he unintentionally changed" issues should all be caught there. Since this didn't happen, a bug report against the tests should be written.
Of course, the coworker could just change those tests. But all of that should be in a code reviewed change set in your source repository. The coworker should have to justify the change and how to mitigate breakage. Since this API appears to have external clients, it should be very difficult to get an API signature change as all clients will need to be notified.
I'm trying to achieve strong consistency with ndb using python.
And looks like I'm missing something as my reads behave like they're not strongly consistent.
The query is:
links = Link.query(ancestor=lead_key).filter(Link.last_status ==
None).fetch(keys_only=True)
if links:
do_action()
The key structure is:
Lead root (generic key) -> Lead -> Website (one per lead) -> Link
I have many tasks that are executed concurrently using TaskQueue and this query is performed at the end of every task. Sometimes I'm getting "too much contention" exception when updating the last_status field but I deal with it using retries. Can it break strong consistency?
The expected behavior is having do_action() called when there are no links left with last_status equal to None. The actual behavior is inconsistent: sometimes do_action() is called twice and sometimes not called at all.
Using an ancestor key to get strong consistency has a limitation: you're limited to one update per second per entity group. One way to work around this is to shard the entity groups. Sharding Counters describes the technique. It's an old article, but as far as I know, the advise is still sound.
Adding to Dave's answer which is the 1st thing to check.
One thing which isn't well documented and can be a bit surprising is that the contention can be caused by read operations as well, not only by the write ones.
Whenever a transaction starts the entity groups being accessed (by read or write ops, doesn't matter) are marked as such. The too much contention error indicates that too many parallel transactions simultaneously try to access the same entity group. It can happen even if none of the transactions actually attempts to write!
Note: this contention is NOT emulated by the development server, it can only be seen when deployed on GAE, with the real datastore!
What can add to the confusion is the automatic re-tries of the transactions, which can happen after both actual write conflicts or just plain access contention. These retries may appear to the end-user as suspicious repeated execution of some code paths - which I suspect could explain your reports of do_action() being called twice.
Usually when you run into such problems you have to re-visit your data structures and/or the way you're accessing them (your transactions). In addition to solutions maintaining the strong consistency (which can be quite expensive) you may want to re-check if consistency is actually a must. In some cases it's added as a blanket requirement just because appears to simplify things. From my experience it doesn't :)
There is nothing in your sample that ensures that your code is only called once.
For the moment, I am going to assume that your "do_action" function does something to the Link entities, specifically that it sets the "last_status" property.
If you do not perform the query and the write to the Link Entity inside a transaction, then it is possible for two different requests (task queue tasks) to get results back from the query, then both write their new value to the Link entity (with the last write overwriting the previous value).
Remember that even if you do use a transaction, you don't know until the transaction is successfully completed that nobody else tried to perform a write. This is important if you are trying to do something external to datastore (for example, making a http request to an external system), as you may see http requests from transactions that would eventually fail with a concurrent modification exception.
I've read quite a few answers on here about testing helper methods (not necessarily private) in my unit tests and I'm still not quite sure what the best approach should be for my current situation.
I currently have a block of logic that runs as a scheduled job. It does a number of mostly related things like update local repositories, convert file types, commit these to other repos, clean up old repos, etc. I need all of this code to run in a specific order, so rather than setting a bunch of scheduled jobs, I took a lot of these small methods and put them into one large method that would enforce the order in which the code is run:
def mainJob():
sync_repos()
convert_files()
commit_changes()
and so on. Now I'm not sure how to write my tests for this thing. It's frustrating to test the entire mainJob() function because it does so many things and is really more of a reliability feature anyway. I see a lot of people saying I should only test the public interface, but I worry that there will potentially be code that isn't directly verified.
E.g. If I am trying to open a file, can I not simply check if os.path.exists(myfile) instead of using try/except . I think the answer to why I should not rely on os.path.exists(myfile) is that there may be a number of other reasons why the file may not open.
Is that the logic behind why error handling using try/except should be used?
Is there a general guideline on when to use Exceptions in Python.
Race conditions.
In the time between checking whether a file exists and doing an operation that file might have been deleted, edited, renamed, etc...
On top of that, an exception will give you an OS error code that allows you to get more relevant reason why an operation has failed.
Finally, it's considered Pythonic to ask for forgiveness, rather than ask for permission.
Generally you use try/except when you handle things that are outside of the parameters that you can influence.
Within your script you can check variables for type, lists for length, etc. and you can be sure that the result will be sufficient since you are the only one handling these objects. As soon however as you handle files in the file system or you connect to remote hosts etc. you can neither influence or check all parameters anymore nor can you be sure that the result of the check stays valid.
As you said,
the file might be existent but you don't have access rights
you might be able to ping a host address but a connection is declined
There are too many factors that could go wrong to check them all seperately plus, if you do, they might still change until you actually perform your command.
With the try/error you can generally catch every exception and handle the most important errors individually. You make sure that the error is handled even if the test succeeds at first but fails after you start running your commands.
I've been racking my brain on this for the last few weeks and I just can't seem to understand it. I'm hoping you folks here can give me some clarity.
A LITTLE BACKGROUND
I've built an API to help serve a large website and like all of us, are trying to keep the API as efficient as possible. Part of this efficiency is to NOT create an object that contains custom business logic over and over again (Example: a service class) as requests are made. To give some personal background I come from the Java world so I'm use to using a IoC or DI to help handle object creation and injection into my classes to ensure classes are NOT created over and over on a per request basis.
WHAT I'VE READ
While looking at many Python IoC and DI posts I've become rather confused on how to best approach creating a given class and not having to worry about the server getting overloaded with too many objects based on the amount of requests it may be handling.
Some people say an IoC or DI really isn't needed. But as I run my Django app I find that unless I construct the object I want globally (top of file) for views.py to use later rather than within each view class or def within views.py I run the change of creating multiple classes of the same type, which from what I understand would cause memory bloat on the server.
So what's the right way to be pythonic to keep objects from being built over and over? Should I invest in using an IoC / DI or not? Can I safely rely on setting up my service.py files to just contain def's instead of classes that contain def's? Is the garbage collector just THAT efficient so I don't even have to worry about it.
I've purposely not placed any code in this post since this seems like a general questions, but I can provide a few code examples if that helps.
Thanks
From a confused engineer that wants to be as pythonic as possible
You come from a background where everything needs to be a class, I've programmed web apps in Java too, and sometimes it's harder to unlearn old things than to learn new things, I understand.
In Python / Django you wouldn't make anything a class unless you need many instances and need to keep state.
For a service that's hardly the case, and sometimes you'll notice in Java-like web apps some services are made singletons, which is just a workaround and a rather big anti-pattern in Python
Pythonic
Python is flexible enough so that a "services class" isn't required, you'd just have a Python module (e.g. services.py) with a number of functions, emphasis on being a function that takes in something, returns something, in a completely stateless fashion.
# services.py
# this is a module, doesn't keep any state within,
# it may read and write to the DB, do some processing etc but doesn't remember things
def get_scores(student_id):
return Score.objects.filter(student=student_id)
# views.py
# receives HTTP requests
def view_scores(request, student_id):
scores = services.get_scores(student_id)
# e.g. use the scores queryset in a template return HTML page
Notice how if you need to swap out the service, you'll just be swapping out a single Python module (just a file really), so Pythonistas hardly bother with explicit interfaces and other abstractions.
Memory
Now per each "django worker process", you'd have that one services module, that is used over and over for all requests that come in, and when the Score queryset is used and no longer pointed at in memory, it'll be cleaned up.
I saw your other post, and well, instantiating a ScoreService object for each request, or keeping an instance of it in the global scope is just unnecessary, the above example does the job with one module in memory, and doesn't need us to be smart about it.
And if you did need to keep state in-between several requests, keeping them in online instances of ScoreService would be a bad idea anyway because now every user might need one instance, that's not viable (too many online objects keeping context). Not to mention that instance is only accessible from the same process unless you have some sharing mechanisms in place.
Keep state in a datastore
In case you want to keep state in-between requests, you'd keep the state in a datastore, and when the request comes in, you hit the services module again to get the context back from the datastore, pick up where you left it and do your business, return your HTTP response, then unused things will get garbage collected.
The emphasis being on keeping things stateless, where any given HTTP request can be processed on any given django process, and all state objects are garbage collected after the response is returned and objects go out of scope.
This may not be the fastest request/response cycle we can pull, but it's scalable as hell
Look at some major web apps written in Django
I suggest you look at some open source Django projects and look at how they're organized, you'll see a lot of the things you're busting your brains with, Djangonauts just don't bother with.