Using annotate instead of model property - python

The annotate function is very useful to define computed fields for each row of the table. Model properties can also be used for defining computed fields, but are limited (can not be used for sorting, for instance).
Can a model property be completely replaced by an annotated field? When is it adequate to use each?

Differences in annotations, properties, and other methods
There are some cases where annotations are definitely better and easier than properties. These are usually calculations that are easy to make in the database and where the logic is easy to read.
Django #property, on the other hand, is a very easy and Pythonic way to write calculation logic into your models. Some think they are neat things, others think properties should be burned and hidden away because they mix program logic into data objects, which increases complexity. I think that especially the #cached_property decorator is rather neat.
Properties and annotations are not, however, the only ways to query and calculate things in the Django ORM.
In many complex cases properties or Model, Manager, or QuerySet methods, especially with custom QuerySets and Managers have the most flexibility when it comes to customizing your queries.
Most of the time it does not matter which method you use speed-wise, and should use the cleanest or most compact option that is the simplest to write and easiest to read. Try to keep it simple and stupid and you will have the least amount of complex code to maintain.
Exploring, benchmarking, and optimizing
In some cases when you have performance problems and end up analyzing your SQL queries you might be forced to use annotations and custom queries to optimize the complexities of queries you are making. This can be especially true when you are making complex lookups in the database and have to resort to calculating things in either properties or create custom SQL queries.
Calculating stuff in properties can be horrible for complexity if you have large querysets, because you have to run those calculations in Python where objects are large and iteration is slow. On the other hand, calculating stuff via custom SQL queries can be a nightmare to maintain, especially if the SQL you are maintaining is not, well, written by you.
In the end it comes down to the speed requirements and costs of calculation. If calculating in plain Python doesn't slow your service down or cost you money, you shouldn't probably optimize. If you are paying for a fleet of servers then, of course, reasonable optimization might bring you savings that you can use elsewhere. Using 10 hours on optimizing some snippet might not really pay itself back, so be very careful here.
In optimization cases you have to weigh different up and downsides and try different solutions if you are not a prophet and instinctively know what the problem is. If the problem was obvious, it would have been probably optimized away earlier, right?
When experimenting with different options the Django Debug Toolbar, SQL EXPLAIN and ANALYZE and Python profiling are your friends.
Remember that many query problems are also database related problems and you might be hurting your performance with poor database design or maintenance. Remember to run VACUUM periodically and try to normalize your database design.
Toolwise Django Debug Toolbar is especially useful because it can be help with both profiling and SQL analyzing. Many IDEs such as PyCharm also offer profiling on even a running server. This is pretty useful if you want to have your development setup and integrate different tools into it.

Related

Is it a good practice to send schema in web application?

I want to build a web app (SPA) that send the schema (not necessarily the raw db schema, but a representation of the data potentially in a JSON format) to the view, so in the view we can:
Generate grids based on that schema instead of wiring columns
Handle additional information from these fields, such as if it is editable or not, and the like.
This web app will allow users to see tabular information in a grid, and potentially do CRUD operations.
I see a lot of benefits on using the schema (We can implement validators based on the schema, forms generation should be very simple, and best, the impact of adding a simple field to the web app should be easily handled)
My question is: Is it a good strategy? Could you help me identify some drawbacks with this approach? (The stack I am using for this is not very important, but just for the sake of clarity, I am using Bottle (python) in the backend, and React in frontend)
One drawback I see is the time consumed to maintain this addition that you mention: schema generation, sending, interpreting in the view. But of course this is for you to decide, if this overhead is compensated by the advantages you mentioned. If it is, then go for it.
One other thing I would mention is: you want to do validation based on this schema. How many of the validations in your application can be done this way? Are there many cases in which validation will not fit in this pattern? Same question for grid generation, form generation, etc. If there are a lot, then maybe it is not worth it. I have more then once found a automatic solution like this that got me excited only to see later that it does exceptions for this pattern are many and overall I did not gain to much :).
Overall, you decide. One last thing: try to think long term. 90% of the lifetime of an application is spent in maintenance. Try to understand what happens after you release the application and bugs/small features requests are starting to come up.

TDD with large data in Python

I wonder if TDD could help my programming. However, I cannot use it simply as most of my functions take large network objects (many nodes and links) and do operations on them. Or I even read SQL tables.
Most of them time it's not really the logic that breaks (i.e. not semantic bugs), but rather some functions calls after refactoring :)
Do you think I can use TDD with such kind of data? What do you suggest for that? (mock frameworks etc?)
Would I somehow take real data, process it with a function, validate the output, save input/output states to some kind of mock object, and then write a test on it? I mean just in case I cannot provide hand made input data.
I haven't started TDD yet, so references are welcome :)
You've pretty much got it. Database testing is done by starting with a clean, up-to-date schema and adding a small amount of known, fixed data into the database. You can then do operations on this controlled environment, knowing what results you expect to see.
Working with network objects is a bit more complex, but it normally involves stubbing them (i.e. removing the inner functionality entirely) or mocking them so that a fixed set of known data is returned.
There is always a way to test your code. If it's proving difficult, it's normally the code design that needs some rethinking.
I don't know any Python specific TDD resources, but a great resource on TDD in general is "Test Driven Development: A Practical Guide" by Coad. It uses Java as the language, but the principles are the same.
most of my functions take large network objects
Without knowing anything about your code, it is hard to assess this claim, but you might want to redesign your code so it is easier to unit test, by decomposing it into smaller methods. Although some high-level methods might deal with those troublesome large objects, perhaps low-level methods do not. You can then unit test those low-level methods, relying on integration tests to test the high-level methods.
Edited:
Before getting to grips with TDD you might want to try just adding some unit tests.
Unit testing is about testing at the fine-grained level: see my answer to the question How do you unit test the real world.
You might have to introduce some indirection into your program, to isolate parts that are impossible to unit test.
You might find it useful to decompose your data into an assembly of smaller classes, which can be tested individually.

What are the limitations of Django's ORM? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've heard developers not wanting to use ORM, but don't know why. What are the shortcomings of the ORM?
Let me start by saying that I fully advocate the use of the ORM for most simple cases. It offers a lot of convenience when working with a very straightforward (relational) data model.
But, since you asked for shortcomings...
From a conceptual point of view, an ORM can never be an effective representation of the underlying data model. It will, at best, be an approximation of your data - and most of the time, this is enough.
The problem is that an ORM will map on a "one class -> one table" basis, which doesn't always work.
If you have a very complex data model - one which, ideally, cannot be properly represented by a single DB table - then you may find that you spend a lot of time fighting against the ORM, rather than having it work for you.
On a practical level, you'll find that there is always a workaround; some developers will be partisan in their support for/against an ORM, but I favour a hybrid approach. The Django works well for this, as you can easily drop into raw SQL as needed. Something like:
Model.objects.raw("SELECT ...")
ORMs take a lot of the work out of the 99.99% of other cases, when you're performing simple CRUD operations against your data.
In my experience, the two best reasons to avoid an ORM altogether are:
When you have complex data that is frequently retrieved via multiple joins and aggregations. Often, writing the SQL by hand will be clearer.
Performance. ORMs are pretty good at constructing optimised queries, but nothing can compete with writing a nice, efficient piece of SQL.
But, when all's said and done, after working extensively with Django, I can count on one hand the number of occasions that the ORM hasn't allowed me to do what I want.
creator of SQLAlchemy's response to the question is django considered now pythonic.. This shows a lots of difference and deep understanding of the system.
sqlalchemy_vs_django_db discussion in reddit
Note: Both the links are pretty long, will take time to read. I am not writing gist of them which may lead to misunderstanding.
Another answer from a Django fan, but:
If you use inheritance and query for parent classes, you can't get children (while you can with SQLAlchemy).
Group By and Having clauses are really hard to translate using the aggregate/annotate.
Some queries the ORM make are just ridiculously long, and sometimes you and up with stuff like model.id IN [1, 2, 3... ludicrous long list]
There is a way ask for raw where "stuff is in field" using __contains, but not "field is in stuff". Since there is no portable way to do this accross DBMS, writting raw SQL for it is really annoying. A lot of small edge cases like this one appear if your application starts to be complex, because as #Gary Chambers said, data in the DBMS doesn't always match the OO model.
It's an abstraction, and sometimes, the abstraction leaks.
But more than often, the people I meet that don't want to use an ORM do it for the wrong reason: intellectual laziness. Some people won't make the effort to give a fair try to something because they know something and want to stick to it. And it's scary how many of them you can find in computer science, where a good part of the job is about keeping up with the new stuff.
Of course, in some area it just make sense. But usually someone with good reason not to use it, will use it in other cases. I never met any serious computer scientist saying to it all, just people not using it in some cases, and being able to explain why.
And to be fair, a lot of programmers are not computer scientists, there are biologists, mathematician, teachers or Bob, the guy next door that just wanna help. From their point of view, it's prefectly logical to not spend hours to learn new stuff when you can do what you want with your toolbox.
There are various problems that seem to arise with every Object-Relational Mapping system, about which I think the classic article is by Ted Neward, who described the subject as "The Vietnam of Computer Science". (There's also a followup in response to comments on that post and some comments from Stack Overflow's own Jeff Atwood here.)
In addition, one simple practical problem with ORM systems is they make it hard to see how many queries (and which queries) are actually being run by a given bit of code, which obviously can lead to performance problems. In Django, using the assertNumQueries assertion in your unit tests really helps to avoid this, as does using django-devserver, a replacement for runserver that can output queries as they're being performed.
One of the biggest problem that come to mind is that Building inheritance into Django ORM's is difficult. Essentially this is due to the fact that (Django) ORM layers are trying to bridge the gap by being both relational & OO. Another thing is of course multiple field foreign keys.
One charge leveled at Django ORM is that they abstract away so much of the database engine that writing efficient, scalable applications with them is impossible. For some kinds of applications - those with millions of accesses and highly interrelated models — this assertion is often true.
The vast majority of Web applications never reach such huge audiences and don't achieve that level of complexity. Django ORMs are designed to get projects off the ground quickly and to help developers jump into database-driven projects without requiring a deep knowledge of SQL. As your Web site gets bigger and more popular, you will certainly need to audit performance as described in the first section of this article. Eventually, you may need to start replacing ORM-driven code with raw SQL or stored procedures (read SQLAlchemy etc).
Happily, the capabilities of Django's ORMs continue to evolve. Django V1.1's aggregation library is a major step forward, allowing efficient query generation while still providing a familiar object-oriented syntax. For even greater flexibility, Python developers should also look at SQLAlchemy, especially for Python Web applications that don't rely on Django.
IMHO the bigger issue with Django ORM is the lack of composite primary keys, this prevents me from using some legacy databases with django.contrib.admin.
I do prefer SqlAlchemy over Django ORM, for projects where django.contrib.admin is not important I tend to use Flask instead of Django.
Django 1.4 is adding some nice "batch" tools to the ORM.

Can a Python list, set or dictionary be implemented invisibly using a database?

The Python native capabilities for lists, sets & dictionaries totally rock. Is there a way to continue using the native capability when the data becomes really big? The problem I'm working on involved matching (intersection) of very large lists. I haven't pushed the limits yet -- actually I don't really know what the limits are -- and don't want to be surprised with a big reimplementation after the data grows as expected.
Is it reasonable to deploy on something like Google App Engine that advertises no practical scale limit and continue using the native capability as-is forever and not really think about this?
Is there some Python magic that can hide whether the list, set or dictionary is in Python-managed memory vs. in a DB -- so physical deployment of data can be kept distinct from what I do in code?
How do you, Mr. or Ms. Python Super Expert, deal with lists, sets & dicts as data volume grows?
I'm not quite sure what you mean by native capabilities for lists, sets & dictionaries. However, you can create classes that emulate container types and sequence types by defining some methods with special names. That means that you could create a class that behaves like a list, but stores its data in a SQL database or on GAE datastore. Simply speaking, this is what an ORM does. However, mapping objects to a database is very complicated and it is probably not a good idea to invent your own ORM, but to use an existing one.
I'm afraid there is no one-size-fits-all solution. Especially GAE is not some kind of of Magic Fairy Dust you can sprinkle on your code to make it scale. There are several limitations you have to keep in mind to create an application that can scale. Some of them are general, like computational complexity, others are specific to the environment your code runs in. E.g. on GAE the maximum response time is limited to 30 seconds and querying the datastore works different that on other databases.
It's hard to give any concrete advice without knowing your specific problem, but I doubt that GAE is the right solution.
In general, if you want to work with large datasets, you either have to keep that in mind from the start or you will have to rework your code, algorithms and data structures as the datasets grow.
You are describing my dreams! However, I think you cannot do it. I always wanted something just like LINQ for Python but the language does not permit to use Python syntax for native database operations AFAIK. If it would be possible, you could just write code using lists and then use the same code for retrieving data from a database.
I would not recommend you to write a lot of code based only in lists and sets because it will not be easy to migrate it to a scalable platform. I recommend you to use something like an ORM. GAE even has its own ORM-like system and you can use other ones such as SQLAlchemy and SQLObject with e.g. SQLite.
Unfortunately, you cannot use awesome stuff such as list comprehensions to filter data from the database. Surely, you can filter data after it was gotten from the DB but you'll still need to build a query with some SQL-like language for querying objects or return a lot of objects from a database.
OTOH, there is Buzhug, a curious non-relational database system written in Python which allows the use of natural Python syntax. I have never used it and I do not know if it is scalable so I would not put my money on it. However, you can test it and see if it can help you.
You can use ORM: Object Relational Mapping: A class gets a table, an objects gets a row. I like the Django ORM. You can use it for non-web apps, too. I never used it on GAE, but I think it is possible.

Performance between Django and raw Python

I was wondering what the performance difference is between using plain python files to make web pages and using Django. I was just wondering if there was a significant difference between the two. Thanks
Django IS plain Python. So the execution time of each like statement or expression will be the same. What needs to be understood, is that many many components are put together to offer several advantages when developing for the web:
Removal of common tasks into libraries (auth, data access, templating, routing)
Correctness of algorithms (cookies/sessions, crypto)
Decreased custom code (due to libraries) which directly influences bug count, dev time etc
Following conventions leads to improved team work, and the ability to understand code
Plug-ability; Create or find new functionality blocks that can be used with minimal integration cost
Documentation and help; many people understand the tech and are able to help (StackOverflow?)
Now, if you were to write your own site from scratch, you'd need to implement at least several components yourself. You also lose most of the above benefits unless you spend an extraordinary amount of time developing your site. Django, and other web frameworks for every other language, are designed to provide the common stuff, and let you get straight to work on business requirements.
If you ever banged out custom session code and data access code in PHP before the rise of web frameworks, you won't even think of the performance cost associated with a framework that makes your job interesting and eas(y)ier.
Now, that said, Django ships with a LOT of components. It is designed in such a way that most of the time, they won't affect you. Still, a surprising amount of code is executed for each request. If you build out a site with Django, and the performance just doesn't cut it, you can feel free to remove all the bits you don't need. Or, you can use a 'slim' python framework.
Really, just use Django. It is quite awesome. It powers many sites millions times larger than anything you (or I) will build. There are ways to improve performance significantly, like utilizing caching, rather than optimizing a loop over custom Middleware.
Depends on how your "plain Python" makes web pages. If it uses a templating engine, for instance, the performance of that engine is going make a huge difference. If it uses a database, what kind of data access layer you use (in the context of the requirements for that layer) is going to make a difference.
The question, thus, becomes a question of whether your arbitrary (and presently unstated) toolchain choices have better runtime performance than the ones selected by Django. If performance is your primary, overriding goal, you certainly should be able to make more optimal selections. However, in terms of overall cost -- ie. buying more web servers for the slower-runtime option, vs buying more programmer-hours for the more-work-to-develop option -- the question simply has too many open elements to be answerable.
Premature optimisation is the root of all evil.
Django makes things extremely convenient if you're doing web development. That plus a great community with hundreds of plugins for common tasks is a real boon if you're doing serious work.
Even if your "raw" implementation is faster, I don't think it will be fast enough to seriously affect your web application. Build it using tools that work at the right level of abstraction and if performance is a problem, measure it and find out where the bottlenecks are and apply optimisations. If after all this you find out that the abstractions that Django creates are slowing your app down (which I don't expect that they will), you can consider moving to another framework or writing something by hand. You will probably find that you can get performance boosts by caching, load balancing between multiple servers and doing the "usual tricks" rather than by reimplementing the web framework itself.
Django is also plain Python.
See the performance mostly relies on how efficient your code is.
Most of the performance issues of software arise from the inefficient code, rather than choice of tools and language. So the implementation matters. AFAIK Django does this excellently and it's performance is above the mark.

Categories