This is more of an efficiency question. My django web page is working fine, in the sense that I don't get any errors, but it is very slow. That being said, I don't know where else I would ask this, other than here, so here goes:
I am developing a sales dashboard. In doing so, I am accessing the same data over and over and I would like to speed things up.
For example, one of my metrics is number of opportunities won. This accesses my Opportunities model, sorts out the opportunities won within the last X days and reports it.
Another metric is neglected opportunities. That is, opportunities that are still reported as being worked on, but that there has been no activity on them for Y days. This metric also accesses my Opportunities model.
I read here that querysets are lazy, which, if I understand this concept correctly, would mean that my actual database is accessed only at the very end. Normally this would be an ideal situation, as all of the filters are in place and the queryset only accesses a minimal amount of information.
Currently, I have a separate function for each metric. So, for the examples above, I have compile_won_opportunities and compile_neglected_opportunities. Each function starts with something like:
won_opportunities_query = Opportunities.objects.all()
and then I filter it down from there. If I am reading the documentation correctly, this means that I am accessing the same database many, many times.
There is a noticeable lag when my web page loads. In an attempt to find out what is causing the lag, I commented out different sections of code. When I comment out the code that accesses my database for each function, my web page loads immediately. My initial thought was to access my database in my calling function:
opportunities_query = Opportunities.objects.all()
and then pass that query to each function that uses it. My rationale was that the database would only be accessed one time, but apparently django doesn't work that way, as it made no obvious difference in my page load time. So, after my very long-winded explanation, how can I speed up my page load time?
If I am reading the documentation correctly, this means that I am accessing the same database many, many times.
https://pypi.org/project/django-debug-toolbar/
Btw, go with this one https://docs.djangoproject.com/en/2.2/ref/models/querysets/#select-related
Related
I'm writing a web application in python and postgreSQL. Users are to access a lot of information during a session. All such information (almost) are indexed in the database. My question is, should I litter the code with specific queries, or is it better practice to query larger chunks of information, cashing it, and letting python process the chunk for finer pieces?
For example: A user is to ask for entries in a payment log. Either one writes a query asking for the specific entries requested, or one collect the payment history of the user and then use python to select the specific entries.
Of course cashing is preferred when working with heavy queries, but since nearly all my data is indexed, direct database access is fast and the cashing approach would not yield much if any extra speed. But are there other factors that may still render the cashing approach preferable?
Database designers spend a lot of time on caching and optimization. Unless you hit a specific problem, it's probably better to let the database do the database stuff, and your code do the rest instead of having your code try to take over some of the database functionality.
I'm teaching myself backend and frontend web development (I'm using Flaks if it matters) and I need few pointers for when it comes to unit test my app.
I am mostly concerned with these different cases:
The internal consistency of the data: that's the easy one - I'm aiming for 100% coverage when it comes to issues like the login procedure and, most generally, checking that everything that happens between the python code and the database after every request remain consistent.
The JSON responses: What I'm doing atm is performing a test-request for every get/post call on my app and then asserting that the json response must be this-and-that, but honestly I don't quite appreciate the value in doing this - maybe because my app is still at an early stage?
Should I keep testing every json response for every request?
If yes, what are the long-term benefits?
External APIs: I read conflicting opinions here. Say I'm using an external API to translate some text:
Should I test only the very high level API, i.e. see if I get the access token and that's it?
Should I test that the returned json is what I expect?
Should I test nothing to speed up my test suite and don't make it dependent from a third-party API?
The outputted HTML: I'm lost on this one as well. Say I'm testing the function add_post():
Should I test that on the page that follows the request the desired post is actually there?
I started checking for the presence of strings/html tags in the row response.data, but then I kind of gave up because 1) it takes a lot of time and 2) I would have to constantly rewrite the tests since I'm changing the app so often.
What is the recommended approach in this case?
Thank you and sorry for the verbosity. I hope I made myself clear!
Most of this is personal opinion and will vary from developer to developer.
There are a ton of python libraries for unit testing - that's a decision best left to you as the developer of the project to find one that fits best with your tool set / build process.
This isn't exactly 'unit testing' per se, I'd consider it more like integration testing. That's not to say this isn't valuable, it's just a different task and will often use different tools. For something like this, testing will pay off in the long run because you'll have piece of mind that your bug fixes and feature additions aren't impacting your end to end code. If you're already doing it, I would continue. These sorts of tests are highly valuable when refactoring down the road to ensure consistent functionality.
I would not waste time testing 3rd party APIs. It's their job to make sure their product behaves reliably. You'll be there all day if you start testing 3rd party features. A big reason to use 3rd party APIs is so you don't have to test them. If you ever discover that your app is breaking because of a 3rd party API it's probably time to pick a different API. If your project scales to a size where you're losing thousands of dollars every time that API fails you have a whole new ball of issues to deal with (and hopefully the resources to address them) at that time.
In general, I don't test static content or html. There are tools out there (web scraping tools) that will let you troll your own website for consistent functionality. I would personally leave this as a last priority for the final stages of refinement if you have time. The look and feel of most websites change so often that writing tests isn't worth it. Look and feel is also really easy to test manually because it's so visual.
I have 10 html pages and 10 function in views.py file. I have same query in every function (and same results). So what is the best optimization way not to do same sql query in every pages?
It's a little unclear whether you mean you don't want to hit the database multiple times, or you don't want to write out the actual code multiple times (which is a major part of being "pythonic").
If it's the latter, then look into class based views. These are very powerful tools that allow you to write far less boiler plate code and make your app far more maintainable. For example you could write one base class that contains the query, and all the other views could inherit from this base class and add their own template/whatever it is that you're changing from view to view.
If it's the former, and you're worried about database hits then I'd question A) Is the query really that expensive that it needs to be optimised? and B) If it is that expensive, are you sure there's a very good reason that you need to repeat it in 10 different views?
If the answer is yes to both of those then you'll want to look into caching using something like memcached and django's caching framework.
In future please include more information in your question. Examples of your views/explanations of WHY you need to do what you're trying to do can not only help people answer your question, but also point you in the direction of a better solution to your problem.
The most recent release of the GAE states the following changes:
Datastore
Cross Group (XG) Transactions: For those who need transactional writes
to entities in multiple entity groups (and that's everyone, right?),
XG Transactions are just the thing. This feature uses two phase commit
to make cross group writes atomic just like single group writes.
I think I could use this change within the code of a project I created a while ago but I would like further information regarding this update to the App Engine. I can't seem to find any additional information. So...
How has coding transactions changed, in regards to this update? In layman's terms, how can I implement a cross-group transaction and are there still some limitations to data store transactions that I need to be aware of?
I know this is a rather vague question. My problem is that this sounds very useful, but I'm not sure how to correctly (and effectively) use this change.
Have you read any of the docs? It sounds like you haven't (based on you saying "I can't seem to find any additional information"). In that case, check out the links below and see if still have any questions.
Conceptually, doing a cross group transaction is pretty similar to a typical GAE transaction, just slower, and only available in the HRD. Note that in general, GAE transactions, both "normal" and XG have different isolation characteristics than what you may be used to coming from a SQL database. The second link discusses this immediately after the XG section.
Here is an excerpt from the first link showing how simple using XG can be.
from google.appengine.ext import db
xg_on = db.create_transaction_options(xg=True)
def my_txn():
x = MyModel(a=3)
x.put()
y = MyModel(a=7)
y.put()
db.run_in_transaction_options(xg_on, my_txn)
quick example
slightly more detail
I am coding a psychology experiment in Python. I need to store user information and scores somewhere, and I need it to work as a web application (and be secure).
Don't know much about this - I'm considering XML databases, BerkleyDB, sqlite, an openoffice spreadsheet, or I'm very interested in the python "shelve" library.
(most of my info coming from this thread: http://developers.slashdot.org/story/08/05/20/2150246/FOSS-Flat-File-Database
DATA: I figure that I'm going to have maximally 1000 users. For each user I've got to store...
Username / Pass
User detail fields (for a simple profile)
User scores on the exercise (2 datapoints: each trial gets a score (correct/incorrect/timeout, and has an associated number from 0.1 to 1.0 that I need to record)
Metadata about the trials (when, who, etc.)
Results of data analysis for user
VERY rough estimate, each user generates 100 trials / day. So maximum of 10k datapoints / day. It needs to run that way for about 3 months, so about 1m datapoints. Safety multiplier 2x gives me a target of a database that can handle 2m datapoints.
((note: I could either store trial response data as individual data points, or group trials into Python list objects of varying length (user "sessions"). The latter would dramatically bring down the number database entries, though not the amount of data. Does it matter? How?))
I want a solution that will work (at least) until I get to this 1000 users level. If my program is popular beyond that level, I'm alright with doing some work modding in a beefier DB. Also reiterating that it must be easily deployable as a web application.
Beyond those basic requirements, I just want the easiest thing that will make this work. I'm pretty green.
Thanks for reading
Tr3y
SQLite can certainly handle those amount of data, it has a very large userbase with a few very well known users on all the major platforms, it's fast, light, and there are awesome GUI clients that allows you to browse and extract/filter data with a few clicks.
SQLite won't scale indefinitely, of course, but severe performance problems begins only when simultaneous inserts are needed, which I would guess is a problem appearing several orders of magnitude after your prospected load.
I'm using it since a few years now, and I never had a problem with it (although for larger sites I use MySQL). Personally I find that "Small. Fast. Reliable. Choose any three." (which is the tagline on SQLite's site) is quite accurate.
As for the ease of use... SQLite3 bindings (site temporarily down) are part of the python standard library. Here you can find a small tutorial. Interestingly enough, simplicity is a design criterion for SQLite. From here:
Many people like SQLite because it is small and fast. But those qualities are just happy accidents. Users also find that SQLite is very reliable. Reliability is a consequence of simplicity. With less complication, there is less to go wrong. So, yes, SQLite is small, fast, and reliable, but first and foremost, SQLite strives to be simple.
There's a pretty spot-on discussion of when to use SQLite here. My favorite line is this:
Another way to look at SQLite is this: SQLite is not designed to replace Oracle. It is designed to replace fopen().
It seems to me that for your needs, SQLite is perfect. Indeed, it seems to me very possible that you will never need anything else:
With the default page size of 1024 bytes, an SQLite database is limited in size to 2 terabytes (2^41 bytes).
It doesn't sound like you'll have that much data at any point.
I would consider MongoDB. It's very easy to get started, and is built for multi-user setups (unlike SQLite).
It also has a much simpler model. Instead of futzing around with tables and fields, you simply take all the data in your form and stuff it in the database. Even if your form changes (oops, forgot a field) you won't need to change MongoDB.