Live Updating Widget for 100+ concurrent users - python

what would you use if you had to have a div box on your website that would have to be updated constantly with new HTML content from the server.
simple polling is probably not very resource inefficient - imagine also having 10'000 users and the div has to update.
what is the most efficient or elegant solution for such a problem?
are there existing widgets which contain this "autoupdate" functionality?

Consider using memcached. By caching content in memory
you will reduce the number of calls to the (database?) server that generates the content.
To keep the content up to date you should use the memcache pattern. A short expiration time will provide more up to date content, a long expiration time will provide better performance.

Related

Notion API quickly delete and repopulate entire DB

Background
I'm creating a Notion DB that will contain data about different analyzers my team uses (analyzer name, location, last time the analyzer sent data, etc.). Since I'm using live data I need to have a way to quickly update the data of all analyzers in the notion db.
I'm currently using a python script to get the analyzers data and upload it to the Notion DB. Currently I read each row, get it's ID that I use to update the row's data - but this is too slow: it takes more than 30 seconds to update 100 rows.
The Question
I'd like to know if there's a way to quickly update the data of many rows (maybe in one big bulk operation). The goal is perhaps 100 row updates per second (instead of 30 seconds).
There are multiple things one could do here - sadly none of it will improve the updates drastically. Currently there is no way to update multiple rows, or to be more precise pages. I am not sure what "read each row" refers to, but you can retrieve multiple pages of a database at once - up to 100. If you are retrieving them one by one, this could be updated.
Secondly, I'd like to know how often the analyzers change and if, will they be altered by the Python script or updated in Notion? If this does not happen too often, you might be able to cache the page_ids and retrieve the ids not every time you update. Sadly the last_edited_time of the database does not reflect any addition or removal of it's children, so simply checking this is not an option.
The third and last way to improve performance is multi-threading. You can send multiple requests at the same time as the amount of requests is usually the bottleneck.
I know none of these will really help you, but sadly no efficient method to update multiple pages exists.
There is also the rate limit of 3 requests per second, which is enforced by Notion to ensure fair performance for all users. If you send more requests, you will start receiving responses with an HTTP 429 code. Your integration should be written in such a way that this response will be respected and should prevent any requests to be sent before the time indicated in the indicated number of seconds as per this page on the notion developer API guidelines.

Storing queryset after fetching it once

I am new to django and web development.
I am building a website with a considerable size of database.
Large amount of data should be shown in many pages, and a lot of this data is repeated. I mean I need to show the same data in many pages.
Is it a good idea to make a query to the database asking for the data in every GET request? it takes many seconds to get the data every time I refresh the page or request another page that has the same data shown.
Is there a way to fetch the data once and store it somewhere and just display it in every page, and only refetch it when some updates are being done.
I thought about the session but I found that it is limited to 5MB which is small for my data.
Any suggestions?
Thank you.
Django's cache - as mentionned by Leistungsabfall - can help, but like most cache systems it has some drawbacks too if you use it naively for this kind of problems (long queries/computations): when the cache expires, the next request will have to recompute the whole thing - which might take some times durring which every new request will trigger a recomputation... Also, proper cache invalidation can be really tricky.
Actually there's no one-size-fits-all answer to your question, the right solution is often a mix of different solutions (code optimisation, caching, denormalisation etc), based on your actual data, how often they change, how much visitors you have, how critical it is to have up-to-date data etc, but the very first steps would be to
check the code fetching the data and find out if there are possible optimisations at this level using QuerySet features (.select_related() / prefetch_related(), values() and/or values_list(), annotations etc) to avoid issues like the "n+1 queries" problem, fetching whole records and building whole model instances when you only need a single field's value, doing computations at the Python level when they could be done at the database level etc
check your db schema's indexes - well used indexes can vastly improve performances, badly used ones can vastly degrade performances...
and of course use the right tools (db query logging, Python's profiler etc) to make sure you identify the real issues.

Documents requests and memory issue using MongoDB

I have documents in Mongo that contain questionnaire-like data (company's name, address, phone numbers, etc.). I should have this data displayed in web interface.
How to make it better: get the whole document as dict into a variable and fill webform from its data or make lots of database requests to fill each field of the web form?
If you are going to display all of the data on the web interface anyways, start simple: Get a the whole document. It reduces your latency by requiring less round trips to the database.
Agree with GPS - if you need all, or most, of the data then it should be a better option to retrieve the document in one request as opposed to hitting the database x number of times.

Caching online prices fetched via API unless they change

I'm currently working on a site that makes several calls to big name online sellers like eBay and Amazon to fetch prices for certain items. The issue is, currently it takes a few seconds (as far as I can tell, this time is from making the calls) to load the results, which I'd like to be more instant (~10 seconds is too much in my opinion).
I've already cached other information that I need to fetch, but that information is static. Is there a way that I can cache the prices but update them only when needed? The code is in Python and I store info in a mySQL database.
I was thinking of somehow using chron or something along that lines to update it every so often, but it would be nice if there was a simpler and less intense approach to this problem.
Thanks!
You can use memcache to do the caching. The first request will be slow, but the remaining requests should be instant. You'll want a cron job to keep it up to date though. More caching info here: Good examples of python-memcache (memcached) being used in Python?
Have you thought about displaying the cached data, then updating the prices via an ajax callback? You could notify the user if the price changed with a SO type notification bar or similar.
This way the users get results immediately, and updated prices when they are available.
Edit
You can use jquery:
Assume you have a script names getPrices.php that returns a json array of the id of the item, and it's price.
No error handling etc here, just to give you an idea
My necklace: <div id='1'> $123.50 </div><br>
My bracelet: <div id='1'> $13.50 </div><br>
...
<script>
$(document).ready(function() {
$.ajax({ url: "getPrices.php", context: document.body, success: function(data){
for (var price in data)
{
$(price.id).html(price.price);
}
}}));
</script>
You need to handle the following in your application:
get the price
determine if the price has changed
cache the price information
For step 1, you need to consider how often the item prices will change. I would go with your instinct to set a Cron job for a process which will check for new prices on the items (or on sets of items) at set intervals. This is trivial at small scale, but when you have tens of thousands of items the architecture of this system will become very important.
To avoid delays in page load, try to get the information ahead of time as much as possible. I don't know your use case, but it'd be good to prefetch the information as much as possible and avoid having the user wait for an asynchronous JavaScript call to complete.
For step 2, if it's a new item or if the price has changed, update the information in your caching system.
For step 3, you need to determine the best caching system for your needs. As others have suggested, memcached is a possible solution. There are a variety of "NoSQL" databases you could check out, or even cache the results in MySQL.
How are you getting the price? If you are scrapping the data from the normal HTML page using a tool such as BeautifulSoup, that may be slowing down the round-trip time. In this case, it might help to compute a fast checksum (such as MD5) from the page to see if it has changed, before parsing it. If you are using a API which gives a short XML version of the price, this is probably not an issue.

How to implement Google-style pagination on app engine?

See the pagination on the app gallery? It has page numbers and a 'start' parameter which increases with the page number. Presumably this app was made on GAE. If so, how did they do this type of pagination? ATM I'm using cursors but passing them around in URLs is as ugly as hell.
You can simply pass in the 'start' parameter as an offset to the .fetch() call on your query. This gets less efficient as people dive deeper into the results, but if you don't expect people to browse past 1000 or so, it's manageable. You may also want to consider keeping a cache, mapping queries and offsets to cursors, so that repeated queries can fetch the next set of results efficiently.
Ben Davies's outstanding PagedQuery class will do everything that you want and more.

Categories