In Django/python, how do I set the memcache to infinite time? - python

cache.set(key, value, 9999999)
But this is not infinite time...

def _get_memcache_timeout(self, timeout):
"""
Memcached deals with long (> 30 days) timeouts in a special
way. Call this function to obtain a safe value for your timeout.
"""
timeout = timeout or self.default_timeout
if timeout > 2592000: # 60*60*24*30, 30 days
# See http://code.google.com/p/memcached/wiki/FAQ
# "You can set expire times up to 30 days in the future. After that
# memcached interprets it as a date, and will expire the item after
# said date. This is a simple (but obscure) mechanic."
#
# This means that we have to switch to absolute timestamps.
timeout += int(time.time())
return timeout
And from the FAQ:
What are the limits on setting expire time? (why is there a 30 day limit?)
You can set expire times up to 30 days in the future. After that memcached interprets it as a date, and will expire the item after said date. This is a simple (but obscure) mechanic.

From the docs:
If the value of this settings is None, cache entries will not expire.
Notably, this is different from how the expiration time works in the Memcache standard protocol:
Expiration times can be set from 0, meaning "never expire", to 30
days. Any time higher than 30 days is interpreted as a unix timestamp
date
So, to set a key to never expire, set the timeout to None if you're using Django's cache abstraction, or 0 if you're using Memcache more directly.

Support for non-expiring cache has been added in Django 1.6 by setting timeout=None

Another simple technique is to write the generated HTML out to a file on the disk, and to use that as your cache. It's not hard to implement, and it works quite well as a file-based cache that NEVER expires, is quite transparent, etc.
It's not the django way, but it works well.

Related

How datetime.datetime.now() works without internet connection?

In python, by importing datetime module and using various functions of class datetime.datetime we could get basic dates with formatting and even date arithmetic for deployment.
For example, datetime.datetime.now() will return today's date.
But, today when I run this program there was no internet connection in my computer but still it outputs today's date.
So, how datetime.datetime.now() could return proper date? Is the algorithm automatically increments after 24 hours time ?
tl;dr datetime.datetime.now() uses the clock built into your computer.
Computers have been able to keep fairly accurate time for much longer than the Internet has existed.
For example, PCs feature what's called a real-time clock (RTC). It is battery-powered and can keep the time even when the computer is switched off.
Interestingly, some distributed algorithms require very accurate clocks in order to operate reliably. The required accuracy far exceeds anything that a simple oscillator-based clock can provide.
As a result, companies like Google operate GPS and atomic clocks in their data centres (and even those are not without potential issues, as was demonstrated, for example, on 26 January 2017, when some GPS clocks were out by 13 microseconds for ten hours).
Even though the data centres are connected to the Internet, neither GPS nor atomic clocks require an Internet connection to operate. Besides, someone needs to keep all that Internet time infrastructure running... it can't be that everyone gets their time "off the Internet". ;)
Now that we're on the subject of distributing the time across computer networks, the main protocols for doing that are NTP (Network Time Protocol) and PTP (Precision Time Protocol).
The documentation for datetime.datetime.now() does not state the time is received from the internet.
Return the current local date and time. If optional argument tz is
None or not specified, this is like today(), but, if possible,
supplies more precision than can be gotten from going through a
time.time() timestamp (for example, this may be possible on platforms
supplying the C gettimeofday() function).
If tz is not None, it must be an instance of a tzinfo subclass, and
the current date and time are converted to tz’s time zone. In this
case the result is equivalent to
tz.fromutc(datetime.utcnow().replace(tzinfo=tz)). See also today(),
utcnow().
The datetime is received from the computer time, if you are running windows for example, try to change time from the window and the python will print the time that you changed.
check its documentation : https://docs.python.org/2/library/datetime.html

Delaying 1 second per request, not enough for 3600 per hour

The Amazon API limit is apparently 1 req per second or 3600 per hour. So I implemented it like so:
while True:
#sql stuff
time.sleep(1)
result = api.item_lookup(row[0], ResponseGroup='Images,ItemAttributes,Offers,OfferSummary', IdType='EAN', SearchIndex='All')
#sql stuff
Error:
amazonproduct.errors.TooManyRequests: RequestThrottled: AWS Access Key ID: ACCESS_KEY_REDACTED. You are submitting requests too quickly. Please retry your requests at a slower rate.
Any ideas why?
This code looks correct, and it looks like 1 request/second limit is still actual:
http://docs.aws.amazon.com/AWSECommerceService/latest/DG/TroubleshootingApplications.html#efficiency-guidelines
You want to make sure that no other process is using the same associate account. Depending on where and how you run the code, there may be an old version of the VM, or another instance of your application running, or maybe there is a version on the cloud and other one on your laptop, or if you are using a threaded web server, there may be multiple threads all running the same code.
If you still hit the query limit, you just want to retry, possibly with the TCP-like "additive increase/multiplicative decrease" back-off. You start by setting extra_delay = 0. When request fails, you set extra_delay += 1 and sleep(1 + extra_delay), then retry. When it finally succeeds, set extra_delay = extra_delay * 0.9.
Computer time is funny
This post is correct in saying "it varies in a non-deterministic manner" (https://stackoverflow.com/a/1133888/5044893). Depending on a whole host of factors, the time measured by a processor can be quite unreliable.
This is compounded by the fact that Amazon's API has a different clock than your program does. They are certainly not in-sync, and there's likely some overlap between their "1 second" time measurement and your program's. It's likely that Amazon tries to average out this inconsistency, and they probably also allow a small bit of error, maybe +/- 5%. Even so, the discrepancy between your clock and theirs is probably triggering the ACCESS_KEY_REDACTED signal.
Give yourself some buffer
Here are some thoughts to consider.
Do you really need to hit the Amazon API every single second? Would your program work with a 5 second interval? Even a 2-second interval is 200% less likely to trigger a lockout. Also, Amazon may be charging you for every service call, so spacing them out could save you money.
This is really a question of "optimization" now. If you use a constant variable to control your API call rate (say, SLEEP = 2), then you can adjust that rate easily. Fiddle with it, increase and decrease it, and see how your program performs.
Push, not pull
Sometimes, hitting an API every second means that you're polling for new data. Polling is notoriously wasteful, which is why Amazon API has a rate-limit.
Instead, could you switch to a queue-based approach? Amazon SQS can fire off events to your programs. This is especially easy if you host them with Amazon Lambda.

Why is appengine memcache not storing my data for the requested period of time?

I have my employees stored in appengine ndb and I'm running a cron job via the taskque to generate a list of dictionaries containing the email address of each employee. The resulting list looks something like this:
[{"text":"john#mycompany.com"},{"text":"mary#mycompany.com"},{"text":"paul#mycompany.com"}]
The list is used as source data for varing angular components such as ngTags ngAutocomplete etc. I want to store the list in memcache so the Angular http calls will run faster.
The problem I'm having is that the values stored in memcache never last for more than a few minutes even though I've set it to last 26 hours. I'm aware that the actual value stored can not be over 1mb so as an experiment I hardcoded the list of employees to contain only three values and the problem still persists.
The appengine console is telling me the job ran successfully and if I run the job manually it will load the values into memcache but they'll only stay there for a few minutes. I've done this many times before with far greater amount of data so I can't understand what's going wrong. I have billing enabled and I'm not over quota.
Here is an example of the function used to load the data into memcache:
def update_employee_list():
try:
# Get all 3000+ employees and generate a list of dictionaries
fresh_emp_list = [{"text":"john#mycompany.com"},{"text":"mary#mycompany.com"},{"text":"paul#mycompany.com"}]
the_cache_key = 'my_emp_list'
emp_data = memcache.get(the_cache_key)
# Kill the memcache packet so we can rebuild it.
if emp_data is not None:
memcache.delete(the_cache_key)
# Rebuild the memcache packet
memcache.add(the_cache_key, fresh_emp_list, 93600) # this should last for 26 hours
except Exception as e:
logging.info('ERROR!!!...A failure occured while trying to setup the memcache packet: %s'%e.message)
raise deferred.PermanentTaskFailure()
Here is an example of the function the angular components use to get the data from memcache:
#route
def get_emails(self):
self.meta.change_view('json')
emp_emails = memcache.get('my_emp_list')
if emp_emails is not None:
self.context['data'] = emp_emails
else:
self.context['data'] = []
Here is an example of the cron setting in cron.yaml:
- url: /cron/lookups/update_employee_list
description: Daily rebuild of Employee Data
schedule: every day 06:00
timezone: America/New_York
Why can't appengine memcache hold on to a list of three dictionaries for more than a few minutes?
Any ideas are appreciated. Thanks
Unless you are using dedicated memcache (paid service) the cached values can and will be evicted at any time.
What you tell memcache by specifying a lifetime is when your value becomes invalid and can therefor be removed from memcache. That however does not guarantee that your value will stay that long in the memcache, it's just a capping to a cache value's maximum lifetime.
Note: The more you put in memcache the more it is likely that other values will get dropped. Therefor you should carefully consider what data you put in your cache. You should definitely not put every value you come across in the memcache.
On a sidenote: In the projects i recently worked in, we had a - sort of - maximum cache lifetime of about a day. No cache value ever lasted longer that that, even if the desired lifetime was much higher. Interestingly enough though the cache got cleared out at about the same time every day, even including very new values.
Thus: Never rely on memcache. Always use a persistent storage and memcache for performance boosts with high volume traffic.

Django Static Precompiler for LESS

I'm using this - https://github.com/andreyfedoseev/django-static-precompiler and everything seems to work just fine, but I've got one question. Does the compilation of less file occur every time the template with less is used? Or is there some kind of caching? I'm asking because less file can be rather big and if every time a user makes a request less compilation occurs that's really frustrating.
at https://github.com/andreyfedoseev/django-static-precompiler, you can read:
STATIC_PRECOMPILER_USE_CACHE
Whether to use cache for inline compilation. Default: True.
STATIC_PRECOMPILER_CACHE_TIMEOUT
Cache timeout for inline styles (in seconds). Default: 30 days.
STATIC_PRECOMPILER_MTIME_DELAY
Cache timeout for reading the modification time of source files (in seconds). Default: 10 seconds.
STATIC_PRECOMPILER_CACHE_NAME
Name of the cache to be used. If not specified then the default django cache is used. Default: None.

I want to measure time passed between action in the code

I want to measure time in miliseconds, that this line took:
before=datetime.datetime.now()
response = urllib2.urlopen("https://www.google.com")
after=datetime.datetime.now()
It is supposed to be kind of workaround for server, which doesn't ping back, so I have to measure it from the server response.
I can get the string back string 0:00:00.034225 if I deduct two times and I am able to grab miliseconds as a substring, but I would like to get miliseconds in some cleaner way (whole difference in ms, including time converted from seconds, if the server responds with really big delay).
after - before is a datetime.timedelta object whose total_seconds method will give you what you are looking for. You can find additional information in the Python docs.
You will just have to multiply by 1000 to get milliseconds. Don't worry, although the method is called total_seconds, it includes milliseconds as decimal places. Sample output:
>>> d = t1 - t0
>>> d.total_seconds()
2.429001
This won't give you a timeout though, only a mesurement of the duration.
urlopen allows you to pass a timeout parameter, and will automatically abort after that much time has elapsed. From the docs:
urllib2.urlopen(url[, data][, timeout])
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This actually only works
for HTTP, HTTPS and FTP connections.
Python actually has a mechanism for timing small pieces of code -- timeit.Timer -- but that's for performance profiling and testing, not for implementing your own timeouts.

Categories