Django Static Precompiler for LESS - python

I'm using this - https://github.com/andreyfedoseev/django-static-precompiler and everything seems to work just fine, but I've got one question. Does the compilation of less file occur every time the template with less is used? Or is there some kind of caching? I'm asking because less file can be rather big and if every time a user makes a request less compilation occurs that's really frustrating.

at https://github.com/andreyfedoseev/django-static-precompiler, you can read:
STATIC_PRECOMPILER_USE_CACHE
Whether to use cache for inline compilation. Default: True.
STATIC_PRECOMPILER_CACHE_TIMEOUT
Cache timeout for inline styles (in seconds). Default: 30 days.
STATIC_PRECOMPILER_MTIME_DELAY
Cache timeout for reading the modification time of source files (in seconds). Default: 10 seconds.
STATIC_PRECOMPILER_CACHE_NAME
Name of the cache to be used. If not specified then the default django cache is used. Default: None.

Related

How can I force Python code to read input files again without rebooting my computer

I am scanning through a large number of files looking for some markers. I am starting to be really confident that once I have run through the code one time Python is not rereading the actual files from disk. I find this behavior strange because I was told that one reason I needed to structure my file access in the manner I have is so that the handle and file content is flushed. But that can't be.
There are 9,568 file paths in the list I am reading from. If I shut down Python and reboot my computer it takes roughly 6 minutes to read the files and determine if there is anything returned from the regular expression.
However, if I run the code a second time it takes about 36 seconds. Just for grins, the average document has 53,000 words.
Therefore I am concluding that Python still has access to the file it read in the first iteration.
I want to also observe that the first time I do this I can hear the disk spin (E:\ - Python is on C:). E is just a spinning disk with 126 MB cache - I don't think the cache is big enough to hold the contents of these files. When I do it later I do not hear the disk spin.
Here is the code
import re
test_7A_re = re.compile(r'\n\s*ITEM\s*7\(*a\)*[.]*\s*-*\s*QUANT.*\n',re.IGNORECASE)
no7a = []
for path in path_list:
path = path.strip()
with open(path,'r') as fh:
string = fh.read()
items = [item for item in re.finditer(test_7A_re,string)]
if len(items) == 0:
no7a.append(path)
continue
I care about this for a number of reasons, one is that I was thinking about using multi-processing. But if the bottleneck is reading in the files I don't see that I will gain much. I also think this is a problem because I would be worried about the file being modified and not having the most recent version of the file available.
I am tagging this 2.7 because I have no idea if this behavior is persistent across versions.
To confirm this behavior I modified my code to run as a .py file, and added some timing code. I then rebooted my computer - the first time it ran it took 5.6 minutes and the second time (without rebooting) the time was 36 seconds. Output is the same in both cases.
The really interesting thing is that even if shut down IDLE (but do not reboot my computer) it still takes 36 seconds to run the code.
All of this suggests to me that the files are not read from disk after the first time - this is amazing behavior to me but it seems dangerous.
To be clear, the results are the same - I believe given the timing tests I have run and the fact that I do not hear the disk spinning that somehow the files are still accessible to Python.
This is caused by caching in Windows. It is not related to Python.
In order to stop Windows from caching your reads:
Disable paging file in Windows and fill the RAM up to 90%
Use some tool to disable file caching in Windows like this one.
Run your code on a Linux VM on your Windows machine that has limited RAM. In Linux you can control the caching much better
Make the files much bigger, so that they won't fit in cache
I fail to see why this is a problem. I'm not 100% certain of how Windows handles file cache invalidation, but unless the "Last modified time" changes, you and I and Windows would assume that the file still holds the same content. If the file holds the same content, I don't see why reading from cache can be a problem.
I'm pretty sure that if you change the last modified date, say, by opening the file for write access then closing it right away, Windows will hold sufficient doubts over the file content and invalidate the cache.

Why running django's session cleanup command kill's my machine resources?

I have a one year production site configured with django.contrib.sessions.backends.cached_db backend with a MySQL database backend. The reason why I chose cached_db is a mix of security with read performance.
The problem is, the cleanup command, responsible to delete all expired sessions, was never executed, resulting in a 2.3GB session table data length, 6 million rows and 500Mb index length.
When I try to run the ./manage.py cleanup (in Django 1.3) command, or ./manage.py clearsessions (Django`s 1.5 correspondent), the process never ends (or my patience doesn't complete 3 hours).
The code that Django use's to do this is:
Session.objects.filter(expire_date__lt=timezone.now()).delete()
In a first impression, I think that's normal because the table has 6M rows, but, after I inspect System's monitor, I discover that all memory and cpu was used by the python process, not mysqld, fullfilling my machine's resources. I think that's something terrible wrong with this command code. It seems that python iterates over all founded expired session rows before deleting each of them, one by one. In this case, a code refactoring to just raw a DELETE FROM command can resolve my problem and helps Django community, right? But, if this is the case, a Queryset delete command is acting weird and none optimized in my opinion. Am I right?

aws python boto: looking for reliable way to interrupt get_contents_to_filename

I have a python function that downloads a file from S3 to some temp location on a local drive and then processes it. The download part looks like this:
def processNewDataFile(key):
## templocation below is just some temp local path
key.get_contents_to_filename(templocation)
## further processing
Here key is the AWS key for the file to download. What I've noticed is that occasionally get_contents_to_filename seems to freeze. In other parts of my code I have some solution that interrupts blocks of code (and raises an exception) if these blocks do not complete in a specified amount of time. This solution is hard to use here since files that I need to download vary in size a lot and sometimes S3 responds slower than other times.
So is there any reliable way of interrupting/timing out get_contents_to_filename that does NOT involve a hard predetermined time limit?
thanks
You could use a callback function with get_contents_to_filename
http://boto.cloudhackers.com/en/latest/ref/gs.html#boto.gs.key.Key.get_contents_to_file
The callback function needs two parameters, Bytes Sent and Total Size of the file.
You can specify the granularity (maximum number of times the callback will get called) as well although I've only used it with small files (less than 10kb) and it usually only gets called twice - once on start and once on end.
The important thing is that it will pass the size of the file to the callback function at the start of the transfer, which could then start a timer based on the size of the file.

GAE Backend fails to respond to start request

This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.
So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.

RW-locking a Windows file in Python, so that at most one test instance runs per night

I have written a custom test harness in Python (existing stuff was not a good fit due to lots of custom logic). Windows task scheduler kicks it off once per hour every day. As my tests now take more than 2 hours to run and are growing, I am running into problems. Right now I just check the system time and do nothing unless hour % 3 == 0, but I do not like that. I have a text file that contains:
# This is a comment
LatestTestedBuild = 25100
# Blank lines are skipped too
LatestTestRunStartedDate = 2011_03_26_00:01:21
# This indicates that it has not finished yet.
LatestTestRunFinishDate =
Sometimes, when I kick off a test manually, it can happen at any time, including 12:59:59.99
I want to remove race conditions as much as possible. I would rather put some extra effort once and not worry about practical probability of something happening. So, I think locking a this text file atomically is the best approach.
I am using Python 2.7, Windows Server 2008R2 Pro and Windows 7 Pro. I prefer not to install extra libraries (Python has not been "sold" to my co-workers yet, but I could copy over a file locally that implements it all, granted that the license permits it).
So, please suggest a good, bullet-proof way to solve this.
When you start running a test make a file called __LOCK__ or something. Delete it when you finish, using a try...finally block to ensure that it always gets cleared up. Don't run the test if the file exists. If the computer crashes or similar, delete the file by hand. I doubt you need more cleverness than that.
Are you sure you need 2 hours of tests?! I think 2 minutes is a more reasonable amount of time to spend, though I guess if you are running some complicated numerics you might need more.
example code:
import os
if os.path.exists("__LOCK__"):
raise RuntimeError("Already running.") # or whatever
try:
open("__LOCK__", "w").write("Put some info here if you want.")
finally:
if os.path.exists("__LOCK__"):
os.unlink("__LOCK__")

Categories