Get original path from django filefield - python

My django app accepts two files (in this case a jad and jar combo). Is there a way I can preserve the folders they came from?
I need this so I can check later that they came from the same path.
(And later on accept a whole load of files and be able to work out which came from the same folder).

I think that is not possible, most browsers at least firefox3.0 do not allow fullpath to be seen, so even from JavaScript side you can not get full path
If you could get full path you can send it to server, but I think you will have to be satisfied with file name

Related

Python: iNotify_Simple getting files from other directories

I'm using inotify_simple to get notifications from a directory of directories. I'm accessing a directory that has multiple sub-directories, looping through those sub-directories and wanting to use inotify within each directory. My goal for this program is to be notified anytime anything happens (in particular, a creation of a file) in one of the sub-directories.
My file structure:
-mainDir
-subDir1
-file1
-file2
-subDir2
-file3
-file4
...etc.
I'm looping through the directories in mainDir, and setting that path as the path for inotify to search in:
for directory in os.listdir(self.path):
new_path = os.path.join(self.path, directory)
new_curr = self.inotify.new_curr_file_notification(new_path)
New path values are exactly what I expect:
.../mainDir/subDir1
.../mainDir/subDir2
When passing in new_path into my function (which is the path to give inotify), I'm expecting inotify to only look in that directory. However, I'm getting notifications that files in other directories are causing the notification.
path for inotify .../mainDir/subDir1
Event(wd=1, mask=256, cookie=0, name='someFileInSubDir2')
flags.CREATE
Does anyone know why this is happening? And, if anyone has any suggestions to make this process easier/better, I'm all ears! Thanks!
I'm the author of inotify_simple, and since it doesn't have a method called new_curr_file_notification, I'm guessing that's that's something you wrote. Without seeing that method, or some more code that demonstrates how you're using the library exactly, I unfortunately can't give you any advice, as there's not enough information to see how you're using inotify_simple.
If you post a complete example, I will probably be able to tell what's going wrong.
Feel free to post a bug report on the project't github if it looks like there might be a bug.

Sensible way to create filenames for files based on URLs?

I am screenshotting a bunch of web pages, using Python with Selenium. I want to save the PNGs locally for reference. The list of URLs looks something like this:
www.mysite.com/dir1/pageA
www.mysite.com/dir1/pageB
My question is about what filenames to give the screenshotted PNGs.
If I call the image files e.g. www.mysite.com/dir1/pageA.png the meaningless slashes will inevitably cause problems at some point.
I could replace all the / characters in the URL with _, but I suspect that might cause problems too, e.g. if there are already _ characters in the URL. (I don't strictly need to be able to work backwards from the filename to the URL, but it wouldn't be a bad thing.)
What's a sensible way to handle the naming?
The easiest way to represent what's almost certainly a directory structure on the server is to do like wget does and replicate that structure on your local machine.
Thus the / characters become directory delimiters, and your www.mysite.com/dir1/pageA.png would become a PNG file called pageA.png in a directory called dir1, and dir1 is located in a directory called www.mysite.com.
It's simple, guaranteed to be reversible, and doesn't risk ambiguous results.
What if you use '%2F'? It's the '/' but html encoded.
source:
http://www.w3schools.com/tags/ref_urlencode.asp

How can I find if the contents in a Google Drive folder have changed

I am currently working on an app that syncs one specific folder in a users Google Drive. I need to find when any of the files/folders in that specific folder have changed. The actual syncing process is easy, but I don't want to do a full sync every few seconds.
I am condisering one of these methods:
1) Moniter the changes feed and look for any file changes
This method is easy but it will cause a sync if ANY file in the drive changes.
2) Frequently request all files in the whole drive eg. service.files().list().execute() and look for changes within the specific tree. This is a brute force approach. It will be too slow if the user has 1000's of files in their drive.
3) Start at the specific folder, and move down the folder tree looking for changes.
This method will be fast if there are only a few directories in the specific tree, but it will still lead to numerous API requests.
Are there any better ways to find whether a specific folder and its contents have changed?
Are there any optimisations I could apply to method 1,2 or 3.
As you have correctly stated, you will need to keep (or work out) the file hierarchy for a changed file to know whether a file has changed within a folder tree.
There is no way of knowing directly from the changes feed whether a deeply nested file within a folder has been changed. Sorry.
There are a couple of tricks that might help.
Firstly, if your app is using drive.file scope, then it will only see its own files. Depending on your specific situation, this may equate to your folder hierarchy.
Secondly, files can have multiple parents. So when creating a file in folder-top/folder-1/folder-1a/folder-1ai. you could declare both folder-1ai and folder-top as parents. Then you simply need to check for folder-top.

Python/Django: how to get files fastest (based on path and name)

My website users can upload image files, which then need to be found whenever they are to be displayed on a page (using src = ""). Currently, I put all images into one directory. What if there are many files - is it slow to find the right file? Are they indexed? Should I create subdirectories instead?
I use Python/Django. Everything is on webfaction.
The access time for an individual file are not affected by the quantity of files in the same directory.
running ls -l on a directory with more files in it will take longer of course. Same as viewing that directory in the file browser. Of course it might be easier to work with these images if you store them in a subdirectory defined by the user's name. But that just depends on what you are going to doing with them. There is no technical reason to do so.
Think about it like this. The full path to the image file (/srv/site/images/my_pony.jpg) is the actual address of the file. Your web server process looks there, and returns any data it finds or a 404 if there is nothing. What it doesn't do is list all the files in /srv/site/images and look through that list to see if it contains an item called my_pony.jpg.
If only for organizational purposes, and to help with system maintenance you should create subdirectories. Otherwise, there is very little chance you'll run into the maximum number of files that a directory can hold.
There is negligible performance implication for the web. For other applications though (file listing, ftp, backup, etc.) there may be consequences, but only if you reach a very large number of files.

Good way (and/or platform-independent place) to cache data from web

My pygtk program is an editor for XML-based documents which reference other documents, possibly online, that may in turn reference further documents.
When I load a file, the references are resolved and the documents loaded (already asynchronously). However, this process repeats every time I start the editor, so I want some local caching to save bandwidth and time for both the user and the server hosting the referenced documents.
Are there any typical ways this is done? My idea so far would be:
Get a path to a cache directory somehow (platform-independent)
Any ideas?
Put a file named md5(url) there.
If there is a cache file already existing and it's not older than $cache_policy_age take it, otherwise use HTTP (can urllib do that?) to check if it has been modified since it was downloaded.
Personnaly I use os.path.expanduser to find a good place for caches, it's quite common in unix environnement where most of the current user's config/cache is saved under his home directory, the use of a directory name starting with a dot, making an "hidden" directory.
I would do something like :
directory = os.path.join(os.path.expanduser("~"), ".my_cache")
As for the modification date of the distant file you can use urlib :
import urllib
u = urllib.urlopen("http://www.google.com")
u.info().get("last-modified")
However you should check that your HTTP server provides the last-modified HTTP header and that it is a coherent value ! (This is not always the case)

Categories