Passing a memory file to function that requires a filename

Passing a memory file to function that requires a filename - python

I am using pythonista on iOS, although I hope that does not matter. Some lib calls require a path to a json file to render the content into a form/user interface. However as far as I can see, no API to render the JSON data from a variable . I can read in the JSON data and write it out again as a file and use that file, all works correctly. However, I would like to have some type of virtual filename that points to a file object in memory that I can pass to the function. Basically so the function being called is oblivious to the fact that the path i have provided is a memory file handle. I have searched here, it seems this subject is not delt with well. Or I have searched incorrectly. I could imagine, this functionality very sort after.

Related

Python function requires path but I have an image stored in memory

I have a python function (using the Pythonista app) to show an image in the console. I have the image saved in a BytesIO object but the function requires a file path.
Is there any way to give it a path to the bytesIO or somehow give it the image without needing to save it as a file?
The specific function is console.show_image(image_path)

The general answer is that if the function you call expects a filesystem path and cannot handle a file-like object instead then your only solution is to write your data to a file (and ask the function's author to add support for file-like object, or if it's OSS implement it by yourself and send a merge request).

Django unique filename method

I am looking for the method Django uses to generate unique filename when we upload a file.
For example, if I upload a file called test.csv twice in the same directory, the first one will be saved as test.csv and the second file will be saved as test_2.csv. I already tried to find how Django manages that, but I only found django.utils.text.get_valid_filename which could be useful, but that not what I am looking for...
I already saw other topics with random naming solution, that not what I am looking for here :) I really trying to understand how Django manage that problem.

I actually take a closer look with your help and a found something :)
So basically I have to do something like:
from django.core.files.storage import FileSystemStorage
fss = FileSystemStorage()
filepath = fss.get_available_name(filepath)
Thank you all :)
PS: If you are interesting, the comment from django.core.file.storage.FileSystemStorage._save says:
There's a potential race condition between get_available_name and
saving the file; it's possible that two threads might return the
same name, at which point all sorts of fun happens. So we need to
try to create the file, but if it already exists we have to go back
to get_available_name() and try again.

If you see the implementation of class django.core.files.storage.Storage you will know how Django 1.6 manages the file names.
Look into the save method of this class. In this the line
name = self.get_available_name(name)
is doing the trick.
This is the default implementation of getting the new file name before saving the file. If you want to write your own version (like the file should be overridden) then consider writing your own custom storage system

Actually, you were on the right track.
From the docs,
Internally, Django uses a django.core.files.File instance any time it
needs to represent a file.
And also,
Behind the scenes, Django delegates decisions about how and where to
store files to a file storage system
Which means that, when the file is uploaded, using the default storage (FileSystemStorage), Django delegates the naming (or the available name), behind the scene, for the file to the storage, which then uses: get_available_name(name).
So, If you want to change the way files are named when uploaded, you need to add a custom file storage, which would basically only override get_available_name. The documentation on the matter is here.

Is InMemoryUploadedFile really "in memory"?

I understand that opening a file just creates a file handler that takes a fixed memory irrespective of the size of the file.
Django has a type called InMemoryUploadedFile that represents files uploaded via forms.
I get the handle to my file object inside the django view like this:
file_object = request.FILES["uploadedfile"]
This file_object has type InMemoryUploadedFile.
Now we can see for ourselves that, file_object has the method .read() which is used to read files into memory.
bytes = file_object.read()
Wasn't file_object of type InMemoryUploadedFile already "in memory"?

The read() method on a file object is way to access content from within a file object irrespective of whether that file is in memory or stored on the disk. It is similar to other utility file access methods like readlines or seek.
The behavior is similar to what is built into Python which in turn is built over the operating system's fread() method.
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as
a string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire
as close to size bytes as possible. Also note that when in
non-blocking mode, less data than was requested may be returned, even
if no size parameter was given.
On the question of where exactly the InMemoryUploadedFile is stored, it is a bit more complicated.
Before you save uploaded files, the data needs to be stored somewhere.
By default, if an uploaded file is smaller than 2.5 megabytes, Django
will hold the entire contents of the upload in memory. This means that
saving the file involves only a read from memory and a write to disk
and thus is very fast.
However, if an uploaded file is too large, Django will write the
uploaded file to a temporary file stored in your system’s temporary
directory. On a Unix-like platform this means you can expect Django to
generate a file called something like /tmp/tmpzfp6I6.upload. If an
upload is large enough, you can watch this file grow in size as Django
streams the data onto disk.
These specifics – 2.5 megabytes; /tmp; etc. – are simply “reasonable
defaults”. Read on for details on how you can customize or completely
replace upload behavior.

One thing to consider is that in python file like objects have an API that is pretty strictly adhered to. This allows code to be very flexible, they are abstractions over I/O streams. These allow your code to not have to worry about where the data is coming from, ie. memory, filesystem, network, etc.
File like objects usually define a couple methods, one of which is read
I am not sure of the actually implementation of InMemoryUploadedFile, or how they are generated or where they are stored (I am assuming they are totally in memory though), but you can rest assured that they are file like objects and contain a read method, because they adhere to the file api.
For the implementation you could start checking out the source:
https://github.com/django/django/blob/master/django/core/files/uploadedfile.py#L90
https://github.com/django/django/blob/master/django/core/files/base.py
https://github.com/django/django/blob/master/django/core/files/uploadhandler.py

On Windows, how to open for writing a file already opened for writing by another process?

I'm trying to open a logfile which is kept open by another process and remove the first few lines.
On Unix I'd simply do a os.open('/tmp/file.log', os.O_NONBLOCK) and that would get me closer to my goal.
Now i'm stuck with Windows and I need to rotate this log somehow without ending the application holding the file. Is this even possible?
At first I considered opening a file handle on the location where the application expected the log to be and just act as a pipe into a file-handle in Python but I couldn't find any way of doing that either on Windows.
I also thought of just moving the file on a regular basis and letting the application recreate the file but since it's being used by another process that doesn't do much good.
Thought of O_SHLOCK as well but then again, that's Unix and not Windows.
So I went for mmap the file and hope that it would make it a bit more flexible but that led me nowhere.
import mmap
import contextlib
import time
with open(r'test.log', 'r+') as f:
with contextlib.closing(mmap.mmap(f.fileno(), 0)) as m:
while 1:
line = m.readline()
if len(line) > 0:
print line
time.sleep(0.5)
This results in that the application can't access the file because Python is holding it (and vice versa).
Came to think of signal.SIGHUP but that doesn't exist in Windows either so back to square one.
I'm stuck and I've tried it all, can Python help me here or do I need to switch my language?

Even if the application opens the file as a shared object Python can't
so they can't get along by the looks of it.
It's not so bad :). You can (have to) open a file using CreateFile as pointed out by Augusto. You can use standard ctypes module for this. In the question Using a struct as a function argument with the python ctypes module you can see how to do it. Then you have to associate a C run-time file descriptor with an existing operating-system file handle you obtained in the previous step. You can use _open_osfhandle from the MS C run-time library (CRT) to do this. You can call it once again using ctypes; you can access it as ctypes.cdll.msvcrt._open_osfhandle. Then you have to associate Python file object with an existing C run-time file descriptor you obtained in the previous step. To do this in Python 3 you simply pass file descriptor as the first argument to the built-in open function. According to docs
file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to
be opened or an integer file descriptor of the file to be wrapped.
In Python 2 you have to use os.fdopen; its task, according to docs, is to
Return an open file object connected to the file descriptor fd
All of the above should not be required to do such a simple thing. There's hope it will be much simpler when CPython's implementation on Windows starts using native Windows API for files instead of going through C run-time library which does not give access to many features of Windows platform. See Add new io.FileIO using the native Windows API issue for details.

Do you have any control over the application generating the logfile? Because depending on the way the file is open by that application, you really can't modify it.
This link may seem off-topic here, but deep in Windows, what determines the file access to other application is the dwShareMode parameter of the CreateFile function: http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx
The application should enable FILE_SHARE_WRITE and possibly FILE_SHARE_DELETE, plus it should flush and update the file position everytime it writes a file. Looking at the Python documentation for open(), there is no such detailed parameter.

GAE Python how to check file type on upload

So, i'm trying to create an google app engine (python) app that allows people to share files. I have file uploads working well, but my concern is about checking the file extension and making sure, primarily, that the files are read only, and secondly, that they are of the filetype that is specified. These will not be image files, as a know they are a lot of image resources already. Specifically, .stl mesh files, but i'd like to be able to do this more generally.
I know there are modules that can do this, python-magic seems to be able to do this for example, but i can't seem to find any that i'm able to import without LoadModuleRestricted. I'm considering writing my own parser, but that would be a lot of work for such a common (i'm assuming) issue.
Anyway, i'm totally stumped so this is my first stackoverflow question, so hope i'm doing well etiquette wise. Let me know, and thanks!

It sounds like you want to read the first few bytes of the uploaded file to verify that its signature matches the purported mime type. Assuming that you're uploading to blobstore (i.e., via a url obtained from blobstore.get_upload_url(), then once you're redirected to the upload handler whose path you gave to get_upload_url, you can open blob using a BlobReader, then read and verify the signature.
The Blobstore sample app lays out the framework. You'd glue in code in UploadHandler once you have blob_info (using blob_info.key() to open the blob).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.