Put django file object into tikka server - python

In my project I have receiving multiple files using request.FILES.getlist('filedname') and saving it using django forms save method. Again reading the same files using tika server api of python:
def read_by_tika(self, path):
'''file reading using tika server'''
parsed = parser.from_file(str(path))
contents = (parsed["content"].encode('utf-8'))
return contents
Is there any way to directly put list files getting from request.FILES to tikka server without saving it on hard disk.

If the files are small, try using tika's .from_buffer() with file.read(). However, files over 2.5 MBs are anyway saved to temporary files by django, see Where uploaded data is stored. In this case use read_by_tika(file.temporary_file_path()). See also file upload settings

Related

Is it possible to use Django to parse locally stored files (csv for example) without uploading them?

I would like to develop a WebApp that parses locally stored data and allows users to create a sorted excel file.
It would be amazing if I could somehow avoid the uploading of the files. Some users are worried because of the data, the files can get really big so I have to implement async processes and so on...
Is something like that possible?

Streaming a zip file in Python3

In my Django project, I have large zip files I need to send to a user for them to download.
These files can be up to 10Gb big at some points.
The files are already in zip format, and I don't want to extract them.
Instead, I would like to send say 100mb of the file at a time, wait for the frontend to receive them, and then ask for the next 100mb until the file has been completely read. I can handle the collection and download on the frontend, but how can I perform this request in Django?
Disclaimer: I'm a frontend developer, and very new to python.
Have you tried https://github.com/travcunn/django-zip-stream? It looks like what you may be looking for.
Example Use Case:
from django_zip_stream.responses import TransferZipResponse
def download_zip(request):
# Files are located at /home/travis but Nginx is configured to serve from /data
files = [
("/chicago.jpg", "/data/home/travis/chicago.jpg", 4096),
("/portland.jpg", "/data//home/travis/portland.jpg", 4096),
]
return TransferZipResponse(filename='download.zip', files=files)

What's the best way to handle large files upload with aiohttp (server)

I'm currently working on kind of cloud files manager. You can basically upload your files through a website. This website is connected to a python backend which will store your files and manage them using a database. It will basically put every of your files inside a folder and rename them with their hash. The database will associate the name of the file and its categories (kindof folders) with the hash so that you can retrieve the file easily.
My problem is that I would like the file upload to be really user friendly: I have a quite bad connection and when I try to download or upload a file on the internet I often get problems like at 90% of file uploading, the upload fails and I need to restart it. I'ld like to avoid that.
I'm using aiohttp to achieve this, how could I allow a file to be uploaded in multiple times? What should I use to upload large files.
In a previous code which managed really small files (less than 10MB), I was using something like that:
data = await request.post()
data['file'].file.read()
Should I continue to do it this way for large files?

Saving images from Django database in a management command

I've found out how to load images from the file system into Django, but how do you get them out?
I've figured out how to get stuff from the database in my management command and can do a query like:
for m in my_models.objects.filter(get=some):
image = m.image
# How do I copy this to another non-django location?
Images (or other FileFields) are actually regular files stored in somewhere (like a file system or a block storage like S3). Assuming that your files reside on the server's file system, you can use the path property to get the original file path and use Python shell utilities to copy it to another location:
import shutil
for m in my_models.objects.filter(get=some):
image = m.image
shutil.copy(image.path, '/var/tmp/')

Uploading a file to GCS in GAE (Python)

Im uploading files to my buckets in GCS thru GAE
upload_url = blobstore.create_upload_url('/upload', gs_bucket_name='my_bucket')
as described in the documentation and this question
Everything works fine, but, when I try to read the contents I find that the filename is being changed to a key value such as:
L2FwcGhvc3RpbmdfcHJvZC9ibG9icy9BRW5CMlVwOW93MmJzVWRyZ2RQSHJpMlNhMkZNUkloYm9xcnZnZlFzNEZCYnpWaGNENGkROOFk5b2pHSHBMcDIwcGVrVFZtYzdROHRDRWFpdy50YTNpMFdpNmNCQU9NU0xt
Is there any way to get the uploaded name of the file?
Thanks in advance
see: https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage
"In your upload handler, you need to process the returned FileInfo metadata and explicitly store the GCS filename needed to retrieve the blob later."
More on FileInfo: https://cloud.google.com/appengine/docs/python/blobstore/fileinfoclass
and I think this question is similar to How to get FileInfo/gcs file_name for files uploaded via BlobstoreUploadHandler?

Categories