Generic way to display any file from GridFS

Generic way to display any file from GridFS - python

Given a file from GridFS, I'd like to be able to display it on a webpage.
The files in my database can be of any common type, including jpgs, pngs, xml, txt, csv, etc.
A user would like to be able to click on the name of the file and in a new tab the file is displayed whether it is an image or text file, or click download and download the file with its original extension.
The application is in Python. I have seen some solution on here, but they require reading the bytes into a buffer, concatenating, and formatting some markup for an image with the bytes as a base64 string and require that the programmer knows what the extension of the file is and for the code to handle and format each extension case separately.

Related

Python Camelot - export one PDF file to one converted file

Python 3.7 with Camelot 0.7.3.
By default, Camelot exports separate converted files for each page of the pdf file. I need it so that one pdf file exports to one converted file (HTML conversion is what we use), regardless of how many pages the pdf file is. The documentation does not cover this scenario. Is there a way to achieve this without using compress=true? A zip file will not work in our application.

Store pkl / binary in MetaData

I am writing a function that is supposed to store a text representation of a custom class object, cl
I have some code that writes to a file and takes the necessary information out of cl.
Now I need to go backwards, read the file and return a new instance of cl. The problem is, the file doesn't keep all of the important parts of cl because for the purpose of this text document parts of it are unnecessary.
A .jpg file allows you to store meta data like shutter speed and location. I would like to store the parts of cl that are not supposed to be in the text portion in the meta data of a .txt or .csv file. Is there a way to explicitly write something to the metadata of a text file in Python?
Additionally, would it be possible to write the byte-code .pkl representation of the entire object in the metadata?

Text files don't have meta data in the same way that a jpg file does. A jpeg file is specifically designed to have ways of including meta data as extra structured information in the image. Text files aren't: every character in the text file is generally displayed to the user.
Similarly, every thing in a CSV file is part of one cell in the table represented by the file.
That said, there are some things similar to text file metadata that have existed or exist over the years that might give you some ideas. I don't think any of these is ideal, but I'll give some examples to give you an idea how complex the area of meta data is and what people have done in similar situations.
Some filesystems have meta data associated with each file that can be extended. As an example, NTFS has streams; HFS and HFSplus have resource forks or other attributes; Linux has extended attributes on most of its filesystems. You could potentially store your pickle information in those filesystem metadata. There are disadvantages. Some filesystems don't have this meta data. Some tools for copying and manipulating files will not recognize (or intentionally strip) meta data.
You could have a .txt file and a .pcl file, where the .txt file contains your text representation and the .pkl file contained the other information.
Back in the day, some DOS programs would stop reading a text file at a DOS EOF (decimal character 26). I don't think anything behaves like that, but it's an example that there are file formats that allowed you to end the file and then still have extra data that programs could use.
With a format like HTML or an actual spreadsheet instead of CSV, there are ways you could include things in meta data easily.

the difference between .bin file and .mat files

can the tensorflow read a file contain a normal images for example in JPG, .... or the tensorflow just read the .bin file contains images
what is the difference between .mat file and .bin file
Also when I rename the .bin file name to .mat, does the data of the file changed??
sorry maybe my language not clear because I cannot speak English very well

A file-name suffix is just a suffix (which sometimes help to get info about that file; e.g. Windows decides which tool is called when double-clicked). A suffix does not need to be correct. And of course, changing the suffix will not change the content.
Every format will need their own decoder. JPG, PNG, MAT and co.
To some extent, these are automatically used by reading out metadata (giving some assumptions!). Many image-tools have some imread-function which works for jpg and png, even if there is no suffix (because there is checking for common and supported image-formats).
I'm not sure what tensorflow does automatically, but:
jpg, png, bmp should be no problem
worst-case: use scipy to read and convert
mat is usually a matrix (with infinite different encodings) and often matlab-based
scipy can read many matlab-based formats
bin can be anything (usually stands for binary; no clear mapping like the above)
Don't get me wrong, but i expect someone trying to use tensorflow (not a small, not a simple tool) to know that changing a suffix should never magically transform the content to the new format (especially in the lossless/lossy case like png, jpg). I hope you evaluated this decision and you are not running blindly into using a popular tool.

A '.mat' file contains Matlab formatted Data (not matlab code like you would expect from a '.m' file). I'm not sure if you're even using Matlab since you didn't include the the tag in your question. '.mat' files are associated with matlab workspace; if you wanted to save your current workspace in Matlab, you would save it as a '.mat' file.
A '.bin' file is a binary file read by the computer. In general, executable (ready-to-run) programs are often identified as binary files. I think this is what you would want to use. I am unsure what you really want though because the wording of the question is difficult to understand and it seems like you have two questions here.
Changing the suffix of a file just changes what will run the file. For example, if I were to change test.txt to test.py, the data inside the text file remains the same, but the way the file is opened has changed. In this case, the file was a text file usually opened using Notepad (or some variation) then it was opened by python once changed. If you were to change a .jpg file to a txt file, you wouldn't be able to view it as a picture again, but instead, you would open a text file with a bunch of seemingly random characters which describe the picture. The picture data never changed, but the way you see it and are able to use it does.
Take a look at this website which describes the .bin extension pretty well. Also, a quick Google search goes a long way especially with questions like this.

Convert text file into pdf

I have a task to convert simple text file into pdf format. Also I need to add a header to that newly created pdf file.
The server which will have this text file and will convert it does not have any Microsoft Office document or other tools for conversion. One suggested to use python for that task since the server has it installed.
Could you please help me to start with conversion from text to pdf using python?
P.S. My system does not have pyPdf module and I failed to install it.
Thanks
Here is some update:
I run some program which at the end generate manifest. Manifest is a simple text file which looks like .csv file but columns are separated by white space. I ship this manifest to client. My current task is to ship to client additionally to this manifest another file which should have the same content and the header with the client name and be in PDF format.

I am all set now.
I figured out that my server already has pdf installed and the only thing I had to do was to call it. Sorry for confusion.
Ticket could be closed.

Can I reliably figure out the correct mime type to serve untrusted content?

Say I let users upload files to my server, and I let users download them. I'd like to set the mime type to something other than just application/octet-stream, so that if the browser can just open them, it does (say, for images, pdf files, plain text files, etc.) Of course, since the files are uploaded by users, I can't trust the file extension, etc.
Is there a good library for figuring out what mime type goes with an arbitrary blob? Preferably usable from Python :-)
Thanks!

Try python-magic.

Beware of text files: there's no way of knowing what encoding they're in, and there's no reliable way of guessing, especially since most ones created in Windows are in 8-bit MBCS encodings which are indistinguishable without language heuristics. You need to know the encoding--not just the MIME type--to set the complete Content-Type for a file to be viewable in a browser. If you want to allow uploading and displaying text, it's much safer to use an HTML text form than a raw file upload.
Also, note that a file can be multiple file types; for example, self-extracting ZIPs are both valid Windows executables and ZIP files, and can be treated as either.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.