Building a web application that can work with local files - python

I'd like to build an application (local, not online) by using front-end web technology for the UI, the application simply displays PDFs and has a few text fields for the user to fill in with regards to the current PDF they're viewing, the user can then export their notes and a file path to the document in CSV file format.
comment about file, some more notes, C:\somefolder\doc1.pdf
comment about file, some more notes, C:\somefolder\doc2.pdf
My first issue, JavaScript can't access the local file system, so I used a file upload form which worked except the filepaths were shown as blob filepaths and not the actual system file path. Other than that my "application" worked as intended.
I went and learned Flask in hopes of using python for the back end, which works great except when I pass in the file path to the pdf C:\SomeFolder\doc1.pdf inside the 'src' attribute for an Chrome says it can't access local files. SO I'm back to sqaure one!
How can I go about building this application with local file access?

If you need to access the local files, you can create an endpoint in flask that launches a file dialog GUI. This only works because you application is hosted locally. You can use either tkinter or the native windows API using win32ui.
Assuming you are using the standard Flask format:
from app import app
#app.route('/file_select', methods=['GET', 'POST'])
def file_select():
from tkinter import Tk
from tkinter.filedialog import askopenfilename
root = Tk()
root.withdraw()
# ensure the file dialog pops to the top window
root.wm_attributes('-topmost', 1)
fname = askopenfilename(parent=root)
return jsonify({'filepath': fname})
or using the win32ui API
#app.route('/file_select', methods=['GET', 'POST'])
def file_select():
import win32ui
winobj = win32ui.CreateFileDialog(1, ".pdf", "", 0,
"PDF Files (*.pdf)|*.pdf|All Files (*.*)|*.*|")
winobj.DoModal()
return jsonify({'filepath': winobj.GetPathName()})
Now just add a button that points to the /file_select route and you will open a file dialog via the python local server and return the selected file.

Assuming you are accessing the page via http://localhost:8080/page or something like that, you should serve your content via that approach. Effectively, rather than serving the files as paths on the local file system, you would create an application route and associate it with a handler than retrieves the appropriate PDF from the local filesystem, and then sends back a response containing Content-Type: application/pdf in the HTTP response headers and the bytes of the PDF file in the response body.
To avoid duplicating someone else's solution for the approach described about, I would recommend taking a look at this answer for "Flask handling a PDF as its own page".
Because you are technically sending the response back from localhost -- or whatever name you are serving it with -- rather than trying to load a local file directly from the client's web-page, Chrome shouldn't throw any complaints.
Of course, it's worth noting that best practices should be taken when determining the file to load, if this were going to be anything more than a learning project. In any legitimate system that did this kind of thing, it would be necessary to perform checks on the requested files to ensure a malicious user does not abuse the application to leak files from the local filesystem, beyond those files which are intended to be served. (To that end, you typically might have the src element contain a parameter that is set to the hash/unique ID for the file which is then mapped via some database to the correct path of the file. Alternatively, you might use a param in the src that contains the name of the file without the full path, and then check that the user-provided value for that parameter in the request does not contain any characters outside of a charset like [a-zA-Z0-9_-].) Ultimately, it sounds like this particular warning doesn't apply to your case, but still providing it in case anyone else reads this in the future.

I think mht is exactly what you want. mht is a file extension recongnized by IE. Internally it is an HTML file. IE (only) treats a mht file with the same security restrictions that a exe might have. You could access the file system, delete a file, display a file etc.. It is everything that html/javascript security was trying to prevent. Now that IE has changed significantly I don't know what the support for this is nowadays. I couldn't find a reference page to give you a link, but it is simple enough - just save a html file with an mht extension

Related

Twisted web problem when serving docx files

I have a question for you!
I'm running a simple webserver with twistd web and it works great must of the time. I have a problem serving .docx files.
Let me explain with an example.
On my webserver I have two files: file.pdf and file.docx (the x is important).
Now, on my browser, if I enter the URL of the pdf file, the browser will start the download (or open it depending on user preferences). This is the expected behavior.
But if I enter a link to a docx, instead of downloading it, the browser will display it as a sequence of strange letters and numbers.
It is not a browser issue, because if a click on a docx file served from another webserver, the browser will download it.
I'm starting the webserver directly from the windows cmd prompt using twistd. The line looks like this:
twistd -no web --path d:\shares\
The question is: how can I tell twistd to force the download of docx file the same way it does for pdf?
Thanks
It might help if you shared some of your code, but I think the basic idea is that you should add the correct MIME type to the header that your server returns, which will help the browser know what to do with it rather than try to render it as text. Based on the docs here it looks like you want something like this:
from twisted.web import static
root = static.File("/files")
root.contentTypes[".docx"] = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
Using the somewhat long-winded MIME type for docx.

Change the Directory when Upload a file

I would like to change the default location use by the browser when uploading a file.
I mean I have a website and when the client clic on upload file, I would like to point to a target directory (ie Downloads) instead of Desktop.
How can I do it ?
My website use Flask / Python
That's probably impossible. It's the client who choose's where to save data not the server.
You'll need to know the computer's folders paths. And that's called Hacking wich is illegal.

Quick and dirty way to access files external to Django?

I'm working on a demo for a program that creates some files on its own directory. This demo will be shown to someone physically far, via VPN, so I made a simple django project just to receive an input, call some scripts and display the output - the generated file. However, I don't have permission to open the file to display it since it's on a directory outside of the django project (the result is a permission denied error).
I'm aware it's not good practice or even safe for a web server to have access to files outside of its directories, but since this will run in a closed environment for a short amount of time only, is there a workaround?
Think of it this way - If web server can generate the files , it can display them also.
As for your answer - if you know the path of the file, use the python open built in method to open the file and render the result to a template.
data = open('file_path').read().decode('utf-8')
render(request, template, context={data:data})

Send PDF file from Django website to LogicalDOC

I'm developing my Django website since about 2 months and I begin to get a good global result with my own functions.
But, now I have to start a very hard part (to my mind) and I need some advices, ideas before to do that.
My Django website creates some PDF files from HTML templates with Django variables. Up to now, I'm saving PDF files directly on my Desktop (in a specific folder) but it's completely unsecured.
So, I installed another web application which is named LogicalDoc in order to save PDF file directly on this application. PDF files are created and sent to LogicalDoc.
LogicalDoc owns 2 API : SOAP and REST (http://wiki.logicaldoc.com/rest/#/) and I know that Django could communicate with REST method.
I'm reading this part of Django documentation too in order to understand How I can process : https://docs.djangoproject.com/en/dev/topics/http/file-uploads/
I made a scheme in order to understand what I'm exposing :
Then, I write a script which makes some things :
When the PDF file is created, I create a folder inside LogicalDoc which takes for example the following name : lastname_firstname_birthday
Two possibilities : If the folder exists,I don't create a new folder, else I create it.
Once it's done, I send the PDF file directly inside the folder by comparing PDF name with folder name to do that
I have some questions about this process :
Firstly, is it possible to make this kind of things ?
Is it hard to do that ?
What kind of advices could you give me ?
Thank you so much !
PS : If you need some part of my script, mainly PDF creating part, I can post it just after my question ;)
An idea is pretty simple, however it always requires some practice.
I strongly advice you to use REST api and forget about SOAP as the only thing it can bring to you - is 'pain' :)
If we check documentation, document/create it gives next information.
Endpoint we have to communicate with.
[protocol]://[server]:[port]/document/create
HTTP method to use - POST
List of parameters to provide with your request: body,
document, content
Even more, you can test API by clicking on "Try it out" button and check requests in "Network" tab of your browser (if you open Developer Tools)
I am not sure what kind of metadata do you have to provide in 'document' parameter but what I know you can easy get an idea of what should be done by testing it and putting XML or JSON data into 'document' parameter.
Content is an array of bytes transferred to the server (which would be your file).
To sum up, a request to 'document/create' uri will be simple
body = { 'headers': {},'object': {},}
document = "<note>data</note>"
content=open('report.xls', 'rb') #r - reading, b - binary
r = requests.post('http://logicaldoc/document/create', body=body, document=document, content=content)
Please keep in mind that file transferring requests take time and sometimes you may get timeout exception. Your code will stop and will be waiting for response, so it may be a good idea to get some practice with asyncio or celery. Just keep in mind those kind of possible issues.

GAE Python GCS filename accesses old file

In my GAE Python app, I'm writing code to store images in GCS.
I write the images as follows:
bucket_name = os.environ.get(u'BUCKET_NAME', app_identity.get_default_gcs_bucket_name())
filename = u'/{}/{}/image'.format(bucket_name, str(object.key.id()))
mimetype = self.request.POST[u'image_file'].type
gcs_file = cloudstorage.open(filename, 'w', content_type=mimetype,
options={'x-goog-acl': 'public-read'})
gcs_file.write(self.request.get(u'image_file'))
gcs_file.close()
The first time I use this code to write a particular filename, I can access that file with its filename:
https://storage.googleapis.com/<app>.appspot.com/<id>/image
And I can also click the name "image" on the GCS Storage Browser and see the image.
Yay! It all seems to work.
But when I upload a different image to the same filename, something confusing happens: when I display the filename in the browser, either via an <img> tag or as the URL in a separate browser tab, the old image appears. Yet when I display "image" via the GCS Storage Browser, it shows the new image.
By the way, as an additional data point, although I specify public-read when I open the file for writing, the "shared publicly" column is blank for that file on the GCS Storage Browser page.
I tried deleting the file before the open statement, even though w is supposed to act as an overwrite, but it didn't make any difference.
Can anyone explain how the filename continues to access the old version of the file, even though the GCS Storage Browser shows the new version, and more importantly, what I need to do to make the filename access the new version?
EDIT:
Continuing to research this problem, I found the following statement at https://cloud.google.com/storage/docs/accesscontrol:
If you need to ensure that updates become visible immediately, you should set
a Cache-Control header of "Cache-Control:private, max-age=0, no-transform" on
such objects.
However, I can't see how to do this using Cloudstorage "open" command or in any other way from my Python program. So if this is the solution, can someone tell me how to set the Cache-Control header for these image files I'm creating?
Here is an example open setting cache control:
with gcs.open(new_zf.gcs_filename, 'w', content_type=b'multipart/x-zip',
options={b'x-goog-acl': b'public-read', b'cache-control': b'private, max-age=0, no-cache'}) as nzf:
taken from this respository

Categories