In my GAE Python app, I'm writing code to store images in GCS.
I write the images as follows:
bucket_name = os.environ.get(u'BUCKET_NAME', app_identity.get_default_gcs_bucket_name())
filename = u'/{}/{}/image'.format(bucket_name, str(object.key.id()))
mimetype = self.request.POST[u'image_file'].type
gcs_file = cloudstorage.open(filename, 'w', content_type=mimetype,
options={'x-goog-acl': 'public-read'})
gcs_file.write(self.request.get(u'image_file'))
gcs_file.close()
The first time I use this code to write a particular filename, I can access that file with its filename:
https://storage.googleapis.com/<app>.appspot.com/<id>/image
And I can also click the name "image" on the GCS Storage Browser and see the image.
Yay! It all seems to work.
But when I upload a different image to the same filename, something confusing happens: when I display the filename in the browser, either via an <img> tag or as the URL in a separate browser tab, the old image appears. Yet when I display "image" via the GCS Storage Browser, it shows the new image.
By the way, as an additional data point, although I specify public-read when I open the file for writing, the "shared publicly" column is blank for that file on the GCS Storage Browser page.
I tried deleting the file before the open statement, even though w is supposed to act as an overwrite, but it didn't make any difference.
Can anyone explain how the filename continues to access the old version of the file, even though the GCS Storage Browser shows the new version, and more importantly, what I need to do to make the filename access the new version?
EDIT:
Continuing to research this problem, I found the following statement at https://cloud.google.com/storage/docs/accesscontrol:
If you need to ensure that updates become visible immediately, you should set
a Cache-Control header of "Cache-Control:private, max-age=0, no-transform" on
such objects.
However, I can't see how to do this using Cloudstorage "open" command or in any other way from my Python program. So if this is the solution, can someone tell me how to set the Cache-Control header for these image files I'm creating?
Here is an example open setting cache control:
with gcs.open(new_zf.gcs_filename, 'w', content_type=b'multipart/x-zip',
options={b'x-goog-acl': b'public-read', b'cache-control': b'private, max-age=0, no-cache'}) as nzf:
taken from this respository
Related
I am trying to overwrite an image in my Cloud Storage over the Python API, but after I overwrite it and refresh (and delete browsercache) the Cloud Webpage or the public link the image is still the same, even the next day but sometimes it gets randomly updated to the new image!
Edit: The Metadata get updated, but not the filesize-info and it still shows the old image in the Cloud-Webpage and at the public url.
What I am expecting is that if I am uploading a file to Cloud Storage via a API that I can download the new file from the public link a short time afterwards instead of the old image.
I expected to be able to define the cache behaviour with the Cache-Control File-directive (Edit: it is propably not an issue about caching because even the next day the image stays the old one).
This is my code:
blob = bucket.blob(name)
blob.cache_control = "no-store"
blob.upload_from_filename(name)
I tried:
Deleting the old image over the Cloud-Webpage and then after a few
seconds upload the new image with the same name via Python: It works!
I can download the new image from the public link and see it in the
Cloud-Webpage. Edit: It seems to work only some times!
Deleting the Image with Python and directly afterwards upload the new
image via Python: Not working. While it is deleted the public link
doesnt show it. But after I uploaded the new one the public link
shows the old one again.
I read that the standard cache settings of public bucket files is
"public, max-age=3600". So I used the Cache-Control Directive and set
it to "no-store" or "public, age=0". Then I confirmed these
Cache-Control settings are reflected in the headers in the browser
debug console. But still the old image is loading anytime.
I changed the bucket type to regional instead of multi-region. Even after deleting the bucket, recreating it and moving the data inside it again the old image is still showing up!
Any tip is highly appreciated!
I made it work!
It was propably not related to Google Cloud Storage.
But if someone might did the same mistake as I:
I used Django's FilesSystemStorage-Class and saved the new file with the same name as the old one in the /temp directory, assuming that the old one will be overriden if it still exists. But instead it gives the new file another name. And later I upload the old file with blob.upload_from_filename(name)
Thats why all the things happend so randomly.
Thanks to all who thought about solving this!
I'd like to build an application (local, not online) by using front-end web technology for the UI, the application simply displays PDFs and has a few text fields for the user to fill in with regards to the current PDF they're viewing, the user can then export their notes and a file path to the document in CSV file format.
comment about file, some more notes, C:\somefolder\doc1.pdf
comment about file, some more notes, C:\somefolder\doc2.pdf
My first issue, JavaScript can't access the local file system, so I used a file upload form which worked except the filepaths were shown as blob filepaths and not the actual system file path. Other than that my "application" worked as intended.
I went and learned Flask in hopes of using python for the back end, which works great except when I pass in the file path to the pdf C:\SomeFolder\doc1.pdf inside the 'src' attribute for an Chrome says it can't access local files. SO I'm back to sqaure one!
How can I go about building this application with local file access?
If you need to access the local files, you can create an endpoint in flask that launches a file dialog GUI. This only works because you application is hosted locally. You can use either tkinter or the native windows API using win32ui.
Assuming you are using the standard Flask format:
from app import app
#app.route('/file_select', methods=['GET', 'POST'])
def file_select():
from tkinter import Tk
from tkinter.filedialog import askopenfilename
root = Tk()
root.withdraw()
# ensure the file dialog pops to the top window
root.wm_attributes('-topmost', 1)
fname = askopenfilename(parent=root)
return jsonify({'filepath': fname})
or using the win32ui API
#app.route('/file_select', methods=['GET', 'POST'])
def file_select():
import win32ui
winobj = win32ui.CreateFileDialog(1, ".pdf", "", 0,
"PDF Files (*.pdf)|*.pdf|All Files (*.*)|*.*|")
winobj.DoModal()
return jsonify({'filepath': winobj.GetPathName()})
Now just add a button that points to the /file_select route and you will open a file dialog via the python local server and return the selected file.
Assuming you are accessing the page via http://localhost:8080/page or something like that, you should serve your content via that approach. Effectively, rather than serving the files as paths on the local file system, you would create an application route and associate it with a handler than retrieves the appropriate PDF from the local filesystem, and then sends back a response containing Content-Type: application/pdf in the HTTP response headers and the bytes of the PDF file in the response body.
To avoid duplicating someone else's solution for the approach described about, I would recommend taking a look at this answer for "Flask handling a PDF as its own page".
Because you are technically sending the response back from localhost -- or whatever name you are serving it with -- rather than trying to load a local file directly from the client's web-page, Chrome shouldn't throw any complaints.
Of course, it's worth noting that best practices should be taken when determining the file to load, if this were going to be anything more than a learning project. In any legitimate system that did this kind of thing, it would be necessary to perform checks on the requested files to ensure a malicious user does not abuse the application to leak files from the local filesystem, beyond those files which are intended to be served. (To that end, you typically might have the src element contain a parameter that is set to the hash/unique ID for the file which is then mapped via some database to the correct path of the file. Alternatively, you might use a param in the src that contains the name of the file without the full path, and then check that the user-provided value for that parameter in the request does not contain any characters outside of a charset like [a-zA-Z0-9_-].) Ultimately, it sounds like this particular warning doesn't apply to your case, but still providing it in case anyone else reads this in the future.
I think mht is exactly what you want. mht is a file extension recongnized by IE. Internally it is an HTML file. IE (only) treats a mht file with the same security restrictions that a exe might have. You could access the file system, delete a file, display a file etc.. It is everything that html/javascript security was trying to prevent. Now that IE has changed significantly I don't know what the support for this is nowadays. I couldn't find a reference page to give you a link, but it is simple enough - just save a html file with an mht extension
I'd like to download multiple files from a single website, but the biggest quirk I have is that the server automatically generates a random filename upon requesting the file to download. The issue here is then I won't know which file is which, without having to manually go through each file. However, on the site that has the links to download the files, they all have a name. For example...
File name -> Resultant file name(fake file names)
Week1.pdf 2asd123e.pdf
Week1_1.jpg dsfgp142.jpg
.
.
Week10.pdf 19fgmo2o.pdf
Week11.pdf 0we5984w.pdf
If I were to download them manually by myself, I would type click "download" and a popup "Save as" menu comes up, which gives me the option to change the file name manually, then click ok to confirm the download, to which it starts downloading.
Currently, my code is made to open up the website, log into my account, go to the files page, and then find a file, with it's corresponding server request link. IE: . I am able to store the name of the file, "Week1.pdf" into a variable, and click on the request link, but the only problem is that the Save as menu, doesn't have the ability to change the name of the filename, and only gives me the option to view the file, or Save the file immediately. I've looked around a little, and tried to play around with the Firefox profile settings, but nothing has worked. How would I go about solving this problem?
Thanks
I can think of a few things that you might try...
After the file is saved, look in the downloads folder for the most recently saved file (with the correct extension) using time stamps. This will probably be OK as long as you aren't running this threaded.
Get the list of files in the download directory, download the file, find the file that doesn't exist in the list of files. Again this should be safe unless you are running this threaded.
Create a new folder, set the download directory to the newly created folder, download the file. It should be the only file in that directory. As far as I know, you can only set the download directory before creating the driver instance.
In each of these cases, if you plan to download multiple files I would rename each file as you download them or move them into some known directory to make it easier on yourself.
There's another method I ran across in another answer.
I first had a updating problem with using google drive api, Even I followed the example of Quickstart, and after making some changes on it, the file on google drive is updated successfully. But now here comes a new problem after updating, I am not sure if it is because my change to the Quickstart is not proper, or something else. The problem is after updating the an excel file on google drive with an excel file on my local machine, the excel file on my local mahine is not editable if I don't close the IDLE terminal; but if I close the IDLE window, I can do everything with the excel file and save the changes. Such as, without closing the IDLE file, and I made some changes on the excel file and try to save it, then the system says something like sharing violation, and save the file as a temporary file 62635600...., if I try to delete the excel file, then the system says the file is being used by pythonw.exe. After closing the IDLE window, the excel goes back to normal, same as a normal excel file. Anybody has any idea?
Assuming you are using API v3, I believe the following code will do what you are trying to achieve:
def update_file(file_path, fileId):
media_body = MediaFileUpload(file_path, mimetype="application/vnd.ms-excel")
results = file_service.update(fileId=fileId, media_body=media_body).execute()
return results
I am positive about it since I use it in one of my own applications I created to periodically backup files to my Google Drive account.
It also contains some more practical examples to interact with the Google Drive API, in case you are interested.
You can install google drive on your local machine and copy the file into the google drive directory at the correct position. then google drive (the client software) will update the file.
I found an example here, and followed it.
My question is since the "title","descript" and "mimeType" of the local file are exactly same as the file on google drive(cause the local file just has one more row value than the file on google drive, everything else is same), do I need to assign value to these 3 elements? if not, then I can't update the file on google drive with the file on local drive, if I assign value to just "title", the script update the file on google drive with file on local machine, but the weird thing is if I don't close the IDLE terminal, the file on my local machine is not editable, even the phthon script has run successfully and finished, the message I got is something "the file is using by pythonw.exe....", how do I handle this?
def update_file(service, file_id, new_title, new_description, new_mime_type,
new_filename, new_revision):
try:
file = service.files().get(fileId=file_id).execute()
# File's new metadata.
file['title'] = new_title
file['description'] = new_description
file['mimeType'] = new_mime_type
# File's new content.
media_body = MediaFileUpload(
new_filename, mimetype=new_mime_type, resumable=True)
# Send the request to the API.
updated_file = service.files().update(
fileId=file_id,
body=file,
newRevision=new_revision,
media_body=media_body).execute()
return updated_file
except errors.HttpError, error:
print 'An error occurred: %s' % error
return None
I have a simple script set up for getting a file off my google drive account and updating it. I have no problems authenticating and getting access to the drive. The file is in the form of a google spreadsheet on the drive. Thus, when I have the pydrive file object, I get the URL of the file in csv format via google_file['exportLinks']['text/csv']. This has worked in the past, however today I tried this same method for a new file, and instead of getting the csv format of the data, I keep getting HTML. In addition, if I copy and paste the link from my google_file['exportLink']['text/csv'] and put it into a browser, the browser will begin to download the file in csv format as requested. I really have no idea what is going on, especially since this has worked in the past.
Here is basically what my code does:
drive = GoogleDrive(gauth)
drivefiles = drive.ListFile().GetList()
form_file = None
for f in drivefiles:
if f['title'] == formFileName:
form_file = f
break
output = requests.get(form_file['exportLinks']['text/csv'])
print output.text #this ends up being HTML, not text/csv
Has anyone else out there seen this problem before? Should I just try to delete and re-add the google spreadsheet file on the drive?
EDIT: UPDATE
So, after changing permissions on the file on google drive from accessible only to people who the file has been shared with to accessible/editable by all, I was able to access and download the file. Does the clients_secret.json file allow a particular google drive user to securely authorize or is that a general key that allows anyone with it to access the Google API in that particular session? Are there any special data that has to be send over the http request if the file has only been shared with a limited set of email addresses?
I also found handling idiosyncrasies of google drive api little bit tricky. I have written a wrapper around google drive api, which makes it relatively easy to deal with it. See if it helps:-
Google Drive Client
Sample code:-
file_id = 'abc'
file_access_token = '34244324324'
scope = 'https://www.googleapis.com/auth/drive.readonly'
path_to_private_key_file = '#'
service_account_name = '284765467291-0qnqb1do03dlaj0crl88srh4pbhocj35#developer.gserviceaccount.com'
drive_client = GoogleDriveClient(
scope = scope,
private_key = open(path_to_private_key_file).read(),
service_account_name = service_account_name)
file = drive_client.get_drive_file(file_id, file_access_token)
Ofcourse, you need to change access variables according to your profile.
In case anyone else has been having problems with this, changing the settings on the file in the drive to "everyone can view" solved the problem - seems that there were some permission restrictions. Would suggest that you look into changing the permissions on the file to the least restrictive setting (as possible) while debugging.