Reliably upload large files to Google Drive

Reliably upload large files to Google Drive - python

I am using the Python googleapiclient to upload some large files to Google Drive. I want to make sure the files are uploaded correctly. I looked for ways to get the file's MD5 checksum on Google Drive with no luck. Here is the code:
def print_file_metadata(service, file_id):
"""Print a file's metadata.
Args:
service: Drive API service instance.
file_id: ID of the file to print metadata for.
"""
try:
file = service.files().get(fileId=file_id).execute()
print('Title: %s' % file['title'])
print('MIME type: %s' % file['mimeType'])
except errors.HttpError as error:
print('An error occurred: %s' % error)
For the files I tested, it appears the file dict does not contain its MD5 checksum. Is there any way to get it from the API? Or is there another way of checking if file has been uploaded correctly?

You want to retrieve the file's MD5 checksum using Drive API.
You have already been able to put and get files with Drive API.
From your script, it seems that you are using Drive API v2.
You want to achieve this using google-api-python-client with python.
If my understanding is correct, how about this answer?
When Drive API v2 is used, the values returned from files().get() include the file's MD5 checksum. In this case, please modify your script as follows.
Modified script: For Drive API v2
file = service.files().get(fileId=file_id).execute()
print('Title: %s' % file['title'])
print('MIME type: %s' % file['mimeType'])
print('MD5: %s' % file['md5Checksum']) # <--- Added
Modified script: For Drive API v3
In the case of Drive API v3, the values returned from files().get() don't include the file's MD5 checksum. So as one of several methods, you can use fields='*' like below. But in this case, the filename is file['name']. Please be careful this.
file = service.files().get(fileId=file_id, fields='*').execute() # <--- Modified
print('Title: %s' % file['name']) # <--- Modified
print('MIME type: %s' % file['mimeType'])
print('MD5: %s' % file['md5Checksum']) # <--- Added
References:
Files of Drive API v2
Files: get of Drive API v2
Files of Drive API v3
Files: get of Drive API v3
If I misunderstood your question and this was not the result you want, I apologize.

Related

python "azure-storage-blob" package upload creates empty files

I am using the "azure-storage-blob" package within fastAPI to upload a blob image to my Azure storage blob container. Aftera lot of trial and error I decided to just copy over a static file from my directory to the azure table storage. but everytime I upload the file it gets added as empty. If I write the file locally everything goes fine.
I am using the official documentation as decribed here:
https://pypi.org/project/azure-storage-blob/
I have the following code:
#app.post("/files/")
async def upload(incoming_file: UploadFile = File(...)):
fs = await incoming_file.read()
file_size = len(fs)
print(file_size)
if math.ceil(file_size / 1024) > 64:
raise HTTPException(400, detail="File must be smaller than 64kb.")
if incoming_file.content_type not in ["image/png", "image/jpeg"]:
raise HTTPException(400, detail="File type must either be JPEG or PNG.")
try:
blob = BlobClient.from_connection_string(conn_str=az_connection_string, container_name="app-store-logos",
blob_name="dockerLogo.png")
with open("./dockerLogo.png", "rb") as data:
blob.upload_blob(data)
except Exception as err :
return {"message": "There was an error uploading the file {0}".format(err)}
finally:
await incoming_file.close()
return {"message": f"Successfuly uploaded {incoming_file.filename}"}
When I upload the file to the table storage the entry gets saved but empty:
If I change any filenames or storage names I do get an error, so the files exist and are in the right place, thoug it seems like the azure storage sdk doesnt copy over the contents of the file.
If anyone has any pointers I would be grateful

Error downloading a file from Google Drive

I exported some images from Google Earth Engine to Google Drive. I need to download those images to a local drive using a Python script. Then, I tried to use oauth2client, apiclient as I saw here:
I got a list of files in Drive and the corresponding IDs, then I use the ID to try to download the file using the gdown lib:
gdown.download(f'https://drive.google.com/uc?id={file_data["id"]}',
f'{download_path}{os.sep}{filename_to_download}.tif')
I got the following error message:
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=<id>
As I got the Drive file list, I suppose that the Drive authentication is ok. If I use the error message suggested link in the browser, I can download the file. If a check file properties at Drive, I can see:
Who can access: not shared.
What should I do to download the files?
This is the complete code:
# https://medium.com/swlh/google-drive-api-with-python-part-i-set-up-credentials-1f729cb0372b
# https://levelup.gitconnected.com/google-drive-api-with-python-part-ii-connect-to-google-drive-and-search-for-file-7138422e0563
# https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url
import os
from apiclient import discovery
from httplib2 import Http
from oauth2client import client, file, tools
import gdown
class GoogleDrive(object):
# define API scope
def __init__(self, secret_credentials_file_path = './credentials'):
self.DriveFiles = None
SCOPE = 'https://www.googleapis.com/auth/drive'
self.store = file.Storage(f'{secret_credentials_file_path}{os.sep}credentials.json')
self.credentials = self.store.get()
if not self.credentials or self.credentials.invalid:
flow = client.flow_from_clientsecrets(f'{secret_credentials_file_path}{os.sep}client_secret.json',
SCOPE)
self.credentials = tools.run_flow(flow, self.store)
oauth_http = self.credentials.authorize(Http())
self.drive = discovery.build('drive', 'v3', http=oauth_http)
def RetrieveAllFiles(self):
results = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = self.drive.files().list(**param).execute()
# append the files from the current result page to our list
results.extend(files.get('files'))
# Google Drive API shows our files in multiple pages when the number of files exceed 100
page_token = files.get('nextPageToken')
if not page_token:
break
except Exception as error:
print(f'An error has occurred: {error}')
break
self.DriveFiles = results
def GetFileData(self, filename_to_search):
for file_data in self.DriveFiles:
if file_data.get('name') == filename_to_search:
return file_data
else:
return None
def DownloadFile(self, filename_to_download, download_path):
file_data = self.GetFileData(f'{filename_to_download}.tif')
gdown.download(f'https://drive.google.com/uc?id={file_data["id"]}',
f'{download_path}{os.sep}{filename_to_download}.tif')

Google drive may not be the best tool for this, you may want to upload them into a RAW file hosting service like Imgur and download it to a file using requests, you can then read the file using the script or you don't even have to write it to the file and just use image.content instead to specify the image. Here's an example:
image = requests.get("https://i.imgur.com/5SMNGtv.png")
with open("image.png", 'wb') as file:
file.write(image.content)
(You can specify the location of where you want the file to download by adding the PATH before the file name, like this:)
image = requests.get("https://i.imgur.com/5SMNGtv.png")
with open("C://Users//Admin//Desktop//image.png", 'wb') as file:
file.write(image.content)

Solution 1.
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=<id>
In the sharing tab on gdrive (Right click on image, open Share or Get link), please change privacy to anyone with the link. Hopefully your code should work.
Solution 2.
If you can use Google Colab, then you can mount gdrive easily and access files there using
from google.colab import drive
drive.mount('/content/gdrive')

Google has this policy that they do not accept your regular google-/gmail-password. They only accept so called "App Passwords" that you need to create for your google-account in order to authenticate if you are using thirdparty apps

Uploading image string to Google Drive using pydrive

I need to upload an image string (as the one you get from requests.get(url).content) to google drive using the PyDrive package. I checked a similar question but the answer accepted there was to save it in a temporary file on a local drive and then upload that.
However, I cannot do that because of local storage and permission restrictions.
The accepted answer was previously to use SetContentString(image_string.decode('utf-8')) since
SetContentString requires a parameter of type str not bytes.
However the error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte came up, as in the comments on that answer.
Is there any way to do this without using a temporary file, using PIL/BytesIO/anything that can convert it to be uploaded correctly as a string or somehow using PIL manipulated as an image and uploaded using SetContentFile()?
A basic example of what I'm trying to do is:
img_content = requests.get('https://i.imgur.com/A5gIh7W.jpeg')
file = drive.CreateFile({...})
file.setContentString(img_content.decode('utf-8'))
file.Upload()

When I saw the document (Upload and update file content) of pydrive, it says as follows.
Managing file content is as easy as managing file metadata. You can set file content with either SetContentFile(filename) or SetContentString(content) and call Upload() just as you did to upload or update file metadata.
And, I searched about the method for directly uploading the binary data to Google Drive. But, I couldn't find it. From this situation, I thought that there might not be such method. So, in this answer, I would like to propose to upload the binary data using requests module. In this case, the access token is retrieved from the authorization script of pydrive. The sample script is as follows.
Sample script:
from pydrive.auth import GoogleAuth
import io
import json
import requests
url = 'https://i.imgur.com/A5gIh7W.jpeg' # Please set the direct link of the image file.
filename = 'sample file' # Please set the filename on Google Drive.
folder_id = 'root' # Please set the folder ID. The file is put to this folder.
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
metadata = {
"name": filename,
"parents": [folder_id]
}
files = {
'data': ('metadata', json.dumps(metadata), 'application/json'),
'file': io.BytesIO(requests.get(url).content)
}
r = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart",
headers={"Authorization": "Bearer " + gauth.credentials.access_token},
files=files
)
print(r.text)
Note:
In this script, it supposes that your URL is the direct link of the image file. Please be careful this.
In this case, uploadType=multipart is used. The official document says as follows. Ref
Use this upload type to quickly transfer a small file (5 MB or less) and metadata that describes the file, in a single request. To perform a multipart upload, refer to Perform a multipart upload.
When you want to upload the data of the large size, please use the resumable upload. Ref
References:
Upload and update file content of pydrive
Upload file data of Drive API

KeyError when running codes to read file from google drive in Python

I want to read a file directly from Google Drive using the Google Drive API on Visual Studio Code using Python.
Here is a part of codes:
file2 = drive.CreateFile({'id': file1['<the file ID of my file that is inside the Google Drive>']})
file2.GetContentString('testing.csv')
Upon running this, I get a
KeyError: KeyError('<the file ID of my file that is inside the Google Drive>')
I searched on the internet the possible ways to solve this but nothing seems to work so far...
I followed this tutorial: Hands-on tutorial for managing Google Drive files with Python

{'id': file1['id']} implies that you want to retrieve the id of file1 - a file that you are expected to have uploaded in previous steps
If you did not define file1 in your code, you can hardcode instead the valid id of any file on your Drive as parameter.
Sample:
file2 = drive.CreateFile({'id': '<the file ID of a csv file on your Google Drive>'})
file2.GetContentString('testing.csv')

uploading a file to google drive
The you need to send the MediaFileUpload for a file to be uploaded
file_metadata = {'name': 'photo.jpg'}
media = MediaFileUpload('files/photo.jpg', mimetype='image/jpeg')
file = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print 'File ID: %s' % file.get('id')
view uploaded file on google drive web application.
using file file.get you get a File resporce response. If the file you have uploaded is of a type that Google drive can open and display it will have an property called.
webViewLink string A link for opening the file in a relevant Google editor or viewer in a browser.
You can use that link to open the file over in the Google drive web application. However note that the user opening the file must have permissions on this file to be able to view it.
reading data programmitclly.
Remember Google drive api is just a file storage api it doesn't have the ability to open files it just stores them. If your working with an CSV then you should consider converting it to a google sheet and then using the Google Sheets api to access the data programmaticlly

python + google drive: upload xlsx, convert to google sheet, get sharable link

The flow of my desired program is:
Upload an xlsx spreadsheet to drive (it was created using pandas to_excel)
Convert it to Google Sheets format
Specify that it is editable by anyone with the link
Get the link and share it with someone who will enter information
Download the completed sheet
I am currently using PyDrive, which solves steps 1 and 5, but there are a few unsolved problems.
How can I convert to google sheets format? I tried to just specify the mimeType as 'application/vnd.google-apps.spreadsheet' when I created the file to upload with PyDrive, but that gave me an error.
How can I set the file to be editable by anyone with the link? Once that is set, I can get the sharing link easily enough with PyDrive.
UPDATE: conversion from xlsx to google sheets is easy with a convert=True flag. See below. I am still seeking a way to set the sharing settings of my new file to "anyone with the link can edit".
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
test_file = drive.CreateFile({'title': 'testfile.xlsx'})
test_file.SetContentFile('testfile.xlsx')
test_file.Upload({'convert': True})

There is an Optional query parameter of "convert", for both the "INSERT" and "COPY" method;
convert=true,
Whether to convert this file to the corresponding Google Docs format. (Default: false)
There is a python example here:
Google Documentation - Copy
You need to use the Python client library for the code to work.
from apiclient import errors
from apiclient.http import MediaFileUpload
# ...
def insert_file(service, title, description, parent_id, mime_type, filename):
"""Insert new file.
Args:
service: Drive API service instance.
title: Title of the file to insert, including the extension.
description: Description of the file to insert.
parent_id: Parent folder's ID.
mime_type: MIME type of the file to insert.
filename: Filename of the file to insert.
Returns:
Inserted file metadata if successful, None otherwise.
"""
media_body = MediaFileUpload(filename, mimetype=mime_type, resumable=True)
body = {
'title': title,
'description': description,
'mimeType': mime_type
}
# Set the parent folder.
if parent_id:
body['parents'] = [{'id': parent_id}]
try:
file = service.files().insert(
body=body,
convert=true,
media_body=media_body).execute()
# Uncomment the following line to print the File ID
# print 'File ID: %s' % file['id']
return file
except errors.HttpError, error:
print 'An error occured: %s' % error
return None
I haven't tried this, so you'll need to test it.

In order to set the file to be editable for anyone with the link , you have to insert a new permission with the following information:
from apiclient import errors
# ...
def share_with_anyone(service, file_id):
"""Shares the file with anyone with the link
Args:
service: Drive API service instance.
file_id: ID of the file to insert permission for.
Returns:
The inserted permission if successful, None otherwise.
"""
new_permission = {
'type': "anyone",
'role': "writer",
'withLink': True
}
try:
return service.permissions().insert(
fileId=file_id, body=new_permission).execute()
except errors.HttpError, error:
print 'An error occurred: %s' % error
return None
then to get the link you go to : file["alternateLink"]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reliably upload large files to Google Drive - python

Related

python "azure-storage-blob" package upload creates empty files

Error downloading a file from Google Drive

Uploading image string to Google Drive using pydrive

KeyError when running codes to read file from google drive in Python

python + google drive: upload xlsx, convert to google sheet, get sharable link

Categories

Resources