Cannot download Excel file through Drive API - - python

I am trying to download an Excel file using Drive API. Here is my code:
def downloadXlsx(vars, file, creds):
try:
service = build('drive', 'v3', credentials=creds)
fileId = file['id']
fileName = file['name']
# request = service.files().get_media(fileId=fileId)
request = service.files().get_media(fileId=fileId, acknowledgeAbuse=True)
# request = service.files().get(fileId=fileId, supportsTeamDrives=True, fields='*').execute()
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
fh.seek(0)
print('%s%s' % (vars.download_directory, fileName))
with open('%s%s' % (vars.download_directory, fileName), 'wb') as f:
shutil.copyfileobj(fh, f, length=131072)
except HttpError as error:
print(f'An error occurred: {error}')
Every time I run it most files return this error
An error occurred: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/xxxxxxxxxxxxxxxxxxxx?acknowledgeAbuse=true&alt=media returned "This file has been identified as malware or spam and cannot be downloaded.". Details: "[{'domain': 'global', 'reason': 'cannotDownloadAbusiveFile', 'message': 'This file has been identified as malware or spam and cannot be downloaded.'}]">
I tried adding the acknowledgeAbuse=True flag but it doesn't change anything. Previously it would give me this error:
An error occurred: <HttpError 403 when requesting
https://www.googleapis.com/drive/v3/files/1fr7NwhToKFgvbNgExl0QMgurLJlx8KmV?acknowledgeAbuse=true&alt=media
returned "Only the owner can download abusive files.". Details:
"[{'domain': 'global', 'reason': 'cannotDownloadAbusiveFile',
'message': 'Only the owner can download abusive files.',
'locationType': 'parameter', 'location': 'acknowledgeAbuse'}]">
But I no longer get this error and I'm not sure why as I haven't changed anything.
I tried using this line:
request = service.files().get(fileId=fileId, supportsTeamDrives=True, fields='*').execute()
Which would download the file but it would be corrupted and unable to be opened.
Anyway, does anyone have a clue how I can get around this? Maybe a different method I could try or a way to get the .get() to download the file properly? I don't know why it's saying I'm not the owner - if anyone has knowledge on how Drive API determines 'who' is executing the API that would be helpful.
Edit:
I'm looking at the files.get method documentation here and it reads the following:
By default, this responds with a Files resource in the response body.
If you provide the URL parameter alt=media, then the response includes
the file contents in the response body. Downloading content with
alt=media only works if the file is stored in Drive. To download
Google Docs, Sheets, and Slides use files.export instead. For further
information on downloading files, refer to Download files
Seems like I need to specify alt=media somehow but not sure if that is possible in my situation. Maybe that's referring to get_media?

Fixed! It was a bug with Google Drive API. https://issuetracker.google.com/issues/238551542

There is an optional parm that you can send with your file.get request
acknowledgeAbuse boolean Whether the user is acknowledging the risk of downloading known malware or other abusive files. This is only applicable when alt=media. (Default: false)
try
request = service.files().get(fileId=fileId, supportsTeamDrives=True, fields='*', acknowledgeAbuse='true').execute()

Related

Unable to improve transcription accuracy with speech adaptation boost

I'm using SpeechRecognition Python library to perform Speech to Text operations. I'm using the recognize_google_cloud function to use Google Cloud Speech-to-Text API.
Here is my code:
import speech_recognition as sr;
import json;
j = '';
with open('key.json', 'r') as f:
j = f.read().replace('\n', '');
js = json.loads(j);
r = sr.Recognizer();
mic = sr.Microphone();
with candide as source:
audio = r.record(source);
print(r.recognize_google_cloud(audio, language='fr-FR', preferred_phrases=['pistoles', 'disait'], credentials_json=j));
The function recognize_google_cloud send the data captured by the microphone to Google API and selects the most probable transcription of the given speech among a set of alternatives.
The parameter preferered_phrases, as explained in this page of the documentation, is used to select an other alternative that contains the listed words.
It is possible to improve these results using speech adaptation boost. As the version of the SpeechRecognition library doesn't let us to specify a boost value, I updated the speech_recognition/__init__.py file with an hard-coded boost value:
if preferred_phrases is not None:
speech_config["speechContexts"] = {"phrases": preferred_phrases, "boost": 19}
Unfortunately, when I execute my code, I get the following error:
Traceback (most recent call last):
File "/home/pierre/.local/lib/python3.8/site-packages/speech_recognition/__init__.py", line 931, in recognize_google_cloud
response = request.execute()
File "/home/pierre/.local/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/pierre/.local/lib/python3.8/site-packages/googleapiclient/http.py", line 915, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1/speech:recognize?alt=json returned "Invalid JSON payload received. Unknown name "boost" at 'config.speech_contexts': Cannot find field.". Details: "[{'#type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'config.speech_contexts', 'description': 'Invalid JSON payload received. Unknown name "boost" at \'config.speech_contexts\': Cannot find field.'}]}]">
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "spech_reco.py", line 23, in <module>
print(r.recognize_google_cloud(audio, language='fr-FR', preferred_phrases=['pistoles', 'disait'], credentials_json=j));
File "/home/pierre/.local/lib/python3.8/site-packages/speech_recognition/__init__.py", line 933, in recognize_google_cloud
raise RequestError(e)
speech_recognition.RequestError: <HttpError 400 when requesting https://speech.googleapis.com/v1/speech:recognize?alt=json returned "Invalid JSON payload received. Unknown name "boost" at 'config.speech_contexts': Cannot find field.". Details: "[{'#type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'config.speech_contexts', 'description': 'Invalid JSON payload received. Unknown name "boost" at \'config.speech_contexts\': Cannot find field.'}]}]">
Is there an error in my request?
I understand that you are modifying the speech_recognition/__init__.py file of the SpeechRecognition library in order to include the "boost" parameter in your request.
When reviewing this file I noticed that it is using the 'v1' version of the API; however, the "boost" parameter is only supported in the ‘v1p1beta1’ version
Therefore, another of the adaptations that you could make in the code is the following:
`speech_service = build ("speech","v1p1beta1", credentials = api_credentials, cache_discovery = False)`
With this modification you should no longer see the BadRequest error.
At the same time, please consider that this library is a third-party library that uses the Google Speech-to-text API internally. Therefore, if this library does not cover all your current needs, another alternative could create your own implementation directly using the Speech-to-text API Python Client library.

How can I retrieve the URL of a video I have just uploaded?

I have a Python script that pushes a video file to our Vimeo page via the api, which works perfectly. I'm just having trouble retrieving the link for the video we just uploaded. I found a snippet in the example documentation but it doesnt seem to work.
import vimeo
import sys
client = vimeo.VimeoClient(
token="xxxxx",
key="xxxxx",
secret="xxxxx"
)
# Make the request to the server for the "/me" endpoint.
about_me = client.get("/me")
# Make sure we got back a successful response.
assert about_me.status_code == 200
# Load the body"s JSON data. WORKS UP TO THIS LINE ENABLE BELOW
print (about_me.json())
#sys.exit(0)
# Path to upload file
path_to_file = r"C:\Users\mydocs\Documents\SS19xGEN.mp4"
print('Uploading: %s' % path_to_file)
# Push file with credentials
client.upload(path_to_file, data={'name': 'TEST', 'description': 'test'})
# Return the uri
print("The uri for the video is %s" % (client))
video_data = client.get(client + 'fields=link').json()
print(('"%s" has been uploaded to %s' % (path_to_file, video_data['link'])))
The script works well up until the last two lines, which is my attempt at retrieving the URL of the video I just uploaded in the script, but that gives me the error = "Exception has occurred: TypeError
unsupported operand type(s) for +: 'VimeoClient' and 'str'"
I have poured through the documentation and can't find any examples of how to do this, apologies for the beginner question!
According to the docs, the upload method should return the uri:
# Push file with credentials
video_uri = client.upload(path_to_file, data={'name': 'TEST', 'description': 'test'})
# Return the uri
print("The uri for the video is %s" % (video_uri))
Other way is to receive list of videos using parameters sort and per_page=1
video_data = client.get('https://api.vimeo.com/me/videos?sort=date&per_page=1').json()
Add fields you need in the end of url

How do I use python to download an S3 file from a link with a signature and expiration?

I have a s3 link provided to me by a third-party with the following structure: http://s3.amazonaws.com/bucket_name_possibly/path/to/file_possibly/filename?AWSAccessKeyId=SomeKey&Expires=888888&Signature=SomeCharactersPossiblyHTMLencoded
Clicking on the link downloads the file for me. However, in python when I try to use urllib.request.urlretrieve(link_string) on the link I get the error HTTP Error 403: Forbidden
I have also tried using boto3 and manually parsing out the bucket_name, key, AWSAccessKeyID as well as the signature(treating it as the AWSSecretAccessKey - I know that this is probably wrong). I setup a client with the credentials and try to run a get_object method. Something similar to below:
client= boto3.client(
's3',
aws_access_key_id='AWSACCESSKEY',
aws_secret_access_key='SomeCharactersPossiblyHTMLencoded',
config=Config(signature_version='s3v4') # tried with/without this option
)
client.get_object(
Bucket='bucket_name_possibly',
Key='path/to/file_possibly/filename'
)
The resulting error is An error occurred (SignatureDoesNotMatch) when calling the GetObject operation: The request signature we calculated does not match the signature you provided. Check your key and signing method.
I am stuck, how can I get python to programmatically download the link?
You can use boto to download file as follows.
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
for more info you can refer this

Python Youtube API: Attempt to access Watch Later results in invalid URI

BELIEVED SOLVED: Python API only supports v1, while watch later was added in v2. SOURCE
SOLUTION: Use "Experimental" API v3
I am attempting to use the Youtube API to access my Watch Later playlist. Below is the code I am using.
import gdata.youtube
import gdata.youtube.service
yt_service = gdata.youtube.service.YouTubeService()
yt_service.ssl = True
yt_service.developer_key = 'REDACTED'
yt_service.email = 'REDACTED'
yt_service.password = 'REDACTED'
yt_service.ProgrammaticLogin()
playlist_uri = 'https://gdata.youtube.com/feeds/api/users/default/watch_later?v=2'
playlist_video_feed = yt_service.GetYouTubePlaylistVideoFeed(uri=playlist_uri)
for playlist_video_entry in playlist_video_feed.entry:
print playlist_video_entry.title.text
I am receiving the following error.
Traceback (most recent call last):
File "Youtube.py", line 21, in <module>
playlist_video_feed = yt_service.GetYouTubePlaylistVideoFeed(uri=playlist_uri)
File "/Library/Python/2.6/site-packages/gdata/youtube/service.py", line 393, in GetYouTubePlaylistVideoFeed
uri, converter=gdata.youtube.YouTubePlaylistVideoFeedFromString)
File "/Library/Python/2.6/site-packages/gdata/service.py", line 1108, in Get
'reason': server_response.reason, 'body': result_body}
gdata.service.RequestError: {'status': 400, 'body': 'Invalid request URI', 'reason': 'Bad Request'}
It would seem the URI https://gdata.youtube.com/feeds/api/users/default/watch_later?v=2 is invalid. However this is the one stated to be used in the google documents. Am I using it wrong, or is there another issue here?
In addition if I change the URI to http://gdata.youtube.com/feeds/api/playlists/63F0C78739B09958 it works as expected.
You should check your authentication. According to Retrieving and updating a user's 'Watch Later' playlist:
Again, the link will only be present in a profile entry if either of
the following conditions is true:
You submit an authenticated request to retrieve the logged-in user's
own profile.
The watch_later playlist is publicly available for the user whose
profile you are retrieving.
The API server will return a 40x HTTP response code if you try to
retrieve a watch_later playlist and neither of the above conditions is
true.
The second link would work most likely due to the second publicly available condition being met. One thing I do notice missing from your example is the client id/source:
# A complete client login request
yt_service.email = 'jo#gmail.com'
yt_service.password = 'mypassword'
yt_service.source = 'my-example-application'
yt_service.developer_key = 'ABC123...'
yt_service.client_id = 'my-example-application'
yt_service.ProgrammaticLogin()
You should look into that and ensure that your authentication is happening properly.

watch History feed gdata python

I'm trying to get history feed from YouTube of an authenticated user with python.
This is my code :
yt_service = gdata.youtube.service.YouTubeService()
def LogIn():
login_name = raw_input('Email:')
login_pass = getpass.getpass()
try:
yt_service.email = login_name
yt_service.password = login_pass
yt_service.ProgrammaticLogin()
except:
print 'False username or password. Unable to authenticate.'
exit();
def GetHistoryFeed():
uri = 'https://gdata.youtube.com/feeds/api/users/default/watch_history?v=2'
feed = yt_service.GetYouTubeVideoFeed(uri)
#PrintVideoFeed(yt_service.GetYouTubeVideoFeed(uri),'history')
LogIn()
GetHistoryFeed()
and it says gdata.service.RequestError: {'status': 400, 'body': 'Invalid request URI', 'reason': 'Bad Request'} . I know I have to make a authenticated Get request , but i don't know how. What am I doing wrong ?
EDIT
I am facing a major problem. The prog is the same as above , but with yt_service.developer_key = DEVELOPER_KEY added under password line and uri = 'https://gdata.youtube.com/feeds/api/users/default/watch_history?v=2&key=%s'%DEVELOPER_KEY. I tested it in 4 PCs and it runs without errors only in one if them. I get this error :
File "/usr/local/lib/python2.6/dist-packages/gdata/youtube/service.py", line 186, in
return self.Get(uri, converter=gdata.youtube.YouTubeVideoFeedFromString)
File "/usr/local/lib/python2.6/dist-packages/gdata/service.py", line 1108, in Get
'reason': server_response.reason, 'body': result_body}
gdata.service.RequestError: {'status': 400, 'body': 'Invalid request URI', 'reason': 'Bad Request'}
I use python 2.7 and gdata python 2.0 . Why one Pc executes it and the rest of them not? What can i do to fix it ? Please help!
When you attempt to call youtube API, you will first need to register a new application. Reference - https://developers.google.com/youtube/2.0/developers_guide_protocol_authentication
Visit http://code.google.com/apis/youtube/dashboard/ to register your application and retrieve the Developer Key that will be generated for you.
Thereafter, whenever you make a call to youtube APIs, you should include the key query parameter. (reference - https://developers.google.com/youtube/2.0/developers_guide_protocol#Developer_Key)
Your instantiated yt_service will be:-
yt_service.developer_key = DEVELOPER_KEY
where the DEVELOPER_KEY is the one that you get on your newly registered application's dashboard ( http://code.google.com/apis/youtube/dashboard/ ).
Without this DEVELOPER_KEY, google youtube will not know whether your python script is in fact a recognized application, with proper access rights.

Categories