I am trying to write a GCP Cloud Function which collects some data from an API call then stores selected data points in Firestore. I also would like to pass the event IDs to PubSub topic so that I can use them in other Cloud Functions.
So far I have the following:
import base64
import os
import requests
import json
from firebase_admin import firestore
from google.cloud import pubsub_v1
# FIRESTORE DATABASE
db = firestore.Client(project='puntau')
# API INFO
Base_url = 'https://xxxx.net/v1/feeds/sportsbookv2'
Sport_id = 'xxxx'
AppID = 'xxxx'
AppKey = 'xxxx'
Country = 'en_AU'
Site = 'www.xxxx.com.au'
# Publishes a message to a Cloud Pub/Sub topic.
def event_info(self):
event_ids = []
link = f'{Base_url}/event/group/{Sport_id}.json?app_id={AppID}&app_key={AppKey}&local={Country}&site={Site}'
print(link)
# Request data from link as 'str'
data = requests.get(link).text
# convert 'str' to Json
data = json.loads(data)
# JSON PARSE
for event_data in data['events']:
if event_data['path'][1]['name'] == 'NBA' and event_data['groupId'] == 1000093652 and 'MATCH' in event_data['tags']:
competition = event_data['group']
event_id = event_data['id']
event_name = event_data['name']
event_start = event_data['start']
event_status = event_data['state']
print(f'{competition} {event_id} {event_name} {event_start} {event_status}')
event_ids.append(event_id)
# WRITE TO FIRESTORE
doc_ref = db.collection(u'xxxx_au').document(u'basketball_nba').collection(u'event_info').document(f'{event_id}')
doc_ref.set({
u'competition': competition,
u'event_id': event_id,
u'event_name': event_name,
u'event_start': event_start,
u'event_status': event_status,
u'timestamp': firestore.SERVER_TIMESTAMP,
})
return str(event_ids)
event_keys = str(event_ids)
project_id = 'puntau'
topic_id = 'unibet_basketball_nba'
# Instantiates a Pub/Sub client
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, topic_id)
data = event_keys.encode('utf-8')
# Publishes a message to a Pub/Sub topic
future = publisher.publish(topic_path, data)
print(future.result())
output in the logs:
[1018936416, 1018936327, 1018936419, 1018936392, 1018936473, 1018936375, 1018936471]
Plus the data for Firestore is captured and stored no issue.
The problem I have is that the output above (event_keys) are not passed to the PubSub topic.
Is there an issue with my code or with the setup of the function in GCP?
Since you have written PubSub code below the return statement the code was not executing.After rearranging the code it should be working as expected.
From the comments it is clear that after modifying the code it is working as intended.
I am having a problem with the Google Cloud Speech API, every time I run the script the error
six.raise_from (exceptions.from_grpc_error (exc), exc) occurs
File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 RecognitionAudio not set.
he doesn't seem to recognize RecognitionAudio for some reason, I already checked the API documentation but I couldn't solve the problem
I am not understanding the reason for the error, I will leave my code here in case anyone knows and can help me, thanks
import telebot
import requests
from pydub import AudioSegment
import os
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./chatbot.json"
token = "1233361335"
bot = telebot.TeleBot(token)
downloadAudio = "https://api.telegram.org/file/bot{token}/".format(token = token)
#bot.message_handler(commands=['start'])
def send_welcome(message):
bot.reply_to(message, "welcome")
#bot.message_handler(content_types=['voice'])
def handlerAudio(message):
#get audio from telegram
messageVoice = message.voice
#get download link
audioPath = bot.get_file(messageVoice.file_id).file_path
audioLink = downloadAudio+audioPath
#download file
audioFile = requests.get(audioLink)
audioName = "audio.ogg"
#save locally
open(audioName, 'wb').write(audioFile.content)
#convert format to .WAV
AudioSegment.from_file(audioName).export("audio.wav", format="wav")
sound = AudioSegment.from_wav("audio.wav")
sound = sound.set_channels(1) #convert mono
sound.export("audio.wav", format="wav")
client = speech.SpeechClient()
with io.open("audio.wav", 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=48000,
language_code='pt-BR')
response = client.recognize(config, audio)
for result in response.results:
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
#bot.reply_to(message, result.alternatives[0].transcript)
bot.polling()
First of all, I'm a noob with dialogflow and web services. I'm trying to integrate a dialogflow agent I just created and integrate it with my app on my local computer. I was able to get project_id and all other important information but no matter where I look, no one seems to talk about where they get session ids from. Here is the audio-to-text code that I'm using that was forked from api.ai github page:
import os
import dialogflow_v2 as dialogflow
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "My Google Credential"
project_id = 'project id'
session_id = "this i don't know where to get"
audio_file_path = 'my wave file directory name'
language_code = 'en'
def detect_intent_audio(project_id, session_id, audio_file_path,
language_code):
"""Returns the result of detect intent with an audio file as input.
Using the same `session_id` between requests allows continuation
of the conversaion."""
session_client = dialogflow.SessionsClient()
# Note: hard coding audio_encoding and sample_rate_hertz for simplicity.
audio_encoding = dialogflow.enums.AudioEncoding.AUDIO_ENCODING_LINEAR_16
sample_rate_hertz = 44100
session = session_client.session_path(project_id, session_id)
print('Session path: {}\n'.format(session))
with open(audio_file_path, 'rb') as audio_file:
input_audio = audio_file.read()
audio_config = dialogflow.types.InputAudioConfig(
audio_encoding=audio_encoding, language_code=language_code,
sample_rate_hertz=sample_rate_hertz)
query_input = dialogflow.types.QueryInput(audio_config=audio_config)
response = session_client.detect_intent(
session=session, query_input=query_input,
input_audio=input_audio)
print('=' * 20)
print('Query text: {}'.format(response.query_result.query_text))
print('Detected intent: {} (confidence: {})\n'.format(
response.query_result.intent.display_name,
response.query_result.intent_detection_confidence))
print('Fulfillment text: {}\n'.format(
response.query_result.fulfillment_text))
detect_intent_audio(project_id, session_id, audio_file_path,
language_code)
I enabled webhook and linked the webhook to heroku, but still I don't see where I can get this session ID. Can someone help me?
On the link,
https://dialogflow.com/docs/reference/api-v2/rest/v2/projects.agent.sessions/detectIntent
, it is stated under the HTTP request session path parameters that
"It's up to the API caller to choose an appropriate session ID. It can be a random number or some type of user identifier (preferably hashed). The length of the session ID must not exceed 36 bytes."
I'm running online predictions on google cloud machine learning API using the google api python client and a model hosted for me at google cloud.
When I predict sending one image, the server, including all traffic, is taking about 40 seconds. When I send two images, after some time, I receive the message:
timeout: The read operation timed out
I would like to set the timeout to other value, but I didn't find how.
This is my code:
import base64
import io
import time
from PIL import Image
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient import discovery
SCOPES = ['https://www.googleapis.com/auth/cloud-platform']
SERVICE_ACCOUNT_FILE = 'mycredentialsfile.json'
credentials = ServiceAccountCredentials.from_json_keyfile_name(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
ml = discovery.build('ml', 'v1', credentials=credentials)
projectID = 'projects/{}'.format('projectID') + '/models/{}'.format('modelID')
width = 640
height = 480
instances = []
for image in ["image5.jpg", "image6.jpg"]:
img = Image.open(image)
img = img.resize((width, height), Image.ANTIALIAS)
output_str = io.BytesIO()
img.save(output_str, "JPEG")
instance = {"b64": base64.b64encode(output_str.getvalue()).decode("utf-8") }
output_str.close()
instances.append(instance)
input_json = {"instances": instances }
request = ml.projects().predict(body=input_json, name=projectID)
print("Starting prediction")
start_time = time.time()
response = request.execute()
print("%s seconds" % (time.time() - start_time))
I found a way researching samples from google api python client on github and trying same changes.
Using the httplib2 to authenticate you can set the timeout.
Following the modified code:
import base64
import io
import time
from PIL import Image
# Need: pip install google-api-python-client
import httplib2
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient import discovery
SCOPES = ['https://www.googleapis.com/auth/cloud-platform']
# API & Services -> Credentials -> Create Credential -> service account key
SERVICE_ACCOUNT_FILE = 'mycredentialsfile.json'
credentials = ServiceAccountCredentials.from_json_keyfile_name(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
http = httplib2.Http(timeout=200)
http = credentials.authorize(http)
ml = discovery.build('ml', 'v1', http=http)
projectID = 'projects/{}'.format('projectID ') + '/models/{}'.format('modelID')
width = 640
height = 480
instances = []
for image in ["image5.jpg", "image6.jpg"]:
img = Image.open(image)
img = img.resize((width, height), Image.ANTIALIAS)
output_str = io.BytesIO()
img.save(output_str, "JPEG")
instance = {"b64": base64.b64encode(output_str.getvalue()).decode("utf-8") }
output_str.close()
instances.append(instance)
input_json = {"instances": instances }
request = ml.projects().predict(body=input_json, name=projectID)
print("Starting prediction")
start_time = time.time()
response = request.execute()
print("%s seconds" % (time.time() - start_time))
I think with a few modifications you can use this to set timeout to almost any google cloud API in python client.
I hope this helps.
You have already solved the problem, but I found the other way to do this.
import socket
socket.setdefaulttimeout(150)
If call discovery.build without http, http client is instantiated by build_http in build method.
https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-pysrc.html#build_http
As you can see here, build_http create a http client instance with timeout if it is set before creating http client.
So all you have to do is setting this value by socket.setdefaulttimeout :)
yes. I agree with Shohei's answer above. It took me a while to find this simple and elegant resolution. You only need to add the following to the code
import socket
timeout_in_sec = 60*3 # 3 minutes timeout limit
socket.setdefaulttimeout(timeout_in_sec)
# then you could create your ML service object as usually, and it will have the extended timeout limit.
ml_service = discovery.build('ml', 'v1')
I am trying to download closed captions for this public youtube video (just for testing) https://www.youtube.com/watch?v=Txvud7wPbv4
I am using the code sample(captions.py) below that i got from this link https://developers.google.com/youtube/v3/docs/captions/download
I have already stored the client-secrets.json(oauth2 authentification) and youtube-v3-api-captions.json in the same directory (asked in the sample code)
I put this code line in cmd : python captions.py --videoid='Txvud7wPbv4' --action='download'
I get this error:
I don't know why it doesn't recognise the video id of this public video.
Anyone had the a similar issue ?
Thank you all in advance.
Code sample:
# Usage example:
# python captions.py --videoid='<video_id>' --name='<name>' --file='<file>' --language='<language>' --action='action'
import httplib2
import os
import sys
from apiclient.discovery import build_from_document
from apiclient.errors import HttpError
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the {{ Google Cloud Console }} at
# {{ https://cloud.google.com/console }}.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows for full read/write access to the
# authenticated user's account and requires requests to use an SSL connection.
YOUTUBE_READ_WRITE_SSL_SCOPE = "https://www.googleapis.com/auth/youtube.force-ssl"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the APIs Console
https://console.developers.google.com
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
# Authorize the request and store authorization credentials.
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, scope=YOUTUBE_READ_WRITE_SSL_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
# Trusted testers can download this discovery document from the developers page
# and it should be in the same directory with the code.
with open("youtube-v3-api-captions.json", "r") as f:
doc = f.read()
return build_from_document(doc, http=credentials.authorize(httplib2.Http()))
# Call the API's captions.list method to list the existing caption tracks.
def list_captions(youtube, video_id):
results = youtube.captions().list(
part="snippet",
videoId=video_id
).execute()
for item in results["items"]:
id = item["id"]
name = item["snippet"]["name"]
language = item["snippet"]["language"]
print "Caption track '%s(%s)' in '%s' language." % (name, id, language)
return results["items"]
# Call the API's captions.insert method to upload a caption track in draft status.
def upload_caption(youtube, video_id, language, name, file):
insert_result = youtube.captions().insert(
part="snippet",
body=dict(
snippet=dict(
videoId=video_id,
language=language,
name=name,
isDraft=True
)
),
media_body=file
).execute()
id = insert_result["id"]
name = insert_result["snippet"]["name"]
language = insert_result["snippet"]["language"]
status = insert_result["snippet"]["status"]
print "Uploaded caption track '%s(%s) in '%s' language, '%s' status." % (name,
id, language, status)
# Call the API's captions.update method to update an existing caption track's draft status
# and publish it. If a new binary file is present, update the track with the file as well.
def update_caption(youtube, caption_id, file):
update_result = youtube.captions().update(
part="snippet",
body=dict(
id=caption_id,
snippet=dict(
isDraft=False
)
),
media_body=file
).execute()
name = update_result["snippet"]["name"]
isDraft = update_result["snippet"]["isDraft"]
print "Updated caption track '%s' draft status to be: '%s'" % (name, isDraft)
if file:
print "and updated the track with the new uploaded file."
# Call the API's captions.download method to download an existing caption track.
def download_caption(youtube, caption_id, tfmt):
subtitle = youtube.captions().download(
id=caption_id,
tfmt=tfmt
).execute()
print "First line of caption track: %s" % (subtitle)
# Call the API's captions.delete method to delete an existing caption track.
def delete_caption(youtube, caption_id):
youtube.captions().delete(
id=caption_id
).execute()
print "caption track '%s' deleted succesfully" % (caption_id)
if __name__ == "__main__":
# The "videoid" option specifies the YouTube video ID that uniquely
# identifies the video for which the caption track will be uploaded.
argparser.add_argument("--videoid",
help="Required; ID for video for which the caption track will be uploaded.")
# The "name" option specifies the name of the caption trackto be used.
argparser.add_argument("--name", help="Caption track name", default="YouTube for Developers")
# The "file" option specifies the binary file to be uploaded as a caption track.
argparser.add_argument("--file", help="Captions track file to upload")
# The "language" option specifies the language of the caption track to be uploaded.
argparser.add_argument("--language", help="Caption track language", default="en")
# The "captionid" option specifies the ID of the caption track to be processed.
argparser.add_argument("--captionid", help="Required; ID of the caption track to be processed")
# The "action" option specifies the action to be processed.
argparser.add_argument("--action", help="Action", default="all")
args = argparser.parse_args()
if (args.action in ('upload', 'list', 'all')):
if not args.videoid:
exit("Please specify videoid using the --videoid= parameter.")
if (args.action in ('update', 'download', 'delete')):
if not args.captionid:
exit("Please specify captionid using the --captionid= parameter.")
if (args.action in ('upload', 'all')):
if not args.file:
exit("Please specify a caption track file using the --file= parameter.")
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
try:
if args.action == 'upload':
upload_caption(youtube, args.videoid, args.language, args.name, args.file)
elif args.action == 'list':
list_captions(youtube, args.videoid)
elif args.action == 'update':
update_caption(youtube, args.captionid, args.file);
elif args.action == 'download':
download_caption(youtube, args.captionid, 'srt')
elif args.action == 'delete':
delete_caption(youtube, args.captionid);
else:
# All the available methods are used in sequence just for the sake of an example.
upload_caption(youtube, args.videoid, args.language, args.name, args.file)
captions = list_captions(youtube, args.videoid)
if captions:
first_caption_id = captions[0]['id'];
update_caption(youtube, first_caption_id, None);
download_caption(youtube, first_caption_id, 'srt')
delete_caption(youtube, first_caption_id);
except HttpError, e:
print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
else:
print "Created and managed caption tracks."
Your app seems overly-complex... it's structured to be able to do everything that can be done w/captions, not just download. That makes it harder to debug, so I wrote an abridged (Python 2 or 3) version that just downloads & displays captions:
UPDATED SAMPLE (May 2022) (new Python auth libs)
from __future__ import print_function
import os
from google.auth.transport.requests import Request
from google.oauth2 import credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient import discovery
creds = None
SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
TOKENS = 'storage.json'
if os.path.exists(TOKENS):
creds = credentials.Credentials.from_authorized_user_file(TOKENS)
if not (creds and creds.valid):
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file('client_secret.json', SCOPES)
creds = flow.run_local_server()
with open(TOKENS, 'w') as token:
token.write(creds.to_json())
YOUTUBE = discovery.build('youtube', 'v3', credentials=creds)
def process(vid):
caption_info = YOUTUBE.captions().list(part='id',
videoId=vid).execute().get('items', [])
caption_str = YOUTUBE.captions().download(id=caption_info[0]['id'],
tfmt='srt').execute().decode('utf-8')
caption_data = caption_str.split('\n\n')
for line in caption_data:
if line.count('\n') > 1:
i, timecode, caption = line.split('\n', 2)
print('%02d) [%s] %s' % (
int(i), timecode, ' '.join(caption.split())))
if __name__ == '__main__':
import sys
if len(sys.argv) == 2:
vid = sys.argv[1]
process(vid)
else:
print('Usage: python captions-download.py VIDEO_ID')
ORIGINAL SAMPLE (Mar 2017)
from __future__ import print_function
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))
def process(vid):
caption_info = YOUTUBE.captions().list(
part='id', videoId=vid).execute().get('items', [])
caption_str = YOUTUBE.captions().download(
id=caption_info[0]['id'], tfmt='srt').execute()
caption_data = caption_str.split('\n\n')
for line in caption_data:
if line.count('\n') > 1:
i, cap_time, caption = line.split('\n', 2)
print('%02d) [%s] %s' % (
int(i), cap_time, ' '.join(caption.split())))
if __name__ == '__main__':
import sys
if len(sys.argv) == 2:
vid = sys.argv[1]
process(vid)
else:
print('Usage: python captions-download.py VIDEO_ID')
The way it works is this:
You pass in the video ID (VID) as the only argument (sys.argv[1])
It uses that VID to look up the caption IDs with
YOUTUBE.captions().list()
Assuming the video has (at least) one caption track, I grab its ID (caption_info[0]['id'])
Then it calls YOUTUBE.captions().download() with that caption ID requesting the srt track format
All individual captions are delimited by double NEWLINEs, so split on 'em
Loop through each caption; there's data if there are at least 2 NEWLINEs in the line, so only split() on the 1st pair
Display the caption#, timeline of when it appears, then the caption itself, changing all remaining NEWLINEs to spaces
When I run it, I get the expected result... here on a video I own:
$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390 --> 00:00:09,280] iterator cool but that's cool
02) [00:00:09,280 --> 00:00:12,280] your the moment
03) [00:00:13,380 --> 00:00:16,380] and sellers very thrilled
:
Couple of things...
I think you need to be the owner of the video you're trying to download the captions for.
I tried my script on your video, and I get a 403 HTTP Forbidden error
Here are other errors you may get from the API
In your case, it looks like something is messing up the video ID you're passing in.
It thinks you're giving it <code> and </code> (notice the hex 0x3c & 0x3e values)... rich text?
Anyway, this is why I wrote my own, shorter version... so I have a more controlled environment to experiment.
FWIW, since you're new to using Google APIs, I've made a couple of intro videos I made to get developers on-boarded with using Google APIs in this playlist. The auth code is the toughest, so focus on videos 3 and 4 in that playlist to help get you acclimated.
I don't really have any videos that cover YouTube APIs (as I focus more on G Suite APIs) although I do have the one Google Apps Script example (video 22 in playlist); if you're new to Apps Script, you need to review your JavaScript then check out video 5 first. Hope this helps!