Can't get all comments from Youtube Data API V3 [Python]

Can't get all comments from Youtube Data API V3 [Python] - python

I have a python function, which allows you to get all comments from a youtube video. Therefore I use the youtube API v3 comments.list method.
key = 'My Key'
textFormat = 'plainText'
part = 'snippet'
maxResult = '100'
order = 'relevance'
nextToken = ''
videoId = 'Ix9NXVIbm2A'
while(True):
response = requests.get("https://www.googleapis.com/youtube/v3/commentThreads?&key="+key+"&part="+part+"&videoId="+idVideo +"&maxResults="+maxResult+"&order="+order+"&pageToken="+nextToken)
data = response.json() #kind - etag - ?nextPageToken
if 'error' in data:
print(data)
break
for item in data['items']:
snippet = item["snippet"]
toplevelcomment = snippet['topLevelComment']
content = toplevelcomment['snippet']
commentid = toplevelcomment['id']
authorname = content['authorDisplayName']
textOriginal = content['textOriginal']
#lists
commentids.append(commentid)
authornames.append(authorname)
textOriginals.append(textOriginal)
if 'nextPageToken' in data:
nextToken = data['nextPageToken']
else:
break
all progress good from pageToken to another. But when it reaches the pageToken number 13, the API always returns
{
'error':
{
'errors':
[
{
'domain': 'youtube.commentThread',
'reason': 'processingFailure',
'message': 'The API server failed to successfully process the request. While this can be a transient error, it usually indicates that the requests input is invalid. Check the structure of the <code>commentThread</code> resource in the request body to ensure that it is valid.',
'locationType': 'other',
'location': 'body'
}
],
'code': 400,
'message': 'The API server failed to successfully process the request. While this can be a transient error, it usually indicates that the requests input is invalid. Check the structure of the <code>commentThread</code> resource in the request body to ensure that it is valid.'
}
}
I'm using a valid key and the pageToken is valid too (returned by the API)
Does anyone have the same problems or am I doing something wrong?

This error come because your api limits is exhausted. Youtube change the limit of api time to time.
And sometimes network problem is also occur. You have to write code for multiple attempt when once request is fail.
You can read full documentation here - [https://developers.google.com/youtube/v3/getting-started#quota][1]
Recently youtube decreased the quota numbers from 1M to 10K per day as of 11-January-2019
currently version 3 allow only 10k units per day.

Related

I am facing an Error while i am scraping YOUTUBE information using Google API

I have list of youtube ids. I wanted to get details about each video such as title, viewcount, likecount, etc. My code was working well until yesterday, suddenly it is throwing an error.
For the experts, initially I have list of video ids. For sample I have used 3 ids. But in actual I have 100+ ids.
video_id=['GGHiHodljug','PY8cEyi2MzM','QTCcJipjgxI']
Later, I have defined a function to retrieve video information and append it to various list.
def get_video_details(youtube, video_id):
all_video_stats = []
for i in video_ids:
request = youtube.videos().list(part='snippet,statistics',id=video_id)
response = request.execute()
for video in response['items']:
video_stats = dict(Title = video['snippet']['title'],
Published_date = video['snippet']['publishedAt'],
Views = video['statistics']['viewCount'],
Likes = video['statistics']['likeCount'],
#Dislikes = video['statistics']['dislikeCount'],
Comments = video['statistics']['commentCount']
#Shares = video['statistics']['']
)
all_video_stats.append(video_stats)
return all_video_stats
Now, I tried to call upon the function. But it throws an error, which i never faced till yesterday.
video_details = get_video_details(youtube,video_id)
The error that was received is shown below.
HttpError: <HttpError 400 when requesting
https://youtube.googleapis.com/youtube/v3/videos?part=snippet%2Cstatistics&id=kwlJUFUJeBU&id=9qjgV9UynUU&id=9BRNhfz9TZU&id=Cm614VNaapI&id=Gk0O1hBZL1g&id=QuIfcSDxEZY&id=lO48EoPn7-A&id=n1S886NBQYA&id=y4DGkrs7KcM&id=vk3ahhtY5qE&id=Zr3cAIU6bZM&id=fLC_LfstdAA&id=s4aneR0tc1s&id=A0FJpd-lieI&id=7fFFfU6Cgmk&id=4VJSE59j1pM&id=ns51Rp7o_Bw&id=LQrMIBpGrbA&id=XOeTeM-1qKc&id=f6Ms318wTj0&id=etfoGZquiSA&id=a1LUlovdL_A&id=nTTBa4-0Z7I&id=6KOYprSy1KA&id=wjPmweQ4peQ&id=Cw0xY42b_Mo&id=-n_mVFPzeuY&id=R8ZNqKerC1o&id=JnZhAPBfPYw&id=8o2yLPDr_d8&id=DOoE4hPiJuw&id=M4HRJXRZZAM&id=cNedzwOOqag&id=Qqj96rMBxTE&id=_0J7VcMtKkQ&id=kZ8ObOkzq4w&id=6P_wnhUb02g&id=kH5UPfi3cFQ&id=qtDl9Yu8or0&id=Xzn-URGHItA&id=xutJ2uQ4tT0&id=bZ71zj8BunU&id=gHtjvLejc6E&id=92pTgb_7QI4&id=KKZkES9gfuE&id=Qj01vTif2yI&id=Qkc1A8cJEmw&id=whBSkar_rCs&id=wOPEC95vVUs&id=pi3Tm9lzvLU&id=Ip-PimAxgjw&id=r1W13UnMwLU&id=z68p5nBZ03k&id=6wMuvqUkgqY&id=b7AYFCw8cQA&id=nMrEiom0S74&id=-bGqNqT8Ckg&id=CSXPZuNVqGw&id=wP09CGKyDdg&id=eY_fPjwZ3hU&id=nT9mgg7BtH4&id=D2SqeQPr38I&id=h-0ELP6rNhQ&id=33OOngFBcnk&id=ojbMrcTlQ_s&id=Kt6xncrWZUQ&id=xUCWiEhCIgI&id=jDEC_sV67B4&id=Dxw9xEpkeYk&id=tFiAsk_sOC0&id=eLLXCtd187I&id=3y499rX5A8Q&id=keTSJlzVu2I&id=RtnXrhDxpgo&id=VmbouDaMoM4&id=NWBNOA_fwcA&id=7mTnG30Y8c8&id=gOM5A0w8V1s&id=rLEDXBhuspk&id=XQ-NNWBySIo&id=_klVNQEzauU&id=ukNGhEZLEhw&id=eI8quor4HmM&id=DLbfMGozcyM&id=GZBAUuUbCFs&id=qwquUC4Wk0s&id=aEsEnJMiRqU&id=EKPQX6LC4Uw&id=HmAn0BgKGhA&id=0jlFNP8nDzs&id=n3iSKVMfbFw&id=Ypaah0zPWIo&id=oyOHOcyp2T8&id=aLmJ5zXsVtk&id=dEsEaJGYAZM&id=KGe7nId4GGs&id=BoRgRqqeiw8&id=GGHiHodljug&id=PY8cEyi2MzM&id=QTCcJipjgxI&key=AIzaSyDpPBFs9LZ-33rVUeSKyHCaz5E0UmWmZXk&alt=json
returned "The request specifies an invalid filter parameter.".
Details: "[{'message': 'The request specifies an invalid filter
parameter.', 'domain': 'youtube.parameter', 'reason':
'invalidFilters', 'location': 'parameters.', 'locationType':
'other'}]">

For pass more than one video_id, the elements are separated by comma.
Instead of:
&id=f6Ms318wTj0&id=etfoGZquiSA&id=a1LUlovdL_A
use:
&id=f6Ms318wTj0,etfoGZquiSA,a1LUlovdL_A

The API endpoint does not accept more than 50 IDs. So if we have more then 50 ids, information about each video can be obtained using a loop.

Meraki API call get.organisation/uplinks find failed connections and translate networkid into a network name

I've been looking for a few weeks and nowhere have i found anything that could help me with this specific problem.
I got a large output from an API call (Meraki) i'm looking to extract certain features out of the list.
Task: read output from API call, loop through output until status 'failed' is detected and print the interface and networkId of that item turn the networkId into a network name from a predefined list, and continue to print all "failed interfaces" till end of output.
The API call gets the entire organisation and i want to match the list of networkid's with Network names (since they aren't added in the same API call) so its readable what network has which interface that failed.
The output contains a lot of data , and i don't need all of those output values like IP, gateway, DNS, etc.
an example of the output from the API call:
{'networkId': 'A_1234567890', 'serial': 'A1B2-C3D4-E5F6', 'model': 'MX64', 'lastReportedAt': '2021-01-01T10:00:00Z', 'uplinks': [{'interface': 'wan1', 'status': 'active', 'ip': '192.168.1.2', 'gateway': '192.168.1.1', 'publicIp': '192.168.1.3', 'primaryDns': '8.8.8.8', 'secondaryDns': '8.8.4.4', 'ipAssignedBy': 'static'}, {'interface': 'wan2', 'status': 'ready', 'ip': '172.16.1.2', 'gateway': '172.16.1.1', 'publicIp': '172.16.1.3', 'primaryDns': '8.8.8.8', 'secondaryDns': '8.8.4.4', 'ipAssignedBy': 'static'}]}
This is one network of which there are 50 in this organisation i want to check the status of.
I'm pretty new to Python and I've tried using while loops to sift through the output to find the failed status but i cant output the whole network's information connected to it, I've looked at but most examples are using small predefined lists of separate words or numbers.
the API call im using:
(found the template and modified where necessary to get a total list of all networks in my organisation)
import requests
url = "https://api.meraki.com/api/v1/organizations/{ORG_ID}/uplinks/statuses"
payload = None
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"X-Cisco-Meraki-API-Key": "API_KEY"
}
response = requests.request('GET', url, headers=headers, data = payload)
pprint(response.json())

Answer given in the another post, by #Szabolcs:
net_names = {"A_1234567890": "Name"}
for network_data in json_data:
network_id = network_data.get("networkId")
for uplink_data in network_data.get("uplinks", []):
if uplink_data["status"] == "failed":
print(
"network ID:",
network_id, ""
"Network:",
net_names.get(network_id, "n/a"),
"- Interface:",
uplink_data["interface"],
"- failed",)
Does all i want.

Based on your sample output, looks like you have got the network ID only once in response and interface and is seen many times as part of Uplink attribute,
Hence, you can parse the API response as a Json object and have the network names - network ID mapping in a dictionary and do something like below to get the failed status
net_names = {'A_1234567890':'abc', 'b':'xyz'}
network_id =response_json.get('networkId')
for item in response_json['uplinks']:
if item['status'] == "failed":
print('network ID:', network_id,'network_name:',net_names.get(network_id), 'Interface:',item['interface'])

KeyError: 'name' in Spotify Web API

I'm using spotify's web API to get song information for a discord bot im making. I'm hosting the bot on heroku. im using the tracks option to get the track name and artist name from a songs ID. When the bot is on heroku, it throws the following error:
https://pastebin.com/smqqqDfY
however, when i host the same code on my laptop, it gives no such error. I even separated the spotify code to see if the JSON file has a key named 'name' and it works!
the code is:
#pulls the name and artist name from the API and link
def spotifypull(uri):
r = requests.get(spotify_base.format(id=uri), headers=headers)
r = r.json()
return (r['name']+" "+r['artists'][0]['name'])
#checks if the link is a spotify link(this is from the "request" function)
if query.find("spotify") !=-1:
uri = query[31:53]
name = spotifypull(uri)
this same code gives the proper output if separated locally
import requests
query = "https://open.spotify.com/track/6WkrFOo6SGAjhGMrjIwAD4?si=VDwYLniGQLGmqzUK3RdBow"
uri = query[31:53]
SPOTIFY_ID = "<id>"
SPOTIFY_SECRET = "<secret>"
AUTH_URL = 'https://accounts.spotify.com/api/token'
ytbase = "https://www.youtube.com/watch?v="
auth_response = requests.post(AUTH_URL, {
'grant_type': 'client_credentials',
'client_id': SPOTIFY_ID,
'client_secret': SPOTIFY_SECRET,
})
auth_response_data = auth_response.json()
access_token = auth_response_data['access_token']
headers = {
'Authorization': 'Bearer {token}'.format(token=access_token)
}
spotify_base = 'https://api.spotify.com/v1/tracks/{id}'
r = requests.get(spotify_base.format(id=uri), headers=headers)
r = r.json()
name = r['name']+" "+r['artists'][0]['name']
print(name)
output of above:
Wasn't Enough CrySpy
Any help would be massively appreciated! the full code is here if needed.
edit:
when ran locally,
r.text is https://pastebin.com/sjrW3exW
r.get_status is 200

Okay i got the issue, the spotify token was expiring after an hour. Thanks to #ygrorg for the idea.
Edit: Since some max brain mods want me to provide more clarity, Spotify tokens are valid for only an hour, I solved it by calling for a new token everytime a song is played. Alternatively, its only possible that you put the token regeneration commands in an infinite loop and put a delay of 30-45 min at the end of the loop, so you have a fresh token every time.

Retrieve all emails from Gmail i did but only got 3000 email not all

What is the way to pull out all emails from Gmail?
I did full_sync, but that didn't return all of my email - only about 3000 emails, while I know I have more. In the documentation they did not mention about this.
My code snippet:
history = service.users().history().list(
userId='me',
startHistoryId=start_history_id,
maxResults=500,
labelId='INBOX'
).execute()
if "history" in history:
try:
for message in history["history"]:
batch.add(
service.users().messages().get(userId='me', id=message["messages"][0]["id"]),
callback="somecallbak",
request_id=request_id
)
batch.execute()
while 'nextPageToken' in history:

If you are doing a full sync, you should refer to this documentation, that recommends two steps:
listing all the messages with the users.messages.list method
for each of the entry get the required information using the users.messages.get method
So you don't need use the users.history.list as you will have an hard time finding the startHistoryId from which to start.
You can achieve this with a snipped similar to the one below (tested and working on my python 3.x console). As suggested by others I used the python client pagination and batch request functionalities.
from httplib2 import Http
from googleapiclient.discovery import build
from oauth2client import client, tools, file
# callback for the batch request (see below)
def print_gmail_message(request_id, response, exception):
if exception is not None:
print('messages.get failed for message id {}: {}'.format(request_id, exception))
else:
print(response)
# Scopes
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly', ]
# where do we store our credentials?
creds_store = file.Storage('gmail-list.json')
start_creds = creds_store.get()
# standard oauth2 authentication flow
if not start_creds or start_creds.invalid:
# client_id.json is exported from your gcp project
start_flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
start_creds = tools.run_flow(start_flow, creds_store)
# Gmail SDK
http = Http()
gmail_sdk = build('gmail', 'v1', http=start_creds.authorize(http))
# messages.list parameters
msg_list_params = {
'userId': 'me'
}
# messages.list API
message_list_api = gmail_sdk.users().messages()
# first request
message_list_req = message_list_api.list(**msg_list_params)
while message_list_req is not None:
gmail_msg_list = message_list_req.execute()
# we build the batch request
batch = gmail_sdk.new_batch_http_request(callback=print_gmail_message)
for gmail_message in gmail_msg_list['messages']:
msg_get_params = {
'userId': 'me',
'id': gmail_message['id'],
'format': 'full',
}
batch.add(gmail_sdk.users().messages().get(**msg_get_params), request_id=gmail_message['id'])
batch.execute(http=http)
# pagination handling
message_list_req = message_list_api.list_next(message_list_req, gmail_msg_list)

As suggested in this link, you may use batch requests.
Use batch and request 100 messages at a time. You will need to make 1000 requests but the good news is that's quite fine and it'll be easier for everyone (no downloading 1GB response in a single request!).
Also based from this thread, you could save the next page token on every request and use it in your next request. If there is no next page token in the response, you know that you have gotten all messages.

Python: how to check whether Google street view API returns no image or the API key is expired?

I want to use Google Street View API to download some images in python.
Sometimes there is no image return in one area, but the API key can be used.
Other time API key is expired or invalid, and also cannot return image.
The Google Maps API server rejected your request. The provided API key is expired.
How to distinguish these two situations with code in Python?
Thank you very much.

One way to do this would be to make an api call with the requests library, and parse the JSON response:
import requests
url = 'https://maps.googleapis.com/maps/api/streetview?size=600x300&location=46.414382,10.013988&heading=151.78&pitch=-0.76&key=YOUR_API_KEY'
r = requests.get(url)
results = r.json()
error_message = results.get('error_message')
Now error_message will be the text of the error message (e.g., 'The provided API key is expired.'), or will be None if there is no error message. So later in your code you can check if there is an error message, and do things based on the content of the message:
if error_message and 'The provided API key is invalid.' in error_message:
do_something()
elif ...:
do_something_else()
You could also check the key 'status' if you just want to see if the request was successful or not:
status = results.get('status')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't get all comments from Youtube Data API V3 [Python] - python

Related

I am facing an Error while i am scraping YOUTUBE information using Google API

Meraki API call get.organisation/uplinks find failed connections and translate networkid into a network name

KeyError: 'name' in Spotify Web API

Retrieve all emails from Gmail i did but only got 3000 email not all

Python: how to check whether Google street view API returns no image or the API key is expired?

Categories

Resources