How to get all YouTube comments with Python's gdata module? - python

Looking to grab all the comments from a given video, rather than go one page at a time.
from gdata import youtube as yt
from gdata.youtube import service as yts
client = yts.YouTubeService()
client.ClientLogin(username, pwd) #the pwd might need to be application specific fyi
comments = client.GetYouTubeVideoComments(video_id='the_id')
a_comment = comments.entry[0]
The above code with let you grab a single comment, likely the most recent comment, but I'm looking for a way to grab all the comments at once. Is this possible with Python's gdata module?
The Youtube API docs for comments, the comment feed docs and the Python API docs

The following achieves what you asked for using the Python YouTube API:
from gdata.youtube import service
USERNAME = 'username#gmail.com'
PASSWORD = 'a_very_long_password'
VIDEO_ID = 'wf_IIbT8HGk'
def comments_generator(client, video_id):
comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id)
while comment_feed is not None:
for comment in comment_feed.entry:
yield comment
next_link = comment_feed.GetNextLink()
if next_link is None:
comment_feed = None
else:
comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href)
client = service.YouTubeService()
client.ClientLogin(USERNAME, PASSWORD)
for comment in comments_generator(client, VIDEO_ID):
author_name = comment.author[0].name.text
text = comment.content.text
print("{}: {}".format(author_name, text))
Unfortunately the API limits the number of entries that can be retrieved to 1000. This was the error I got when I tried a tweaked version with a hand crafted GetYouTubeVideoCommentFeed URL parameter:
gdata.service.RequestError: {'status': 400, 'body': 'You cannot request beyond item 1000.', 'reason': 'Bad Request'}
Note that the same principle should apply to retrieve entries in other feeds of the API.
If you want to hand craft the GetYouTubeVideoCommentFeed URL parameter, its format is:
'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={sta‌​rt_index}&max-results={max_results}'
The following restrictions apply: start-index <= 1000 and max-results <= 50.

The only solution I've got for now, but it's not using the API and gets slow when there's several thousand comments.
import bs4, re, urllib2
#grab the page source for vide
data = urllib2.urlopen(r'http://www.youtube.com/all_comments?v=video_id') #example XhFtHW4YB7M
#pull out comments
soup = bs4.BeautifulSoup(data)
cmnts = soup.findAll(attrs={'class': 'comment yt-tile-default'})
#do something with them, ie count them
print len(cmnts)
Note that due to 'class' being a builtin python name, you can't do regular searches for 'startwith' via regex or lambdas as seen here, since you're using a dict, over regular parameters. It also gets pretty slow due to BeautifulSoup, but it needs to get used because etree and minidom don't find matching tags for some reason. Even after prettyfying() with bs4

Related

Search through JSON query from Valve API in Python

I am looking to find various statistics about players in games such as CS:GO from the Steam Web API, but cannot work out how to search through the JSON returned from the query (e.g. here) in Python.
I just need to be able to get a specific part of the list that is provided, e.g. finding total_kills from the link above. If I had a way that could sort through all of the information provided and filters it down to just that specific thing (in this case total_kills) then that would help a load!
The code I have at the moment to turn it into something Python can read is:
url = "http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697&include_appinfo=1"
data = requests.get(url).text
data = json.loads(data)
If you are looking for a way to search through the stats list then try this:
import requests
import json
def findstat(data, stat_name):
for stat in data['playerstats']['stats']:
if stat['name'] == stat_name:
return stat['value']
url = "http://api.steampowered.com/ISteamUserStats/GetUserStatsForGame/v0002/?appid=730&key=FE3C600EB76959F47F80C707467108F2&steamid=76561198185148697"
data = requests.get(url).text
data = json.loads(data)
total_kills = findstat(data, 'total_kills') # change 'total_kills' to your desired stat name
print(total_kills)

Python Facebook API - cursor pagination

My question involves learning how to retrieve my entire list of friends using Facebook's Python API. The current result returns an object with limited number of friends and a link to the 'next' page. How do I use this to fetch the next set of friends ? (Please post the link to possible duplicates) Any help would be much appreciated. In general, I need to learn about the pagination involved the API usage.
import facebook
import json
ACCESS_TOKEN = "my_token"
g = facebook.GraphAPI(ACCESS_TOKEN)
print json.dumps(g.get_connections("me","friends"),indent=1)
Sadly the documentation of pagination is an open issue since almost 2 years. You should be able to paginate like this (based on this example) using requests:
import facebook
import requests
ACCESS_TOKEN = "my_token"
graph = facebook.GraphAPI(ACCESS_TOKEN)
friends = graph.get_connections("me","friends")
allfriends = []
# Wrap this block in a while loop so we can keep paginating requests until
# finished.
while(True):
try:
for friend in friends['data']:
allfriends.append(friend['name'].encode('utf-8'))
# Attempt to make a request to the next page of data, if it exists.
friends=requests.get(friends['paging']['next']).json()
except KeyError:
# When there are no more pages (['paging']['next']), break from the
# loop and end the script.
break
print allfriends
Update: There's a new generator method available which implements above behavior and can be used to iterate over all friends like this:
for friend in graph.get_all_connections("me", "friends"):
# Do something with this friend.
Meanwhile I was searching answer here is much better approach:
import facebook
access_token = ""
graph = facebook.GraphAPI(access_token = access_token)
totalFriends = []
friends = graph.get_connections("me", "/friends&summary=1")
while 'paging' in friends:
for i in friends['data']:
totalFriends.append(i['id'])
friends = graph.get_connections("me", "/friends&summary=1&after=" + friends['paging']['cursors']['after'])
At end point you will get one response where data will be empty and then there will be no 'paging' key so at that time it will break and all the data will be stored.
I couldn't find this anywhere, these answers seem super complicated and just no way I would even use an SDK if I had do stuff like that when Paging from a simple POST is so easy to start with, however:
FacebookAdsApi.init(my_app_id, my_app_secret, my_access_token)
my_account = AdAccount('act_23423423423423423')
# In the below, I added the limit to the max rows, 250.
# Also more importantly, paging. the SDK has a really sneaky way of doing this,
# enclose the request in a list() the results end up the same, but this will make the script request new objects until there are no more
#I tested this example and compared to Graph API and as of right now, 1/22 9:47AM, I get 81 from Graph and 81 here.
fields = ['name']
params = {'limit':250}
ads = list(my_account.get_ads(
fields = fields,
params = params,
))
Trick from the docs: "NOTE: We wrap the return value of get_ad_accounts with list() because get_ad_accounts returns an EdgeIterator object (located in facebook_business.adobjects) and we want to get the full list right away instead of having the iterator lazily loading accounts."
https://github.com/facebook/facebook-python-business-sdk
in this example you off set / pagination by one at the time, i think my while loop is simple since it only looking for the pagination key"next" to be none, if doesnt exists means we finish looping, and you will have your results in a list.
in this example i am just looking for all the people call jacob
import requests
import facebook
token = access_token="your token goes here"
fb = facebook.GraphAPI(access_token=token)
limit = 1
offset = 0
data = {"q": "jacob",
"type": "user",
"fields": "id",
"limit": limit,
"offset": offset}
req = fb.request('/search', args=data, method='GET')
users = []
for item in req['data']:
users.append(item["id"])
pag = req['paging']
while pag.get("next") is not None:
offset += limit
data["offset"] = offset
req = fb.request('/search', args=data, method='GET')
for item in req['data']:
users.append(item["id"])
pag = req.get('paging')
print users

google custom search api return is different from google.com

I am using google api via python and it works, but the result I got from api is totally different from google.com. I found the top result given by custom search are google calendar,google earth and patents. I wonder if there is a way to get same result from custom search api. Thank you
def googleAPICall(self,userInput):
try:
userInput = urllib.quote(userInput)
for i in range(0,1):
index = i*10+1
url = ('https://www.googleapis.com/customsearch/v1?'
'key=%s'
'&cx=%s'
'&alt=json'
'&num=10'
'&start=%d'
'&q=%s')%(self.KEY,self.CX,index,userInput)
print (url)
request = urllib2.Request(url)
response = urllib2.urlopen(request)
returnResults = simplejson.load(response)
webs = returnResults['items']
for web in webs:
self.result.append(web["link"])
except:
print ("search error")
self.result.append("http://en.wikipedia.org/wiki/Climate_change")
return self.result
There is a 'search outside of google'checkbox in the dashboard. you will get the same result after you check it. it takes me a while to find it out. the default sitting is only return search result inside of all google websites.
After some searches, the answer is "It is impossible to have the same result as google.com".
Google clearly stated it:
https://support.google.com/customsearch/answer/141877?hl=en
Hope that this is the definite answer.
Just to add to galaxyan answer, you can still do that by changing Sites to search from Search only included sites to Search the entire web
I think you need to experiment with four parameters cr, gl, hl, lr

Dictionary / JSON issue using Python 2.7

I'm looking at scraping some data from Facebook using Python 2.7. My code basically augments by 1 changing the Facebook profile ID to then capture details returned by the page.
An example of the page I'm looking to capture the data from is graph.facebook.com/4.
Here's my code below:
import scraperwiki
import urlparse
import simplejson
source_url = "http://graph.facebook.com/"
profile_id = 1
while True:
try:
profile_id +=1
profile_url = urlparse.urljoin(source_url, str(profile_id))
results_json = simplejson.loads(scraperwiki.scrape(profile_url))
for result in results_json['results']:
print result
data = {}
data['id'] = result['id']
data['name'] = result['name']
data['first_name'] = result['first_name']
data['last_name'] = result['last_name']
data['link'] = result['link']
data['username'] = result['username']
data['gender'] = result['gender']
data['locale'] = result['locale']
print data['id'], data['name']
scraperwiki.sqlite.save(unique_keys=['id'], data=data)
#time.sleep(3)
except:
continue
profile_id +=1
I am using the scraperwiki site to carry out this check but no data is printed back to console despite the line 'print data['id'], data['name'] used just to check the code is working
Any suggestions on what is wrong with this code? As said, for each returned profile, the unique data should be captured and printed to screen as well as populated into the sqlite database.
Thanks
Any suggestions on what is wrong with this code?
Yes. You are swallowing all of your errors. There could be a huge number of things going wrong in the block under try. If anything goes wrong in that block, you move on without printing anything.
You should only ever use a try / except block when you are looking to handle a specific error.
modify your code so that it looks like this:
while True:
profile_id +=1
profile_url = urlparse.urljoin(source_url, str(profile_id))
results_json = simplejson.loads(scraperwiki.scrape(profile_url))
for result in results_json['results']:
print result
data = {}
# ... more ...
and then you will get detailed error messages when specific things go wrong.
As for your concern in the comments:
The reason I have the error handling is because, if you look for
example at graph.facebook.com/3, this page contains no user data and
so I don't want to collate this info and skip to the next user, ie. no
4 etc
If you want to handle the case where there is no data, then find a way to handle that case specifically. It is bad practice to swallow all errors.

Invalid request URI while adding a video to playlist via youtube api

I have been unable to overcome this error while trying to add a video to my playlist using the youtube gdata python api.
gdata.service.RequestError: {'status':
400, 'body': 'Invalid request URI',
'reason': 'Bad Request'}
This seems to be the same error, but there are no solutions as yet. Any help guys?
import getpass
import gdata.youtube
import gdata.youtube.service
yt_service = gdata.youtube.service.YouTubeService()
# The YouTube API does not currently support HTTPS/SSL access.
yt_service.ssl = False
yt_service = gdata.youtube.service.YouTubeService()
yt_service.email = #myemail
yt_service.password = getpass.getpass()
yt_service.developer_key = #mykey
yt_service.source = #text
yt_service.client_id= #text
yt_service.ProgrammaticLogin()
feed = yt_service.GetYouTubePlaylistFeed(username='default')
# iterate through the feed as you would with any other
for entry in feed.entry:
if (entry.title.text == "test"):
lst = entry;
print entry.title.text, entry.id.text
custom_video_title = 'my test video on my test playlist'
custom_video_description = 'this is a test video on my test playlist'
video_id = 'Ncakifd_16k'
playlist_uri = lst.id.text
playlist_video_entry = yt_service.AddPlaylistVideoEntryToPlaylist(playlist_uri, video_id, custom_video_title, custom_video_description)
if isinstance(playlist_video_entry, gdata.youtube.YouTubePlaylistVideoEntry):
print 'Video added'
The confounding thing is that updating the playlist works, but adding a video does not.
playlist_entry_id = lst.id.text.split('/')[-1]
original_playlist_description = lst.description.text
updated_playlist = yt_service.UpdatePlaylist(playlist_entry_id,'test',original_playlist_description,playlist_private=False)
The video_id is not wrong because its the video from the sample code. What am I missing here? Somebody help!
Thanks.
Gdata seems to use v1 API. So, the relevant documentation is here: http://code.google.com/apis/youtube/1.0/developers_guide_protocol.html#Retrieving_a_playlist
This means, your "playlist_uri" should not take the value of "lst.id.text", but should take the "feedLink" element's "href" attribute in order to be used with "AddPlaylistVideoEntryToPlaylist"
Even if you happen to use v2 API, you should take the URI from the "content" element's "src" attribute as explained in the documentation, you get by substituting 2.0, in the above URL! (SO doesn't allow me to put two hyperlinks because i don't have enough reputations! :))

Categories