How to Cross the 1000 Results Limit of youtube data API? - python

I need all video's ID between X to Y dates of specific subject. I used YouTube data API v3 and every thing was OK, until it returned empty item after 1000 result. I searched about this issue and figured out it's API limitation!!! Does anyone have any solution?
If it helps I code in python.I did test in php too and same result. It doesn't depend on languages.
thanks

Just split your date range into multiple requests, so each request < result limit.

Related

Trouble understanding tweepy pagination

I want to use search terms in the Twitter API between 2 dates which may return 1000s of results. Is this even possible in the free version? I’m using tweepy and unsure if I should be using pages or not. When I set ‘count’ to any number I always get the same number of results back. I have set count=100 and got over 900 results in my latest test. How can I count the number of results returned? I have done lots of googling but can’t find the answer to this simple question. It’s basic stuff I know, but I find the Twitter documentation as clear as mud and the tweepy documentation not simple enough for a newbie. I want to store the results in an SQLite database for doing analysis. I’m happy with the database but don’t understand the pagination in tweepy. The documentation uses api.friends. I want to use api.search. I want to retrieve the maximum number of results possible.
Any help hugely appreciated!
Basically the bit I'm not understanding is how to return the maximum number of tweets using code like:
tweets = tw.Cursor(api.search,
q=search_words,
lang='en',
since=date_since,
count=200).items()
db.add_tweets(tweets)
This returns more than 200 tweets. Is it returning all the tweets or will I hit a 3200 limit? Is it saying there are 200 tweets per page? Some of the examples in the tweeps docs use pages(). I don't understand pagination! How do I find the number of tweets in each page if there's not 200? How do I know how many tweets have been returned from the query without doing count on the database? How would I iterate over every page if I use pages()? I apologise for my lack of understanding but the documentation is definitely not for newbies! Please can someone help? Thanks

Using chunk in json, requests to get large data into python

I am trying to get a large data into python using API. But I am not being able to get the entire data. The request is allowing only first 1000 lines to be retrieved.
r = requests.get("https://data.cityofchicago.org/resource/6zsd-86xi.json")
json=r.json()
df=pd.DataFrame(json)
df.drop(df.columns[[0,1,2,3,4,5,6,7]], axis=1, inplace=True) #dropping some columns
df.shape
Output is
(1000,22)
The website contains almost 6 million data points. Yet only 1000 are retrieved. How do I get around this? Is chunking right option? Can someone please help me with the code?
Thanks.
You'll need to paginate through the results to get the entire dataset. Most APIs will limit the amount of results returned in a single request. According to the Socrata docs you need to add $limit and $offset parameters to the request url.
For example, for the first page of results you would start with -
https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=0
Then for the next page you would just increment the offset -
https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=1000
Continue incrementing until you have the entire dataset.

Tweepy not finding results that should be there

I am writing a script in Python, that uses tweepy to search for tweets with a given keyword. Here is the snippet:
for tweet in tweepy.Cursor(api.search, q=keyword, lang="en").items(10):
print tweet.id
I have everything authenticated properly and the code works most of the time. However, when I try to search for some keywords (examples below) it doesn't return anything.
The keywords that cause trouble are "digitalkidz" (a tech conference) and "newtrendbg" (a Bulgarian company). If you do a quick search on Twitter for either of those you will see that there are results. However, tweepy doesn't find anything. Again, it does work for pretty much any other keyword I use.
Do you have any ideas what might be the problem and how to fix it?
Thank you
I believe you're forgetting an important aspect of the twitter api, it's not exhaustive.
Taken from the api docs
Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
Regardless of whether you're using the streaming or rest api, you're going to have issues with this if you're looking for specific tweets.
Rest API
When looking for historical tweets, you unfortunately won't be able to obtain anything that is older than a week using api.search(). This is also shown in the docs.
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
There are other ways of getting older tweets, this post details those options.
Streaming API
While it doesn't sound like you're using twitter's streaming API, it should be noted that this only gives a small sample of twitter's current tweet traffic (~1-2%).
Hopefully this is helpful. Let me know if you have any questions.

facebook graph api {page}/links only returns for 60 days

I'm managing a Facebook page in which I'm also analyzing it's insights. We own the page and every post on the page feed(page doesn't allow other users to post). I'm doing an analysis on all of the posts that we've every created.
I've been using {page}/posts edge to get the post ids but found out that it only returns a subset of the data. Then I tried {page}/links and {page}/videos because these are the post types I'm mostly interested in. The video edge works great; it gave me all of the videos ids from the page. However, {page}/links only returned 2 months worth of link ids.
Here is a sample GET I'm using (I'm trying to get the post ids from 10/2014 to 12/2014):
https://graph.facebook.com/v2.2/{actual_page_id}/links?fields=id,created_time&since=1414175236&until=1419445636&access_token=[The_actual_access_token]
But I get an empty result string:
{"data": []}
And when I set the date with in the 2 months frame I can get proper response.
My question is: Is there a way to get ALL of the Facebook page posts ids that we have created? I've tried to set limits and paging but none have worked. Thank you very much for your help.
The Below snippet should solve your issue, It uses Facepy and handles paging on its own.
from facepy import GraphAPI
import json
access = '<access_token>'
graph = GraphAPI(access)
page_id= '<page_id>'
data= graph.get(page_id+ "/posts?fields=id", page=True, retry=5)
data1=[]
for i in data:
data1.append(i)
print data1

Get unlimited tweets from twitter api

Is there a way that I can get unlimited or 500 tweets from Twitter?
I'm using python.
I can get 100 tweets by using twitter_search = Twitter(domain="search.twitter.com") but it has a limitation on 100 tweets.
Edit:
Im using the pypi.python.org/pypi/twitter/1.9.0 library.
It should be the public tweets and not the tweets from my account and my followers
I've been having the same issue. As far as I can tell, there is really no getting around the twitter API limits, and there don't seem to be any other APIs that give access to archives of tweets.
One option, albeit a challenging one, is downloading all tweets in bulk from archive.org:
http://archive.org/details/twitterstream
Each month of data is > 30GB large though, compressed, so it won't be easy to handle. But if you are determined, this will give you full control over the raw data with no limits.
The twitter API limits the results to a maximum of 200 per request. How to set the count in your example to this maximum depends on the library you are using (which you didn't state, so I can't give you any information on that).
So, unlimited won't be possible in one request, no matter which library you are using.
See the "count" parameter here: https://dev.twitter.com/docs/api/1.1/get/statuses/home_timeline
If you can shift to Orange from here you can get 9999 tweets per request. Hope someone will find it helpful.
https://medium.com/analytics-vidhya/twitter-sentiment-analysis-with-orange-vader-powerbi-part-1-184b693b9d70

Categories