Trouble understanding tweepy pagination - python

I want to use search terms in the Twitter API between 2 dates which may return 1000s of results. Is this even possible in the free version? I’m using tweepy and unsure if I should be using pages or not. When I set ‘count’ to any number I always get the same number of results back. I have set count=100 and got over 900 results in my latest test. How can I count the number of results returned? I have done lots of googling but can’t find the answer to this simple question. It’s basic stuff I know, but I find the Twitter documentation as clear as mud and the tweepy documentation not simple enough for a newbie. I want to store the results in an SQLite database for doing analysis. I’m happy with the database but don’t understand the pagination in tweepy. The documentation uses api.friends. I want to use api.search. I want to retrieve the maximum number of results possible.
Any help hugely appreciated!
Basically the bit I'm not understanding is how to return the maximum number of tweets using code like:
tweets = tw.Cursor(api.search,
q=search_words,
lang='en',
since=date_since,
count=200).items()
db.add_tweets(tweets)
This returns more than 200 tweets. Is it returning all the tweets or will I hit a 3200 limit? Is it saying there are 200 tweets per page? Some of the examples in the tweeps docs use pages(). I don't understand pagination! How do I find the number of tweets in each page if there's not 200? How do I know how many tweets have been returned from the query without doing count on the database? How would I iterate over every page if I use pages()? I apologise for my lack of understanding but the documentation is definitely not for newbies! Please can someone help? Thanks

Related

How to get all the comments on a subreddit in a tree structure using reddit API?

For my school assignment I need to get all the comments on last 500 posts in a particular subreddit
with a tree structure. My goal is to find the most used words and make a frequency chart with python.
Using the reddit API I was able to get information about last 500 posts(like title, id, upvotes etc.) on a subreddit but unfortunately I couldn't find how can I get the comments. I have spent my last couple hours searching for this on google but I wasn't able to find anything. Do you have any idea about my problem?
Using other APIs like PRAW and PushShift is forbidden.
I'd be glad if you can help me or at least give me an idea how I can do so

Sorting Twitter API pages in favorites by time you favorited them not when they were created

Okay, so basically I'm trying to write a program that would go through my twitter favorite history and find pieces of media from them and then download them. I actually have a working model in Python using Tweepy that works it's just it only gets the tweets in order of when they were created. See, the thing is I don't want to download my whole favorite history every time, just up to the point of the last mass download, so at first, I set it up to stop when the tweets reached a certain date. Every time I downloaded I would record the date, but then I realized that sometimes I favorite tweets that are from a while ago, before the cut off date, but is still new to my favorites history. So I decided to try something else, I would record the tweet ID of the first tweet in the list from the last time I downloaded and then set it to stop there, and this would work fine if the api.favorites() returned it in the order it appears on your profile, but instead it auto sorts them by date, so if I fav a post form 2010, I would have to cycle threw all the back to 2010 before it would appear on one of my returned pages. When I looked threw the docs I found a little bit on sorting, but nothing on sorting the Fav Hist by when you faved the tweet. And the thing is I know it stores the order that you liked the tweets in, since that's how it displays it on your profile, even if it is just a table that they append to every time, it still works. I know I'm writing this program in python, but I'm good enough with Java and JavaScript to understand the guts of the API, it's how I got this far. Anyway if you have some suggestions or now how to do it please let me know, any help is appreciated! If all else fails I'll try using Selenium to go through my Twitter Favs from a user perspective...
I had the same issue and I found that you can't do this with the regular Twitter API. They have an enterprise API, which requires payment to access, and that will give you the time a tweet was favorited but it doesn't look cheap so it's probably not worth it.

Tweepy not finding results that should be there

I am writing a script in Python, that uses tweepy to search for tweets with a given keyword. Here is the snippet:
for tweet in tweepy.Cursor(api.search, q=keyword, lang="en").items(10):
print tweet.id
I have everything authenticated properly and the code works most of the time. However, when I try to search for some keywords (examples below) it doesn't return anything.
The keywords that cause trouble are "digitalkidz" (a tech conference) and "newtrendbg" (a Bulgarian company). If you do a quick search on Twitter for either of those you will see that there are results. However, tweepy doesn't find anything. Again, it does work for pretty much any other keyword I use.
Do you have any ideas what might be the problem and how to fix it?
Thank you
I believe you're forgetting an important aspect of the twitter api, it's not exhaustive.
Taken from the api docs
Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
Regardless of whether you're using the streaming or rest api, you're going to have issues with this if you're looking for specific tweets.
Rest API
When looking for historical tweets, you unfortunately won't be able to obtain anything that is older than a week using api.search(). This is also shown in the docs.
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
There are other ways of getting older tweets, this post details those options.
Streaming API
While it doesn't sound like you're using twitter's streaming API, it should be noted that this only gives a small sample of twitter's current tweet traffic (~1-2%).
Hopefully this is helpful. Let me know if you have any questions.

Get unlimited tweets from twitter api

Is there a way that I can get unlimited or 500 tweets from Twitter?
I'm using python.
I can get 100 tweets by using twitter_search = Twitter(domain="search.twitter.com") but it has a limitation on 100 tweets.
Edit:
Im using the pypi.python.org/pypi/twitter/1.9.0 library.
It should be the public tweets and not the tweets from my account and my followers
I've been having the same issue. As far as I can tell, there is really no getting around the twitter API limits, and there don't seem to be any other APIs that give access to archives of tweets.
One option, albeit a challenging one, is downloading all tweets in bulk from archive.org:
http://archive.org/details/twitterstream
Each month of data is > 30GB large though, compressed, so it won't be easy to handle. But if you are determined, this will give you full control over the raw data with no limits.
The twitter API limits the results to a maximum of 200 per request. How to set the count in your example to this maximum depends on the library you are using (which you didn't state, so I can't give you any information on that).
So, unlimited won't be possible in one request, no matter which library you are using.
See the "count" parameter here: https://dev.twitter.com/docs/api/1.1/get/statuses/home_timeline
If you can shift to Orange from here you can get 9999 tweets per request. Hope someone will find it helpful.
https://medium.com/analytics-vidhya/twitter-sentiment-analysis-with-orange-vader-powerbi-part-1-184b693b9d70

How to Collect Tweets More Quickly Using Twitter API in Python?

For a research project, I am collecting tweets using Python-Twitter. However, when running our program nonstop on a single computer for a week we manage to collect about only 20 MB of data per week. I am only running this program on one machine so that we do not collect the same tweets twice.
Our program runs a loop that calls getPublicTimeline() every 60 seconds. I tried to improve this by calling getUserTimeline() on some of the users that appeared in the public timeline. However, this consistently got me banned from collecting tweets at all for about half an hour each time. Even without the ban, it seemed that there was very little speed-up by adding this code.
I know about Twitter's "whitelisting" that allows a user to submit more requests per hour. I applied for this about three weeks ago, and have not hear back since, so I am looking for alternatives that will allow our program to collect tweets more efficiently without going over the standard rate limit. Does anyone know of a faster way to collect public tweets from Twitter? We'd like to get about 100 MB per week.
Thanks.
How about using the streaming API? This is exactly the use-case it was created to address. With the streaming API you will not have any problems gathering megabytes of tweets. You still won't be able to access all tweets or even a statistically significant sample without being granted access by Twitter though.
I did a similar project analyzing data from tweets. If you're just going at this from a pure data collection/analysis angle, you can just scrape any of the better sites that collect these tweets for various reasons. Many sites allow you to search by hashtag, so throw in a popular enough hashtag and you've got thousands of results. I just scraped a few of these sites for popular hashtags, collected these into a large list, queried that list against the site, and scraped all of the usable information from the results. Some sites also allow you to export the data directly, making this task even easier. You'll get a lot of garbage results that you'll probably need to filter (spam, foreign language, etc), but this was the quickest way that worked for our project. Twitter will probably not grant you whitelisted status, so I definitely wouldn't count on that.
There is pretty good tutorial from ars technica on using streaming API n Python that might be helpful here.
Otherwise you could try doing it via cURL.
.

Categories