Tweepy not finding results that should be there - python

I am writing a script in Python, that uses tweepy to search for tweets with a given keyword. Here is the snippet:
for tweet in tweepy.Cursor(api.search, q=keyword, lang="en").items(10):
print tweet.id
I have everything authenticated properly and the code works most of the time. However, when I try to search for some keywords (examples below) it doesn't return anything.
The keywords that cause trouble are "digitalkidz" (a tech conference) and "newtrendbg" (a Bulgarian company). If you do a quick search on Twitter for either of those you will see that there are results. However, tweepy doesn't find anything. Again, it does work for pretty much any other keyword I use.
Do you have any ideas what might be the problem and how to fix it?
Thank you

I believe you're forgetting an important aspect of the twitter api, it's not exhaustive.
Taken from the api docs
Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
Regardless of whether you're using the streaming or rest api, you're going to have issues with this if you're looking for specific tweets.
Rest API
When looking for historical tweets, you unfortunately won't be able to obtain anything that is older than a week using api.search(). This is also shown in the docs.
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
There are other ways of getting older tweets, this post details those options.
Streaming API
While it doesn't sound like you're using twitter's streaming API, it should be noted that this only gives a small sample of twitter's current tweet traffic (~1-2%).
Hopefully this is helpful. Let me know if you have any questions.

Related

How do I recursively collect all Twitter replies?

I just recently finished a nearly 6 week long conversation with multiple people on Twitter. Since several things that were said were quite interesting (particularly in hindsight), I'd like to be able to archive the entire conversation for reference later. From what I can tell, there are no existing solutions similar to threadreaderapp.com to recursively unroll an entire conversation. As such, I looked into doing it in Python with the Twitter API. In researching it, I found several people saying the free version of the API only lets you search replies from the last 7 days. However, then I found some places (e.g., here) that seemed to indicate the Twitter API v2 added access to a "conversation ID" that enabled this limitation to be avoided. However, when I tried to run that code to get the replies to my tweet, the response kept coming back empty. Specifically, as best I can tell, the request from line 19 of this code (link ... which is the code from step 7 of the previously mentioned article: direct link) is not returning data.
Am I missing something? Is it possible to recursively get all replies to a tweet from the past 6 weeks without needing to be considered an "Academic Researcher" to be able to access the full Twitter archive (reference)?
Ultimately, I can get all the tweets from the website in the browser, so I suppose if I knew what I was doing I could just use some sort of a HTML scraper or something, but I don't.
The Twitter API v2 allows you to use the conversation_id as a search parameter on both the recent search, and full archive search, endpoints. The difference is that the recent search API covers the past seven days (available in the Essential access tier / most users), and the full archive search API is limited to Academic access at this time.
So, to directly answer your question: no, the API does not allow you to recursively get all replies to a Tweet from the past 6 weeks, unless you are indeed a qualified academic researcher with access to the full archive search functionality.
Other retrieval methods are beyond the scope of the API and are not supported by Twitter.

Trouble understanding tweepy pagination

I want to use search terms in the Twitter API between 2 dates which may return 1000s of results. Is this even possible in the free version? I’m using tweepy and unsure if I should be using pages or not. When I set ‘count’ to any number I always get the same number of results back. I have set count=100 and got over 900 results in my latest test. How can I count the number of results returned? I have done lots of googling but can’t find the answer to this simple question. It’s basic stuff I know, but I find the Twitter documentation as clear as mud and the tweepy documentation not simple enough for a newbie. I want to store the results in an SQLite database for doing analysis. I’m happy with the database but don’t understand the pagination in tweepy. The documentation uses api.friends. I want to use api.search. I want to retrieve the maximum number of results possible.
Any help hugely appreciated!
Basically the bit I'm not understanding is how to return the maximum number of tweets using code like:
tweets = tw.Cursor(api.search,
q=search_words,
lang='en',
since=date_since,
count=200).items()
db.add_tweets(tweets)
This returns more than 200 tweets. Is it returning all the tweets or will I hit a 3200 limit? Is it saying there are 200 tweets per page? Some of the examples in the tweeps docs use pages(). I don't understand pagination! How do I find the number of tweets in each page if there's not 200? How do I know how many tweets have been returned from the query without doing count on the database? How would I iterate over every page if I use pages()? I apologise for my lack of understanding but the documentation is definitely not for newbies! Please can someone help? Thanks

get old tweets using tweepy [duplicate]

I have been trying to figure this out but this is a really frustrating. I'm trying to get tweets with a certain hashtag (a great amount of tweets) using Tweepy. But this doesn't go back more than one week. I need to go back at least two years for a period of a couple of months. Is this even possible, if so how?
Just for the check here is my code
import tweepy
import csv
consumer_key = '####'
consumer_secret = '####'
access_token = '####'
access_token_secret = '####'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q="#ps4",count=100,\
lang="en",\
since_id=2014-06-12).items():
print tweet.created_at, tweet.text
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
As you have noticed Twitter API has some limitations, I have implemented a code that do this using the same strategy as Twitter running over a browser. Take a look, you can get the oldest tweets: https://github.com/Jefferson-Henrique/GetOldTweets-python
You cannot use the twitter search API to collect tweets from two years ago. Per the docs:
Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week. - Twitter documentation.
If you need a way to get old tweets, you can get them from individual users because collecting tweets from them is limited by number rather than time (so in many cases you can go back months or years). A third-party service that collects tweets like Topsy may be useful in your case as well (shut down as of July 2016, but other services exist).
Found one code that would help retrieve older tweets.
https://github.com/Jefferson-Henrique/GetOldTweets-python
To get old tweets, run the following command in the directory where the code repository got extracted.
python Exporter.py --querysearch 'keyword' --since 2016-01-10 --until 2016-01-15 --maxtweets 1000
And it returned a file 'output_got.csv' with 1000 tweets during the above days with your keyword
You need to install a module 'pyquery' for this to work
PS: You can modify 'Exporter.py' python code file to get more tweet attributes as per your requirement.
2018 update:
Twitter has Premium search APIs that can return results from the beginning of time (2006):
https://developer.twitter.com/en/docs/tweets/search/overview/premium#ProductPackages
Search Tweets: 30-day endpoint → provides Tweets from the previous 30
days.
Search Tweets: Full-archive endpoint → provides complete and instant
access to Tweets dating all the way back to the first Tweet in March
2006.
With an example Python client:
https://github.com/twitterdev/search-tweets-python
Knowing this is a very old question but still, some folks might be facing the same issue.
After some digging, I found out Tweepy's search only returns data for the past 7 days and that some times lead to buy third party service.
I utilised python library, GetOldTweets3 and it worked fine for me. The utility of this library is really easy. The only limitation of this library that we can't search for more than one hashtag in one execution but it works fine to search for multiple accounts at the same time.
use the args "since" and "until" to adjust your timeframe. You are presently using since_id which is meant to correspond to twitter id values (not dates):
for tweet in tweepy.Cursor(api.search,
q="test",
since="2014-01-01",
until="2014-02-01",
lang="en").items():
As others have noted, the Twitter API has the date limitation, but not the actual advanced search as implemented on twitter.com. So so the solution is to use Python's wrapper for Selenium or PhantomJS to iterate through the twitter.com endpoint. Here's an implementation using Selenium that someone has posted on Github: https://github.com/bpb27/twitter_scraping/
I can't believe nobody said this but this git repository completely solved my problem. I haven't been able to utilize other solutions such as GOT or Twitter API Premium.
Try this, definitely useful:
https://betterprogramming.pub/how-to-scrape-tweets-with-snscrape-90124ed006af
https://github.com/MartinBeckUT/TwitterScraper/tree/master/snscrape/cli-with-python

Getting tweets with current hashtag from Twitter using Python?

Is there any better way to get tweets from twitter then crawling twitter.com and mutating URLs?
If there is, how can I get latest tweets with current hash-tag?
Thank you!
Did you try Twitter REST API? Particulary, you can use search tweets endpoint. There are some limitations though, enforced by Twitter.
You can use one of many available python libraries.
For example, some sample code for tweepy can be found here.

tweepy/twitter get all tweets from a location:

I have the following questions about tweepy python module
1.I am trying to retrieve all tweets for a specific location. I am able to do this by using tweepy python module (streaming API), but I get only those tweets whose geo locations are enabled, which means I would loose rest of the tweeter’s tweet who have not enabled their geo location. Is there a better way to retrieve all the tweets, given a location?
2.I use Stream.Sample method to retrieve all the tweets, Can someone tell me about the parameters used in sample method? I see count, and async as parameters. Now what should we specify here?
3.What does firehose method in tweepy.Stream do?
Any help is much appreciated
If tweepy doesn't have a feature you need, you can always access Twitter directly with an HTTP request. The full Twitter REST API is described here: https://dev.twitter.com/docs/api
The ones that seem relevant to your interest are:
GET trends/:woeid which looks up tweets by woeid, a Yahoo Identifier for collecting information about a given place/landmark/etc.
GET geo/id/:place_id which only mines geotagged tweets.
There is documentation of all the information available for a GET request but the IP address is not among the available fields: https://dev.twitter.com/docs/api/1/get/search .
Lastly, Twitter has a location search FAQ that may be of interest.

Categories