I just recently finished a nearly 6 week long conversation with multiple people on Twitter. Since several things that were said were quite interesting (particularly in hindsight), I'd like to be able to archive the entire conversation for reference later. From what I can tell, there are no existing solutions similar to threadreaderapp.com to recursively unroll an entire conversation. As such, I looked into doing it in Python with the Twitter API. In researching it, I found several people saying the free version of the API only lets you search replies from the last 7 days. However, then I found some places (e.g., here) that seemed to indicate the Twitter API v2 added access to a "conversation ID" that enabled this limitation to be avoided. However, when I tried to run that code to get the replies to my tweet, the response kept coming back empty. Specifically, as best I can tell, the request from line 19 of this code (link ... which is the code from step 7 of the previously mentioned article: direct link) is not returning data.
Am I missing something? Is it possible to recursively get all replies to a tweet from the past 6 weeks without needing to be considered an "Academic Researcher" to be able to access the full Twitter archive (reference)?
Ultimately, I can get all the tweets from the website in the browser, so I suppose if I knew what I was doing I could just use some sort of a HTML scraper or something, but I don't.
The Twitter API v2 allows you to use the conversation_id as a search parameter on both the recent search, and full archive search, endpoints. The difference is that the recent search API covers the past seven days (available in the Essential access tier / most users), and the full archive search API is limited to Academic access at this time.
So, to directly answer your question: no, the API does not allow you to recursively get all replies to a Tweet from the past 6 weeks, unless you are indeed a qualified academic researcher with access to the full archive search functionality.
Other retrieval methods are beyond the scope of the API and are not supported by Twitter.
I want to use search terms in the Twitter API between 2 dates which may return 1000s of results. Is this even possible in the free version? I’m using tweepy and unsure if I should be using pages or not. When I set ‘count’ to any number I always get the same number of results back. I have set count=100 and got over 900 results in my latest test. How can I count the number of results returned? I have done lots of googling but can’t find the answer to this simple question. It’s basic stuff I know, but I find the Twitter documentation as clear as mud and the tweepy documentation not simple enough for a newbie. I want to store the results in an SQLite database for doing analysis. I’m happy with the database but don’t understand the pagination in tweepy. The documentation uses api.friends. I want to use api.search. I want to retrieve the maximum number of results possible.
Any help hugely appreciated!
Basically the bit I'm not understanding is how to return the maximum number of tweets using code like:
tweets = tw.Cursor(api.search,
q=search_words,
lang='en',
since=date_since,
count=200).items()
db.add_tweets(tweets)
This returns more than 200 tweets. Is it returning all the tweets or will I hit a 3200 limit? Is it saying there are 200 tweets per page? Some of the examples in the tweeps docs use pages(). I don't understand pagination! How do I find the number of tweets in each page if there's not 200? How do I know how many tweets have been returned from the query without doing count on the database? How would I iterate over every page if I use pages()? I apologise for my lack of understanding but the documentation is definitely not for newbies! Please can someone help? Thanks
I have been trying to figure this out but this is a really frustrating. I'm trying to get tweets with a certain hashtag (a great amount of tweets) using Tweepy. But this doesn't go back more than one week. I need to go back at least two years for a period of a couple of months. Is this even possible, if so how?
Just for the check here is my code
import tweepy
import csv
consumer_key = '####'
consumer_secret = '####'
access_token = '####'
access_token_secret = '####'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q="#ps4",count=100,\
lang="en",\
since_id=2014-06-12).items():
print tweet.created_at, tweet.text
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
As you have noticed Twitter API has some limitations, I have implemented a code that do this using the same strategy as Twitter running over a browser. Take a look, you can get the oldest tweets: https://github.com/Jefferson-Henrique/GetOldTweets-python
You cannot use the twitter search API to collect tweets from two years ago. Per the docs:
Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week. - Twitter documentation.
If you need a way to get old tweets, you can get them from individual users because collecting tweets from them is limited by number rather than time (so in many cases you can go back months or years). A third-party service that collects tweets like Topsy may be useful in your case as well (shut down as of July 2016, but other services exist).
Found one code that would help retrieve older tweets.
https://github.com/Jefferson-Henrique/GetOldTweets-python
To get old tweets, run the following command in the directory where the code repository got extracted.
python Exporter.py --querysearch 'keyword' --since 2016-01-10 --until 2016-01-15 --maxtweets 1000
And it returned a file 'output_got.csv' with 1000 tweets during the above days with your keyword
You need to install a module 'pyquery' for this to work
PS: You can modify 'Exporter.py' python code file to get more tweet attributes as per your requirement.
2018 update:
Twitter has Premium search APIs that can return results from the beginning of time (2006):
https://developer.twitter.com/en/docs/tweets/search/overview/premium#ProductPackages
Search Tweets: 30-day endpoint → provides Tweets from the previous 30
days.
Search Tweets: Full-archive endpoint → provides complete and instant
access to Tweets dating all the way back to the first Tweet in March
2006.
With an example Python client:
https://github.com/twitterdev/search-tweets-python
Knowing this is a very old question but still, some folks might be facing the same issue.
After some digging, I found out Tweepy's search only returns data for the past 7 days and that some times lead to buy third party service.
I utilised python library, GetOldTweets3 and it worked fine for me. The utility of this library is really easy. The only limitation of this library that we can't search for more than one hashtag in one execution but it works fine to search for multiple accounts at the same time.
use the args "since" and "until" to adjust your timeframe. You are presently using since_id which is meant to correspond to twitter id values (not dates):
for tweet in tweepy.Cursor(api.search,
q="test",
since="2014-01-01",
until="2014-02-01",
lang="en").items():
As others have noted, the Twitter API has the date limitation, but not the actual advanced search as implemented on twitter.com. So so the solution is to use Python's wrapper for Selenium or PhantomJS to iterate through the twitter.com endpoint. Here's an implementation using Selenium that someone has posted on Github: https://github.com/bpb27/twitter_scraping/
I can't believe nobody said this but this git repository completely solved my problem. I haven't been able to utilize other solutions such as GOT or Twitter API Premium.
Try this, definitely useful:
https://betterprogramming.pub/how-to-scrape-tweets-with-snscrape-90124ed006af
https://github.com/MartinBeckUT/TwitterScraper/tree/master/snscrape/cli-with-python
Is there any better way to get tweets from twitter then crawling twitter.com and mutating URLs?
If there is, how can I get latest tweets with current hash-tag?
Thank you!
Did you try Twitter REST API? Particulary, you can use search tweets endpoint. There are some limitations though, enforced by Twitter.
You can use one of many available python libraries.
For example, some sample code for tweepy can be found here.
I have the following questions about tweepy python module
1.I am trying to retrieve all tweets for a specific location. I am able to do this by using tweepy python module (streaming API), but I get only those tweets whose geo locations are enabled, which means I would loose rest of the tweeter’s tweet who have not enabled their geo location. Is there a better way to retrieve all the tweets, given a location?
2.I use Stream.Sample method to retrieve all the tweets, Can someone tell me about the parameters used in sample method? I see count, and async as parameters. Now what should we specify here?
3.What does firehose method in tweepy.Stream do?
Any help is much appreciated
If tweepy doesn't have a feature you need, you can always access Twitter directly with an HTTP request. The full Twitter REST API is described here: https://dev.twitter.com/docs/api
The ones that seem relevant to your interest are:
GET trends/:woeid which looks up tweets by woeid, a Yahoo Identifier for collecting information about a given place/landmark/etc.
GET geo/id/:place_id which only mines geotagged tweets.
There is documentation of all the information available for a GET request but the IP address is not among the available fields: https://dev.twitter.com/docs/api/1/get/search .
Lastly, Twitter has a location search FAQ that may be of interest.