I have been trying to figure this out but this is a really frustrating. I'm trying to get tweets with a certain hashtag (a great amount of tweets) using Tweepy. But this doesn't go back more than one week. I need to go back at least two years for a period of a couple of months. Is this even possible, if so how?
Just for the check here is my code
import tweepy
import csv
consumer_key = '####'
consumer_secret = '####'
access_token = '####'
access_token_secret = '####'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q="#ps4",count=100,\
lang="en",\
since_id=2014-06-12).items():
print tweet.created_at, tweet.text
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
As you have noticed Twitter API has some limitations, I have implemented a code that do this using the same strategy as Twitter running over a browser. Take a look, you can get the oldest tweets: https://github.com/Jefferson-Henrique/GetOldTweets-python
You cannot use the twitter search API to collect tweets from two years ago. Per the docs:
Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week. - Twitter documentation.
If you need a way to get old tweets, you can get them from individual users because collecting tweets from them is limited by number rather than time (so in many cases you can go back months or years). A third-party service that collects tweets like Topsy may be useful in your case as well (shut down as of July 2016, but other services exist).
Found one code that would help retrieve older tweets.
https://github.com/Jefferson-Henrique/GetOldTweets-python
To get old tweets, run the following command in the directory where the code repository got extracted.
python Exporter.py --querysearch 'keyword' --since 2016-01-10 --until 2016-01-15 --maxtweets 1000
And it returned a file 'output_got.csv' with 1000 tweets during the above days with your keyword
You need to install a module 'pyquery' for this to work
PS: You can modify 'Exporter.py' python code file to get more tweet attributes as per your requirement.
2018 update:
Twitter has Premium search APIs that can return results from the beginning of time (2006):
https://developer.twitter.com/en/docs/tweets/search/overview/premium#ProductPackages
Search Tweets: 30-day endpoint → provides Tweets from the previous 30
days.
Search Tweets: Full-archive endpoint → provides complete and instant
access to Tweets dating all the way back to the first Tweet in March
2006.
With an example Python client:
https://github.com/twitterdev/search-tweets-python
Knowing this is a very old question but still, some folks might be facing the same issue.
After some digging, I found out Tweepy's search only returns data for the past 7 days and that some times lead to buy third party service.
I utilised python library, GetOldTweets3 and it worked fine for me. The utility of this library is really easy. The only limitation of this library that we can't search for more than one hashtag in one execution but it works fine to search for multiple accounts at the same time.
use the args "since" and "until" to adjust your timeframe. You are presently using since_id which is meant to correspond to twitter id values (not dates):
for tweet in tweepy.Cursor(api.search,
q="test",
since="2014-01-01",
until="2014-02-01",
lang="en").items():
As others have noted, the Twitter API has the date limitation, but not the actual advanced search as implemented on twitter.com. So so the solution is to use Python's wrapper for Selenium or PhantomJS to iterate through the twitter.com endpoint. Here's an implementation using Selenium that someone has posted on Github: https://github.com/bpb27/twitter_scraping/
I can't believe nobody said this but this git repository completely solved my problem. I haven't been able to utilize other solutions such as GOT or Twitter API Premium.
Try this, definitely useful:
https://betterprogramming.pub/how-to-scrape-tweets-with-snscrape-90124ed006af
https://github.com/MartinBeckUT/TwitterScraper/tree/master/snscrape/cli-with-python
Related
I am currently trying to find tweets that mention a twitter user for example #Google. I am trying to get a list of tweets that mention #Google. I have searched my query on twitter, and I have received results, however when I use tweepy it only brings back one row. I am using Google as an example here, it is not the actual account I am searching for mentions of.
I have tried other searches and they seem to bring back results with tweepy, however this is weird as there are definitely plenty of tweets mentioning #Google.
for tweet in tweepy.Cursor(api.search,
q="#Google",
count=100,
result_type="recent",
include_entities=True,
lang="en").items():
print("result found")
I am expecting more than one result to come back, from doing a search on twitter for #Google you can get loads of mentions,so I am expecting to at least a decent subset of those mentions.
# Initiate the connection to Twitter
twitter = Twitter(auth=oauth)
# Search for latest tweets about "pakistan"
results = twitter.search.tweets(q='pakistan',until=2008 - 08 - 19, )
print results
I am trying to retrieve tweets that are earlier than this date by one week. It does not return anything. However, I have searched manually on twitter and found that tweets exist.
When you use the Twitter API to download tweets you will have access to tweets back to roughly one week old. This is despite the fact that you can see tweets older than one week on Twitter's website. This is a built-in limitation of the API.
To have access to a bigger time span you can do the following ways:
download everyday data and add up gradually.
you can search on the web to find a dataset
The best way is to ask Twitter to give you the data for a specific time span while you have an API developer account. You have asked for a quote using this address:
https://www.trackmyhashtag.com/twitter-dataset#request-data-form
First of all let me note that this is the first time I am using the Twitter API, so I could be missing something obvious.
What I want to do is get all tweets that include a given hashtag. My research lead me to use the twitter search API. I've tried using it, however I only seem to get about 6 tweets, when I know that the hashtag has thousands of tweets.
So my question is, how do I actually get all of the tweets for a given hashtag? Or at least more than 6 tweets.
For reference, there is the python code I am using to fetch tweets with the #hillarysoqualified hashtag (replace the keys obviously):
from twitter import Twitter, OAuth
ACCESS_TOKEN = 'access_token'
ACCESS_SECRET = 'access_secret'
CONSUMER_KEY = 'consumer_key'
CONSUMER_SECRET = 'consumer_secret'
oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
t = Twitter(auth=oauth)
query = t.search.tweets(q='%23hillarysoqualified')
for s in query['statuses']:
print(s['created_at'], s['text'], '\n')
It appears I had not read the docs - Twitter search API only gives you tweets from the last week. Hope this helps if someone else tries to do what I did without knowing.
yes, the standard search API gives you only access to last 7 days, but you request an access to a premium Search API sandbox for free which gives access to up to 30 days. you can find more information here https://developer.twitter.com/en/docs/tweets/search/overview
I am writing a script in Python, that uses tweepy to search for tweets with a given keyword. Here is the snippet:
for tweet in tweepy.Cursor(api.search, q=keyword, lang="en").items(10):
print tweet.id
I have everything authenticated properly and the code works most of the time. However, when I try to search for some keywords (examples below) it doesn't return anything.
The keywords that cause trouble are "digitalkidz" (a tech conference) and "newtrendbg" (a Bulgarian company). If you do a quick search on Twitter for either of those you will see that there are results. However, tweepy doesn't find anything. Again, it does work for pretty much any other keyword I use.
Do you have any ideas what might be the problem and how to fix it?
Thank you
I believe you're forgetting an important aspect of the twitter api, it's not exhaustive.
Taken from the api docs
Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
Regardless of whether you're using the streaming or rest api, you're going to have issues with this if you're looking for specific tweets.
Rest API
When looking for historical tweets, you unfortunately won't be able to obtain anything that is older than a week using api.search(). This is also shown in the docs.
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
There are other ways of getting older tweets, this post details those options.
Streaming API
While it doesn't sound like you're using twitter's streaming API, it should be noted that this only gives a small sample of twitter's current tweet traffic (~1-2%).
Hopefully this is helpful. Let me know if you have any questions.
Is there any better way to get tweets from twitter then crawling twitter.com and mutating URLs?
If there is, how can I get latest tweets with current hash-tag?
Thank you!
Did you try Twitter REST API? Particulary, you can use search tweets endpoint. There are some limitations though, enforced by Twitter.
You can use one of many available python libraries.
For example, some sample code for tweepy can be found here.