Please forgive me if this is a gross repeat of a question previously answered elsewhere, but I am lost on how to use the tweepy API search function. Is there any documentation available on how to search for tweets using the api.search() function?
Is there any way I can control features such as number of tweets returned, results type etc.?
The results seem to max out at 100 for some reason.
the code snippet I use is as follows
searched_tweets = self.api.search(q=query,rpp=100,count=1000)
I originally worked out a solution based on Yuva Raj's suggestion to use additional parameters in GET search/tweets - the max_id parameter in conjunction with the id of the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError.
However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor (see tweepy Cursor tutorial for more on using Cursor).
The following code fetches the most recent 1000 mentions of 'python'.
import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweets with the following:
searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
count = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
if not new_tweets:
break
searched_tweets.extend(new_tweets)
last_id = new_tweets[-1].id
except tweepy.TweepError as e:
# depending on TweepError.code, one may want to retry or wait
# to keep things simple, we will give up on an error
break
There's a problem in your code. Based on Twitter Documentation for GET search/tweets,
The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was
formerly the "rpp" parameter in the old Search API.
Your code should be,
CONSUMER_KEY = '....'
CONSUMER_SECRET = '....'
ACCESS_KEY = '....'
ACCESS_SECRET = '....'
auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
search_results = api.search(q="hello", count=100)
for i in search_results:
# Do Whatever You need to print here
The other questions are old and the API changed a lot.
Easy way, with Cursor (see the Cursor tutorial). Pages returns a list of elements (You can limit how many pages it returns. .pages(5) only returns 5 pages):
for page in tweepy.Cursor(api.search, q='python', count=100, tweet_mode='extended').pages():
# process status here
process_page(page)
Where q is the query, count how many will it bring for requests (100 is the maximum for requests) and tweet_mode='extended' is to have the full text. (without this the text is truncated to 140 characters) More info here. RTs are truncated as confirmed jaycech3n.
If you don't want to use tweepy.Cursor, you need to indicate max_id to bring the next chunk. See for more info.
last_id = None
result = True
while result:
result = api.search(q='python', count=100, tweet_mode='extended', max_id=last_id)
process_result(result)
# we subtract one to not have the same again.
last_id = result[-1]._json['id'] - 1
I am working on extracting twitter data for around a location (in here, around India), for all tweets which include a special keyword or a list of keywords.
import tweepy
import credentials ## all my twitter API credentials are in this file, this should be in the same directory as is this script
## set API connection
auth = tweepy.OAuthHandler(credentials.consumer_key,
credentials.consumer_secret)
auth.set_access_secret(credentials.access_token,
credentials.access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True) # set wait_on_rate_limit =True; as twitter may block you from querying if it finds you exceeding some limits
search_words = ["#covid19", "2020", "lockdown"]
date_since = "2020-05-21"
tweets = tweepy.Cursor(api.search, =search_words,
geocode="20.5937,78.9629,3000km",
lang="en", since=date_since).items(10)
## the geocode is for India; format for geocode="lattitude,longitude,radius"
## radius should be in miles or km
for tweet in tweets:
print("created_at: {}\nuser: {}\ntweet text: {}\ngeo_location: {}".
format(tweet.created_at, tweet.user.screen_name, tweet.text, tweet.user.location))
print("\n")
## tweet.user.location will give you the general location of the user and not the particular location for the tweet itself, as it turns out, most of the users do not share the exact location of the tweet
Results:
created_at: 2020-05-28 16:48:23
user: XXXXXXXXX
tweet text: RT #Eatala_Rajender: Media Bulletin on status of positive cases #COVID19 in Telangana. (Dated. 28.05.2020)
# TelanganaFightsCorona
# StayHom…
geo_location: Hyderabad, India
You can search the tweets with specific strings as showed below:
tweets = api.search('Artificial Intelligence', count=200)
Related
I am trying to scrape Twitter profiles for a project I am doing. I have the following code
from tweepy import OAuthHandler
import pandas as pd
"""I like to have my python script print a message at the beginning. This helps me confirm whether everything is set up correctly. And it's nice to get an uplifting message ;)."""
print("You got this!")
access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
tweets = []
count = 1
"""Twitter will automatically sample the last 7 days of data. Depending on how many total tweets there are with the specific hashtag, keyword, handle, or key phrase that you are looking for, you can set the date back further by adding since= as one of the parameters. You can also manually add in the number of tweets you want to get back in the items() section."""
for tweet in tweepy.Cursor(api.search, q="#BNonnecke", count=450, since='2020-02-28').items(50000):
print(count)
count += 1
try:
data = [tweet.created_at, tweet.id, tweet.text, tweet.user._json['screen_name'], tweet.user._json['name'], tweet.user._json['created_at'], tweet.entities['urls']]
data = tuple(data)
tweets.append(data)
except tweepy.TweepError as e:
print(e.reason)
continue
except StopIteration:
break
df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name', 'account_creation_date', 'urls'])
"""Add the path to the folder you want to save the CSV file in as well as what you want the CSV file to be named inside the single quotations"""
df.to_csv(path_or_buf = '/Users/Name/Desktop/FolderName/FileName.csv', index=False)
however, I keep getting the error "API" object has no attribute "search" from the line "for tweet in tweepy.Cursor(api.search, q="#BNonnecke", count=450, since='2020-02-28').items(50000):" I am not really sure why and don't know how to resolve this issue.
Thanks so much!
The latest version of Tweepy (v4 upwards) now has a search_tweets method instead of a search method. Check the documentation.
API.search_tweets(q, *, geocode, lang, locale, result_type, count, until, since_id, max_id, include_entities)
Also, read the comment in your code :-) The Search API has a 7 day history limit, so searching for Tweets since 2020-02-28 will only return Tweets posted in the 7 days before the date you run your code.
I have a list of tweets Id more than 100 and I want to get all retweets Id for each tweet Id the code that I used is for one tweet Id how can I give the list of tweets Id and check if there is retweets for this tweet print the user ids
# import the module
import tweepy
# assign the values accordingly
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
# authorization of consumer key and consumer secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access to user's access key and access secret
auth.set_access_token(access_token, access_token_secret)
# calling the api
api = tweepy.API(auth)
# the ID of the tweet
ID = 1265889240300257280
# getting the retweeters
retweets_list = api.retweets(ID)
# printing the screen names of the retweeters
for retweet in retweets_list:
print(retweet.user.screen_name)
can anyone help me ?
For getting Retweets from a list of Tweets, you'll need to iterate over your list of Tweet IDs and call the api.retweets function for each one in turn.
If your Tweets themselves have more than 100 Retweets, you'll hit a limitation in the API.
Per the Tweepy documentation:
API.retweets(id[, count])
Returns up to 100 of the first retweets of the given tweet.
The Twitter API itself only supports retrieving up to 100 Retweets, see the API documentation (this is the same API that Tweepy is calling):
GET statuses/retweets/:id
Returns a collection of the 100 most recent retweets of the Tweet specified by the id parameter.
This works for me:
for retweet in retweets_list:
print (retweets_list.retweet_count)
so I came up with this script to get the all of a user tweet from one twitter user
import tweepy
from tweepy import OAuthHandler
import json
def load_api():
Consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
return tweepy.API(auth)
api = load_api()
user = 'allkpop'
tweets = api.user_timeline(id=user, count=2000)
print('Found %d tweets' % len(tweets))
tweets_text = [t.text for t in tweets]
filename = 'tweets-'+user+'.json'
json.dump(tweets_text, open(filename, 'w'))
print('Saved to file:', filename)
but when I run it I can only get 200 tweets per request. Is there a way to get 2000 tweets or at least more than 2000 tweets?
please help me, thank you
The Twitter API has request limits. The one you're using corresponds to the Twitter statuses/user_timeline endpoint. The max number that you can get for this endpoint is documented as 3,200. Also note that there's a max number of requests in a 15-minute window, which might explain why you're only getting 2000, instead of the max. Here are a couple observations that might be interesting for you:
Documentation says that the max count is 200.
There's an include_rts (include retweets) parameter that might return more values. While it's part of the Twitter API, I can't see where Tweepy documents that.
You might try Tweepy Cursors to see if that will bring back more items.
Because of the 15 minute limits, you might be able to pause until the next 15 minute window to continue. That said, I don't know enough about your use case to know if this is practical or not.
I am using the tweepy library in python to search for tweets that contain a certain word. Retrieving all tweets results in a long list, which also includes a lot of retweets. I want to exclude these retweets. The following code works, but now each tweet is processed (also the retweets), which is not ideal considering the rate limit:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
query = 'test'
max_tweets = 100
for tweet in tweepy.Cursor(api.search, q=query).items(max_tweets):
jsontweet = json.dumps(tweet._json)
jsontweet = json.loads(jsontweet)
if not 'retweeted_status' in jsontweet:
print(tweet)
Is there a way in which I can specify within my search request to not include retweets? I found that I could include include_rts = False in my code in this post, but I do not know where, and whether it is also working for the API.search function. I was unable to find how to include this parameter in this function in the tweepy documentation.
I am using tweepy to get tweets about dengue in Brazil.
I am interesting in getting recent tweets about the 10 people with the most followers. I use the search api, not streaming api, because I don't need all the tweets, just the most relevant.
I am surprised to get so few tweets (only 17). Should I use the streaming api instead?
Here is my code:
#api access
consumer_key=""
consumer_secret=""
access_token_key=""
access_token_secret=""
import csv
#write results in file
writer= csv.writer(open(r"twitter.csv", "wt"), lineterminator='\n', delimiter =';')
writer.writerow(["date", "langage", "place", "country", "username", "nb_followers", "tweet_text"])
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)
api = tweepy.API(auth)
#get tweets for Brazil only
places = api.geo_search(query="Brazil", granularity="country")
place_id = places[0].id
print(place_id)
for tweet in tweepy.Cursor(api.search, q="dengue+OR+%23dengue&place:" + place_id, since="2015-08-01", until="2015-08-25").items():
date=tweet.created_at
langage=tweet.lang
try:
place=tweet.place.full_name
country=tweet.place.country
except:
place=None
country=None
username=tweet.user.screen_name
nb_followers=tweet.user.followers_count
tweet_text=tweet.text.encode('utf-8')
print("created on", tweet.created_at)
print("langage", tweet.lang)
print("place:", place)
print("country:", country)
print("user:", tweet.user.screen_name)
print("nb_followers:", tweet.user.followers_count)
print(tweet.text.encode("utf-8"))
print('')
writer.writerow([date, langage, place, country, username, nb_followers, tweet_text])
Try doing the search manually and seeing what you get. It sounds like your application is appropriate for the search API.
I think I know what the problem is: The place attribute is rarely present in the data. Thus very few tweets are returned.
I am now using the lang attribute with pt value (their is no pt-br langage unfortunately). It's not exactly what I want since it returns tweets from other countries such as Portugal but it's so far the best I could find.
for tweet in tweepy.Cursor(api.search, q="dengue+OR+%23dengue", lang="pt", since=date, until=end_date).items():