Exclude retweets using tweepy's API.search function

Exclude retweets using tweepy's API.search function - python

I am using the tweepy library in python to search for tweets that contain a certain word. Retrieving all tweets results in a long list, which also includes a lot of retweets. I want to exclude these retweets. The following code works, but now each tweet is processed (also the retweets), which is not ideal considering the rate limit:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
query = 'test'
max_tweets = 100
for tweet in tweepy.Cursor(api.search, q=query).items(max_tweets):
jsontweet = json.dumps(tweet._json)
jsontweet = json.loads(jsontweet)
if not 'retweeted_status' in jsontweet:
print(tweet)
Is there a way in which I can specify within my search request to not include retweets? I found that I could include include_rts = False in my code in this post, but I do not know where, and whether it is also working for the API.search function. I was unable to find how to include this parameter in this function in the tweepy documentation.

Related

Twitter user_timeline not returning enough tweets

I'm currently using the GET statuses/user_timeline twitter API in python to retrieve tweets, It says in the docs This method can only return up to 3,200 of a user’s most recent Tweets however when i run it, it only returns 100-150.
How do i get it to return more tweets than it is already?
Thanks in advance

You'll have to write code to work with timelines. Twitter's Working with timelines documentation has a discussion of how and why Twitter timelines work the way they do.
Essentially, set your count to the maximum amount (200) and use since_id and max_id to manage reading each page. Alternatively, you can use an existing library to make the task much easier. Here's an example, using the tweepy library.
consumer_key = "<your consumer_key>"
consumer_secret = "<your consumer_secret>"
access_token = "<your access_token>"
access_token_secret = "<your access_token_secret>"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
for status in tweepy.Cursor(api.user_timeline, "JoeMayo").items():
print('status_id: {}, text: {}'.format(status.id, status.text.encode('utf-8')))

Why tweepy cannot retrieve media data of some tweets?

I'm using tweepy to develop the program that retrieves media urls and download them. While testing some tweets, I found something weird. So this is what I did:
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
status = api.get_status(908827394856947712)
The original url of this tweet is 'https://twitter.com/realDonaldTrump/status/908827394856947712' and this tweet DOES contain an image. While studying status._json, I figured out that the links of media files are contained in either status._json['entities'] or status._json['extended_entities'] but I couldn't find ['extended_entities'] and ['entities'] doesn't contain image link.
What makes me annoying is that some tweets have this problem and most do not. So why does this happen and how can I solve this problem?

If you take a look through the response, you will see "truncated": true,
Twitter recently changed how tweets are presented - see their documentation https://dev.twitter.com/overview/api/upcoming-changes-to-tweets
With your request, you need to set tweet_mode=extended
So: api.get_status('908827394856947712', tweet_mode='extended')

Is there a way to search for Top tweets with tweepy instead of latest tweets?

It seem tweepy's search function only accesses the latest tweets. Is there a way to switch it so it searches the top tweets of a given query? Is there a workaround to retrieve the popular tweets if search cannot do this? I'm running OS X and python 3.6.
Thanks in advance.

Along with your query topic, pass a result_type='popular' parameter to tweepy's search function. Although the tweepy documentation does not list this in its parameters, it is an available parameter in the Twitter Dev Docs.
popular_tweets = api.search(q='python', result_type='popular') # e.g. "python"
popular : return only the most popular results in the response.

This should work (Add your Keys)
import tweepy
consumer_key =""
consumer_secret =""
access_token =""
access_token_secret =""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = api.search(q='keyword', result_type='popular')

Using tweepy to follow people tweeting a specific hashtag

This is one of my first python projects and I'm using Tweepy to trying to search for a specific hashtag and follow those people tweeting that hashtag. I don't understand why this doesn't work and I've tried to append followers to list but nothing either. I've read the tweepy docs and this is what I've come up with:
import tweepy
import time
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
for follower in tweepy.Cursor(api.search, q="#kenbone").items():
api.create_friendship(screen_name = follower)
print(follower)

The screen_name is part of the author attribute, this works for me
api.create_friendship(screen_name = follower.author.screen_name)

Want you are getting in your loop, the variable follower, is a user object with a huge lot of information about the user, and not just a name as you seem to believe. To get the screen name of a user object, use follower.screen_name:
api.create_friendship(screen_name = follower.screen_name)

Managing Tweepy API Search

Please forgive me if this is a gross repeat of a question previously answered elsewhere, but I am lost on how to use the tweepy API search function. Is there any documentation available on how to search for tweets using the api.search() function?
Is there any way I can control features such as number of tweets returned, results type etc.?
The results seem to max out at 100 for some reason.
the code snippet I use is as follows
searched_tweets = self.api.search(q=query,rpp=100,count=1000)

I originally worked out a solution based on Yuva Raj's suggestion to use additional parameters in GET search/tweets - the max_id parameter in conjunction with the id of the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError.
However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor (see tweepy Cursor tutorial for more on using Cursor).
The following code fetches the most recent 1000 mentions of 'python'.
import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweets with the following:
searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
count = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
if not new_tweets:
break
searched_tweets.extend(new_tweets)
last_id = new_tweets[-1].id
except tweepy.TweepError as e:
# depending on TweepError.code, one may want to retry or wait
# to keep things simple, we will give up on an error
break

There's a problem in your code. Based on Twitter Documentation for GET search/tweets,
The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was
formerly the "rpp" parameter in the old Search API.
Your code should be,
CONSUMER_KEY = '....'
CONSUMER_SECRET = '....'
ACCESS_KEY = '....'
ACCESS_SECRET = '....'
auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
search_results = api.search(q="hello", count=100)
for i in search_results:
# Do Whatever You need to print here

The other questions are old and the API changed a lot.
Easy way, with Cursor (see the Cursor tutorial). Pages returns a list of elements (You can limit how many pages it returns. .pages(5) only returns 5 pages):
for page in tweepy.Cursor(api.search, q='python', count=100, tweet_mode='extended').pages():
# process status here
process_page(page)
Where q is the query, count how many will it bring for requests (100 is the maximum for requests) and tweet_mode='extended' is to have the full text. (without this the text is truncated to 140 characters) More info here. RTs are truncated as confirmed jaycech3n.
If you don't want to use tweepy.Cursor, you need to indicate max_id to bring the next chunk. See for more info.
last_id = None
result = True
while result:
result = api.search(q='python', count=100, tweet_mode='extended', max_id=last_id)
process_result(result)
# we subtract one to not have the same again.
last_id = result[-1]._json['id'] - 1

I am working on extracting twitter data for around a location (in here, around India), for all tweets which include a special keyword or a list of keywords.
import tweepy
import credentials ## all my twitter API credentials are in this file, this should be in the same directory as is this script
## set API connection
auth = tweepy.OAuthHandler(credentials.consumer_key,
credentials.consumer_secret)
auth.set_access_secret(credentials.access_token,
credentials.access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True) # set wait_on_rate_limit =True; as twitter may block you from querying if it finds you exceeding some limits
search_words = ["#covid19", "2020", "lockdown"]
date_since = "2020-05-21"
tweets = tweepy.Cursor(api.search, =search_words,
geocode="20.5937,78.9629,3000km",
lang="en", since=date_since).items(10)
## the geocode is for India; format for geocode="lattitude,longitude,radius"
## radius should be in miles or km
for tweet in tweets:
print("created_at: {}\nuser: {}\ntweet text: {}\ngeo_location: {}".
format(tweet.created_at, tweet.user.screen_name, tweet.text, tweet.user.location))
print("\n")
## tweet.user.location will give you the general location of the user and not the particular location for the tweet itself, as it turns out, most of the users do not share the exact location of the tweet
Results:
created_at: 2020-05-28 16:48:23
user: XXXXXXXXX
tweet text: RT #Eatala_Rajender: Media Bulletin on status of positive cases #COVID19 in Telangana. (Dated. 28.05.2020)
# TelanganaFightsCorona
# StayHom…
geo_location: Hyderabad, India

You can search the tweets with specific strings as showed below:
tweets = api.search('Artificial Intelligence', count=200)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exclude retweets using tweepy's API.search function - python

Related

Twitter user_timeline not returning enough tweets

Why tweepy cannot retrieve media data of some tweets?

Is there a way to search for Top tweets with tweepy instead of latest tweets?

Using tweepy to follow people tweeting a specific hashtag

Managing Tweepy API Search

Categories

Resources