Fetch the latest tweet from Twitter with Tweepy - python

I want to fetch the latest tweet if the keyword is met from a bounch of users at Twitter in real time.
This code fetchs the latest tweet if 'Twitter' keyword is met, and stores it in the "store" variable every 5 seconds and goes on forever.
Is there a way to make it to only fetch the tweet if it isent already present in the store variable. And if its already there it should stay on and search for the next tweet but not fetch it?
import tweepy
import time
api = 'APIKEY'
apisq = 'APISQ'
acc_tok = 'TOK'
acc_sq = 'TOkSQ'
auth = tweepy.OAuthHandler(api, apisq)
auth.set_access_token(acc_tok, acc_sq)
api = tweepy.API(auth)
store = []
username = 'somename'
while True:
first = []
get_tweets = api.user_timeline(screen_name=username, count=1)
test = get_tweets[0]
first.append(test.text)
time.sleep(5)
if any('Twitter' in word for word in first):
store.append(first)
print(store)
else:
continue
Ive tried with some Conditional Statements but has not been very succesful yet.

I think the important piece of data would be the 'id' field returned in the list. You could either add the tweets to a dictionary where the key would be the 'id' and the value the text of the tweet, or create a second list that contains the 'id' and then create a filter condition to validate that the 'id' isn't present in the other list before adding the new tweet.
The dictionary method is likely the quickest computationally, but the second list method is likely the easiest conceptually.

Related

How to iterate through a list of Twitter users using Snscrape?

I trying to retrieve tweets over a list of users, however in the snscrape function this argument is inside quotes, which makes the username to be taken as a fixed input
import snscrape.modules.twitter as sntwitter
tweets_list1 = []
users_name = [{'username':'#bbcmundo'},{'username':'#nytimes'}]
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:{}').get_items().format(username)):
if i>100:
break
tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.url,\
tweet.user.username, tweet.user.followersCount,tweet.replyCount,\
tweet.retweetCount, tweet.likeCount, tweet.quoteCount, tweet.lang,\
tweet.outlinks, tweet.media, tweet.retweetedTweet, tweet.quotedTweet,\
tweet.inReplyToTweetId, tweet.inReplyToUser, tweet.mentionedUsers,\
tweet.coordinates, tweet.place, tweet.hashtags, tweet.cashtags])
As output Python get:
`AttributeError: 'generator' object has no attribute 'format'
This code works fine replacing the curly braces with the username and deleting the .format attribute. If you want replicate this code be sure install snscrape library using:
pip install git+https://github.com/JustAnotherArchivist/snscrape.git
I found some mistakes that I did writing this code. So, I want to share with all of you just in case you need it and overcome your stuck with this very same problem or a similar one:
First: I changed the users_name format, from a dict to a list items.
Second: I put the format attribute in the right place. Right after text input function
Third: I added a nested loop to scrape each Twitter user account
users_name = ['bbcmundo','nytimes']
for n, k in enumerate(users_name):
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:{}'.format(users_name[n])).get_items()):
if i>100:
break
tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.url,\
tweet.user.username, tweet.user.followersCount,tweet.replyCount,\
tweet.retweetCount, tweet.likeCount, tweet.quoteCount, tweet.lang,\
tweet.outlinks, tweet.media, tweet.retweetedTweet, tweet.quotedTweet,\
tweet.inReplyToTweetId, tweet.inReplyToUser, tweet.mentionedUsers,\
tweet.coordinates, tweet.place, tweet.hashtags, tweet.cashtags])
You can avoid to make several requests by using more than one from criteria:
users = ['bbcmundo','nytimes']
filters = ['since:2022-07-06', 'until:2022-07-07']
from_filters = []
for user in users:
from_filters.append(f'from:{user}')
filters.append(' OR '.join(from_filters))
tweets = list(sntwitter.TwitterSearchScraper(' '.join(filters)).get_items())
# The argument is 'since:2022-07-06 until:2022-07-07 from:bbcmundo OR from:nytimes'

How to filter keyword search in tweepy

I need to create a program with tweepy for a homework. Im not a programmer.
I would like the program to search for menacing tweets toward for example Justin Trudeau. And then send me an email when it spot one.
To determine if a tweet is menacing or not, the tweet would have to contain, for example, the keyword "trudeau" and one of the following "bomb" or "kill". Once i get this to work, I'll refine the keyword filter.
So i have tried this:
api = tweepy.API(auth)
searchterm1 = "trudeau"
searchterm2 = "bomb" or "kill"
search = tweepy.Cursor(api.search,
q= searchterm1 and searchterm2
lang="en",
result_type="recent").items(10)
for item in search:
print (item.text)
But it only shows me tweets with the last keyword, not one of them like it should with the or function, no?
I want to show only tweets that contain the word "trudeau" and one of the keyword in searchterm2
Thanks for your help
You're gonna need this, where you'll find that your query string should be:
q = 'trudeau bomb OR kill'
From your example you can get to that query string like this:
searchterm1 = 'trudeau'
searchterm2 = 'bomb OR kill'
q = ' '.join([searchterm1, searchterm2])

How to get id of the tweet posted in tweepy

I want to check if a certain tweet is a reply to the tweet that I sent. Here is how I think I can do it:
Step1: Post a tweet and store id of posted tweet
Step2: Listen to my handle and collect all the tweets that have my handle in it
Step3: Use tweet.in_reply_to_status_id to see if tweet is reply to the stored id
In this logic, I am not sure how to get the status id of the tweet that I am posting in step 1. Is there a way I can get it? If not, is there another way in which I can solve this problem?
What one could do, is get the last nth tweet from a user, and then get the tweet.id of the relevant tweet. This can be done doing:
latestTweets = api.user_timeline(screen_name = 'user', count = n, include_rts = False)
I, however, doubt that it is the most efficient way.
When you call the update_status method of your tweepy.API object, it returns a Status object. This object contains all of the information about your tweet.
Example:
my_tweet_ids = list()
api = tweepy.API(auth)
# Post to twitter
status = api.update_status(status='Testing out my Twitter')
# Append this tweet's id to my list of tweets
my_tweet_ids.append(status.id)
Then you can just iterate over the list using for each tweet_id in my_tweet_ids and check the number of replies for each one.

Does tweets from search api overlap?

I'm very new to twitter api, and was wondering if I use search api, and I want to call it every minute, to retrieve about a 1000 tweets. Will I get duplicate tweets if in case there were created less than a 1000 tweets for a given criteria or I will call it more often than once a minute
I hope my question is clear, just in case if it matters I use python-twitter library.
and the way I get tweets is :
self.api = twitter.Api(consumer_key, consumer_secret ,access_key, access_secret)
self.api.VerifyCredentials()
self.api.GetSearch(self.hashtag, per_page=100)
Your search results will overlap because the API has no idea what you searched before. One way to prevent the overlap is to use use the tweet ID from the last retrieved tweet. Here is a python 2.7 snippet from my code:
maxid = 10000000000000000000
for i in range(0,10):
with open('output.json','a') as outfile:
time.sleep(5) # don't piss off twitter
print 'maxid=',maxid,', twitter loop',i
results = api.GetSearch('search_term', count=100,max_id = maxid)
for tweet in results:
tweet = str(tweet).replace('\n',' ').replace('\r',' ') # remove new lines
tweet = (json.loads(tweet))
maxid = tweet['id'] # redefine maxid
json.dump(tweet,outfile)
outfile.write('\n') #print tweets on new lines
This code gives you 10 loops of 100 tweets since the last id, which is defined each time through the loop. It then write a json file (with one tweet per line). I use this code to search into the recent past, but you can adapt it to have non-overlapping tweets by changing the 'max_id' to 'since_id'.

How to get tweet IDs (since_id, max_id) in tweepy (python)?

I want to know a way of getting tweet IDs to keep a check on what tweets have been displayed in the timeline of user in the python app I am making using tweepy.
There doesn't seem to be a way I get extract the tweet IDs or keep track of them. The parameter to keep check is since_id. Please if anyone could help.
The tweepy library follows the twitter API closely. All attributes returned by that API are available on the result objects; so for status messages you need to look at the tweet object description to see they have an id parameter:
for status in api.user_timeline():
print status.id
Store the most recent id to poll for updates.
The max_id and since_id are parameters for the api.user_timeline() method.
Using the tweepy.Cursor() object might look something like this:
tweets = []
for tweet in tweepy.Cursor(api.user_timeline,
screen_name=<twitter_handle>,
since_id = <since_id>
).items(<count>):
tweets.append(tweet)

Categories