I want to check if a certain tweet is a reply to the tweet that I sent. Here is how I think I can do it:
Step1: Post a tweet and store id of posted tweet
Step2: Listen to my handle and collect all the tweets that have my handle in it
Step3: Use tweet.in_reply_to_status_id to see if tweet is reply to the stored id
In this logic, I am not sure how to get the status id of the tweet that I am posting in step 1. Is there a way I can get it? If not, is there another way in which I can solve this problem?
What one could do, is get the last nth tweet from a user, and then get the tweet.id of the relevant tweet. This can be done doing:
latestTweets = api.user_timeline(screen_name = 'user', count = n, include_rts = False)
I, however, doubt that it is the most efficient way.
When you call the update_status method of your tweepy.API object, it returns a Status object. This object contains all of the information about your tweet.
Example:
my_tweet_ids = list()
api = tweepy.API(auth)
# Post to twitter
status = api.update_status(status='Testing out my Twitter')
# Append this tweet's id to my list of tweets
my_tweet_ids.append(status.id)
Then you can just iterate over the list using for each tweet_id in my_tweet_ids and check the number of replies for each one.
Related
I want to fetch the latest tweet if the keyword is met from a bounch of users at Twitter in real time.
This code fetchs the latest tweet if 'Twitter' keyword is met, and stores it in the "store" variable every 5 seconds and goes on forever.
Is there a way to make it to only fetch the tweet if it isent already present in the store variable. And if its already there it should stay on and search for the next tweet but not fetch it?
import tweepy
import time
api = 'APIKEY'
apisq = 'APISQ'
acc_tok = 'TOK'
acc_sq = 'TOkSQ'
auth = tweepy.OAuthHandler(api, apisq)
auth.set_access_token(acc_tok, acc_sq)
api = tweepy.API(auth)
store = []
username = 'somename'
while True:
first = []
get_tweets = api.user_timeline(screen_name=username, count=1)
test = get_tweets[0]
first.append(test.text)
time.sleep(5)
if any('Twitter' in word for word in first):
store.append(first)
print(store)
else:
continue
Ive tried with some Conditional Statements but has not been very succesful yet.
I think the important piece of data would be the 'id' field returned in the list. You could either add the tweets to a dictionary where the key would be the 'id' and the value the text of the tweet, or create a second list that contains the 'id' and then create a filter condition to validate that the 'id' isn't present in the other list before adding the new tweet.
The dictionary method is likely the quickest computationally, but the second list method is likely the easiest conceptually.
I'm kind of a python newb and stuck with tweepy here.
What I'm trying to do is bring in a bunch of user and tweet objects into a neo4j database with tweet and retweet relationships. My problem is in determining if a given status object is a retweet and if so the screen_name and id_str of the original author.
I can see the data if I print out tweet.retweets but I can't figure out how to get to it. tweepy's docs mention something about object models and for more information check out ModelsReference but google isn't helping me much here.
any help would be great, even just pointing me in the right direction. Thanks
Sample code
tweets=api.get_timeling(1234556)
twitter_user=api.get_user(123456)
for tweet in tweets:
neo4j_create_tweet_node(tweet)
if tweet.user.id == twitter_user.id:
create_tweet_relationship(twitter_user,tweet)
elif tweet.user.id != twitter_user.id:
create_retweet_relationship(twitter_user,tweet)
Suppose tweet is a retweet,
originalAuthorID = tweet.retweeted_status.user.id_str;
According to the Twitter API documentation:
Retweets can be distinguished from typical Tweets by the existence of a retweeted_status attribute. This attribute contains a representation of the original Tweet that was retweeted. Note that retweets of retweets do not show representations of the intermediary retweet, but only the original tweet.
I would use hasattr() to look for the presence of the retweeted_status atttibute in each Tweetpy tweet object.
The following code (where create_tweet_relationship() and create_retweet_relationship() are functions you have defined as in your example) seems like it should would work:
for tweet in tweets:
if hasattr(tweet, 'retweeted_status'):
create_tweet_relationship(tweet.retweeted_status.author, tweet)
else:
create_retweet_relationship(tweet.author, tweet)
If the tweet is a retweet of another, the original retweeted status is included in the JSON object in the field "retweeted_status". You'd get the user information there under the "user" field.
Twitter only returns 100 tweets per "page" when returning search results on the API. They provide the max_id and since_id in the returned search_metadata that can be used as parameters to get earlier/later tweets.
Twython 3.1.2 documentation suggests that this pattern is the "old way" to search:
results = twitter.search(q="xbox",count=423,max_id=421482533256044543)
for tweet in results['statuses']:
... do something
and that this is the "new way":
results = twitter.cursor(t.search,q='xbox',count=375)
for tweet in results:
... do something
When I do the latter, it appears to endlessly iterate over the same search results. I'm trying to push them to a CSV file, but it pushes a ton of duplicates.
What is the proper way to search for a large number of tweets, with Twython, and iterate through the set of unique results?
Edit: Another issue here is that when I try to iterate with the generator (for tweet in results:), it loops repeatedly, without stopping. Ah -- this is a bug... https://github.com/ryanmcgrath/twython/issues/300
I had the same problem, but it seems that you should just loop through a user's timeline in batches using the max_id parameter. The batches should be 100 as per Terence's answer (but actually, for user_timeline 200 is the max count), and just set the max_id to the last id in the previous set of returned tweets minus one (because max_id is inclusive). Here's the code:
'''
Get all tweets from a given user.
Batch size of 200 is the max for user_timeline.
'''
from twython import Twython, TwythonError
tweets = []
# Requires Authentication as of Twitter API v1.1
twitter = Twython(PUT YOUR TWITTER KEYS HERE!)
try:
user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200)
except TwythonError as e:
print e
print len(user_timeline)
for tweet in user_timeline:
# Add whatever you want from the tweet, here we just add the text
tweets.append(tweet['text'])
# Count could be less than 200, see:
# https://dev.twitter.com/discussions/7513
while len(user_timeline) != 0:
try:
user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200,max_id=user_timeline[len(user_timeline)-1]['id']-1)
except TwythonError as e:
print e
print len(user_timeline)
for tweet in user_timeline:
# Add whatever you want from the tweet, here we just add the text
tweets.append(tweet['text'])
# Number of tweets the user has made
print len(tweets)
As per the official Twitter API documentation.
Count optional
The number of tweets to return per page, up to a maximum of 100
You need to make repeated calls to the python method. However, there is no guarantee that these will be the next N, or if the tweets are really coming in it might miss some.
If you want all the tweets in a time frame you can use the streaming api: https://dev.twitter.com/docs/streaming-apis and combine this with the oauth2 module.
How can I consume tweets from Twitter's streaming api and store them in mongodb
python-twitter streaming api support/example
Disclaimer: i have not actually tried this
As a solution to the problem of returning 100 tweets for a search query using Twython, here is the link showing how it can be done using the "old way":
Twython search API with next_results
I'm very new to twitter api, and was wondering if I use search api, and I want to call it every minute, to retrieve about a 1000 tweets. Will I get duplicate tweets if in case there were created less than a 1000 tweets for a given criteria or I will call it more often than once a minute
I hope my question is clear, just in case if it matters I use python-twitter library.
and the way I get tweets is :
self.api = twitter.Api(consumer_key, consumer_secret ,access_key, access_secret)
self.api.VerifyCredentials()
self.api.GetSearch(self.hashtag, per_page=100)
Your search results will overlap because the API has no idea what you searched before. One way to prevent the overlap is to use use the tweet ID from the last retrieved tweet. Here is a python 2.7 snippet from my code:
maxid = 10000000000000000000
for i in range(0,10):
with open('output.json','a') as outfile:
time.sleep(5) # don't piss off twitter
print 'maxid=',maxid,', twitter loop',i
results = api.GetSearch('search_term', count=100,max_id = maxid)
for tweet in results:
tweet = str(tweet).replace('\n',' ').replace('\r',' ') # remove new lines
tweet = (json.loads(tweet))
maxid = tweet['id'] # redefine maxid
json.dump(tweet,outfile)
outfile.write('\n') #print tweets on new lines
This code gives you 10 loops of 100 tweets since the last id, which is defined each time through the loop. It then write a json file (with one tweet per line). I use this code to search into the recent past, but you can adapt it to have non-overlapping tweets by changing the 'max_id' to 'since_id'.
I want to know a way of getting tweet IDs to keep a check on what tweets have been displayed in the timeline of user in the python app I am making using tweepy.
There doesn't seem to be a way I get extract the tweet IDs or keep track of them. The parameter to keep check is since_id. Please if anyone could help.
The tweepy library follows the twitter API closely. All attributes returned by that API are available on the result objects; so for status messages you need to look at the tweet object description to see they have an id parameter:
for status in api.user_timeline():
print status.id
Store the most recent id to poll for updates.
The max_id and since_id are parameters for the api.user_timeline() method.
Using the tweepy.Cursor() object might look something like this:
tweets = []
for tweet in tweepy.Cursor(api.user_timeline,
screen_name=<twitter_handle>,
since_id = <since_id>
).items(<count>):
tweets.append(tweet)