I'm kind of a python newb and stuck with tweepy here.
What I'm trying to do is bring in a bunch of user and tweet objects into a neo4j database with tweet and retweet relationships. My problem is in determining if a given status object is a retweet and if so the screen_name and id_str of the original author.
I can see the data if I print out tweet.retweets but I can't figure out how to get to it. tweepy's docs mention something about object models and for more information check out ModelsReference but google isn't helping me much here.
any help would be great, even just pointing me in the right direction. Thanks
Sample code
tweets=api.get_timeling(1234556)
twitter_user=api.get_user(123456)
for tweet in tweets:
neo4j_create_tweet_node(tweet)
if tweet.user.id == twitter_user.id:
create_tweet_relationship(twitter_user,tweet)
elif tweet.user.id != twitter_user.id:
create_retweet_relationship(twitter_user,tweet)
Suppose tweet is a retweet,
originalAuthorID = tweet.retweeted_status.user.id_str;
According to the Twitter API documentation:
Retweets can be distinguished from typical Tweets by the existence of a retweeted_status attribute. This attribute contains a representation of the original Tweet that was retweeted. Note that retweets of retweets do not show representations of the intermediary retweet, but only the original tweet.
I would use hasattr() to look for the presence of the retweeted_status atttibute in each Tweetpy tweet object.
The following code (where create_tweet_relationship() and create_retweet_relationship() are functions you have defined as in your example) seems like it should would work:
for tweet in tweets:
if hasattr(tweet, 'retweeted_status'):
create_tweet_relationship(tweet.retweeted_status.author, tweet)
else:
create_retweet_relationship(tweet.author, tweet)
If the tweet is a retweet of another, the original retweeted status is included in the JSON object in the field "retweeted_status". You'd get the user information there under the "user" field.
Related
I'm writing a program to scrape tweets between two specific dates from a user, while making sure that they are not retweets or replies. I am using snscrape and tweepy.
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('from:' + lines[x] + ' since:' + startDate + ' until:' + endDate).get_items()):
if tweet.retweetedTweet is None and tweet.inReplyToTweetId is None and tweet.inReplyToUser is None:
This is what I have for the check, however, if the tweet is in reply to a tweet that has been deleted, then the tweet is no longer considered a reply and the check passes as None. Is there a way around this? I'm looking at pulling tweets from large companies like Tesco and Sainsburys and manually sorting through their tweets by hand will be tedious and want to find a way to fix this within the code.
An example of this is this tweet, as the code passes the check for inReplyToTweetId is None
Any help would be greatly appreciated, thank you.
I actually solved this a lot quicker than I thought I would. Turns out, in the tweet object, the mentionedUsers array is empty for these specific tweets, so I added the following if statement which solved the problem:
if not ('#' in tweet.content and not tweet.mentionedUsers):
This just checks whether the user has a mention (# symbol) in the actual text of the tweet and also whether the mentionedUsers array is empty to discard it as being a reply.
I've written a simple script to get the most trending 300 tweets containing a specific hashtag.
for self._tweet in tweepy.Cursor(self._api.search,q=self._screen_name,count=300, lang="en").items(300):
self._csvWriter.writerow([self._tweet.created_at, self._tweet.text.encode('utf-8')])
It works well and it save the result to CSV but the tweets are truncated.
I modified the code like this, adding the twitter_mode=extended parameter:
for self._tweet in tweepy.Cursor(self._api.search,q=self._screen_name,count=300, lang="en", tweet_mode="extended").items(300):
self._csvWriter.writerow([self._tweet.created_at, self._tweet.text.encode('utf-8')])
But I got this exception:
AttributeError: 'Status' object has no attribute 'text
My question is: how can I save an complete tweet using a Cursor? (complete = not truncated)
Thanks in advance (and sorry, I'm a Tweepy newbie trying to learn as much as possible)
You're really close, do this instead:
for self._tweet in tweepy.Cursor(self._api.search,q=self._screen_name,count=300, lang="en", tweet_mode="extended").items(300):
self._csvWriter.writerow([self._tweet.created_at, self._tweet.full_text.encode('utf-8')])
Notice that I used full_text in self._tweet.full_text.encode('utf-8'), rather than just text. The text property is null when you use tweet_mode='extended' and the tweet appears in full_text instead.
I created another twitter acc to help promote my main, so I was wondering how do you retweet ones account using twython are there any examples?
I found a few but I'm still a little confused? thanks!
What I am trying is:
user_timeline=twitter.getUserTimeline(sreen_name="slaughdaradio", count = 100,)
to tweet in user_timeline:
print tweet ['text']
It keeps giving me a syntax error for 'tweet'.
print tweet ['text']
change to
print(tweet ['text'])
I'm assume your python version is Python3. In Python3, print have changed to print().
I want to check if a certain tweet is a reply to the tweet that I sent. Here is how I think I can do it:
Step1: Post a tweet and store id of posted tweet
Step2: Listen to my handle and collect all the tweets that have my handle in it
Step3: Use tweet.in_reply_to_status_id to see if tweet is reply to the stored id
In this logic, I am not sure how to get the status id of the tweet that I am posting in step 1. Is there a way I can get it? If not, is there another way in which I can solve this problem?
What one could do, is get the last nth tweet from a user, and then get the tweet.id of the relevant tweet. This can be done doing:
latestTweets = api.user_timeline(screen_name = 'user', count = n, include_rts = False)
I, however, doubt that it is the most efficient way.
When you call the update_status method of your tweepy.API object, it returns a Status object. This object contains all of the information about your tweet.
Example:
my_tweet_ids = list()
api = tweepy.API(auth)
# Post to twitter
status = api.update_status(status='Testing out my Twitter')
# Append this tweet's id to my list of tweets
my_tweet_ids.append(status.id)
Then you can just iterate over the list using for each tweet_id in my_tweet_ids and check the number of replies for each one.
I want to know a way of getting tweet IDs to keep a check on what tweets have been displayed in the timeline of user in the python app I am making using tweepy.
There doesn't seem to be a way I get extract the tweet IDs or keep track of them. The parameter to keep check is since_id. Please if anyone could help.
The tweepy library follows the twitter API closely. All attributes returned by that API are available on the result objects; so for status messages you need to look at the tweet object description to see they have an id parameter:
for status in api.user_timeline():
print status.id
Store the most recent id to poll for updates.
The max_id and since_id are parameters for the api.user_timeline() method.
Using the tweepy.Cursor() object might look something like this:
tweets = []
for tweet in tweepy.Cursor(api.user_timeline,
screen_name=<twitter_handle>,
since_id = <since_id>
).items(<count>):
tweets.append(tweet)