Checking if tweet is in reply to deleted tweet - python

I'm writing a program to scrape tweets between two specific dates from a user, while making sure that they are not retweets or replies. I am using snscrape and tweepy.
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('from:' + lines[x] + ' since:' + startDate + ' until:' + endDate).get_items()):
if tweet.retweetedTweet is None and tweet.inReplyToTweetId is None and tweet.inReplyToUser is None:
This is what I have for the check, however, if the tweet is in reply to a tweet that has been deleted, then the tweet is no longer considered a reply and the check passes as None. Is there a way around this? I'm looking at pulling tweets from large companies like Tesco and Sainsburys and manually sorting through their tweets by hand will be tedious and want to find a way to fix this within the code.
An example of this is this tweet, as the code passes the check for inReplyToTweetId is None
Any help would be greatly appreciated, thank you.

I actually solved this a lot quicker than I thought I would. Turns out, in the tweet object, the mentionedUsers array is empty for these specific tweets, so I added the following if statement which solved the problem:
if not ('#' in tweet.content and not tweet.mentionedUsers):
This just checks whether the user has a mention (# symbol) in the actual text of the tweet and also whether the mentionedUsers array is empty to discard it as being a reply.

Related

Getting tweets for different hashtags in one call using Tweepy

I want to get popular and recent tweets for different Hash tags, with Tweepy. The solution I have right now is shown below. But this method does a separate call for each hashtag, which is not good in the presence of rate limitations. Is it possible to do this in one call? If so, how?
for ht in hash_tags:
tweets = tweepy.Cursor(api.search, ht + " -filter:retweets",
result_type='mixed', since=date_since,
tweet_mode='extended').items(num_of_tweets)
add_tweets(tweets)
Rather than running a for loop in which you search for one hashtag at a time, you can run a search which contains many of the hashtags you need using the OR operator from Twitter's standard search operators. This is a broader search, and you'll get a mixed bag of results for each hashtag, but it cuts down on the number of requests you make overall.
So add all of your hashtags to a string, and pass this through as your query.
#just an example, not the nicest!
query_string = ""
for i in hash_tags:
if i == hash_tags[-1]:
query_string+=str(i)
else:
query_string+=str(i) + " OR "
tweets = tweepy.Cursor(api.search, q=query_string + " -filter:retweets",
result_type='mixed', since=date_since,
tweet_mode='extended').items(num_of_tweets)
add_tweets(tweets)
This means your search will be something like #coolstuff OR #reallycoolstuff OR #evencoolerstuff. It may be worth increasing the number of items returned when doing this, as the search will be so much more broad.
Keep in mind that there are limits to the size of query you can search with, so you may need to break this down into smaller queries if you've got a lot of hashtags. This may also help you get better results (e.g a more popular hashtag in your query string taking up a lot of your results, so do a separate search for less popular hashtags separately; how you'd measure that though, would be up to you!)
Hope this helps you get started.

Twitter Search API using regex to filter tweets: "no tweets found"

I'm currently trying to make a twitter bot that is supposed to reply to one tweet, which it filters using regex, and reply to it.
The relevant code looks as follows:
questionRegex = re.compile(regex here)
def searchWeatherRequest(weatherReport) :
for tweet in tweepy.Cursor(api.search,
q=questionRegex,
lang="en",
since=today).items(1):
try:
tweetId = tweet.user.id
username = tweet.user.screen_name
print ('\Tweet by: #' + username)
tweet.retweet()
api.update_status("#" + username + "Today's weather" + weatherReport)
print (tweet.text)
except tweepy.TweepError as e:
print (e.reason)
except StopIteration:
break
time.sleep(3600)
But whenever I run the code, I receive the message "no tweets found" (even after posting a tweet that would match the regex, so I know that it's not just because there are simply no tweets that would match it).
I also tried filtering the tweets in steps (first, I filter tweets using just one word, and then I filter those tweets using regex) but this did not work either.
Does anyone know what I'm doing wrong. I read multiple articles and questions about this but none of the solutions seemed to work.
I read one question you couldn't filter tweets using regex but other answers suggested otherwise. Is it true that you simply can't use regex, or am I encountering a simple coding error?
Unfortunately regexes won't work here. This is because the q= is expecting a string to come through and thus won't interperet the regex you're passing, instead I believe it'd either just error or take the re.compile(regex here) as a string itself, which of course, isn't likely to turn up many - if any - results.
So it looks like your current method isn't going to work. A workaround could be using Twitter's standard operators. You could build up strings using filter operations that when passed to the Cursor, essentially act the same way as your regex did. Keep in mind though that there are character limits and overly complicated queries may also be rejected. You can find details on that in the search tweets docs.
Another option would be to have a fairly general search, and then use your regex from there to filter the results. The answerer to a fairly similar question as yours shares some articles here.
Hope that helps and gets you on the right path.

Twitter: quoted tweet has not a quoted_status nor a quoted_status_id

I am converting some tweet IDs into tweet object with twython (I use python 2.7 on ubuntu 14.04).
As you can see here, a tweet has a boolean variable is_quote_status with obvious (I guess) meaning. Also, there are the variables quoted_status and quoted_status_id. About these two variables you can find on the above link that "This field only surfaces when the Tweet is a quote Tweet" so i guess they should exist whenever is_quote_status is True.
But the first time in the dataset I find a tweet with is_quote_status is True, this is what I get:
crazy_ID = XXXXXXXXXXXXXXX
twt = twitter.show_status(id = crazy_ID)
print twt['is_quote_status']
>>True
print twt['quoted_status']
>> KeyError: 'quoted_status'
print twt['quoted_status_id']
>> KeyError: 'quoted_status_id'
and I really don't know what to think about it. A direct check (i.e. print twt) shows me that effectively is_quoted_status is True but quote_status and quote_status_id are not contained in the tweet.
Let me note that the tweet was created in 2011 and I am not even sure quoted existed at that time, but if it is the case I am still wondering why is_quoted_status is True
So here is the question: how is that possible that a tweet has is_quoted_status = True but quote_status and quote_status_id are not contained in the tweet?

Tweepy tweets missing from search

This is an example of how my code in tweepy looks like:
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)
random = random.randint(1,1000)
for tweet in tweepy.Cursor(api.search, q='twitter', lang='en', result_type='recent').items():
if not (tweet.retweeted) and 'RT #' not in (tweet.text):
api.update_status('#' + tweet.user.screen_name + ' ' + str(random) + ': test', in_reply_to_status_id = tweet.id_str)
print('Replied to the tweet!')
sleep (900)
The code works but for some reason, after a while of running the code, my tweets go missing from the search. Before it goes missing from the search, it goes missing from the tweet I replied to. I don't really know why this is happening.
The Twitter search function is optimized to show recent tweets (and probably especially so when you have the arg result_type='recent'). The tweet still exists but it is not showing up in the search because it is no longer recent enough. If you go into the twitter browser I am sure the tweets and replies are still there (navigate to the user's timeline to find it easiest). Or, try removing the result_type='recent'.
Hope this helps.
Tweets go missing from search when you are either tweeting too much or someone reports your tweets too often. Twitter is just based off of algorithms that detect if you are tweeting the same thing over and over (spamming), following / unfollowing too quickly or retweeting too much. Give it a rest it should clear after 72 hours max. I believe the term is called “shadow banned”.

tweepy: finding the original author of a retweet

I'm kind of a python newb and stuck with tweepy here.
What I'm trying to do is bring in a bunch of user and tweet objects into a neo4j database with tweet and retweet relationships. My problem is in determining if a given status object is a retweet and if so the screen_name and id_str of the original author.
I can see the data if I print out tweet.retweets but I can't figure out how to get to it. tweepy's docs mention something about object models and for more information check out ModelsReference but google isn't helping me much here.
any help would be great, even just pointing me in the right direction. Thanks
Sample code
tweets=api.get_timeling(1234556)
twitter_user=api.get_user(123456)
for tweet in tweets:
neo4j_create_tweet_node(tweet)
if tweet.user.id == twitter_user.id:
create_tweet_relationship(twitter_user,tweet)
elif tweet.user.id != twitter_user.id:
create_retweet_relationship(twitter_user,tweet)
Suppose tweet is a retweet,
originalAuthorID = tweet.retweeted_status.user.id_str;
According to the Twitter API documentation:
Retweets can be distinguished from typical Tweets by the existence of a retweeted_status attribute. This attribute contains a representation of the original Tweet that was retweeted. Note that retweets of retweets do not show representations of the intermediary retweet, but only the original tweet.
I would use hasattr() to look for the presence of the retweeted_status atttibute in each Tweetpy tweet object.
The following code (where create_tweet_relationship() and create_retweet_relationship() are functions you have defined as in your example) seems like it should would work:
for tweet in tweets:
if hasattr(tweet, 'retweeted_status'):
create_tweet_relationship(tweet.retweeted_status.author, tweet)
else:
create_retweet_relationship(tweet.author, tweet)
If the tweet is a retweet of another, the original retweeted status is included in the JSON object in the field "retweeted_status". You'd get the user information there under the "user" field.

Categories