Tweepy: extended mode with api.search - python

I've written a simple script to get the most trending 300 tweets containing a specific hashtag.
for self._tweet in tweepy.Cursor(self._api.search,q=self._screen_name,count=300, lang="en").items(300):
self._csvWriter.writerow([self._tweet.created_at, self._tweet.text.encode('utf-8')])
It works well and it save the result to CSV but the tweets are truncated.
I modified the code like this, adding the twitter_mode=extended parameter:
for self._tweet in tweepy.Cursor(self._api.search,q=self._screen_name,count=300, lang="en", tweet_mode="extended").items(300):
self._csvWriter.writerow([self._tweet.created_at, self._tweet.text.encode('utf-8')])
But I got this exception:
AttributeError: 'Status' object has no attribute 'text
My question is: how can I save an complete tweet using a Cursor? (complete = not truncated)
Thanks in advance (and sorry, I'm a Tweepy newbie trying to learn as much as possible)

You're really close, do this instead:
for self._tweet in tweepy.Cursor(self._api.search,q=self._screen_name,count=300, lang="en", tweet_mode="extended").items(300):
self._csvWriter.writerow([self._tweet.created_at, self._tweet.full_text.encode('utf-8')])
Notice that I used full_text in self._tweet.full_text.encode('utf-8'), rather than just text. The text property is null when you use tweet_mode='extended' and the tweet appears in full_text instead.

Related

How to retrieve full text from tweet using twarc2

I am using twarc2 for retrieving tweets. The returned jsonl file has the following keys:
dict_keys(['text', 'conversation_id', 'entities', 'author_id', 'public_metrics', 'source', 'id', 'reply_settings', 'edit_history_tweet_ids', 'created_at', 'possibly_sensitive', 'lang', 'referenced_tweets', 'author', '__twarc'])
When I checked the value of data[0]['text'], it terminated with ... like below:
RT #Weather_West: "You may have heard that we have 12 years to fix everything. This is well-meaning nonsense, but it’s still nonsense. We h…
I am wondering how can I get the full text of the tweet. Apparently, twarc2 doesn't even return retweeted_status unlike tweepy which used to be helpful for retrieving the full text.
Actually, twarc2 csv auto-expands the tweets. So, instead of working with .jsonl, one can first convert to .csv and then one will be able to access the full text from the tweet.

Tweets get back from twitter api are not showing whole tweets

This first I am using python twitter tool. I have question about results get back from it. It seems they are omission of original tweets.
import twitter
api = twitter.Api(consumer_key='jyd2tcu**OHiIrfg',
consumer_secret='****t80qZeM4JYvV5V8UpB0fTtebPSsb0LUjI9kYSZbLTRn',
access_token_key='1***74372608-dfi5bz22RTKep7GF04lk6FnPSYBgnD',
access_token_secret='5gt0YIw***gwPca5RXiwMksg7GM4ACQtl4')
results = api.GetSearch(
raw_query="q=immigration%20&result_type=recent")
Text I got back is
Text='RT #ddale8: Fox is now showing Trump\'s comments at Cabinet. He begins the clip by saying he\'s "heard numbers as high as $275 billion" for h…')
It ends with "…", is it how twitter api works or is there a way i can get whole tweets instead?
thank you
Try passing tweet_mode="extended" to the twitter.Api constructor.
I believe that since the original tweet is greater than 140 chars, we need to inform the interface to expect this as it does not do this by default.

Reading a dictionary from within a dictionary

I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.
It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.
Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.
Here is the code I have so far:
import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it. This part works correctly.
with open("filelocation.txt") as source
for line in source:
if line.strip():
tweets.append(json.loads(line))
print(len(tweets)
df = pd.DataFrame.from_dict(tweets)
df.info()
When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.
I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.
My first thought was to try and read that specific column of the dataframe as a dictionary again using:
d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])
But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.
My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "
I also tried using the recommendation provided by this website. But this gave me a None value no matter what.
Thanks in advance!
I tried to do what #Dan D. suggested, however, this still gave me errors. But it gave me the idea to try this:
tweet[0]['extended_tweet']['full_text']
This works and gives me the value that I am looking for. But I need to run through the whole thing. So I tried this:
df['full'] = [tweet[i]['extended_tweet']['full_text'] for i in range(len(tweet))
This gives me "Key Error: 'extended_tweet' "
Does it seem like I am on the right track?
I would suggest to flatten out the dictionaries like this:
tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
tweets.append(tweet)
I don't know if the answer suggested earlier works. I never got that successfully. But I did figure out something else that works well for me.
What I really needed was a way to display the full text of a tweet. I first loaded the tweets from the json with what I posted above. Then I noticed that in the data file, there is something called truncated. If this value is true, the tweet is cut short and the full tweet is placed within the
tweet[i]['extended_tweet]['full_text]
In order to access it, I used this:
tweet_list = []
for i in range(len(tweets)):
if tweets[i]['truncated'] == 'True':
tweet_list.append(tweets[i]['extended_tweet']['full_text']
else:
tweet_list.append(tweets[i]['text']
Then I can work with the data using the whol text from each tweet.

tweepy: finding the original author of a retweet

I'm kind of a python newb and stuck with tweepy here.
What I'm trying to do is bring in a bunch of user and tweet objects into a neo4j database with tweet and retweet relationships. My problem is in determining if a given status object is a retweet and if so the screen_name and id_str of the original author.
I can see the data if I print out tweet.retweets but I can't figure out how to get to it. tweepy's docs mention something about object models and for more information check out ModelsReference but google isn't helping me much here.
any help would be great, even just pointing me in the right direction. Thanks
Sample code
tweets=api.get_timeling(1234556)
twitter_user=api.get_user(123456)
for tweet in tweets:
neo4j_create_tweet_node(tweet)
if tweet.user.id == twitter_user.id:
create_tweet_relationship(twitter_user,tweet)
elif tweet.user.id != twitter_user.id:
create_retweet_relationship(twitter_user,tweet)
Suppose tweet is a retweet,
originalAuthorID = tweet.retweeted_status.user.id_str;
According to the Twitter API documentation:
Retweets can be distinguished from typical Tweets by the existence of a retweeted_status attribute. This attribute contains a representation of the original Tweet that was retweeted. Note that retweets of retweets do not show representations of the intermediary retweet, but only the original tweet.
I would use hasattr() to look for the presence of the retweeted_status atttibute in each Tweetpy tweet object.
The following code (where create_tweet_relationship() and create_retweet_relationship() are functions you have defined as in your example) seems like it should would work:
for tweet in tweets:
if hasattr(tweet, 'retweeted_status'):
create_tweet_relationship(tweet.retweeted_status.author, tweet)
else:
create_retweet_relationship(tweet.author, tweet)
If the tweet is a retweet of another, the original retweeted status is included in the JSON object in the field "retweeted_status". You'd get the user information there under the "user" field.

How to get tweet IDs (since_id, max_id) in tweepy (python)?

I want to know a way of getting tweet IDs to keep a check on what tweets have been displayed in the timeline of user in the python app I am making using tweepy.
There doesn't seem to be a way I get extract the tweet IDs or keep track of them. The parameter to keep check is since_id. Please if anyone could help.
The tweepy library follows the twitter API closely. All attributes returned by that API are available on the result objects; so for status messages you need to look at the tweet object description to see they have an id parameter:
for status in api.user_timeline():
print status.id
Store the most recent id to poll for updates.
The max_id and since_id are parameters for the api.user_timeline() method.
Using the tweepy.Cursor() object might look something like this:
tweets = []
for tweet in tweepy.Cursor(api.user_timeline,
screen_name=<twitter_handle>,
since_id = <since_id>
).items(<count>):
tweets.append(tweet)

Categories