I am retrieving tweets using twarc2 with search terms in the following way:
twarc2 search --archive --start-time "2015-01-01" --end-time "2018-12-31" --limit 25000 "faith OR #faith" results.jsonl
But the resultant tweets are truncated after a certain length. E.g. RT #AndrewYNg: We cannot abdicate responsibility when two children, ages 7 and 8, die in US custody. The US once said: "Give me your tired,… although the tweet is a bit longer. I read the twarc2 documentation but can't find any "extended" tweet_mode option for retrieving the full_text. Any help on this will be appreciated.
Are all the retrieved tweets truncated? The example you provided is a retweet (includes "RT"). RT are truncated but their original tweet is full text.
You can exclude retweets in twarc2. Try adding the below to your command:
-is:retweet
Hope that helps.
Related
I am using twarc2 for retrieving tweets. The returned jsonl file has the following keys:
dict_keys(['text', 'conversation_id', 'entities', 'author_id', 'public_metrics', 'source', 'id', 'reply_settings', 'edit_history_tweet_ids', 'created_at', 'possibly_sensitive', 'lang', 'referenced_tweets', 'author', '__twarc'])
When I checked the value of data[0]['text'], it terminated with ... like below:
RT #Weather_West: "You may have heard that we have 12 years to fix everything. This is well-meaning nonsense, but it’s still nonsense. We h…
I am wondering how can I get the full text of the tweet. Apparently, twarc2 doesn't even return retweeted_status unlike tweepy which used to be helpful for retrieving the full text.
Actually, twarc2 csv auto-expands the tweets. So, instead of working with .jsonl, one can first convert to .csv and then one will be able to access the full text from the tweet.
This first I am using python twitter tool. I have question about results get back from it. It seems they are omission of original tweets.
import twitter
api = twitter.Api(consumer_key='jyd2tcu**OHiIrfg',
consumer_secret='****t80qZeM4JYvV5V8UpB0fTtebPSsb0LUjI9kYSZbLTRn',
access_token_key='1***74372608-dfi5bz22RTKep7GF04lk6FnPSYBgnD',
access_token_secret='5gt0YIw***gwPca5RXiwMksg7GM4ACQtl4')
results = api.GetSearch(
raw_query="q=immigration%20&result_type=recent")
Text I got back is
Text='RT #ddale8: Fox is now showing Trump\'s comments at Cabinet. He begins the clip by saying he\'s "heard numbers as high as $275 billion" for h…')
It ends with "…", is it how twitter api works or is there a way i can get whole tweets instead?
thank you
Try passing tweet_mode="extended" to the twitter.Api constructor.
I believe that since the original tweet is greater than 140 chars, we need to inform the interface to expect this as it does not do this by default.
The following code is returning None keywords:-
from rake_nltk import Rake
r=Rake()
testscenario='''This document is very important as it has a lot of business objectives mentioned in it.'''
defect='''Current day per security file is going to Bloomberg and we are getting data back from Bloomberg but it is not loading into the MarkIt tables. Last date on MarkIt tables for data loaded was June 29, 2016.BBG Run date for what is going into per security matcher is June 29th.See attached for screen shots.'''
print(r.extract_keywords_from_text(testscenario))
The output that I am getting is None.
The following code can be used. It worked for me.
from rake_nltk import Rake
r=Rake()
testscenario='This document is very important as it has a lot of business objectives mentioned in it.'
r.extract_keywords_from_text(testscenario)
print(r.get_ranked_phrases())
Reference: https://pypi.org/project/rake-nltk/
Refer to README of the package. It clearly describes what you need to get ranked phrases
r.extract_keywords_from_text(testscenario)
extracts the keywords from the given text. Use
r.get_ranked_phrases()
r.get_ranked_phrases_with_scores()
to get the ranked scores and their weights.
Readme link : https://github.com/csurfer/rake-nltk/blob/master/README.md
I'm searching for separate words used back to back in a tweets, but it's resulting with tweets that have both words in one tweet (although not used in the correct form --- e.g. " Apple Watch " comes back as something like "#JohnDoe - I watch tv and eat an apple")
Code I'm currently using is as followed:
live_stream.filter(track = ("apple watch"))
I've also tried:
live_stream.filter(track = ("\"apple watch\""))
Both have not worked for the task at hand. Thanks!
The Twitter API doesn't support exact matching in this manner, unfortunately.
https://dev.twitter.com/streaming/overview/request-parameters
Exact matching of phrases (equivalent to quoted phrases in most search engines) is not supported.
The way to do it is as suggested by rbierman - retrieve the tweets and drop the ones that don't have the exact phrase, e.g., something along the lines of:
search_track = "apple watch"
retained tweets = [tweet for tweet in retrieved_tweets if search_track in tweet]
For the purpose of making a sentiment summariser i require to read large number of tweets.I use the following code to fetch tweets from twitter.The number of tweets returned are just 10 to 20.What changes can be made in this code to increase the number of tweets to 100 or more
t.statuses.home_timeline()
raw_input(query)
data = t.search.tweets(q=query)
for i in range (len(data['statuses'])):
test = data['statuses'][i]['text']
print test
By default, it returns only 20 tweets. Use Count Parameter in your query. Here's statuses/home_timeline doc page.
So, below is the code to get 100 tweets. Also, it must be less than or equal to 200.
t.statuses.home_timeline(count=100)
Updated at 4.48 after getting output
I tried and got huge tweets in 50 & 100. Here's the code:
Save the below code as test.py. Create a new directory - Paste test.py & this latest Twitter 1.14.1 library in it - Click Terminal & go the path where you created your new directoy using cd path command - now run python test.py.
from twitter import *
t = Twitter(
auth=OAuth('OAUTH_TOKEN','OAUTH_SECRET',
'CONSUMER_KEY', 'CONSUMER_SECRET')
)
query = int(raw_input("Type how many tweets do you need:\n"))
x = t.statuses.home_timeline(count=query)
for i in range(query):
print x[i]['text']
There is a limit to the number of tweets an application can fetch in a single request. You need to iterate through the results to get more than what you are returned in a single request. Take a look at this article on the twitter developer site that explains how to work with iterating through the results.
Note that the number of results also depends on the query you are searching for.