Python + Tweepy (STREAM API) - Parsing JSON output for certain values/objects - python

Found answer: I found the solution with the help of a friend. I forgot to try the json.loads() function - it worked with print json.loads(data)['text'].
Question:
I am attempting to experiment with Twitter's streaming API through Tweepy. I have gotten the example code to run, which filters the stream based on certain keywords and dumps the entire block of JSON information to stdout for each of these tweets.
Being new to JSON and the twitter API, I do not know how to extract a certain attribute - say, the name of the poster or the actual text of the tweet - into a string.
I have determined that the JSON output that goes to the stdout is a Unicode object, and I have no idea how to access the various elements within the JSON.
I am using Python 2.7.9 (should I upgrade to 3.x?) and Tweepy 3.3.0. The code below is a mostly-unmodified version of a random streaming API tutorial I found by Googling.
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
consumer_key = "IS A SECWET"
consumer_secret = "IS A SECWET"
access_token = "IS A SECWET"
access_token_secret = "IS A SECWET"
class StdOutListener(StreamListener):
def on_data(self, data):
print data # here I would like to print ONLY the tweet's text, not the entire JSON dump.
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['#testing'])
Here is the output in the terminal from one tweet:
{"created_at":"Thu Jul 02 18:59:13 +0000 2015","id":616682557896290306,"id_str":"616682557896290306","text":"I am #testing again","source":"\u003ca href=\"https:\/\/about.twitter.com\/products\/tweetdeck\" rel=\"nofollow\"\u003eTweetDeck\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":247344597,"id_str":"247344597","name":"Techniponi","screen_name":"techniponi","location":"Sugar Land, TX","url":"http:\/\/comeonandsl.am","description":"Internet Marketing Specialist for Wolf Beats, weekly DJ on PonyvilleFM (Sundays 4-5pm Central). I made music like a year ago. Skype wincam98","protected":false,"verified":false,"followers_count":110,"friends_count":187,"listed_count":3,"favourites_count":353,"statuses_count":806,"created_at":"Fri Feb 04 16:14:13 +0000 2011","utc_offset":-18000,"time_zone":"Central Time (US & Canada)","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/612795294728597504\/XISJ1ccp.png","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/612795294728597504\/XISJ1ccp.png","profile_background_tile":true,"profile_link_color":"3B94D9","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/612347971368148992\/Qeoo3RvD_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/612347971368148992\/Qeoo3RvD_normal.png","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/247344597\/1431372460","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"testing","indices":[5,13]}],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1435863553867"}

I found the solution with the help of a friend. I forgot to try the json.loads() function - it worked with print json.loads(data)['text'].

Related

twitter streaming API using python 3

Is there any method to fetch tweets over a specific span of time,using twitter streaming API,in python 3? I am working on a project to fetch tweets that are dated from April 2017 to June 2017.But all I get is the real time tweets.The following is my code in python 3.6:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
access_token = "####"
access_token_secret = "###"
consumer_key = "###"
consumer_secret = "####"
def on_data(self, data):
print (data)
return True
def on_error(self, status):
print (status)
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['earthquake','Mexico','2017'])
What changes shall I do in the above code?
Twitter's Streaming API returns only real-time tweets. So, the answer is no. Use Twitter's REST API -- specifically, the search/tweets endpoint -- to get historical tweets. But, that gets you only the last week's worth of tweets. To get the older tweets that you are interested in you will need to pay for Twitter's Enterprise service.

Retrieve tweets tweepy

How can I retrieve only my tweets with a stream? I test that but I don't see my tweets.
My first attempt:
streamingAPI = tweepy.streaming.Stream(auth, CustomStreamListener())
streamingAPI.userstream(_with='followings')
streamingAPI.filter()
My second attempt:
streamingAPI = tweepy.streaming.Stream(auth, CustomStreamListener())
streamingAPI.filter(follow= ['2466458114'])
Thanks a lot.
If you want stream only tweets on your user, you can use the following lines:
from tweepy import StreamListener
from tweepy import Stream
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
class CustomStreamListener(StreamListener):
def on_data(self, data):
print(data)
def on_error(self, status):
print(status)
if __name__ == '__main__':
listener = CustomStreamListener()
twitterStream = Stream(auth, listener)
twitterStream.filter(follow=['2466458114'])
In your question, you said that you can't see your tweets. I don't know if is clear or not but just to be sure, with streaming you can see only the "real time" tweets. So also with my code, if you don't tweet nothing, you don't see nothing.
UPDATE AFTER CHAT IN COMMENTS
Since Twitter Official API has the bother limitation of time constraints, you can't get older tweets than a week.
For this task I suggest you to use this great python library.
It allows to get how many tweets you want and wrote when you want.
As documentation says, you can simply use it in this way:
tweetCriteria = got.manager.TweetCriteria().setUsername('<user_without_#>').setSince("2015-05-01").setUntil("2015-09-30")
If you are using python2.X you can use got, instead if you are using python3.X you can use got3.
I prepare an example in Python3:
from getOldTweets import got3
tweetCriteria = got3.manager.TweetCriteria().setUsername('barackobama').setSince("2015-09-01").setUntil("2015-09-30")
tweets_list = got3.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets_list:
print(tweet.text)
Let me know.

Mine Tweets between two dates in Python

I would like to mine tweets for two keywords for a specific period of time. I currently have the code below, but how do I add so it only mine tweets between two dates? (10/03/2016 - 10/07/2016) Thank you!
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "ENTER YOUR ACCESS TOKEN"
access_token_secret = "ENTER YOUR ACCESS TOKEN SECRET"
consumer_key = "ENTER YOUR API KEY"
consumer_secret = "ENTER YOUR API SECRET"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['python', 'javascript', 'ruby'])
You can't. Have a look at this question, that is the closest you can get.
The Twitter API does not allow to search by time. Trivially, what you can do is fetching tweets and looking at their timestamps afterwards in Python, but that is highly inefficient.

twitter crawling hashtag with api for using python

i want to crawling 10000 tweet in twitter, contain particular word with hashtag.
for example, contain hashtag, #love like this.
and crawl all hashtag in tweet.
for example, one tweet exist like this.
[i am sleepy #boring #tired #sleep]
and i want to crawl data, and see result like this.
"#boring" "#tired" "#sleep"
i expect to understand what i am saying.
i tried to crawl hashtag using twitter API for python.
but there are some error
my code following this:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "mytoken"
access_token_secret = "mytokenscret"
consumer_key = "consumerkey"
consumer_secret = "consumersecret"
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['#happy'])
when i run this code, there are popup like this.
how to i fix this, and crawl tweet's all hashtag that tweet contain particular hashtag.
i tried python 3.3.4 and windows 8.1 64x.
please help me.
thanks for read my question.
As it seems you are using Python 3.0+ and so you can't use print "Hello world", you need to use print("Hello world") so just change your print calls to have parentheses.

Using Tweepy to print tweets just from your friends

I've been trying to figure out Tweepy for the last 3 hours and I'm still stuck.
I would like to be able to get all my friend's tweets for the period between Sept and Oct 2014, and have it be filtered by the top 10 number of retweets.
I'm only vaguely familiar with StreamListener, however, I think this does a list of tweets that are real time. I was wondering if I could go back last month and grab out those tweets from my friends. Can this be done through Tweepy? This is the code I have now.
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import csv
ckey = 'xyz'
csecret = 'xyz'
atoken = 'xyz'
asecret = 'xyz'
auth = OAuthHandler(ckey,csecret)
auth.set_access_token(atoken, a secret)
class Listener(StreamListener):
def on_data(self,data):
print data
return True
def on_error(self,status):
print status
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
tiwtterStream = Stream(aut, Listener())
users = [123456, 7890019, 9919038] # this is the list of friends I would like to output
twitterStream.filter(users, since=09-01-2014, until = 10-01-2014)
You are correct in that StreamListener returns real-time tweets. To get past tweets from specific users, you need to use tweepy's API wrapper--tweepy.API. An example, which would replace from your Listener class on down:
api = tweepy.API(auth)
tweetlist = api.user_timeline(id=123456)
This returns a list of up to 20 status objects. You can mess with the parameters to get more results, probably count and since will be helpful for your implementation. I think the most you can ask for with a single count is 200 tweets.
P.S. Not a major issue but you authenticate twice in your code which is not necessary.

Categories