I've been trying to figure out Tweepy for the last 3 hours and I'm still stuck.
I would like to be able to get all my friend's tweets for the period between Sept and Oct 2014, and have it be filtered by the top 10 number of retweets.
I'm only vaguely familiar with StreamListener, however, I think this does a list of tweets that are real time. I was wondering if I could go back last month and grab out those tweets from my friends. Can this be done through Tweepy? This is the code I have now.
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import csv
ckey = 'xyz'
csecret = 'xyz'
atoken = 'xyz'
asecret = 'xyz'
auth = OAuthHandler(ckey,csecret)
auth.set_access_token(atoken, a secret)
class Listener(StreamListener):
def on_data(self,data):
print data
return True
def on_error(self,status):
print status
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
tiwtterStream = Stream(aut, Listener())
users = [123456, 7890019, 9919038] # this is the list of friends I would like to output
twitterStream.filter(users, since=09-01-2014, until = 10-01-2014)
You are correct in that StreamListener returns real-time tweets. To get past tweets from specific users, you need to use tweepy's API wrapper--tweepy.API. An example, which would replace from your Listener class on down:
api = tweepy.API(auth)
tweetlist = api.user_timeline(id=123456)
This returns a list of up to 20 status objects. You can mess with the parameters to get more results, probably count and since will be helpful for your implementation. I think the most you can ask for with a single count is 200 tweets.
P.S. Not a major issue but you authenticate twice in your code which is not necessary.
Related
so I came up with this script to get the all of a user tweet from one twitter user
import tweepy
from tweepy import OAuthHandler
import json
def load_api():
Consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
return tweepy.API(auth)
api = load_api()
user = 'allkpop'
tweets = api.user_timeline(id=user, count=2000)
print('Found %d tweets' % len(tweets))
tweets_text = [t.text for t in tweets]
filename = 'tweets-'+user+'.json'
json.dump(tweets_text, open(filename, 'w'))
print('Saved to file:', filename)
but when I run it I can only get 200 tweets per request. Is there a way to get 2000 tweets or at least more than 2000 tweets?
please help me, thank you
The Twitter API has request limits. The one you're using corresponds to the Twitter statuses/user_timeline endpoint. The max number that you can get for this endpoint is documented as 3,200. Also note that there's a max number of requests in a 15-minute window, which might explain why you're only getting 2000, instead of the max. Here are a couple observations that might be interesting for you:
Documentation says that the max count is 200.
There's an include_rts (include retweets) parameter that might return more values. While it's part of the Twitter API, I can't see where Tweepy documents that.
You might try Tweepy Cursors to see if that will bring back more items.
Because of the 15 minute limits, you might be able to pause until the next 15 minute window to continue. That said, I don't know enough about your use case to know if this is practical or not.
How can I retrieve only my tweets with a stream? I test that but I don't see my tweets.
My first attempt:
streamingAPI = tweepy.streaming.Stream(auth, CustomStreamListener())
streamingAPI.userstream(_with='followings')
streamingAPI.filter()
My second attempt:
streamingAPI = tweepy.streaming.Stream(auth, CustomStreamListener())
streamingAPI.filter(follow= ['2466458114'])
Thanks a lot.
If you want stream only tweets on your user, you can use the following lines:
from tweepy import StreamListener
from tweepy import Stream
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
class CustomStreamListener(StreamListener):
def on_data(self, data):
print(data)
def on_error(self, status):
print(status)
if __name__ == '__main__':
listener = CustomStreamListener()
twitterStream = Stream(auth, listener)
twitterStream.filter(follow=['2466458114'])
In your question, you said that you can't see your tweets. I don't know if is clear or not but just to be sure, with streaming you can see only the "real time" tweets. So also with my code, if you don't tweet nothing, you don't see nothing.
UPDATE AFTER CHAT IN COMMENTS
Since Twitter Official API has the bother limitation of time constraints, you can't get older tweets than a week.
For this task I suggest you to use this great python library.
It allows to get how many tweets you want and wrote when you want.
As documentation says, you can simply use it in this way:
tweetCriteria = got.manager.TweetCriteria().setUsername('<user_without_#>').setSince("2015-05-01").setUntil("2015-09-30")
If you are using python2.X you can use got, instead if you are using python3.X you can use got3.
I prepare an example in Python3:
from getOldTweets import got3
tweetCriteria = got3.manager.TweetCriteria().setUsername('barackobama').setSince("2015-09-01").setUntil("2015-09-30")
tweets_list = got3.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets_list:
print(tweet.text)
Let me know.
I would like to mine tweets for two keywords for a specific period of time. I currently have the code below, but how do I add so it only mine tweets between two dates? (10/03/2016 - 10/07/2016) Thank you!
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "ENTER YOUR ACCESS TOKEN"
access_token_secret = "ENTER YOUR ACCESS TOKEN SECRET"
consumer_key = "ENTER YOUR API KEY"
consumer_secret = "ENTER YOUR API SECRET"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['python', 'javascript', 'ruby'])
You can't. Have a look at this question, that is the closest you can get.
The Twitter API does not allow to search by time. Trivially, what you can do is fetching tweets and looking at their timestamps afterwards in Python, but that is highly inefficient.
i want to crawling 10000 tweet in twitter, contain particular word with hashtag.
for example, contain hashtag, #love like this.
and crawl all hashtag in tweet.
for example, one tweet exist like this.
[i am sleepy #boring #tired #sleep]
and i want to crawl data, and see result like this.
"#boring" "#tired" "#sleep"
i expect to understand what i am saying.
i tried to crawl hashtag using twitter API for python.
but there are some error
my code following this:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "mytoken"
access_token_secret = "mytokenscret"
consumer_key = "consumerkey"
consumer_secret = "consumersecret"
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['#happy'])
when i run this code, there are popup like this.
how to i fix this, and crawl tweet's all hashtag that tweet contain particular hashtag.
i tried python 3.3.4 and windows 8.1 64x.
please help me.
thanks for read my question.
As it seems you are using Python 3.0+ and so you can't use print "Hello world", you need to use print("Hello world") so just change your print calls to have parentheses.
Found answer: I found the solution with the help of a friend. I forgot to try the json.loads() function - it worked with print json.loads(data)['text'].
Question:
I am attempting to experiment with Twitter's streaming API through Tweepy. I have gotten the example code to run, which filters the stream based on certain keywords and dumps the entire block of JSON information to stdout for each of these tweets.
Being new to JSON and the twitter API, I do not know how to extract a certain attribute - say, the name of the poster or the actual text of the tweet - into a string.
I have determined that the JSON output that goes to the stdout is a Unicode object, and I have no idea how to access the various elements within the JSON.
I am using Python 2.7.9 (should I upgrade to 3.x?) and Tweepy 3.3.0. The code below is a mostly-unmodified version of a random streaming API tutorial I found by Googling.
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
consumer_key = "IS A SECWET"
consumer_secret = "IS A SECWET"
access_token = "IS A SECWET"
access_token_secret = "IS A SECWET"
class StdOutListener(StreamListener):
def on_data(self, data):
print data # here I would like to print ONLY the tweet's text, not the entire JSON dump.
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['#testing'])
Here is the output in the terminal from one tweet:
{"created_at":"Thu Jul 02 18:59:13 +0000 2015","id":616682557896290306,"id_str":"616682557896290306","text":"I am #testing again","source":"\u003ca href=\"https:\/\/about.twitter.com\/products\/tweetdeck\" rel=\"nofollow\"\u003eTweetDeck\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":247344597,"id_str":"247344597","name":"Techniponi","screen_name":"techniponi","location":"Sugar Land, TX","url":"http:\/\/comeonandsl.am","description":"Internet Marketing Specialist for Wolf Beats, weekly DJ on PonyvilleFM (Sundays 4-5pm Central). I made music like a year ago. Skype wincam98","protected":false,"verified":false,"followers_count":110,"friends_count":187,"listed_count":3,"favourites_count":353,"statuses_count":806,"created_at":"Fri Feb 04 16:14:13 +0000 2011","utc_offset":-18000,"time_zone":"Central Time (US & Canada)","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/612795294728597504\/XISJ1ccp.png","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/612795294728597504\/XISJ1ccp.png","profile_background_tile":true,"profile_link_color":"3B94D9","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/612347971368148992\/Qeoo3RvD_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/612347971368148992\/Qeoo3RvD_normal.png","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/247344597\/1431372460","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"testing","indices":[5,13]}],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1435863553867"}
I found the solution with the help of a friend. I forgot to try the json.loads() function - it worked with print json.loads(data)['text'].