how to get whole user timeline of a specific twitter user - python

so I came up with this script to get the all of a user tweet from one twitter user
import tweepy
from tweepy import OAuthHandler
import json
def load_api():
Consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
return tweepy.API(auth)
api = load_api()
user = 'allkpop'
tweets = api.user_timeline(id=user, count=2000)
print('Found %d tweets' % len(tweets))
tweets_text = [t.text for t in tweets]
filename = 'tweets-'+user+'.json'
json.dump(tweets_text, open(filename, 'w'))
print('Saved to file:', filename)
but when I run it I can only get 200 tweets per request. Is there a way to get 2000 tweets or at least more than 2000 tweets?
please help me, thank you

The Twitter API has request limits. The one you're using corresponds to the Twitter statuses/user_timeline endpoint. The max number that you can get for this endpoint is documented as 3,200. Also note that there's a max number of requests in a 15-minute window, which might explain why you're only getting 2000, instead of the max. Here are a couple observations that might be interesting for you:
Documentation says that the max count is 200.
There's an include_rts (include retweets) parameter that might return more values. While it's part of the Twitter API, I can't see where Tweepy documents that.
You might try Tweepy Cursors to see if that will bring back more items.
Because of the 15 minute limits, you might be able to pause until the next 15 minute window to continue. That said, I don't know enough about your use case to know if this is practical or not.

Related

Using Python and Tweepy - How to reply with set text each time a specific user tweets?

I am able to reply to a specific tweet by getting tweet IDs, but cannot get my configuration to do what I want it to do, which is to reply to every tweet from a specific user. I have that user's username and ID. Currently it appears to only be pulling one tweet, which I suspect has something to do with line 23's tweet.id. I guess what I'm looking for is a way to ensure that my bot replies every single time this user tweets. Here is my current code (sensitive info redacted)
from ast import For
import tweepy
api_key = "###############################################"
api_secret = "###############################################"
bearer_token = r"###############################################"
access_token = "###############################################"
access_token_secret = "###############################################"
client = tweepy.Client(bearer_token, api_key, api_secret, access_token, access_token_secret)
auth = tweepy.OAuth1UserHandler(api_key, api_secret, access_token, access_token_secret)
api = tweepy.API(auth)
toReply = "TwitterUsernameHere"
api = tweepy.API(auth)
tweets = api.user_timeline(screen_name = toReply, count=1)
for tweet in tweets:
api.update_status("#" + toReply + " Why? ", in_reply_to_status_id = tweet.id)
Assuming that you are following the Twitter automation rules (i.e. that you're only replying to Tweets that the user has opted-in for your app to reply to - otherwise your user account or app will be restricted)...
... your code currently checks the user's Timeline, and then replies to the most recent single Tweet (count=1 on the user_timeline call). You would need this to check for new Tweets in order to reply to different ones. You could store tweet.id somewhere and only reply to it when it changes.
Note that there are a few other things to tidy up:
from ast import For is not required
client = tweepy.Client targets the Twitter API v2 but the rest of the code uses Twitter API v1.1 (via tweepy.API)
bearer_token is unused in this code and will only work for a read operation in v1.1 of the API so you could remove it.

'API' object has no attribute 'search' using Tweepy

I am trying to scrape Twitter profiles for a project I am doing. I have the following code
from tweepy import OAuthHandler
import pandas as pd
"""I like to have my python script print a message at the beginning. This helps me confirm whether everything is set up correctly. And it's nice to get an uplifting message ;)."""
print("You got this!")
access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
tweets = []
count = 1
"""Twitter will automatically sample the last 7 days of data. Depending on how many total tweets there are with the specific hashtag, keyword, handle, or key phrase that you are looking for, you can set the date back further by adding since= as one of the parameters. You can also manually add in the number of tweets you want to get back in the items() section."""
for tweet in tweepy.Cursor(api.search, q="#BNonnecke", count=450, since='2020-02-28').items(50000):
print(count)
count += 1
try:
data = [tweet.created_at, tweet.id, tweet.text, tweet.user._json['screen_name'], tweet.user._json['name'], tweet.user._json['created_at'], tweet.entities['urls']]
data = tuple(data)
tweets.append(data)
except tweepy.TweepError as e:
print(e.reason)
continue
except StopIteration:
break
df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name', 'account_creation_date', 'urls'])
"""Add the path to the folder you want to save the CSV file in as well as what you want the CSV file to be named inside the single quotations"""
df.to_csv(path_or_buf = '/Users/Name/Desktop/FolderName/FileName.csv', index=False)
however, I keep getting the error "API" object has no attribute "search" from the line "for tweet in tweepy.Cursor(api.search, q="#BNonnecke", count=450, since='2020-02-28').items(50000):" I am not really sure why and don't know how to resolve this issue.
Thanks so much!
The latest version of Tweepy (v4 upwards) now has a search_tweets method instead of a search method. Check the documentation.
API.search_tweets(q, *, geocode, lang, locale, result_type, count, until, since_id, max_id, include_entities)
Also, read the comment in your code :-) The Search API has a 7 day history limit, so searching for Tweets since 2020-02-28 will only return Tweets posted in the 7 days before the date you run your code.

Python run script for every entry in a dictionary

I'm trying to write a simple python programme that uses the tweepy API for twitter and wget to retrieve the image link from a twitter post ID (Example: twitter.com/ExampleUsername/12345678), then download the image from the link. The actual programme works fine, but there is a problem. While it runs FOR every ID in the dictionary (if there are 2 IDs, it runs 2 times), it doesn't use every ID, so the script ends up looking at the last ID on the dictionary, then downloading the image from that same id however many times there is an ID in the dictionary. Does anyone know how to make the script run again for every ID?
tl;dr I want the programme to look at the first ID, grab its image link, download it, then do the same thing with the next ID until its done all of the IDs.
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import wget
#Twitter API credentials
consumer_key = "nice try :)"
consumer_secret = "nice try :)"
access_key = "nice try :)"
access_secret = "my, this joke is getting really redundant"
def get_all_tweets():
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
id_list = [1234567890, 0987654321]
# Hey StackOverflow, these are example ID's. They won't work as they're not real twitter ID's, so if you're gonna run this yourself, you'll want to find some twitter IDs on your own
# tweets = api.statuses_lookup(id_list)
for i in id_list:
tweets = []
tweets.extend(api.statuses_lookup(id_=id_list, include_entities=True))
for tweet in tweets:
spacefiller = (1+1)
# this is here so the loop runs, if it doesn't the app breaks
a = len(tweets)
print(tweet.entities['media'][0]['media_url'])
url = tweet.entities['media'][0]['media_url']
wget.download(url)
get_all_tweets()
Thanks,
~CS
I figured it out!
I knew that loop was being used for something...
I moved everything from a = len(tweets to wget.download(url) into the for tweet in tweets: loop, and removed the for i in id_list: loop.
Thanks to tdelany this programme works now! Thanks everyone!
Here's the new code if anyone wants it:
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import wget
#Twitter API credentials
consumer_key = "nice try :)"
consumer_secret = "nice try :)"
access_key = "nice try :)"
access_secret = "my, this joke is getting really redundant"
def get_all_tweets():
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
id_list = [1234567890, 0987654321]
# Hey StackOverflow, these are example ID's. They won't work as they're not real twitter ID's, so if you're gonna run this yourself, you'll want to find some twitter IDs on your own
tweets = []
tweets.extend(api.statuses_lookup(id_=id_list, include_entities=True))
for tweet in tweets:
a = len(tweets)
print(tweet.entities['media'][0]['media_url'])
url = tweet.entities['media'][0]['media_url']
wget.download(url)
get_all_tweets()
One strange thing I see is that the variable i declared in the outer loop is never used after on. Shouldn't your code be
tweets.extend(api.statuses_lookup(id_=i, include_entities=True))
and not id_=id_list as you wrote?

Random sampling tweets with tweepy

I'm trying to analyze tweets that have the hashtag #contentmarketing. I first tried grabbing 20,000 tweets with tweepy but ran into the rate limit. So I'd like to take a random sample instead (or a couple random samples).
I'm not really familiar with random sampling through an API call. If I had an array that already contained the data, I would take random indices from that array without replacement. However, I don't think I can create that array in the first place without the rate limit kicking in.
Can anyone enlighten me on how to access random tweets (or random data from an API, overall)?
For reference, here's the code that got me in rate limit purgatory:
import tweepy
from tweepy import OAuthHandler
consumerKey = 'my-key'
consumerSecret = 'my-key'
accessToken = 'my-key'
accessSecret = 'my-key'
auth = OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessSecret)
api = tweepy.API(auth)
tweets = []
for tweet in tweepy.Cursor(api.search, q='#contentmarketing', count=20000,
lang='en', since='2017-06-20').items():
tweets.append(tweet)
with open('content-tweets.json', 'w') as f:
json.dump(tweets, f, sort_keys=True, indent=4)
This should stop the rate limit from kicking in, just make the following changes to your code:
api = tweepy.API(auth, wait_on_rate_limit=True)
I ever heared about getting random tweets. But you can get "forever" tweets and not all of them, so this is quite the same.
With the public search API, you can do 450 requests within 15 minutes (app auth). So you can ask for 100 tweets every 2 seconds. This is never ended.
Then change the "count" parameter to 100, and add a time.sleep(2) :
import time
for tweet in tweepy.Cursor(api.search, q='#contentmarketing', count=100, lang='en', since='2017-06-20').items():
tweets.append(tweet)
time.sleep(2)
Reference : https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html

Managing Tweepy API Search

Please forgive me if this is a gross repeat of a question previously answered elsewhere, but I am lost on how to use the tweepy API search function. Is there any documentation available on how to search for tweets using the api.search() function?
Is there any way I can control features such as number of tweets returned, results type etc.?
The results seem to max out at 100 for some reason.
the code snippet I use is as follows
searched_tweets = self.api.search(q=query,rpp=100,count=1000)
I originally worked out a solution based on Yuva Raj's suggestion to use additional parameters in GET search/tweets - the max_id parameter in conjunction with the id of the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError.
However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor (see tweepy Cursor tutorial for more on using Cursor).
The following code fetches the most recent 1000 mentions of 'python'.
import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweets with the following:
searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
count = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
if not new_tweets:
break
searched_tweets.extend(new_tweets)
last_id = new_tweets[-1].id
except tweepy.TweepError as e:
# depending on TweepError.code, one may want to retry or wait
# to keep things simple, we will give up on an error
break
There's a problem in your code. Based on Twitter Documentation for GET search/tweets,
The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was
formerly the "rpp" parameter in the old Search API.
Your code should be,
CONSUMER_KEY = '....'
CONSUMER_SECRET = '....'
ACCESS_KEY = '....'
ACCESS_SECRET = '....'
auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
search_results = api.search(q="hello", count=100)
for i in search_results:
# Do Whatever You need to print here
The other questions are old and the API changed a lot.
Easy way, with Cursor (see the Cursor tutorial). Pages returns a list of elements (You can limit how many pages it returns. .pages(5) only returns 5 pages):
for page in tweepy.Cursor(api.search, q='python', count=100, tweet_mode='extended').pages():
# process status here
process_page(page)
Where q is the query, count how many will it bring for requests (100 is the maximum for requests) and tweet_mode='extended' is to have the full text. (without this the text is truncated to 140 characters) More info here. RTs are truncated as confirmed jaycech3n.
If you don't want to use tweepy.Cursor, you need to indicate max_id to bring the next chunk. See for more info.
last_id = None
result = True
while result:
result = api.search(q='python', count=100, tweet_mode='extended', max_id=last_id)
process_result(result)
# we subtract one to not have the same again.
last_id = result[-1]._json['id'] - 1
I am working on extracting twitter data for around a location (in here, around India), for all tweets which include a special keyword or a list of keywords.
import tweepy
import credentials ## all my twitter API credentials are in this file, this should be in the same directory as is this script
## set API connection
auth = tweepy.OAuthHandler(credentials.consumer_key,
credentials.consumer_secret)
auth.set_access_secret(credentials.access_token,
credentials.access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True) # set wait_on_rate_limit =True; as twitter may block you from querying if it finds you exceeding some limits
search_words = ["#covid19", "2020", "lockdown"]
date_since = "2020-05-21"
tweets = tweepy.Cursor(api.search, =search_words,
geocode="20.5937,78.9629,3000km",
lang="en", since=date_since).items(10)
## the geocode is for India; format for geocode="lattitude,longitude,radius"
## radius should be in miles or km
for tweet in tweets:
print("created_at: {}\nuser: {}\ntweet text: {}\ngeo_location: {}".
format(tweet.created_at, tweet.user.screen_name, tweet.text, tweet.user.location))
print("\n")
## tweet.user.location will give you the general location of the user and not the particular location for the tweet itself, as it turns out, most of the users do not share the exact location of the tweet
Results:
created_at: 2020-05-28 16:48:23
user: XXXXXXXXX
tweet text: RT #Eatala_Rajender: Media Bulletin on status of positive cases #COVID19 in Telangana. (Dated. 28.05.2020)
# TelanganaFightsCorona
# StayHom…
geo_location: Hyderabad, India
You can search the tweets with specific strings as showed below:
tweets = api.search('Artificial Intelligence', count=200)

Categories