I am trying to gather the tweets of a user navalny, from 01.11.2017 to 31.01.2018 using tweepy. I have ids of the first and last tweets that I need, so I tried the following code:
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
t = api.user_timeline(screen_name='navalny', since_id = 933000445307518976, max_id = 936533580481814529)
However, the returned value is an empty list.
What is the problem here?
Are there any restrictions on the history of tweets that I can get?
What are possible solutions?
Quick answer:
Using Tweepy you can only retrieve the last 3200 tweets from the Twitter REST API for a given user.
Unfortunately the tweets you are trying to access are older than this.
Detailed answer:
I did a check using the code below:
import tweepy
from tweepy import OAuthHandler
def tweet_check(user):
"""
Scrapes a users most recent tweets
"""
# API keys and initial configuration
consumer_key = ""
consumer_secret = ""
access_token = ""
access_secret = ""
# Configure authentication
authorisation = OAuthHandler(consumer_key, consumer_secret)
authorisation.set_access_token(access_token, access_secret)
api = tweepy.API(authorisation)
# Requests most recent tweets from a users timeline
tweets = api.user_timeline(screen_name=user, count=2,
max_id=936533580481814529)
for tweet in tweets:
tid = tweet.id
print(tid)
twitter_users = ["#navalny"]
for twitter_user in twitter_users:
tweet_check(twitter_user)
This test returns nothing before 936533580481814529
Using a seperate script I scraped all 3200 tweets, the max Twitter will let you scrape and the youngest tweet id I can find is 943856915536326662
Seems like you have run into Twitter's tweet scraping limit for user timelines here.
Related
So I have written code to pull tweets on certain key words and send it to an excel document. I am trying to get it to work with the premium sandbox but cannot figure out how. Any insight?
I have:
-a developer account
-a registered application
-a developer environment set up
what else do I need to do to get this to work? Core code is as follows:
##import library
import os
import tweepy as tw
###variables###
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
##set values for keys
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
search_list = ["apples oranges -filter:retweets"]
search_words = search_list[sc]
date_since = "2020-07-01"
##set search words and search date limit
def gettweet():
tweets = tw.Cursor(api.search,
q=search_words,
lang="en", since=date_since,until="2020-07-16",tweet_mode="extended").items(50)
#finds tweets. can filter our retweets if wanted
#,until="2020-07-08"
all_tweets = [[tweet.user.screen_name, tweet.user.location, tweet.created_at, tweet.full_text] for tweet in tweets]
print(all_tweets)
#generates list containin username and location
gettweet()
From this I hope to return a dataframe containing tweets containing the keywords 'apples' or 'oranges'. I want these tweets to be from 30 days ago (hence my using the premium sandbox for this)
I'm trying to retrive Tweets that particular accounts has posted. I do use
user_timeline parameter from the tweepy library, but it includes also replies from the concrete Twitter user. Does anyone has a clue how to omit them?
Code:
import tweepy
consumer_key = key
consumer_secret = key
access_key = key
access_secret = key
def get_tweets(username):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#set count to however many tweets you want; twitter only allows 200 at once
number_of_tweets = 20
#get tweets
tweets = api.user_timeline(screen_name = username,count = number_of_tweets)
#create array of tweet information: username, tweet id, date/time, text
tweets_for_csv = [[username,tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in tweets]
print(str(tweets_for_csv))
Pass exclude_replies as a kwarg.
tweets = api.user_timeline(screen_name=username, count=number_of_tweets, exclude_replies=True)
See Twitters API documentation for a full list of kwargs you can pass.
I want to get the all of a user tweets from one Twitter user and so far this is what I came up with:
import twitter
import json
import sys
import tweepy
from tweepy.auth import OAuthHandler
CONSUMER_KEY = ''
CONSUMER_SECRET= ''
OAUTH_TOKEN=''
OAUTH_TOKEN_SECRET = ''
auth = twitter.OAuth(OAUTH_TOKEN,OAUTH_TOKEN_SECRET,CONSUMER_KEY,CONSUMER_SECRET)
twitter_api =twitter.Twitter(auth=auth)
print twitter_api
statuses = twitter_api.statuses.user_timeline(screen_name='#realDonaldTrump')
print [status['text'] for status in statuses]
Please ignore the unnecessary imports. One problem is that this only gets a user's recent tweets (or the first 20 tweets). Is it possible to get all of a users tweet? To my knowledge, the GEt_user_timeline (?) only allows a limit of 3200. Is there a way to get at least 3200 tweets? What am I doing wrong?
There's a few issues with your code, including some superfluous imports. Particularly, you don't need to import twitter and import tweepy - tweepy can handle everything you need. The particular issue you are running into is one of pagination, which can be handled in tweepy using a Cursor object like so:
import tweepy
# Consumer keys and access tokens, used for OAuth
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication
api = tweepy.API(auth)
for status in tweepy.Cursor(api.user_timeline, screen_name='#realDonaldTrump', tweet_mode="extended").items():
print(status.full_text)
The code below was provided to another user who was scraping the "friends" (not followers) list of a specific Twitter user. For some reason, I get an error when using "api.lookup_users". The error states "Too many terms specified in query". Ideally, I would like to scrape the followers and output a csv with the screen names (not ids). I would like their descriptions as well, but can do this in a separate step unless there is a suggestion for pulling both pieces of information. Below is the code that I am using:
import time
import tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
ids = []
for page in tweepy.Cursor(api.friends, screen_name="").pages():
ids.extend(page)
time.sleep(60)
print(len(ids))
users = api.lookup_users(user_ids=ids) #iterates through the list of users and prints them
for u in users:
print(u.screen_name)
From the error you get, it seems that you are putting too many ids at once in the api.lookup_users request. Try splitting you list of ids in smaller parts and make a request for each part.
import time
import tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
ids = []
for page in tweepy.Cursor(api.friends, screen_name="").pages():
ids.extend(page)
time.sleep(60)
print(len(ids))
idChunks = [ids[i:i + 300] for i in range(0, len(ids), 300)]
users = []
for idChunk in idChunks:
try:
users.append(api.lookup_users(user_ids=idChunk))
except tweepy.error.RateLimitError:
print("RATE EXCEDED. SLEEPING FOR 16 MINUTES")
time.sleep(16*60)
users.append(api.lookup_users(user_ids=idChunk))
for u in users:
print(u.screen_name)
print(u.description)
This code has not been tested, and does not writes the CSV, but it should help you getting past that error you were having. The size of 300 for the chunks is completely arbitrary, adjust it if it is too small (or too big).
I am creating a Twitter bot that will follow the creator of a given status.
I have the status ID (tweet ID), but I need to grab the user ID of the user who posted the tweet in order to follow them. How can I get this? I am using the Twyton package.
You should use the request statuses/show/:id as specified in the Twitter REST API
In Twython, you should call show_status like this:
from twython import Twython
# define application keys here
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
twitter = Twython(consumer_key, consumer_secret, access_token, access_token_secret)
status = twitter.show_status(id='tweet_id')
print status['user']['id_str']