I'm trying to retrive Tweets that particular accounts has posted. I do use
user_timeline parameter from the tweepy library, but it includes also replies from the concrete Twitter user. Does anyone has a clue how to omit them?
Code:
import tweepy
consumer_key = key
consumer_secret = key
access_key = key
access_secret = key
def get_tweets(username):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#set count to however many tweets you want; twitter only allows 200 at once
number_of_tweets = 20
#get tweets
tweets = api.user_timeline(screen_name = username,count = number_of_tweets)
#create array of tweet information: username, tweet id, date/time, text
tweets_for_csv = [[username,tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in tweets]
print(str(tweets_for_csv))
Pass exclude_replies as a kwarg.
tweets = api.user_timeline(screen_name=username, count=number_of_tweets, exclude_replies=True)
See Twitters API documentation for a full list of kwargs you can pass.
Related
Lets say I have the id of a tweet which is a reply to another tweet. How to get the id of this parent tweet using tweepy in python.
First, use tweepy to get the tweet you have the id for. If it is a reply, its in_reply_to_status_id property will be set (otherwise it will be None).
import tweepy
from config import *
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
tweet_id = 1556276424662831104
tweet = api.get_status(tweet_id)
print(tweet.in_reply_to_status_id) # prints the parent status id
parent_tweet = tweet = api.get_status(tweet.in_reply_to_status_id)
print(parent_tweet)
print(parent_tweet.in_reply_to_status_id) # prints None
I am trying to retrieve tweets from Trump's twitter account with the Twitter API.
However, I am not getting the maximum amount of 3200 tweets with the code below. When I try another screenname I am getting the 3200 tweets. Now I'm only getting 100-200 tweets (it's different each time). The code I am using is as following:
import tweepy
import json
access_token = xxx
access_token_secret = xxx
consumer_key = xxx
consumer_secret = xxx
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
screen_name = "realdonaldtrump"
data = []
for tweets in tweepy.Cursor(api.user_timeline, screen_name = screen_name).pages():
for tweet in tweets:
print(tweet.text)
data.append(tweet._json)
filename = screen_name + "_tweets.json"
with open(filename, "w") as outfile:
json.dump(data, outfile)
I have a list of tweets Id more than 100 and I want to get all retweets Id for each tweet Id the code that I used is for one tweet Id how can I give the list of tweets Id and check if there is retweets for this tweet print the user ids
# import the module
import tweepy
# assign the values accordingly
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
# authorization of consumer key and consumer secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access to user's access key and access secret
auth.set_access_token(access_token, access_token_secret)
# calling the api
api = tweepy.API(auth)
# the ID of the tweet
ID = 1265889240300257280
# getting the retweeters
retweets_list = api.retweets(ID)
# printing the screen names of the retweeters
for retweet in retweets_list:
print(retweet.user.screen_name)
can anyone help me ?
For getting Retweets from a list of Tweets, you'll need to iterate over your list of Tweet IDs and call the api.retweets function for each one in turn.
If your Tweets themselves have more than 100 Retweets, you'll hit a limitation in the API.
Per the Tweepy documentation:
API.retweets(id[, count])
Returns up to 100 of the first retweets of the given tweet.
The Twitter API itself only supports retrieving up to 100 Retweets, see the API documentation (this is the same API that Tweepy is calling):
GET statuses/retweets/:id
Returns a collection of the 100 most recent retweets of the Tweet specified by the id parameter.
This works for me:
for retweet in retweets_list:
print (retweets_list.retweet_count)
So I have written code to pull tweets on certain key words and send it to an excel document. I am trying to get it to work with the premium sandbox but cannot figure out how. Any insight?
I have:
-a developer account
-a registered application
-a developer environment set up
what else do I need to do to get this to work? Core code is as follows:
##import library
import os
import tweepy as tw
###variables###
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
##set values for keys
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
search_list = ["apples oranges -filter:retweets"]
search_words = search_list[sc]
date_since = "2020-07-01"
##set search words and search date limit
def gettweet():
tweets = tw.Cursor(api.search,
q=search_words,
lang="en", since=date_since,until="2020-07-16",tweet_mode="extended").items(50)
#finds tweets. can filter our retweets if wanted
#,until="2020-07-08"
all_tweets = [[tweet.user.screen_name, tweet.user.location, tweet.created_at, tweet.full_text] for tweet in tweets]
print(all_tweets)
#generates list containin username and location
gettweet()
From this I hope to return a dataframe containing tweets containing the keywords 'apples' or 'oranges'. I want these tweets to be from 30 days ago (hence my using the premium sandbox for this)
I am trying to gather the tweets of a user navalny, from 01.11.2017 to 31.01.2018 using tweepy. I have ids of the first and last tweets that I need, so I tried the following code:
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
t = api.user_timeline(screen_name='navalny', since_id = 933000445307518976, max_id = 936533580481814529)
However, the returned value is an empty list.
What is the problem here?
Are there any restrictions on the history of tweets that I can get?
What are possible solutions?
Quick answer:
Using Tweepy you can only retrieve the last 3200 tweets from the Twitter REST API for a given user.
Unfortunately the tweets you are trying to access are older than this.
Detailed answer:
I did a check using the code below:
import tweepy
from tweepy import OAuthHandler
def tweet_check(user):
"""
Scrapes a users most recent tweets
"""
# API keys and initial configuration
consumer_key = ""
consumer_secret = ""
access_token = ""
access_secret = ""
# Configure authentication
authorisation = OAuthHandler(consumer_key, consumer_secret)
authorisation.set_access_token(access_token, access_secret)
api = tweepy.API(authorisation)
# Requests most recent tweets from a users timeline
tweets = api.user_timeline(screen_name=user, count=2,
max_id=936533580481814529)
for tweet in tweets:
tid = tweet.id
print(tid)
twitter_users = ["#navalny"]
for twitter_user in twitter_users:
tweet_check(twitter_user)
This test returns nothing before 936533580481814529
Using a seperate script I scraped all 3200 tweets, the max Twitter will let you scrape and the youngest tweet id I can find is 943856915536326662
Seems like you have run into Twitter's tweet scraping limit for user timelines here.