twitter api retweet exclude - python

So i currently trying to mine tweets from Twitter account(s), but i wanted to exclude the retweets so i can get 200 of Tweets only data for my project. Currently I have a working code to mine the data feed, but still have Re-Tweets included. I have founded that to exclude Re-Tweets you need to put
-RT in the code but i simply do not know where since i am pretty new to programming.
(Currently using Twitter API for Python (Tweepy) with Python 3.6 using Spyder.)
import tweepy
from tweepy import OAuthHandler
import pandas as pd
consumer_key = 'consumer_key'
consumer_secret = 'consumer_secret'
access_token = 'access_token'
access_secret = 'access_secret'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
api = tweepy.API(auth)
screen_name='screen_name'
tweets = api.user_timeline(screen_name, count=200)
save=['']*len(tweets)
for i in range(len(tweets)):
save[i]=tweets[i].text
print(tweets[i].text)
data = pd.DataFrame(save)
data.to_csv("results.csv")
Can anyone help me, preferrably with complete section for the code to remove the Retweets. Thank you very much

Faced the same issue back when i was using tweepy to retrieve tweets from twitter, what worked for me was that i used the twitter's api with inbuilt request i.e. http requests.
To exclude retweets you could pass -RT operator in query parameter .
Documentation to this api .

Change this line in your code:
tweets = api.user_timeline(screen_name, count=200)
to the following:
tweets = api.user_timeline(screen_name, count=200, include_rts=False)
This Twitter doc may be helpful: https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html

Related

Get old tweets by user using tweepy

I am trying to gather the tweets of a user navalny, from 01.11.2017 to 31.01.2018 using tweepy. I have ids of the first and last tweets that I need, so I tried the following code:
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
t = api.user_timeline(screen_name='navalny', since_id = 933000445307518976, max_id = 936533580481814529)
However, the returned value is an empty list.
What is the problem here?
Are there any restrictions on the history of tweets that I can get?
What are possible solutions?
Quick answer:
Using Tweepy you can only retrieve the last 3200 tweets from the Twitter REST API for a given user.
Unfortunately the tweets you are trying to access are older than this.
Detailed answer:
I did a check using the code below:
import tweepy
from tweepy import OAuthHandler
def tweet_check(user):
"""
Scrapes a users most recent tweets
"""
# API keys and initial configuration
consumer_key = ""
consumer_secret = ""
access_token = ""
access_secret = ""
# Configure authentication
authorisation = OAuthHandler(consumer_key, consumer_secret)
authorisation.set_access_token(access_token, access_secret)
api = tweepy.API(authorisation)
# Requests most recent tweets from a users timeline
tweets = api.user_timeline(screen_name=user, count=2,
max_id=936533580481814529)
for tweet in tweets:
tid = tweet.id
print(tid)
twitter_users = ["#navalny"]
for twitter_user in twitter_users:
tweet_check(twitter_user)
This test returns nothing before 936533580481814529
Using a seperate script I scraped all 3200 tweets, the max Twitter will let you scrape and the youngest tweet id I can find is 943856915536326662
Seems like you have run into Twitter's tweet scraping limit for user timelines here.

Twitter user_timeline not returning enough tweets

I'm currently using the GET statuses/user_timeline twitter API in python to retrieve tweets, It says in the docs This method can only return up to 3,200 of a user’s most recent Tweets however when i run it, it only returns 100-150.
How do i get it to return more tweets than it is already?
Thanks in advance
You'll have to write code to work with timelines. Twitter's Working with timelines documentation has a discussion of how and why Twitter timelines work the way they do.
Essentially, set your count to the maximum amount (200) and use since_id and max_id to manage reading each page. Alternatively, you can use an existing library to make the task much easier. Here's an example, using the tweepy library.
consumer_key = "<your consumer_key>"
consumer_secret = "<your consumer_secret>"
access_token = "<your access_token>"
access_token_secret = "<your access_token_secret>"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
for status in tweepy.Cursor(api.user_timeline, "JoeMayo").items():
print('status_id: {}, text: {}'.format(status.id, status.text.encode('utf-8')))

Is there a way to search for Top tweets with tweepy instead of latest tweets?

It seem tweepy's search function only accesses the latest tweets. Is there a way to switch it so it searches the top tweets of a given query? Is there a workaround to retrieve the popular tweets if search cannot do this? I'm running OS X and python 3.6.
Thanks in advance.
Along with your query topic, pass a result_type='popular' parameter to tweepy's search function. Although the tweepy documentation does not list this in its parameters, it is an available parameter in the Twitter Dev Docs.
popular_tweets = api.search(q='python', result_type='popular') # e.g. "python"
popular : return only the most popular results in the response.
This should work (Add your Keys)
import tweepy
consumer_key =""
consumer_secret =""
access_token =""
access_token_secret =""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = api.search(q='keyword', result_type='popular')

Getting whole user timeline of a Twitter user

I want to get the all of a user tweets from one Twitter user and so far this is what I came up with:
import twitter
import json
import sys
import tweepy
from tweepy.auth import OAuthHandler
CONSUMER_KEY = ''
CONSUMER_SECRET= ''
OAUTH_TOKEN=''
OAUTH_TOKEN_SECRET = ''
auth = twitter.OAuth(OAUTH_TOKEN,OAUTH_TOKEN_SECRET,CONSUMER_KEY,CONSUMER_SECRET)
twitter_api =twitter.Twitter(auth=auth)
print twitter_api
statuses = twitter_api.statuses.user_timeline(screen_name='#realDonaldTrump')
print [status['text'] for status in statuses]
Please ignore the unnecessary imports. One problem is that this only gets a user's recent tweets (or the first 20 tweets). Is it possible to get all of a users tweet? To my knowledge, the GEt_user_timeline (?) only allows a limit of 3200. Is there a way to get at least 3200 tweets? What am I doing wrong?
There's a few issues with your code, including some superfluous imports. Particularly, you don't need to import twitter and import tweepy - tweepy can handle everything you need. The particular issue you are running into is one of pagination, which can be handled in tweepy using a Cursor object like so:
import tweepy
# Consumer keys and access tokens, used for OAuth
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication
api = tweepy.API(auth)
for status in tweepy.Cursor(api.user_timeline, screen_name='#realDonaldTrump', tweet_mode="extended").items():
print(status.full_text)

How to use Tweepy set_access_token?

So I am a complete beginner to Tweepy, and I was trying to get started from their tutorial with the following code:
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
public_tweets = api.home_timeline()
for tweet in public_tweets:
print tweet.text
Before the OAuthHandler, I added the consumer_key, consumer_secret, etc as something like:
consumer_key = 'aslkjfdalskdjflsjlfsakdjflasjdkf'
consumer_secret = 'ldsjlfksajldjflasjdljflasjdf'
access_token = 'asldkfjasjdfalsjdflksajkfdlasd'
acess_token_secret = 'alskjdflksajdlfjsalfd'
Of course, with the proper values. however, when I run the whole example, I get nothing printed out. What am I doing wrong? I understand I am supposed to get a list of tweets back. Am I not supposed to pass the values in as a string? I tried looking for tweepy documentation regarding OAuthHandler but didn't really come up with anything.
Any help would be greatly appreciated, thanks!!
For anyone else that might have this problem, it was a very simple issue– I created a new account and didn't have anything in my timeline. Naturally, nothing was printing.

Categories