Why tweepy cannot retrieve media data of some tweets? - python

I'm using tweepy to develop the program that retrieves media urls and download them. While testing some tweets, I found something weird. So this is what I did:
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
status = api.get_status(908827394856947712)
The original url of this tweet is 'https://twitter.com/realDonaldTrump/status/908827394856947712' and this tweet DOES contain an image. While studying status._json, I figured out that the links of media files are contained in either status._json['entities'] or status._json['extended_entities'] but I couldn't find ['extended_entities'] and ['entities'] doesn't contain image link.
What makes me annoying is that some tweets have this problem and most do not. So why does this happen and how can I solve this problem?

If you take a look through the response, you will see "truncated": true,
Twitter recently changed how tweets are presented - see their documentation https://dev.twitter.com/overview/api/upcoming-changes-to-tweets
With your request, you need to set tweet_mode=extended
So: api.get_status('908827394856947712', tweet_mode='extended')

Related

twitter api retweet exclude

So i currently trying to mine tweets from Twitter account(s), but i wanted to exclude the retweets so i can get 200 of Tweets only data for my project. Currently I have a working code to mine the data feed, but still have Re-Tweets included. I have founded that to exclude Re-Tweets you need to put
-RT in the code but i simply do not know where since i am pretty new to programming.
(Currently using Twitter API for Python (Tweepy) with Python 3.6 using Spyder.)
import tweepy
from tweepy import OAuthHandler
import pandas as pd
consumer_key = 'consumer_key'
consumer_secret = 'consumer_secret'
access_token = 'access_token'
access_secret = 'access_secret'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
api = tweepy.API(auth)
screen_name='screen_name'
tweets = api.user_timeline(screen_name, count=200)
save=['']*len(tweets)
for i in range(len(tweets)):
save[i]=tweets[i].text
print(tweets[i].text)
data = pd.DataFrame(save)
data.to_csv("results.csv")
Can anyone help me, preferrably with complete section for the code to remove the Retweets. Thank you very much
Faced the same issue back when i was using tweepy to retrieve tweets from twitter, what worked for me was that i used the twitter's api with inbuilt request i.e. http requests.
To exclude retweets you could pass -RT operator in query parameter .
Documentation to this api .
Change this line in your code:
tweets = api.user_timeline(screen_name, count=200)
to the following:
tweets = api.user_timeline(screen_name, count=200, include_rts=False)
This Twitter doc may be helpful: https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html

Twitter user_timeline not returning enough tweets

I'm currently using the GET statuses/user_timeline twitter API in python to retrieve tweets, It says in the docs This method can only return up to 3,200 of a user’s most recent Tweets however when i run it, it only returns 100-150.
How do i get it to return more tweets than it is already?
Thanks in advance
You'll have to write code to work with timelines. Twitter's Working with timelines documentation has a discussion of how and why Twitter timelines work the way they do.
Essentially, set your count to the maximum amount (200) and use since_id and max_id to manage reading each page. Alternatively, you can use an existing library to make the task much easier. Here's an example, using the tweepy library.
consumer_key = "<your consumer_key>"
consumer_secret = "<your consumer_secret>"
access_token = "<your access_token>"
access_token_secret = "<your access_token_secret>"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
for status in tweepy.Cursor(api.user_timeline, "JoeMayo").items():
print('status_id: {}, text: {}'.format(status.id, status.text.encode('utf-8')))

Using Tweepy Documentation to find first x followers

Messing around with tweepy. I am creating a desktop application, no callbacks. I want to find the first x followers of a user. Here's what I have so far:
consumer_key='...'
consumer_secret='...'
access_token='...'
access_token_secret='...'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
Basically enough to authenticate my application. Reading the documentation, under the 'User methods' section, I see API.followers returns followers 100 at a time. Is there a way to restrict that to x? Like first 5 followers?
Thank you.
api.followers('screen_name')[0:5]
This will return the first 5 followers for the specified 'screen_name' profile.

Using tweepy to follow people tweeting a specific hashtag

This is one of my first python projects and I'm using Tweepy to trying to search for a specific hashtag and follow those people tweeting that hashtag. I don't understand why this doesn't work and I've tried to append followers to list but nothing either. I've read the tweepy docs and this is what I've come up with:
import tweepy
import time
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
for follower in tweepy.Cursor(api.search, q="#kenbone").items():
api.create_friendship(screen_name = follower)
print(follower)
The screen_name is part of the author attribute, this works for me
api.create_friendship(screen_name = follower.author.screen_name)
Want you are getting in your loop, the variable follower, is a user object with a huge lot of information about the user, and not just a name as you seem to believe. To get the screen name of a user object, use follower.screen_name:
api.create_friendship(screen_name = follower.screen_name)

Exclude retweets using tweepy's API.search function

I am using the tweepy library in python to search for tweets that contain a certain word. Retrieving all tweets results in a long list, which also includes a lot of retweets. I want to exclude these retweets. The following code works, but now each tweet is processed (also the retweets), which is not ideal considering the rate limit:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
query = 'test'
max_tweets = 100
for tweet in tweepy.Cursor(api.search, q=query).items(max_tweets):
jsontweet = json.dumps(tweet._json)
jsontweet = json.loads(jsontweet)
if not 'retweeted_status' in jsontweet:
print(tweet)
Is there a way in which I can specify within my search request to not include retweets? I found that I could include include_rts = False in my code in this post, but I do not know where, and whether it is also working for the API.search function. I was unable to find how to include this parameter in this function in the tweepy documentation.

Categories