I want to store the IDs of some tweets using tweepy - python

This code prints the IDs but also raises a TypeError
for tweet in client.search_recent_tweets(search_string):
for tweet_id in tweet:
print(tweet_id['id'])
Simply printing Tweet gives the following data
Response(data=[<Tweet id=#ID text='#text'>], includes={}, errors=[], meta={'newest_id': '#ID of first tweet', 'oldest_id': '#ID of last tweet', 'result_count': 10, 'next_token': '#Token no.'})
I basically want to extract the Tweet IDs

I don't understand how your double loop is supposed to work.
Anyway, you can see that the tweets are in the response.data, so simply iterate through it:
response = client.search_recent_tweets(search_string) # Get the API response
tweets = response.data # Tweets are the data
for tweet in tweets: # Iterate through the tweets
print(tweet.id) # You can now access their id

Here is documentation on how to get tweet fields. - https://docs.tweepy.org/en/stable/examples.html

Related

I want to search a file of tweets to find the most popular hashtags used

For a python project I have been asked to collect tweets over a certain period of time about a certain topic. I now have a file with hundreds of tweets. How do I search for most popular hashtags in that file to create a word cloud?
Let us suppose that your corpus is stored as a list and all the special characters are already removed. I am using functions from sklearn
corpus = ['the text of your tweet','quote in it']
vectorizer = TfidfVectorizer(stop_words='english')
v = vectorizer.fit_transform(corpus)
names = vectorizer.get_feature_names()
dense = v.todense()
final_list = dense.tolist()
df = pd.DataFrame(final_list, columns=names)
Cloud = WordCloud(background_color="white", max_words=50).generate_from_frequencies(df.T.sum(axis=1))
I will suppose that you have the ID of each tweet
you need to send a GET request to this url
"
https://twitter.com/i/api/graphql/6n-3uwmsFr53-5z_w5FTVw/TweetDetail?variables=%7B%22focalTweetId<YOUR_TWEET_ID> with_rux_injections%22%3Afalse%2C%22includePromotedContent%22%3Atrue%2C%22withCommunity%22%3Atrue%2C%22withQuickPromoteEligibilityTweetFields%22%3Atrue%2C%22withBirdwatchNotes%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Atrue%2C%22__fs_responsive_web_like_by_author_enabled%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Atrue%2C%22__fs_interactive_text_enabled%22%3Atrue%2C%22__fs_responsive_web_uc_gql_enabled%22%3Afalse%2C%22__fs_responsive_web_edit_tweet_api_enabled%22%3Afalse%7D
"
Note: the url looks not good because of the line breaks but
hopefully you understood
and the very first kind of param is focalTweetId, which is the id of the tweet, this API call will return a data object where you'll find all infos about a tweet
const response = await fetch(url)
console.log(response.data.instructions[0].entries[0].content.itemContent.tweet_results.result.legacy.retweet_count)
I did this in JavaScript, so you can do it in python with
response = requests.get(url)
# ...
this will return the retweet_count, and there are a lot of other usefull data you can use

How to remove #user, hashtag, and links from tweet text and put it into dataframe in python

I'm a begginer at python and I'm trying to gather data from twitter using the API. I want to gather username, date, and the clean tweets without #username, hashtags and links and then put it into dataframe.
I find a way to achieve this by using : ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split()) but when I implement it on my codes, it returns NameError: name 'tweet' is not defined
Here is my codes
tweets = tw.Cursor(api.search, q=keyword, lang="id", since=date).items()
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet] for tweet in tweets]
dataFrame = pd.DataFrame(data=data_tweet, columns=['user', "date", "tweet"])
I know the problem is in the data_tweet, but I don't know how to fix it. Please help me
Thank you.
The problem is actually in the second line:
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
Here, you are using tweet.text. However, you have not defined what tweet is yet, only tweets. Also, from reading your third line where you actually define tweet:
for tweet in tweets
I'm assuming you want tweet to be the value you get while iterating through tweets.
So what you have to do is to run both lines through an iterator together, assuming my earlier hypothesis is correct.
So:
for tweet in tweets:
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet]]
You can also use reg-ex to remove any words the start with '#' (usernames) or 'http' (links) in a pre-defined function and apply the function to the pandas data frame column
import re
def remove_usernames_links(tweet):
tweet = re.sub('#[^\s]+','',tweet)
tweet = re.sub('http[^\s]+','',tweet)
return tweet
df['tweet'] = df['tweet'].apply(remove_usernames_links)
If you encounter, "expected string or byte-like object error", then just use
import re
def remove_usernames_links(tweet):
tweet = re.sub('#[^\s]+','',str(tweet))
tweet = re.sub('http[^\s]+','',str(tweet))
return tweet
df['tweet'] = df['tweet'].apply(remove_usernames_links)
Credit: https://www.datasnips.com/59/remove-usernames-http-links-from-tweet-data/

Tweet Strings via Tweepy

I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed. Is there anyway that tweepy can create a thread of tweets when the content is too long? My tweepy code looks like this:
import tweepy
def get_api(cfg):
auth = tweepy.OAuthHandler(cfg['consumer_key'],
cfg['consumer_secret'])
auth.set_access_token(cfg['access_token'],
cfg['access_token_secret'])
return tweepy.API(auth)
def main():
# Fill in the values noted in previous step here
cfg = {
"consumer_key" : "VALUE",
"consumer_secret" : "VALUE",
"access_token" : "VALUE",
"access_token_secret" : "VALUE"
}
api = get_api(cfg)
tweet = "Hello, world!"
status = api.update_status(status=tweet)
# Yes, tweet is called 'status' rather confusing
if __name__ == "__main__":
main()
Your code isn't relevant to the problem you're trying to solve. Not only does main() not seem to take any arguments (tweet text?) but you don't show how you are currently trying approaching the matter. Consider the following code:
import random
TWEET_MAX_LENGTH = 280
# Sample Tweet Seed
tweet = """I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed."""
# Creates list of tweets of random length
tweets = []
for _ in range(10):
tweets.append(tweet * (random.randint(1, 10)))
# Print total initial tweet count and list of lengths for each tweet.
print("Initial Tweet Count:", len(tweets), [len(x) for x in tweets])
# Create a list for formatted tweet texts
to_tweet = []
for tweet in tweets:
while len(tweet) > TWEET_MAX_LENGTH:
# Take only first 280 chars
cut = tweet[:TWEET_MAX_LENGTH]
# Save as separate tweet to do later
to_tweet.append(cut)
# replace the existing 'tweet' variable with remaining chars
tweet = tweet[TWEET_MAX_LENGTH:]
# Gets last tweet or those < 280
to_tweet.append(tweet)
# Print total final tweet count and list of lengths for each tweet
print("Formatted Tweet Count:", len(to_tweet), [len(x) for x in to_tweet])
It's separated out as much as possible for ease-of-interpretation. The gist is that one could start with a list of text to be used as tweets. The variable TWEET_MAX_LENGTH defines where each tweet would be split to allow for multi-tweets.
The to_tweet list would contain each tweet, in the order of your initial list, expanded into multiple tweets of <= TWEET_MAX_LENGTH length strings.
You could use that list to feed into your actual tweepy function that posts. This approach is pretty willy-nilly and doesn't do any checks for maintaining sequence of split tweets. Depending on how you're implenting your final tweet functions, that might be an issue but also a matter for a separate question.

How do I place multiple searched tweets into string

I have a program set up so it searches tweets based on the hashtag I give it and I can edit how many tweets to search and display but I can't figure out how to place the searched tweets into a string. this is the code I have so far
while True:
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = [status.text]
print tweet
when this is run it only outputs 1 tweet when it is set to search 2
Your code looks like there's nothing to break out of the while loop. One method that comes to mind is to set a variable to an empty list and then with each tweet, append that to the list.
foo = []
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = status.text
foo.append(tweet)
print foo
Of course, this will print a list. If you want a string instead, use the string join() method. Adjust the last line of code to look like this:
bar = ' '.join(foo)
print bar

Python extract top user name from json

I'm trying to get a sorted list or table of users from a loaded dict. I was able to print them as below but I couldn't figure out how to sort them in descending order according to the number of tweets the user name made in the sample. If I'm able to do that I might figure out how to track the to user as well. Thanks!
tweets = urllib2.urlopen("http://search.twitter.com/search.json?q=ECHO&rpp=100")
tweets_json = tweets.read()
data = json.loads(tweets_json)
for tweet in data['results']:
... print tweet['from_user_name']
... print tweet['to_user_name']
... print
tweets = data['results']
tweets.sort(key=lambda tw: tw['from_user_name'], reverse=True)
Assuming tw['from_user_name'] contains number of tweets from given username.
If tw['from_user_name'] contains username instead then:
from collections import Counter
tweets = data['results']
count = Counter(tw['from_user_name'] for tw in tweets)
tweets.sort(key=lambda tw: count[tw['from_user_name']], reverse=True)
To print top 10 usernames by number of tweets they send, you don't need to sort tweets:
print("\n".join(count.most_common(10)))

Categories