I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed. Is there anyway that tweepy can create a thread of tweets when the content is too long? My tweepy code looks like this:
import tweepy
def get_api(cfg):
auth = tweepy.OAuthHandler(cfg['consumer_key'],
cfg['consumer_secret'])
auth.set_access_token(cfg['access_token'],
cfg['access_token_secret'])
return tweepy.API(auth)
def main():
# Fill in the values noted in previous step here
cfg = {
"consumer_key" : "VALUE",
"consumer_secret" : "VALUE",
"access_token" : "VALUE",
"access_token_secret" : "VALUE"
}
api = get_api(cfg)
tweet = "Hello, world!"
status = api.update_status(status=tweet)
# Yes, tweet is called 'status' rather confusing
if __name__ == "__main__":
main()
Your code isn't relevant to the problem you're trying to solve. Not only does main() not seem to take any arguments (tweet text?) but you don't show how you are currently trying approaching the matter. Consider the following code:
import random
TWEET_MAX_LENGTH = 280
# Sample Tweet Seed
tweet = """I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed."""
# Creates list of tweets of random length
tweets = []
for _ in range(10):
tweets.append(tweet * (random.randint(1, 10)))
# Print total initial tweet count and list of lengths for each tweet.
print("Initial Tweet Count:", len(tweets), [len(x) for x in tweets])
# Create a list for formatted tweet texts
to_tweet = []
for tweet in tweets:
while len(tweet) > TWEET_MAX_LENGTH:
# Take only first 280 chars
cut = tweet[:TWEET_MAX_LENGTH]
# Save as separate tweet to do later
to_tweet.append(cut)
# replace the existing 'tweet' variable with remaining chars
tweet = tweet[TWEET_MAX_LENGTH:]
# Gets last tweet or those < 280
to_tweet.append(tweet)
# Print total final tweet count and list of lengths for each tweet
print("Formatted Tweet Count:", len(to_tweet), [len(x) for x in to_tweet])
It's separated out as much as possible for ease-of-interpretation. The gist is that one could start with a list of text to be used as tweets. The variable TWEET_MAX_LENGTH defines where each tweet would be split to allow for multi-tweets.
The to_tweet list would contain each tweet, in the order of your initial list, expanded into multiple tweets of <= TWEET_MAX_LENGTH length strings.
You could use that list to feed into your actual tweepy function that posts. This approach is pretty willy-nilly and doesn't do any checks for maintaining sequence of split tweets. Depending on how you're implenting your final tweet functions, that might be an issue but also a matter for a separate question.
Related
This code prints the IDs but also raises a TypeError
for tweet in client.search_recent_tweets(search_string):
for tweet_id in tweet:
print(tweet_id['id'])
Simply printing Tweet gives the following data
Response(data=[<Tweet id=#ID text='#text'>], includes={}, errors=[], meta={'newest_id': '#ID of first tweet', 'oldest_id': '#ID of last tweet', 'result_count': 10, 'next_token': '#Token no.'})
I basically want to extract the Tweet IDs
I don't understand how your double loop is supposed to work.
Anyway, you can see that the tweets are in the response.data, so simply iterate through it:
response = client.search_recent_tweets(search_string) # Get the API response
tweets = response.data # Tweets are the data
for tweet in tweets: # Iterate through the tweets
print(tweet.id) # You can now access their id
Here is documentation on how to get tweet fields. - https://docs.tweepy.org/en/stable/examples.html
I'm a begginer at python and I'm trying to gather data from twitter using the API. I want to gather username, date, and the clean tweets without #username, hashtags and links and then put it into dataframe.
I find a way to achieve this by using : ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split()) but when I implement it on my codes, it returns NameError: name 'tweet' is not defined
Here is my codes
tweets = tw.Cursor(api.search, q=keyword, lang="id", since=date).items()
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet] for tweet in tweets]
dataFrame = pd.DataFrame(data=data_tweet, columns=['user', "date", "tweet"])
I know the problem is in the data_tweet, but I don't know how to fix it. Please help me
Thank you.
The problem is actually in the second line:
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
Here, you are using tweet.text. However, you have not defined what tweet is yet, only tweets. Also, from reading your third line where you actually define tweet:
for tweet in tweets
I'm assuming you want tweet to be the value you get while iterating through tweets.
So what you have to do is to run both lines through an iterator together, assuming my earlier hypothesis is correct.
So:
for tweet in tweets:
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet]]
You can also use reg-ex to remove any words the start with '#' (usernames) or 'http' (links) in a pre-defined function and apply the function to the pandas data frame column
import re
def remove_usernames_links(tweet):
tweet = re.sub('#[^\s]+','',tweet)
tweet = re.sub('http[^\s]+','',tweet)
return tweet
df['tweet'] = df['tweet'].apply(remove_usernames_links)
If you encounter, "expected string or byte-like object error", then just use
import re
def remove_usernames_links(tweet):
tweet = re.sub('#[^\s]+','',str(tweet))
tweet = re.sub('http[^\s]+','',str(tweet))
return tweet
df['tweet'] = df['tweet'].apply(remove_usernames_links)
Credit: https://www.datasnips.com/59/remove-usernames-http-links-from-tweet-data/
I'm using Tweepy and its cursor to collect Tweets with certain search terms. My goal is to have two lists of words on two different topics, so e.g. list 1 with words about love and list 2 with words about health. I then want to search for tweets that each contain at least one word from list 1 and at least one word from list 2. My problem is that I can't even get a search running that only uses one list.
So I have the following code:
# extracting words from a csv-file
file_loc1 = "search_words/love.xlsx"
love_words = pd.read_excel(file_loc1, index_col=None, na_values=['NA'], usecols = "A", skiprows=11)
love_words = str(love_words['love'].values)
# converting the list to readable search terms (there are probably more elegant ways...)
love_words = love_words.lower()
love_words = love_words.replace("\r","")
love_words = love_words.replace("\n","")
love_words = love_words.replace("' '", " OR ")
love_words = love_words.replace("[", "")
love_words = love_words.replace("]", "")
love_words = love_words.replace("'", "")
search_words = love_words + " -filter:retweets"
date_since = "2020-01-01"
tweets = tw.Cursor(api.search,
q=search_words,
lang="en",
since=date_since).items(5000)
tweet_text = [tweet.text for tweet in tweets]
So I'm retrieving the words from a csv file and put them all into a string that in the end will look like this: word1 OR word2 OR word3 -filter:retweets.
If it's only two or three words, it seems to work and I'm getting a lot of tweets. But if I use more terms, I don't get any tweets. It seems like maybe the OR operator is not working the way I think it is... And in the end I would like to have the search like (love1 OR love2 OR love3 OR ...) AND (health1 OR health2 OR ...), so that I get tweets that contain one or more words from each of the two lists.
I hope that this explanation makes sense. Any suggestions? Thank you!
I have implemented Tweepy and found the OR operator not to be sufficient. What I do is a separate search for each keyword and collect all tweets:
tweet_list = []
for word in keyword_list:
tweets = api.search(word)
tweet_list.append(tweets)
Then, after I have all my tweets, I filter for if they contain the words I'm interested in.
This is not efficient, nor likely to be the best solution. But it works for me.
I have a program set up so it searches tweets based on the hashtag I give it and I can edit how many tweets to search and display but I can't figure out how to place the searched tweets into a string. this is the code I have so far
while True:
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = [status.text]
print tweet
when this is run it only outputs 1 tweet when it is set to search 2
Your code looks like there's nothing to break out of the while loop. One method that comes to mind is to set a variable to an empty list and then with each tweet, append that to the list.
foo = []
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = status.text
foo.append(tweet)
print foo
Of course, this will print a list. If you want a string instead, use the string join() method. Adjust the last line of code to look like this:
bar = ' '.join(foo)
print bar
I'm trying to get a sorted list or table of users from a loaded dict. I was able to print them as below but I couldn't figure out how to sort them in descending order according to the number of tweets the user name made in the sample. If I'm able to do that I might figure out how to track the to user as well. Thanks!
tweets = urllib2.urlopen("http://search.twitter.com/search.json?q=ECHO&rpp=100")
tweets_json = tweets.read()
data = json.loads(tweets_json)
for tweet in data['results']:
... print tweet['from_user_name']
... print tweet['to_user_name']
... print
tweets = data['results']
tweets.sort(key=lambda tw: tw['from_user_name'], reverse=True)
Assuming tw['from_user_name'] contains number of tweets from given username.
If tw['from_user_name'] contains username instead then:
from collections import Counter
tweets = data['results']
count = Counter(tw['from_user_name'] for tw in tweets)
tweets.sort(key=lambda tw: count[tw['from_user_name']], reverse=True)
To print top 10 usernames by number of tweets they send, you don't need to sort tweets:
print("\n".join(count.most_common(10)))