How do I place multiple searched tweets into string - python

I have a program set up so it searches tweets based on the hashtag I give it and I can edit how many tweets to search and display but I can't figure out how to place the searched tweets into a string. this is the code I have so far
while True:
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = [status.text]
print tweet
when this is run it only outputs 1 tweet when it is set to search 2

Your code looks like there's nothing to break out of the while loop. One method that comes to mind is to set a variable to an empty list and then with each tweet, append that to the list.
foo = []
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = status.text
foo.append(tweet)
print foo
Of course, this will print a list. If you want a string instead, use the string join() method. Adjust the last line of code to look like this:
bar = ' '.join(foo)
print bar

Related

How do I extract tweets that mentions a specific word and/or phrase from the text?

So I want to know what people are saying about KFC, Popeyes, and ChickfilA's chicken sandwiches for a project. NOTE: I already have all the Twitter data I need.
I successfully extracted users and their screennames but have yet to figure out how to go a step further and figure out who mentioned 'sandwich' in their tweet.
I am pretty sure what this code is doing is extracting all users whose tweets are exactly 'sandwich'... I cannot figure out how to extract tweets where sandwich is just MENTIONED. I have researched and think i can do this task with the re.findall() or with the Tweepy library? Can anybody show me exactly what I need to do?
Here's what I've tried so far:
uniqueusers = {}
keyword = 'sandwich'
for tweetzipfile in tweetzipfiles:
zf = zipfile.ZipFile(tweetzipfile)
for i, obj in enumerate(zf.infolist()):
tweetjson = json.load(zf.open(obj))
userwhotweeted = tweetjson['user']['screen_name']
tweettext = tweetjson['text']
if tweettext == keyword:
if userwhotweeted in uniqueusers:
uniqueusers[userwhotweeted] += 1
if userwhotweeted not in uniqueusers:
uniqueusers[userwhotweeted] = 1
I would need more than that to test, but if you are looking for the hiccup, it's because you are searched for tweettext to equal the single word. That's why it is returning such.
You would need to do something like:
if keyword in tweettext:
if userwhotweeted in uniqueusers:
uniqueusers[userwhotweeted] += 1
elif userwhotweeted not in uniqueusers:
uniqueusers[userwhotweeted] = 1
else:
print("No Results")
Something to that extent.
You can also use a variation of this with .split() if you want to turn a block of text from the tweet into individual items in a list.
It will make it easier to work with the keywords.

How to remove #user, hashtag, and links from tweet text and put it into dataframe in python

I'm a begginer at python and I'm trying to gather data from twitter using the API. I want to gather username, date, and the clean tweets without #username, hashtags and links and then put it into dataframe.
I find a way to achieve this by using : ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split()) but when I implement it on my codes, it returns NameError: name 'tweet' is not defined
Here is my codes
tweets = tw.Cursor(api.search, q=keyword, lang="id", since=date).items()
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet] for tweet in tweets]
dataFrame = pd.DataFrame(data=data_tweet, columns=['user', "date", "tweet"])
I know the problem is in the data_tweet, but I don't know how to fix it. Please help me
Thank you.
The problem is actually in the second line:
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
Here, you are using tweet.text. However, you have not defined what tweet is yet, only tweets. Also, from reading your third line where you actually define tweet:
for tweet in tweets
I'm assuming you want tweet to be the value you get while iterating through tweets.
So what you have to do is to run both lines through an iterator together, assuming my earlier hypothesis is correct.
So:
for tweet in tweets:
raw_tweet = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet.text).split())
data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet]]
You can also use reg-ex to remove any words the start with '#' (usernames) or 'http' (links) in a pre-defined function and apply the function to the pandas data frame column
import re
def remove_usernames_links(tweet):
tweet = re.sub('#[^\s]+','',tweet)
tweet = re.sub('http[^\s]+','',tweet)
return tweet
df['tweet'] = df['tweet'].apply(remove_usernames_links)
If you encounter, "expected string or byte-like object error", then just use
import re
def remove_usernames_links(tweet):
tweet = re.sub('#[^\s]+','',str(tweet))
tweet = re.sub('http[^\s]+','',str(tweet))
return tweet
df['tweet'] = df['tweet'].apply(remove_usernames_links)
Credit: https://www.datasnips.com/59/remove-usernames-http-links-from-tweet-data/

Tweet Strings via Tweepy

I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed. Is there anyway that tweepy can create a thread of tweets when the content is too long? My tweepy code looks like this:
import tweepy
def get_api(cfg):
auth = tweepy.OAuthHandler(cfg['consumer_key'],
cfg['consumer_secret'])
auth.set_access_token(cfg['access_token'],
cfg['access_token_secret'])
return tweepy.API(auth)
def main():
# Fill in the values noted in previous step here
cfg = {
"consumer_key" : "VALUE",
"consumer_secret" : "VALUE",
"access_token" : "VALUE",
"access_token_secret" : "VALUE"
}
api = get_api(cfg)
tweet = "Hello, world!"
status = api.update_status(status=tweet)
# Yes, tweet is called 'status' rather confusing
if __name__ == "__main__":
main()
Your code isn't relevant to the problem you're trying to solve. Not only does main() not seem to take any arguments (tweet text?) but you don't show how you are currently trying approaching the matter. Consider the following code:
import random
TWEET_MAX_LENGTH = 280
# Sample Tweet Seed
tweet = """I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed."""
# Creates list of tweets of random length
tweets = []
for _ in range(10):
tweets.append(tweet * (random.randint(1, 10)))
# Print total initial tweet count and list of lengths for each tweet.
print("Initial Tweet Count:", len(tweets), [len(x) for x in tweets])
# Create a list for formatted tweet texts
to_tweet = []
for tweet in tweets:
while len(tweet) > TWEET_MAX_LENGTH:
# Take only first 280 chars
cut = tweet[:TWEET_MAX_LENGTH]
# Save as separate tweet to do later
to_tweet.append(cut)
# replace the existing 'tweet' variable with remaining chars
tweet = tweet[TWEET_MAX_LENGTH:]
# Gets last tweet or those < 280
to_tweet.append(tweet)
# Print total final tweet count and list of lengths for each tweet
print("Formatted Tweet Count:", len(to_tweet), [len(x) for x in to_tweet])
It's separated out as much as possible for ease-of-interpretation. The gist is that one could start with a list of text to be used as tweets. The variable TWEET_MAX_LENGTH defines where each tweet would be split to allow for multi-tweets.
The to_tweet list would contain each tweet, in the order of your initial list, expanded into multiple tweets of <= TWEET_MAX_LENGTH length strings.
You could use that list to feed into your actual tweepy function that posts. This approach is pretty willy-nilly and doesn't do any checks for maintaining sequence of split tweets. Depending on how you're implenting your final tweet functions, that might be an issue but also a matter for a separate question.

Joining Strings on New Lines Error Python

Almost there with this one!
Taking user input and removing any trailing punctuation and non-hashed words to spot trends in tweets. Don't ask!
tweet = input('Tweet: ')
tweets = ''
while tweet != '':
tweets += tweet
tweet = input('Tweet: ')
print (tweets) # only using this to spot where things are going wrong!
listed_tweets = tweets.lower().rstrip('\'\"-,.:;!?').split(' ')
hashed = []
for entry in listed_tweets:
if entry[0] == '#':
hashed.append(entry)
from collections import Counter
trend = Counter(hashed)
for item in trend:
print (item, trend[item])
Which works apart from that fact I get:
Tweet: #Python is #AWESOME!
Tweet: This is #So_much_fun #awesome
Tweet:
#Python is #AWESOME!This is #So_much_fun #awesome
#awesome!this 1
#python 1
#so_much_fun 1
#awesome 1
Instead of:
#so_much_fun 1
#awesome 2
#python 1
So I'm not getting a space at the end of each line of input and it's throwing my list!
It's probably very simple, but after 10hrs straight of self-teaching, my mind is mush!!
The problem is with this line:
tweets += tweet
You're taking each tweet and appending it to the previous one. Thus, the last word of the previous tweet gets joined with the first word of the current tweet.
There are various ways to solve this problem. One approach is to process the tweets one at a time. Start out with an empty array for your hashtags, then do the following in a loop:
read a line from the user
if the line is empty, break out of the loop
otherwise, extract the hashtags and add them to the array
return to step 1
The following code incorporates this idea and makes several other improvements. Notice how the interactive loop is written so that there's only one place in the code where we prompt the user for input.
hashtags = []
while True: # Read and clean each line of input.
tweet = input('Tweet: ').lower().rstrip('\'\"-,.:;!?')
if tweet == '': # Check for empty input.
break
print('cleaned tweet: '+tweet) # Review the cleaned tweet.
for word in tweet.split(): # Extract hashtags.
if word[0] == '#':
hashtags.append(word)
from collections import Counter
trend = Counter(hashtags)
for item in trend:
print (item, trend[item])
If you continue working on tweet processing, I suspect that you'll find that your tweet-cleaning process is inadequate. What if there is punctuation in the middle of a tweet, for example? You will probably want to embark on the study of regular expressions sooner or later.

Python extract top user name from json

I'm trying to get a sorted list or table of users from a loaded dict. I was able to print them as below but I couldn't figure out how to sort them in descending order according to the number of tweets the user name made in the sample. If I'm able to do that I might figure out how to track the to user as well. Thanks!
tweets = urllib2.urlopen("http://search.twitter.com/search.json?q=ECHO&rpp=100")
tweets_json = tweets.read()
data = json.loads(tweets_json)
for tweet in data['results']:
... print tweet['from_user_name']
... print tweet['to_user_name']
... print
tweets = data['results']
tweets.sort(key=lambda tw: tw['from_user_name'], reverse=True)
Assuming tw['from_user_name'] contains number of tweets from given username.
If tw['from_user_name'] contains username instead then:
from collections import Counter
tweets = data['results']
count = Counter(tw['from_user_name'] for tw in tweets)
tweets.sort(key=lambda tw: count[tw['from_user_name']], reverse=True)
To print top 10 usernames by number of tweets they send, you don't need to sort tweets:
print("\n".join(count.most_common(10)))

Categories