I'm trying to get a sorted list or table of users from a loaded dict. I was able to print them as below but I couldn't figure out how to sort them in descending order according to the number of tweets the user name made in the sample. If I'm able to do that I might figure out how to track the to user as well. Thanks!
tweets = urllib2.urlopen("http://search.twitter.com/search.json?q=ECHO&rpp=100")
tweets_json = tweets.read()
data = json.loads(tweets_json)
for tweet in data['results']:
... print tweet['from_user_name']
... print tweet['to_user_name']
... print
tweets = data['results']
tweets.sort(key=lambda tw: tw['from_user_name'], reverse=True)
Assuming tw['from_user_name'] contains number of tweets from given username.
If tw['from_user_name'] contains username instead then:
from collections import Counter
tweets = data['results']
count = Counter(tw['from_user_name'] for tw in tweets)
tweets.sort(key=lambda tw: count[tw['from_user_name']], reverse=True)
To print top 10 usernames by number of tweets they send, you don't need to sort tweets:
print("\n".join(count.most_common(10)))
Related
I have a list of revisions from a Wikipedia article that I queried like this:
import urllib
import re
def getRevisions(wikititle):
url = "https://en.wikipedia.org/w/api.php?action=query&format=xml&prop=revisions&rvlimit=500&titles="+wikititle
revisions = [] #list of all accumulated revisions
next = '' #information for the next request
while True:
response = urllib.request.urlopen(url + next).read() #web request
response = str(response)
revisions += re.findall('<rev [^>]*>', response) #adds all revisions from the current request to the list
cont = re.search('<continue rvcontinue="([^"]+)"', response)
if not cont: #break the loop if 'continue' element missing
break
next = "&rvcontinue=" + cont.group(1) #gets the revision Id from which to start the next request
return revisions
Which results in a list with each element being a rev Tag as a string:
['<rev revid="343143654" parentid="6546465" minor="" user="name" timestamp="2021-12-12T08:26:38Z" comment="abc" />',...]
How can I get generate a DF from this list
An "easy" way without using regex would be splitting the string and then parsing:
for rev_string in revisions:
rev_dict = {}
# Skipping the first and last as it's the tag.
attributes = rev_string.split(' ')[1:-1]
#Split on = and take each value as key and value and convert value to string to get rid of excess ""
for attribute in attributes:
key, value = attribute.split("=")
rev_dict[key] = str(value)
df = pd.DataFrame.from_dict(rev_dict)
This sample would create one dataframe per revision. If you would like to gather multiple reivsions in one dictionary then you handle unique attributes (I don't know if these are changing depending on wiki-document) and then after gathering all attributes in the dictionary you convert to a DataFrame.
Use output format of json then you can easily create data fram from Json
Example URL for JSON output
For json to dataframe help check out this stackoverflow query
any other solution if i have multiple revisions like
''''[, ]''''
I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed. Is there anyway that tweepy can create a thread of tweets when the content is too long? My tweepy code looks like this:
import tweepy
def get_api(cfg):
auth = tweepy.OAuthHandler(cfg['consumer_key'],
cfg['consumer_secret'])
auth.set_access_token(cfg['access_token'],
cfg['access_token_secret'])
return tweepy.API(auth)
def main():
# Fill in the values noted in previous step here
cfg = {
"consumer_key" : "VALUE",
"consumer_secret" : "VALUE",
"access_token" : "VALUE",
"access_token_secret" : "VALUE"
}
api = get_api(cfg)
tweet = "Hello, world!"
status = api.update_status(status=tweet)
# Yes, tweet is called 'status' rather confusing
if __name__ == "__main__":
main()
Your code isn't relevant to the problem you're trying to solve. Not only does main() not seem to take any arguments (tweet text?) but you don't show how you are currently trying approaching the matter. Consider the following code:
import random
TWEET_MAX_LENGTH = 280
# Sample Tweet Seed
tweet = """I'm using tweepy to automatically tweet a list of URLs. However if my list is too long (it can vary from tweet to tweet) I am not allowed."""
# Creates list of tweets of random length
tweets = []
for _ in range(10):
tweets.append(tweet * (random.randint(1, 10)))
# Print total initial tweet count and list of lengths for each tweet.
print("Initial Tweet Count:", len(tweets), [len(x) for x in tweets])
# Create a list for formatted tweet texts
to_tweet = []
for tweet in tweets:
while len(tweet) > TWEET_MAX_LENGTH:
# Take only first 280 chars
cut = tweet[:TWEET_MAX_LENGTH]
# Save as separate tweet to do later
to_tweet.append(cut)
# replace the existing 'tweet' variable with remaining chars
tweet = tweet[TWEET_MAX_LENGTH:]
# Gets last tweet or those < 280
to_tweet.append(tweet)
# Print total final tweet count and list of lengths for each tweet
print("Formatted Tweet Count:", len(to_tweet), [len(x) for x in to_tweet])
It's separated out as much as possible for ease-of-interpretation. The gist is that one could start with a list of text to be used as tweets. The variable TWEET_MAX_LENGTH defines where each tweet would be split to allow for multi-tweets.
The to_tweet list would contain each tweet, in the order of your initial list, expanded into multiple tweets of <= TWEET_MAX_LENGTH length strings.
You could use that list to feed into your actual tweepy function that posts. This approach is pretty willy-nilly and doesn't do any checks for maintaining sequence of split tweets. Depending on how you're implenting your final tweet functions, that might be an issue but also a matter for a separate question.
I have a program set up so it searches tweets based on the hashtag I give it and I can edit how many tweets to search and display but I can't figure out how to place the searched tweets into a string. this is the code I have so far
while True:
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = [status.text]
print tweet
when this is run it only outputs 1 tweet when it is set to search 2
Your code looks like there's nothing to break out of the while loop. One method that comes to mind is to set a variable to an empty list and then with each tweet, append that to the list.
foo = []
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = status.text
foo.append(tweet)
print foo
Of course, this will print a list. If you want a string instead, use the string join() method. Adjust the last line of code to look like this:
bar = ' '.join(foo)
print bar
I have a simple for :
for tweet in tweets.xpath("/tweets/tweet/content"):
count = count+1
print("Tweet n°%s" % count)
print("=> " + tweet.text)
print("===================================")
And I wanna know how can I do to create automatically a variable to get every tweets here in a different variable, if there are 30 tweets so 30 differents variables are automatically create.. I don't know if I'm clear but thanks for any help !
You shall use an array:
from lxml import etree
xmldoc = etree.parse("tweets.xml")
tweets = xmldoc.xpath("/tweets/tweet/content/text()")
To access any of the tweet texts, access them by an index:
print "first one", tweets[0]
print "last one", tweets[-1]
print "number of tweets", len(tweets)
I know, you asked for dynamically creating new variable names for the values,
but you would have to provide information about those dynamically created
variables to later processing.
On the other hand, if you consider tweets[1] a "variable name", you have the
solution you asked for.
EDIT: Code modified not to reuse variable tweets for multiple purposes.
In your case, you can do:
for count, tweet in enumerate(tweets.xpath("/tweets/tweet/content")):
print("Tweet n°%s" % count)
print("=> " + tweet.text)
print("===================================")
If you want to store the result, you can do:
tweets = dict()
for count, tweet in enumerate(tweets.xpath("/tweets/tweet/content")):
tweets[count] = tweet.text
Almost there with this one!
Taking user input and removing any trailing punctuation and non-hashed words to spot trends in tweets. Don't ask!
tweet = input('Tweet: ')
tweets = ''
while tweet != '':
tweets += tweet
tweet = input('Tweet: ')
print (tweets) # only using this to spot where things are going wrong!
listed_tweets = tweets.lower().rstrip('\'\"-,.:;!?').split(' ')
hashed = []
for entry in listed_tweets:
if entry[0] == '#':
hashed.append(entry)
from collections import Counter
trend = Counter(hashed)
for item in trend:
print (item, trend[item])
Which works apart from that fact I get:
Tweet: #Python is #AWESOME!
Tweet: This is #So_much_fun #awesome
Tweet:
#Python is #AWESOME!This is #So_much_fun #awesome
#awesome!this 1
#python 1
#so_much_fun 1
#awesome 1
Instead of:
#so_much_fun 1
#awesome 2
#python 1
So I'm not getting a space at the end of each line of input and it's throwing my list!
It's probably very simple, but after 10hrs straight of self-teaching, my mind is mush!!
The problem is with this line:
tweets += tweet
You're taking each tweet and appending it to the previous one. Thus, the last word of the previous tweet gets joined with the first word of the current tweet.
There are various ways to solve this problem. One approach is to process the tweets one at a time. Start out with an empty array for your hashtags, then do the following in a loop:
read a line from the user
if the line is empty, break out of the loop
otherwise, extract the hashtags and add them to the array
return to step 1
The following code incorporates this idea and makes several other improvements. Notice how the interactive loop is written so that there's only one place in the code where we prompt the user for input.
hashtags = []
while True: # Read and clean each line of input.
tweet = input('Tweet: ').lower().rstrip('\'\"-,.:;!?')
if tweet == '': # Check for empty input.
break
print('cleaned tweet: '+tweet) # Review the cleaned tweet.
for word in tweet.split(): # Extract hashtags.
if word[0] == '#':
hashtags.append(word)
from collections import Counter
trend = Counter(hashtags)
for item in trend:
print (item, trend[item])
If you continue working on tweet processing, I suspect that you'll find that your tweet-cleaning process is inadequate. What if there is punctuation in the middle of a tweet, for example? You will probably want to embark on the study of regular expressions sooner or later.