Trouble with encoding special characters for tweepy - python

I am having a lot of trouble with encoding character '/' in order to use it in streaming twitter api with twython. When it is tried without the encoding, the 'EUR/USD' gives an error code 401. Note, with other search queries it works normally and does not produce this error.
I have tried doing this in a couple of ways.
First:
'EUR/USD'.replace('/','%2F')
but the search is not returning anything.
I also tried:
urllib.quote('EUR/USD', '')
and while the output with print is the same (EUR%2FUSD) the search is still not returning any results.
Finally I tried double encoding:
urllib.quote(urllib.quote('EUR/USD', ''),'')
where I get EUR%252FUSD but still no results.
Furthermore, when searching for just EURUSD, search does work properly but only when it is preceded by the symbol $ (e.g. $EURUSD) in the tweet itself.
In case the dollar sign is missing search also won't detect the tweet. (e.g. just EURUSD)
This is how it works:
querystring = 'EURUSD'
auth = tweepy.OAuthHandler('key','secret')
auth.set_access_token('key','secret')
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
pprint.pprint([status.user.name,removeNonAscii(status.text),status.lang])
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=querystring, languages=['en'])
Anyone has an idea of what might be going on here?

Related

Tweepy UnicodeEncodeError in streaming filter code

I have the following code, with which I want to collect Tweets geolocated to the UK, written in english and with the keywords regarding the topics "death" and "covid". I am still very new to all of this, so bear with me, the code is definetly not ideal. After some hours of streaming, I always get the message "UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f449' [...] character maps to " tracebacked to the stream.filter line (last line). First I thought it was because of all the strings, so I added the "u" before every string, but it didn't help.
class StdOutListener(StreamListener):
def on_status(self, status):
if (u'death' in status.text.lower() or u'dead' in status.text.lower() or u'decease' in status.text.lower()) and (u'corona' in status.text.lower() or u'covid' in status.text.lower()):
print(status)
return True
def on_error(self, status_code):
print(error)
if __name__ == '__main__':
mystreamlistener = StdOutListener()
#This handles Twitter authentification and the connection to Twitter Streaming API
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, mystreamlistener)
# stream filtered by location in United Kingdom
stream.filter(locations=[-6.38,49.87,1.77,55.81], languages=[u'en'])
Just a workound:
As only very few tweets seem to cause this error, I just wrapped the whole if-statement in the method on_status with a try/except.

How to get tweets data that contain multiple keywords

I'm trying to accumulate tweets data by using these typical codes. As you can see I attempt to track tweets containing 'UniversalStudios', 'Disneyland' OR 'Los Angeles'. But in fact what I really want to get are tweets that contain these keywords "UniversalStudios", "Disneyland" AND "LosAngeles" altogether. Can anyone tell me how to achieve that?
Thanks a lot in advance :)
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet = TextBlob(all_data["text"])
#Add the 'sentiment data to all_data
#all_data['sentiment'] = tweet.sentiment
#print(tweet)
#print(tweet.sentiment)
# Open json text file to save the tweets
with open('tweets.json', 'a') as tf:
# Write a new line
tf.write('\n')
# Write the json data directly to the file
json.dump(all_data, tf)
# Alternatively: tf.write(json.dumps(all_data))
return True
def on_error(self, status):
print (status)
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(languages = ['en'], track=['UniversalStudios','Disneyland', "LosAngeles"])
Twitter's API (see "track") mentions you need to have spaces between the phrases to mean ANDs (commas are ORs). I'm not sure how the library you're using handles it, but my bet would be:
track=['UniversalStudios Disneyland LosAngeles']
The quote from the docs:
By this model, you can think of commas as logical ORs, while spaces are equivalent to logical ANDs (e.g. ‘the twitter’ is the AND twitter, and ‘the,twitter’ is the OR twitter).

Python issue with saving the output

As a new user to Python I have hit an issue with the following code. Instead of only printing the results of Twitter search on the screen I need to save the file (ideally pipe-delimited which I don't yet know how to produce...). However the following code runs ok but doesn't create the Output.txt file. It did once and then never again. I am running it on Mac OS and ending the code with Ctrl+C (as I still don't know how to modify it only to return specific number of tweets). I thought that the issue might be related to Flush'ing but after trying to include the options from this post:Flushing issues none of them seemed to work (unless I did something wrong which is more than probable...)
import tweepy
import json
import sys
# Authentication details. To obtain these visit dev.twitter.com
consumer_key = 'xxxxxx'
consumer_secret = 'xxxxx'
access_token = 'xxxxx-xxxx'
access_token_secret = 'xxxxxxxx'
# This is the listener, resposible for receiving data
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
# Also, we convert UTF-8 to ASCII ignoring all bad characters sent by users
print '#%s: %s' % (decoded['user']['screen_name'], decoded['text'].encode('ascii', 'ignore'))
print ''
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
print "Showing all new tweets for #Microsoft"
stream = tweepy.Stream(auth, l)
stream.filter(track=['Microsoft'])
sys.stdout = open('Output.txt', 'w')
I think you would be much better off chaning StdOutListener and having it write to the file directly. Assigning sys.stdout to a file is... weird. This way, you can print things for debug output. Also note that file mode "w" will truncate the file when it's opened.
class TweepyFileListener(tweepy.StreamListener):
def on_data(self, data):
print "on_data called"
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
msg = '#%s: %s\n' % (
decoded['user']['screen_name'],
decoded['text'].encode('ascii', 'ignore'))
#you should really open the file in __init__
#You should also use a RotatingFileHandler or this guy will get massive
with open("Output.txt", "a") as tweet_log:
print "Received: %s\n" % msg
tweet_log.write(msg)

How to add a location filter to tweepy module

I have found the following piece of code that works pretty well for letting me view in Python Shell the standard 1% of the twitter firehose:
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['manchester united'])
How do I add a filter to only parse tweets from a certain location? Ive seen people adding GPS to other twitter related Python code but I cant find anything specific to sapi within the Tweepy module.
Any ideas?
Thanks
The streaming API doesn't allow to filter by location AND keyword simultaneously.
Bounding boxes do not act as filters for other filter parameters. For example
track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing
the term Twitter (even non-geo tweets) OR coming from the San Francisco area.
Source: https://dev.twitter.com/docs/streaming-apis/parameters#locations
What you can do is ask the streaming API for keyword or located tweets and then filter the resulting stream in your app by looking into each tweet.
If you modify the code as follows you will capture tweets in United Kingdom, then those tweets get filtered to only show those that contain "manchester united"
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key=""
access_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
if 'manchester united' in status.text.lower():
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(locations=[-6.38,49.87,1.77,55.81])
Juan gave the correct answer. I'm filtering for Germany only using this:
# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]
stream.filter(locations=GEOBOX_GERMANY)
This is a pretty crude box that includes parts of some other countries. If you want a finer grain you can combine multiple boxes to fill out the location you need.
It should be noted though that you limit the number of tweets quite a bit if you filter by geotags. This is from roughly 5 million Tweets from my test database (the query should return the %age of tweets that actually contain a geolocation):
> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
0.016668392651547598
So only 1.67% of my sample of the 1% stream include a geotag. However there's other ways of figuring out a user's location:
http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf
You can't filter it while streaming but you could filter it at the output stage, if you were writing the tweets to a file.
sapi.filter(track=['manchester united'],locations=['GPS Coordinates'])

How to close twitter stream

I have included code below that I'm using with Tweepy, a Twitter API library for Python. While I'm trying most approaches that I've found online, they've failed to close the connection or stop the stream. Is there any way to do so?
Inside my function
setTerms = s.split(',')
streaming_api = tweepy.Stream(auth=auth, listener=StreamListener(), timeout=60 )
if (s == '0'):
streaming_api.disconnect()
raise web.seeother('/dc')
print "Failed to see this"
try:
twt = streaming_api.filter(track=setTerms)
except:
streaming_api.disconnect()
#also cannot see this
raise web.seeother('/stream')
Here is the stream listener class
class StreamListener(tweepy.StreamListener):
def on_status(self, status):
try:
printer(status.text, status.created_at)
except Exception, e:
pass
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True
The first time you call stream.disconnect() (inside if (s == '0'):), you haven't called filter yet, so the stream will never be connected. The rest of your code should be correct, assuming you're using the latest version of tweepy. Note that the except block will almost never be called, as any errors that occur while the stream is running are passed to the on_error callback.

Categories