Twitter Streaming in Python: cp949 codec - python

I am currently using tweepy to gather data using Streaming API.
Here is my code and I ran this on Acaconda command prompt. When streaming starts, it returns tweets and then after giving few tweets it gives the following error:
Streaming Started ...
RT #ish10040: Crack Dealer Released Early From Prison By Obama Murders Woman And Her 2 Young Kids… Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Users\Jae Hee\Anaconda2\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "C:\Users\Jae Hee\Anaconda2\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Users\Jae Hee\Anaconda2\lib\site-packages\tweepy\streaming.py", line 294, in _run
raise exception
UnicodeEncodeError: 'cp949' codec can't encode character u'\xab' in position 31: illegal multibyte sequence
I believe that it has to do with encoding so I used chcp 65001 to deal with this issue but it does not give the solution!
Here is the code
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text)
def on_error(self, status_code):
#returning False in on_data disconnects the stream
if status_code == 420:
return False
def main():
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener = myStreamListener)
print "Streaming Started ..."
try:
myStream.filter(track=['Obama'], async = True)
except:
print "error!"
myStream.disconnect()
if __name__ == '__main__':
main()

All text produced and accepted through the twitter API should be encoded as UTF-8, so your code should be using that codec to decode what's coming back.
See here: https://dev.twitter.com/overview/api/counting-characters

Related

Twython Not capturing errors

I am using Twython to capture a stream of tweets from a group of users. I worked for an hour or so quite well (just a few tweets) and then crashed with an HTTP error IncompleteRead. I saw this discussed in a few posts but never resolved.
Is there any way to capture this error so it does not crash the program?
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\response.py", line
331, in _error_catcher
yield
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\response.py",
line 640, in read_chunked
chunk = self._handle_chunk(amt)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\response.py",
line 586, in _handle_chunk
value = self._fp._safe_read(amt)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 612, in
_safe_read
raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(0 bytes read, 1 more expected)
My code is simple and I see no other options to trap these errors.
from twython import TwythonStreamer
CONSUMER_KEY = '...'
CONSUMER_SECRET = '...'
# Access:
ACCESS_TOKEN = '...'
ACCESS_SECRET = '....'
class MyStreamer(TwythonStreamer):
def on_success(self, data):
if 'text' in data:
if not data['text'].startswith('RT') and not
data['text'].startswith('#'):
print(data['text'])
def on_error(self, status_code, data):
print(status_code)
self.disconnect()
stream = MyStreamer(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN,
ACCESS_SECRET)
# follow is watching for tweets from a user or list of users
users = [25073877, 19905457,1058764970010308611,251918778]
stream.statuses.filter(follow=users, language = 'en')

python script execution failed due to tweepy error 401

I'm using below code to streaming tweets and analyse them for making decisions. while running the below code I got an error. that error occurs twitter users those who had the friend list of more than 50.
import re
import tweepy
import sys
import time
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
users = tweepy.Cursor(api.friends, screen_name='#myuser').items()
while True:
try:
user = next(users)
except tweepy.TweepError:
time.sleep(60*15)
user = next(users)
except StopIteration:
break
for status in tweepy.Cursor(api.user_timeline,screen_name=user.screen_name,result_type='recent').items(5):
text=status._json['text'].translate(non_bmp_map)
print (user.screen_name + ' >>>>>> '+text)
while executing this script I have got an error as below.
Traceback (most recent call last):
File "D:sensitive2demo.py", line 31, in <module>
for status in tweepy.Cursor(api.user_timeline,screen_name=user.screen_name,result_type='recent').items(5):
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tweepy-3.6.0-py3.6.egg\tweepy\cursor.py", line 49, in __next__
return self.next()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tweepy-3.6.0-py3.6.egg\tweepy\cursor.py", line 197, in next
self.current_page = self.page_iterator.next()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tweepy-3.6.0-py3.6.egg\tweepy\cursor.py", line 108, in next
data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tweepy-3.6.0-py3.6.egg\tweepy\binder.py", line 250, in _call
return method.execute()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tweepy-3.6.0-py3.6.egg\tweepy\binder.py", line 234, in execute
raise TweepError(error_msg, resp, api_code=api_error_code)
tweepy.error.TweepError: Twitter error response: status code = 401
I have googled a lot.but nothing worked. Can somebody help me to solve the problem?
401 is an http status code for 'Unauthorized'. I would suggest verifying your credentials.

python - Traceback (most recent call last): ...

I try to collecting data from Twitter with Python and tweepy.
My code is :
import tweepy
consumer_key="..."
consumer_secret="..."
access_key = "..."
access_secret = "..."
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print (status.text)
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['capital'], async=True)
In my console, python return :
RT #gucamo74: #rubenuria eso es muy difícil. El Sevilla no tiene el halo protector arbitral de los equipos de la capital y del Barcelona.
RT #TonySantanaZA: #woznyjs On Macro scale, we failed 2 create Corporate stability, 4 investing Companies. Hence Capital flight,2 other mor… "Pour ne pas se faire rouler"...... #Capital
So it's fine but after few tweets he show me this message :
Exception in thread Thread-10:
Traceback (most recent call last):
File "//anaconda/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "//anaconda/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "//anaconda/lib/python3.5/site-packages/tweepy/streaming.py", line 286, in _run
raise
RuntimeError: No active exception to reraise
Do you have an idea why? I want that the steam doesn't stop before I ask him.
I found the solution on this post :
Unraised exception using Tweepy and MySQL
By checking tweepy/streaming.py at https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py it seems there is a bug in tweepy in the way exceptions are handled, spefically
if exception:
# call a handler first so that the exception can be logged.
self.listener.on_exception(exception)
raise
This raise should be raise exception
That's magic but It work...

'charmap' codec can't encode characters

I'm using tweepy and get this error when printing tweet messages on the screen (Windows).
#!/usr/bin/env python
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
#consumer key, consumer secret, access token, access secret.
ckey = 'xyz'
csecret = 'xyz'
atoken = 'xyz'
asecret = 'xyz'
class Listener(StreamListener):
def on_data(self, data):
print json.loads(data)['text']
return True
def on_error(self, status):
print status
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, Listener())
twitterStream.filter(track=['#hash1', '#hash2'], languages=['en'])
> Traceback (most recent call last): File
> "C:....twitterSentiment.py",
> line 34, in <module>
> twitterStream.filter(track=['#hash1', '#hash2'], languages=['en']) File
> line 430, in filter
> self._start(async) File "C:......streaming.py",
> line 346, in _start
> self._run() File "C:.....streaming.py",
> line 286, in _run
> raise exception UnicodeEncodeError: 'charmap' codec can't encode characters in position 108-111: character maps to <undefined>
It is caused by Windows not supporting all characters. Is there a workaround for this?
You are getting this error, because it is not able to print unicode part of tweet.text. Encode it to utf-8 (unicode).
def on_data(self, data):
print json.loads(data)['text'].encode('utf-8')
return True
chcp 65001
This is the prescribed solution in multiple threads. I was using a symbol "∞" which was not getting printed. I ran the python code from cmd after running
chcp 65001
It worked like a charm. Hope it helps.
p.s. It only works in cmd not in atom editor nor via cygwin.

Python encoding issue when trying to parse JSON tweets

I am trying to parse out the tweet and username sections of the JSON object returned from Twitter using the following code:
class listener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
c.execute("INSERT INTO tweets (tweet_time, username, tweet) VALUES (%s,%s,%s)" ,
(time.time(), username, tweet))
print (username, tweet)
return True
def on_error(self, status):
print (status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track = ["LeBron James"])
But I get the following error. How can the code be adjusted to decode or encode the response properly?
Traceback (most recent call last):
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 45, in <module>
twitterStream.filter(track = ["LeBron James"])
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 428, in filter
self._start(async)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 346, in _start
self._run()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 286, in _run
raise exception
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 255, in _run
self._read_loop(resp)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 309, in _read_loop
self._data(next_status_obj)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 289, in _data
if self.listener.on_data(data) is False:
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 36, in on_data
print (username, tweet)
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to <undefined>
Unfortunately the problem with that is the information you get from twitter is not utf-8 encoded, which is causing you to get the charmap error. To fix that, you'll need to encode it.
tweet = all_data["text"].encode('utf-8')
username = all_data["user"]["screen_name"].encode('utf-8')
This will cause you to lose some of emoji and special characters that show up in the tweet, it will be converted to \x899. If you really need that information (I discard it myself) for sentiment analysis, then you'll need to install a package with a pre-compiled list to convert them accordingly.

Categories