Python encoding issue when trying to parse JSON tweets - python

I am trying to parse out the tweet and username sections of the JSON object returned from Twitter using the following code:
class listener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
c.execute("INSERT INTO tweets (tweet_time, username, tweet) VALUES (%s,%s,%s)" ,
(time.time(), username, tweet))
print (username, tweet)
return True
def on_error(self, status):
print (status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track = ["LeBron James"])
But I get the following error. How can the code be adjusted to decode or encode the response properly?
Traceback (most recent call last):
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 45, in <module>
twitterStream.filter(track = ["LeBron James"])
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 428, in filter
self._start(async)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 346, in _start
self._run()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 286, in _run
raise exception
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 255, in _run
self._read_loop(resp)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 309, in _read_loop
self._data(next_status_obj)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 289, in _data
if self.listener.on_data(data) is False:
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 36, in on_data
print (username, tweet)
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to <undefined>

Unfortunately the problem with that is the information you get from twitter is not utf-8 encoded, which is causing you to get the charmap error. To fix that, you'll need to encode it.
tweet = all_data["text"].encode('utf-8')
username = all_data["user"]["screen_name"].encode('utf-8')
This will cause you to lose some of emoji and special characters that show up in the tweet, it will be converted to \x899. If you really need that information (I discard it myself) for sentiment analysis, then you'll need to install a package with a pre-compiled list to convert them accordingly.

Related

TypeError: 'str' object is not callable when insert tweet data to mysql Python 3

This is my code to insert tweet data in MYSQL
import pymysql
import tweepy
import time
import json
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pymysql.cursors
ckey= ''
csecret= ''
atoken=''
asecret=''
conn = pymysql.connect(host='localhost', port=3306, user='root', passwd='admin1234', db='mysql')
cur = conn.cursor()
class listener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
a=0
#username = all_data["user"]["screen_name"]
cur.execute("INSERT INTO tweet (textt) VALUES (%s)" (tweet))
print (tweet)
return True
def on_error(self, status):
print (status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track = ["puasa"])
cur.close()
conn.close()
but i get error TypeError: 'str' object is not callable
traceback error
Traceback (most recent call last):
File "collect-sql.py", line 40, in <module>
twitterStream.filter(track = ["puasa"])
File "/Users/amzar/anaconda3/lib/python3.6/site-packages/tweepy/streaming.py", line 450, in filter
self._start(async)
File "/Users/amzar/anaconda3/lib/python3.6/site-packages/tweepy/streaming.py", line 364, in _start
self._run()
File "/Users/amzar/anaconda3/lib/python3.6/site-packages/tweepy/streaming.py", line 297, in _run
six.reraise(*exc_info)
File "/Users/amzar/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/Users/amzar/anaconda3/lib/python3.6/site-packages/tweepy/streaming.py", line 266, in _run
self._read_loop(resp)
File "/Users/amzar/anaconda3/lib/python3.6/site-packages/tweepy/streaming.py", line 327, in _read_loop
self._data(next_status_obj)
File "/Users/amzar/anaconda3/lib/python3.6/site-packages/tweepy/streaming.py", line 300, in _data
if self.listener.on_data(data) is False:
File "collect-sql.py", line 30, in on_data
cur.execute("INSERT INTO tweet (textt) VALUES (%s)" (tweet))
TypeError: 'str' object is not callable
You need 2 extra commas:
cur.execute("INSERT INTO tweet (textt) VALUES (%s)", (tweet,))
The first separates the query string from the arguments, the second turns the value in brackets into the first element in a 1 element tuple (it actually would work if you just used a single string instead of a tuple, assuming you only have one argument, but this isn't officially supported from the look of things).
But this error that you mentioned in the comments:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 97: ordinal not in range(256)
means you are trying to interpret unicode text containing a character from the extended character set into latin-1.
If the field is already internally defined (in your mysql database) as unicode, you may need to specify the character set to use when connecting e.g.:
conn = pymysql.connect(host='localhost', port=3306, user='root', passwd='admin1234', db='mysql', use_unicode=True, charset="utf8")
If the field in mysql is not already something like utf-8 then I recommend you alter or otherwise redefine the database to use a unicode character se tfor this column.
https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html

Python: Tweepy.Cursor : 'JSONParser' object has no attribute 'model_factory'

I am trying to scrape status data from twitter.Facing Error trying to run the below code
for status in tweepy.Cursor(api.user_timeline,screen_name=screenname).items():
statuses.append(status)
Below is the Error :
File "", line 3, in
for status in data:
File "C:\Users\Sriram\Anaconda2\lib\site-packages\tweepy\cursor.py",
line 197, in next
self.current_page = self.page_iterator.next()
File "C:\Users\Sriram\Anaconda2\lib\site-packages\tweepy\cursor.py",
line 117, in next
model = ModelParser().parse(self.method(create=True), data)
File
"C:\Users\Sriram\Anaconda2\lib\site-packages\tweepy\parsers.py", line
102, in parse
result = model.parse_list(method.api, json)
File "C:\Users\Sriram\Anaconda2\lib\site-packages\tweepy\models.py",
line 65, in parse_list
results.append(cls.parse(api, obj))
File "C:\Users\Sriram\Anaconda2\lib\site-packages\tweepy\models.py",
line 81, in parse
user_model = getattr(api.parser.model_factory, 'user') if api else User
AttributeError: 'JSONParser' object has no attribute 'model_factory'
I also got the same error. I fixed this by removing the JSONParser in the API initialization.
I changed this:
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())
To this:
api = tweepy.API(auth)

'charmap' codec can't encode characters

I'm using tweepy and get this error when printing tweet messages on the screen (Windows).
#!/usr/bin/env python
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
#consumer key, consumer secret, access token, access secret.
ckey = 'xyz'
csecret = 'xyz'
atoken = 'xyz'
asecret = 'xyz'
class Listener(StreamListener):
def on_data(self, data):
print json.loads(data)['text']
return True
def on_error(self, status):
print status
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, Listener())
twitterStream.filter(track=['#hash1', '#hash2'], languages=['en'])
> Traceback (most recent call last): File
> "C:....twitterSentiment.py",
> line 34, in <module>
> twitterStream.filter(track=['#hash1', '#hash2'], languages=['en']) File
> line 430, in filter
> self._start(async) File "C:......streaming.py",
> line 346, in _start
> self._run() File "C:.....streaming.py",
> line 286, in _run
> raise exception UnicodeEncodeError: 'charmap' codec can't encode characters in position 108-111: character maps to <undefined>
It is caused by Windows not supporting all characters. Is there a workaround for this?
You are getting this error, because it is not able to print unicode part of tweet.text. Encode it to utf-8 (unicode).
def on_data(self, data):
print json.loads(data)['text'].encode('utf-8')
return True
chcp 65001
This is the prescribed solution in multiple threads. I was using a symbol "∞" which was not getting printed. I ran the python code from cmd after running
chcp 65001
It worked like a charm. Hope it helps.
p.s. It only works in cmd not in atom editor nor via cygwin.

Inserting tweets into MySQL DB using Tweepy

I am trying to use the following Python code to insert parsed out tweets into a MySQL database:
#-*- coding: utf-8 -*-
__author__ = 'sagars'
import pymysql
import tweepy
import time
import json
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
class listener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]
c.execute("INSERT INTO tweets (tweet_time, username, tweet) VALUES (%s,%s,%s)"
(time.time(), username, tweet))
print (username, tweet)
return True
def on_error(self, status):
print (status)
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track = ["LeBron James"])
But I am running into the following error:
Traceback (most recent call last):
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 45, in <module>
twitterStream.filter(track = ["LeBron James"])
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 428, in filter
self._start(async)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 346, in _start
self._run()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 286, in _run
raise exception
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 255, in _run
self._read_loop(resp)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 309, in _read_loop
self._data(next_status_obj)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 289, in _data
if self.listener.on_data(data) is False:
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 35, in on_data
(time.time(), username, tweet))
TypeError: 'str' object is not callable
How can the code be adjusted to avoid the error? Should the tweets be parsed out of the JSON object in a different way?
You forgot a comma before (time.time(), usernam.... Etc.
To clarify it would be
c.execute("INSERT INTO tweets (tweet_time, username, tweet) VALUES (%s,%s,%s)" ,
(time.time(), username, tweet))

How to add dictionary to json object

In my construct below, I am trying to pass a JSON object through my web service. As a new requirement I have to pass a dictionary object which is sent in the code below. Can you please guide me how to add the dictionary to JSON object?
if plain_text is not None:
blob = TextBlob(plain_text)
sentiment = TextBlob(plain_text)
sent = {}
for sentence in blob.sentences:
sent[sentence] =sentence.sentiment.polarity
print sent
return json.dumps(
{'input' : plain_text,
'Polarity': sentiment.polarity,
#'sent': json.dumps(sent) # this is where I am stuck as this 'sent' is a dict
},
indent=4)
If I uncomment the line I get the below error:
Exception:
TypeError('keys must be a string',)
Traceback:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\bottle-0.12.7-py2.7.egg\bottle.py", line 862, in _handle
return route.call(**args)
File "C:\Python27\lib\site-packages\bottle-0.12.7-py2.7.egg\bottle.py", line 1729, in wrapper
rv = callback(*a, **ka)
File "C:\Users\hp\Desktop\RealPy\WebServices\bottle\server_bckup.py", line 53, in sentimentEngine
'sent': json.dumps(sent),
File "C:\Python27\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
TypeError: keys must be a string
In JSON, dictionary keys must be strings. You have a Python dictionary sent, that you want to serialize into JSON. This fails, as your dictionary sent has keys that are not strings, but textblob.blob.Sentence instances.
If it makes sense, you can change your code to read:
for sentence in blob.sentences:
sent[str(sentence)] = sentence.sentiment.polarity
Or, you can customize the Python JSON encoder to know how to serialize TextBlob Sentences.

Categories