Geograpy3 library is not working properly and give traceback error - python

I just try to extract location from a text using the geograpy3 library. But it throws an error.
for content in feedContent:
if content != "":
place = geograpy.get_place_context(text=content)
placesInFeed.append(place.places)
else:
placesInFeed.append("null")
The result is
Traceback (most recent call last):
File "C:/Users/Peshala/Documents/SDGP/Location-based-news-recommendation-master/Backend/rss_scraper.py", line 46, in <module>
place = geograpy.get_place_context(text=content)
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\__init__.py", line 11, in get_place_context
pc.set_cities()
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\places.py", line 137, in set_cities
self.populate_db()
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\places.py", line 30, in populate_db
for row in reader:
File "C:\Users\Peshala\AppData\Local\Programs\Python\Python36\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 277: character maps to <undefined>

As a committer of geograpy3 to reproduce your issue i added a test to the most recent geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py:
def testStackoverflow55548116(self):
'''
see https://stackoverflow.com/questions/55548116/geograpy3-library-is-not-working-properly-and-give-traceback-error
'''
feedContent=['Las Vegas is a city in Nevada']
placesInFeed=[]
for content in feedContent:
if content != "":
e=Extractor(text=content)
e.find_entities()
places = e.places
if self.debug:
print(places)
placesInFeed.append(places)
The result might not be what you expect:
['Las', 'Vegas', 'Nevada']
but the test does not show any error so please supply the feedContent that does - you might want to fork the project and modify the test and add a pull request for your problem.

Related

The tweepy library is starting to throw UnicodeEncodeError yet it hasn't been showing them

I'm not sure what is happening but my Twitter bot script has been running fine until now when it started showing UnicodeEncodeError when I just want to print a tweet object or its attributes.
Here is the code
def like_and_retweet():
for tweet in tweepy.Cursor(api.home_timeline).items(10):
print(tweet)
like_and_retweet()
I literally just want to see the tweet object but it generates the error below
[Running] python -u "c:\Users\Kweronda\Desktop\wen\app.py"
Traceback (most recent call last):
File "c:\Users\Kweronda\Desktop\wen\app.py", line 44, in <module>
like_and_retweet()
File "c:\Users\Kweronda\Desktop\wen\app.py", line 42, in like_and_retweet
print(tweet)
File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 2592-2593: character maps to <undefined>
even when I try to print(tweet.text) the same error happens.
I'm not sure why because it wasn't doing that before

Unable to encode a unicode into a .txt file in python

So while trying to mess around with python i tried making a program which would get me the content from pastebin url's and then save each ones content into a file of their own. I got an error
This is the code :-
import requests
file = open("file.txt", "r", encoding="utf-8").readlines()
for line in file:
link = line.rstrip("\n")
n_link = link.replace("https://pastebin.com/", "https://pastebin.com/raw/")
pastebin = n_link.replace("https://pastebin.com/raw/", "")
r = requests.get(n_link, timeout=3)
x = open(f"{pastebin}.txt", "a+")
x.write(r.text)
x.close
I get the following error :-
Traceback (most recent call last):
File "C:\Users\Lenovo\Desktop\Py\Misc. Scripts\ai.py", line 9, in <module>
x.write(r.text)
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2694' in position 9721: character maps to <undefined>
Can somebody help?
You’re doing good at the start by reading in the input file as UTF-8. The only thing you’re missing is to do the same thing with your output file:
x = open(f"{pastebin}.txt", "a+", encoding="utf-8")

Umlauts in JSON files lead to errors in Python code created by ANTLR4

I've created python modules from the JSON grammar on github / antlr4 with
antlr4 -Dlanguage=Python3 JSON.g4
I've written a main program "JSON2.py" following this guide: https://github.com/antlr/antlr4/blob/master/doc/python-target.md
and downloaded the example1.json also from github.
python3 ./JSON2.py example1.json # works perfectly, but
python3 ./JSON2.py bookmarks-2017-05-24.json # the bookmarks contain German Umlauts like "ü"
...
File "/home/xyz/lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)
The offending line in JSON2.py is
input = FileStream(argv[1])
I've searched stackoverflow and tried this instead of using the above FileStream:
fp = codecs.open(argv[1], 'rb', 'utf-8')
try:
input = fp.read()
finally:
fp.close()
lexer = JSONLexer(input)
stream = CommonTokenStream(lexer)
parser = JSONParser(stream)
tree = parser.json() # This is line 39, mentioned in the error message
Execution of this program ends with an error message, even if the input file doesn't contain Umlauts:
python3 ./JSON2.py example1.json
Traceback (most recent call last):
File "./JSON2.py", line 46, in <module>
main(sys.argv)
File "./JSON2.py", line 39, in main
tree = parser.json()
File "/home/x/Entwicklung/antlr/links/JSONParser.py", line 108, in json
self.enterRule(localctx, 0, self.RULE_json)
File "/home/xyz/lib/python3.5/site-packages/antlr4/Parser.py", line 358, in enterRule
self._ctx.start = self._input.LT(1)
File "/home/xyz/lib/python3.5/site-packages/antlr4/CommonTokenStream.py", line 61, in LT
self.lazyInit()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 186, in lazyInit
self.setup()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 189, in setup
self.sync(0)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 111, in sync
fetched = self.fetch(n)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 123, in fetch
t = self.tokenSource.nextToken()
File "/home/xyz/lib/python3.5/site-packages/antlr4/Lexer.py", line 111, in nextToken
tokenStartMarker = self._input.mark()
AttributeError: 'str' object has no attribute 'mark'
This parses correctly:
javac *.java
grun JSON json -gui bookmarks-2017-05-24.json
So the grammar itself is not the problem.
So finally the question: How should I process the input file in python, so that lexer and parser can digest it?
Thanks in advance.
Make sure your input file is actually encoded as UTF-8. Many problems with character recognition by the lexer are caused by using other encodings. I just took a testbed application, added ëto the list of available characters for an IDENTIFIER and it works again. UTF-8 is the key -- and make sure your grammar also allows these characters where you want to accept them.
I solved it by passing the encoding info:
input = FileStream(sys.argv[1], encoding = 'utf8')
If without the encoding info, I will have the same issue as yours.
Traceback (most recent call last):
File "test.py", line 20, in <module>
main()
File "test.py", line 9, in main
input = FileStream(sys.argv[1])
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 20, in __init__
super().__init__(self.readDataFrom(fileName, encoding, errors))
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)
Where my input data is
[今明]天(台南|高雄)的?天氣如何

UnicodeEncodeError in Python 3.6

I'm trying to find a way to disable this error logging in my python code. The program seems to actually run fine, the search function just returns a whole buttload of json objects with dozens of attributes each whenever it finds a character that it cant print it will print the thousands of json objects returned
to the console.
I wrapped the guilty code (below) in a try block but it hasn't changed anything.
try:
results = api.search(query)
print('Station hits: ', len(results['station_hits']), '\nSong hits: ', len(results['song_hits']), '\nArtist hits: ', len(results['artist_hits']), '\nAlbum hits: ', len(results['album_hits'])).encode('ascii', 'ignore')
except UnicodeEncodeError:
pass
Here is the error that is printed to the console. (Without the buttload of text referenced earlier)
--- Logging error ---
Traceback (most recent call last):
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\logging\__init__.py", line 989, in emit
stream.write(msg)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2117' in position 108194: character maps to <undefined>
Call stack:
File "gpm.py", line 247, in <module>
main()
File "gpm.py", line 181, in main
results = api.search(query)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\mobileclient.py", line 1806, in search
res = self._make_call(mobileclient.Search, query, max_results)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\shared.py", line 84, in _make_call
return protocol.perform(self.session, self.validate, *args, **kwargs)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\protocol\shared.py", line 243, in perform
log.debug(cls.filter_response(parsed_response))
Trace-back etc reveals: Can't encode u"\u2117" using cp1252, not surprising, use utf8 instead.

While reading file on Python, I got a UnicodeDecodeError. What can I do to resolve this?

This is one of my own projects. This will later help benefit other people in a game I am playing (AssaultCube). Its purpose is to break down the log file and make it easier for users to read.
I kept getting this issue. Anyone know how to fix this? Currently, I am not planning to write/create the file. I just want this error to be fixed.
The line that triggered the error is a blank line (it stopped on line 66346).
This is what the relevant part of my script looks like:
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r')
for line in log:
and the exception is:
Traceback (most recent call last):
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 159, in <module>
main()
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 7, in main
for line in log:
File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3074: character maps to <undefined>
Try:
enc = 'utf-8'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
if it won't work try:
enc = 'utf-16'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
you could also try it with
enc = 'iso-8859-15'
also try:
enc = 'cp437'
wich is very old but it also has the "ü" at 0x81 wich would fit to the string "üßer" wich I found on the homepage of assault cube.
If all the codings are wrong try to contact some of the guys developing assault cube or as mentioned in a comment: have a look at https://pypi.python.org/pypi/chardet

Categories