I just try to extract location from a text using the geograpy3 library. But it throws an error.
for content in feedContent:
if content != "":
place = geograpy.get_place_context(text=content)
placesInFeed.append(place.places)
else:
placesInFeed.append("null")
The result is
Traceback (most recent call last):
File "C:/Users/Peshala/Documents/SDGP/Location-based-news-recommendation-master/Backend/rss_scraper.py", line 46, in <module>
place = geograpy.get_place_context(text=content)
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\__init__.py", line 11, in get_place_context
pc.set_cities()
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\places.py", line 137, in set_cities
self.populate_db()
File "C:\Users\Peshala\PycharmProjects\Location-based-news-recommendation\venv\lib\site-packages\geograpy\places.py", line 30, in populate_db
for row in reader:
File "C:\Users\Peshala\AppData\Local\Programs\Python\Python36\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 277: character maps to <undefined>
As a committer of geograpy3 to reproduce your issue i added a test to the most recent geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py:
def testStackoverflow55548116(self):
'''
see https://stackoverflow.com/questions/55548116/geograpy3-library-is-not-working-properly-and-give-traceback-error
'''
feedContent=['Las Vegas is a city in Nevada']
placesInFeed=[]
for content in feedContent:
if content != "":
e=Extractor(text=content)
e.find_entities()
places = e.places
if self.debug:
print(places)
placesInFeed.append(places)
The result might not be what you expect:
['Las', 'Vegas', 'Nevada']
but the test does not show any error so please supply the feedContent that does - you might want to fork the project and modify the test and add a pull request for your problem.
Related
I'm not sure what is happening but my Twitter bot script has been running fine until now when it started showing UnicodeEncodeError when I just want to print a tweet object or its attributes.
Here is the code
def like_and_retweet():
for tweet in tweepy.Cursor(api.home_timeline).items(10):
print(tweet)
like_and_retweet()
I literally just want to see the tweet object but it generates the error below
[Running] python -u "c:\Users\Kweronda\Desktop\wen\app.py"
Traceback (most recent call last):
File "c:\Users\Kweronda\Desktop\wen\app.py", line 44, in <module>
like_and_retweet()
File "c:\Users\Kweronda\Desktop\wen\app.py", line 42, in like_and_retweet
print(tweet)
File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 2592-2593: character maps to <undefined>
even when I try to print(tweet.text) the same error happens.
I'm not sure why because it wasn't doing that before
So while trying to mess around with python i tried making a program which would get me the content from pastebin url's and then save each ones content into a file of their own. I got an error
This is the code :-
import requests
file = open("file.txt", "r", encoding="utf-8").readlines()
for line in file:
link = line.rstrip("\n")
n_link = link.replace("https://pastebin.com/", "https://pastebin.com/raw/")
pastebin = n_link.replace("https://pastebin.com/raw/", "")
r = requests.get(n_link, timeout=3)
x = open(f"{pastebin}.txt", "a+")
x.write(r.text)
x.close
I get the following error :-
Traceback (most recent call last):
File "C:\Users\Lenovo\Desktop\Py\Misc. Scripts\ai.py", line 9, in <module>
x.write(r.text)
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2694' in position 9721: character maps to <undefined>
Can somebody help?
You’re doing good at the start by reading in the input file as UTF-8. The only thing you’re missing is to do the same thing with your output file:
x = open(f"{pastebin}.txt", "a+", encoding="utf-8")
I've created python modules from the JSON grammar on github / antlr4 with
antlr4 -Dlanguage=Python3 JSON.g4
I've written a main program "JSON2.py" following this guide: https://github.com/antlr/antlr4/blob/master/doc/python-target.md
and downloaded the example1.json also from github.
python3 ./JSON2.py example1.json # works perfectly, but
python3 ./JSON2.py bookmarks-2017-05-24.json # the bookmarks contain German Umlauts like "ü"
...
File "/home/xyz/lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)
The offending line in JSON2.py is
input = FileStream(argv[1])
I've searched stackoverflow and tried this instead of using the above FileStream:
fp = codecs.open(argv[1], 'rb', 'utf-8')
try:
input = fp.read()
finally:
fp.close()
lexer = JSONLexer(input)
stream = CommonTokenStream(lexer)
parser = JSONParser(stream)
tree = parser.json() # This is line 39, mentioned in the error message
Execution of this program ends with an error message, even if the input file doesn't contain Umlauts:
python3 ./JSON2.py example1.json
Traceback (most recent call last):
File "./JSON2.py", line 46, in <module>
main(sys.argv)
File "./JSON2.py", line 39, in main
tree = parser.json()
File "/home/x/Entwicklung/antlr/links/JSONParser.py", line 108, in json
self.enterRule(localctx, 0, self.RULE_json)
File "/home/xyz/lib/python3.5/site-packages/antlr4/Parser.py", line 358, in enterRule
self._ctx.start = self._input.LT(1)
File "/home/xyz/lib/python3.5/site-packages/antlr4/CommonTokenStream.py", line 61, in LT
self.lazyInit()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 186, in lazyInit
self.setup()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 189, in setup
self.sync(0)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 111, in sync
fetched = self.fetch(n)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 123, in fetch
t = self.tokenSource.nextToken()
File "/home/xyz/lib/python3.5/site-packages/antlr4/Lexer.py", line 111, in nextToken
tokenStartMarker = self._input.mark()
AttributeError: 'str' object has no attribute 'mark'
This parses correctly:
javac *.java
grun JSON json -gui bookmarks-2017-05-24.json
So the grammar itself is not the problem.
So finally the question: How should I process the input file in python, so that lexer and parser can digest it?
Thanks in advance.
Make sure your input file is actually encoded as UTF-8. Many problems with character recognition by the lexer are caused by using other encodings. I just took a testbed application, added ëto the list of available characters for an IDENTIFIER and it works again. UTF-8 is the key -- and make sure your grammar also allows these characters where you want to accept them.
I solved it by passing the encoding info:
input = FileStream(sys.argv[1], encoding = 'utf8')
If without the encoding info, I will have the same issue as yours.
Traceback (most recent call last):
File "test.py", line 20, in <module>
main()
File "test.py", line 9, in main
input = FileStream(sys.argv[1])
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 20, in __init__
super().__init__(self.readDataFrom(fileName, encoding, errors))
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)
Where my input data is
[今明]天(台南|高雄)的?天氣如何
I'm trying to find a way to disable this error logging in my python code. The program seems to actually run fine, the search function just returns a whole buttload of json objects with dozens of attributes each whenever it finds a character that it cant print it will print the thousands of json objects returned
to the console.
I wrapped the guilty code (below) in a try block but it hasn't changed anything.
try:
results = api.search(query)
print('Station hits: ', len(results['station_hits']), '\nSong hits: ', len(results['song_hits']), '\nArtist hits: ', len(results['artist_hits']), '\nAlbum hits: ', len(results['album_hits'])).encode('ascii', 'ignore')
except UnicodeEncodeError:
pass
Here is the error that is printed to the console. (Without the buttload of text referenced earlier)
--- Logging error ---
Traceback (most recent call last):
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\logging\__init__.py", line 989, in emit
stream.write(msg)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2117' in position 108194: character maps to <undefined>
Call stack:
File "gpm.py", line 247, in <module>
main()
File "gpm.py", line 181, in main
results = api.search(query)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\mobileclient.py", line 1806, in search
res = self._make_call(mobileclient.Search, query, max_results)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\clients\shared.py", line 84, in _make_call
return protocol.perform(self.session, self.validate, *args, **kwargs)
File "C:\Users\670854\AppData\Local\Programs\Python\Python36-32\lib\site-packages\gmusicapi\protocol\shared.py", line 243, in perform
log.debug(cls.filter_response(parsed_response))
Trace-back etc reveals: Can't encode u"\u2117" using cp1252, not surprising, use utf8 instead.
This is one of my own projects. This will later help benefit other people in a game I am playing (AssaultCube). Its purpose is to break down the log file and make it easier for users to read.
I kept getting this issue. Anyone know how to fix this? Currently, I am not planning to write/create the file. I just want this error to be fixed.
The line that triggered the error is a blank line (it stopped on line 66346).
This is what the relevant part of my script looks like:
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r')
for line in log:
and the exception is:
Traceback (most recent call last):
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 159, in <module>
main()
File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 7, in main
for line in log:
File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3074: character maps to <undefined>
Try:
enc = 'utf-8'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
if it won't work try:
enc = 'utf-16'
log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc)
you could also try it with
enc = 'iso-8859-15'
also try:
enc = 'cp437'
wich is very old but it also has the "ü" at 0x81 wich would fit to the string "üßer" wich I found on the homepage of assault cube.
If all the codings are wrong try to contact some of the guys developing assault cube or as mentioned in a comment: have a look at https://pypi.python.org/pypi/chardet