Python: How do I print an unicode string from a .txt file - python

I'm using Python 3.2.3 and idle to program a text game.
I'm using a .txt file to store the map schemes that later will be opened by the program and draw at the terminal(IDLE for the moment).
What is in the .txt file is it:
╔════Π═╗
Π ║
║w bb c□
║w bb c║
╚═□══□═╝
Π: door; □: window; b: bed; c: computer; w: wardrobe
As I'm new to programming I'm having a difficult problem doing this.
Here is the code I made so far for this:
doc = codecs.open("D:\Escritório\Codes\maps.txt")
map = doc.read().decode('utf8')
whereIsmap = map.find('bedroom')
if buldIntel == 1 and localIntel == 1:
whereIsmap = text.find('map1:')
itsGlobal = 1
if espLocation == "localIntel" == 1:
whereIsmap = text.find('map0:')
if buldIntel == 0 and localIntel == 0:
doc.close()
for line in whereIsmap:
(map) = line
mapa.append(str(map))
doc.close()
if itsGlobal == 1:
print(mapa[0])
print(mapa[1])
print(mapa[2])
print(mapa[3])
print(mapa[4])
print(mapa[5])
print(mapa[6])
print(mapa[7])
if itsLocal == 1 and itsGlobal == 0:
print(mapa[0])
print(mapa[1])
print(mapa[2])
print(mapa[3])
print(mapa[4])
There are two maps and each one of them has a title the smaller one is map1(the one I've show).
Python is giving this error message if I try to run the program:
Traceback (most recent call last):
File "C:\Python32\projetoo", line 154, in <module>
gamePlay(ask1, type, selfIntel1, localIntel, buildIntel, whereAmI, HP, time, itsLocal, itsBuild)
File "C:\Python32\projetoo", line 72, in gamePlay
map = doc.read().decode('utf8')
File "C:\Python32\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
What do I do to print to the IDLE terminal the maps exactly as I showed up there?

The issue is that you are using codecs.open without specifying an encoding, then trying to decode the string returned by doc.read(), even though it is already a Unicode string.
To fix this, specify an encoding in your call to codecs.open: codecs.open("...", encoding="utf-8"), then you won't need the call to .decode('utf-8') later.
Also, since you're using Python 3, you can just use open:
doc = open("...", encoding="utf-8").read()
Finally, you'll need to re-encode the unicode string when you print it:
print("\n".join(mapa[0:4]).encode("utf-8"))

Related

'UCS-2' codec can't encode characters in position 61-61

When I run my Python code and print(item), I get the following errors:
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 61-61: Non-BMP character not supported in Tk
Here is my code:
def getUserFollowers(self, usernameId, maxid = ''):
if maxid == '':
return self.SendRequest('friendships/'+ str(usernameId) +'/followers/?rank_token='+ self.rank_token,l=2)
else:
return self.SendRequest('friendships/'+ str(usernameId) +'/followers/?rank_token='+ self.rank_token + '&max_id='+ str(maxid))
def getTotalFollowers(self,usernameId):
followers = []
next_max_id = ''
while 1:
self.getUserFollowers(usernameId,next_max_id)
temp = self.LastJson
for item in temp["users"]:
print(item)
followers.append(item)
if temp["big_list"] == False:
return followers
next_max_id = temp["next_max_id"]
How can I fix this?
Hard to guess without knowing the content of temp["users"], but the error indicates that it contains non BMP unicode characters like for example emoji.
If you try to display that in IDLE, you immediately get that kind of error. Simple example to reproduce (on IDLE for Python 3.5):
>>> t = "ab \U0001F600 cd"
>>> print(t)
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
print(t)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 3-3: Non-BMP character not supported in Tk
(\U0001F600 represents the unicode character U+1F600 grinning face)
The error is indeed caused by Tk not supporting unicode characters with code greater than FFFF. A simple workaround is the filter them out of your string:
def BMP(s):
return "".join((i if ord(i) < 10000 else '\ufffd' for i in s))
'\ufffd' is the Python representation for the unicode U+FFFD REPLACEMENT CHARACTER.
My example becomes:
>>> t = "ab \U0001F600 cd"
>>> print(BMP(t))
ab � cd
So your code would become:
for item in temp["users"]:
print(BMP(item))
followers.append(item)

Get rid of unicode error

I have the following code attempting to print the edge lists of graphs. It looks like the edges are cycled but it's my intention to test whether all edges are contained while going through the function for further processing.
def mapper_network(self, _, info):
info[0] = info[0].encode('utf-8')
for i in range(len(info[1])):
info[1][i] = str(info[1][i])
l_lst = len(info[1])
packed = [(info[0], l) for l in info[1]] #each pair of nodes (edge)
weight = [1 /float(l_lst)] #each edge weight
G = nx.Graph()
for i in range(len(packed)):
edge_from = packed[i][0]
edge_to = packed[i][1]
#edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
edge_to = edge_to.encode("utf-8")
weight = weight
G.add_edge(edge_from, edge_to, weight=weight)
#print G.size() #yes, this works :)
G_edgelist = []
G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next())
print G_edgelist
With this code, I obtain the error
Traceback (most recent call last):
File "MRQ7_trevor_2.py", line 160, in <module>
MRMostUsedWord2.run()
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 433, in run
mr_job.execute()
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 442, in execute
self.run_mapper(self.options.step_num)
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 507, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "MRQ7_trevor_2.py", line 91, in mapper_network
G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next())
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/readwrite/edgelist.py", line 114, in generate_edgelist
yield delimiter.join(map(make_str,e))
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/utils/misc.py", line 82, in make_str
return unicode(str(x), 'unicode-escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 0: \ at end of string
With the modification below
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
I obtained
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
TypeError: must be unicode, not str
How to get rid of the error of unicode? It seems very troublesome and I highly appreciate your assistance. Thank you!!
I highly recommend reading this article on unicode. It gives a nice explanation of unicode vs. strings in Python 2.
For your problem specifically, when you call unicodedata.normalize("NFKD", edge_to), edge_to must be a unicode string. However, it is not unicode since you set it in this line: info[1][i] = str(info[1][i]). Here's a quick test:
import unicodedata
edge_to = u'edge' # this is unicode
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
print edge_to # prints 'edge' as expected
edge_to = 'edge' # this is not unicode
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
print edge_to # TypeError: must be unicode, not str
You can get rid of the problem by casting edge_to to unicode.
As an aside, it seems like the encoding/decoding of the whole code chunk is a little confusing. Think out exactly where you want strings to be unicode vs. bytes. You may not need to be doing so much encoding/decoding/normalization.

How can I print Korean characters from an excel sheet using python?

I have an excel sheet like below, I need to read the values from the cell and compare with another file.
0 1 2 3 4 5
1100 ᄀ ᄁ ᄂ ᄃ ᄄ ᄅ
1120 ᄠ ᄡ ᄢ ᄣ ᄤ ᄥ
1140 ᅀ ᅁ ᅂ ᅃ ᅄ ᅅ
I have written the below piece of code after googling for opening and reading the contents of the excel sheet in python
import xlrd
file_location = "C:/Users/yepMe/Desktop/try/language.xlsx"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(3)
data = [[sheet.cell_value(r,c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
print type(data)
print data[1][0]
Now this doesn't throw any error and prints 0.0 , but when I try to print any other value which tries to print the Korean characters like print data[2][2], it gives the below error:
print data[2][2]
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u1121' in position 0: character maps to <undefined>,
Kindly tell me what needs to be done to get the Korean characters.
If I do print ord(data[2][2]), it prints the unicode code point, but when I am trying to loop through it:
for char in data:
decVal = ord(char)
Traceback (most recent call last):
File "C:\Users\amondapr\Desktop\korean\fonts\ttfq_protima.py", line 78, in <mo
dule>
decVal = ord(char)
TypeError: ord() expected string of length 1, but list found
It gives the above error how do I fetch the values, what am I doing wrong?

how to show the right word in my code, my code is : os.urandom(64)

My code is:
print os.urandom(64)
which outputs:
> "D:\Python25\pythonw.exe" "D:\zjm_code\a.py"
\xd0\xc8=<\xdbD'
\xdf\xf0\xb3>\xfc\xf2\x99\x93
=S\xb2\xcd'\xdbD\x8d\xd0\\xbc{&YkD[\xdd\x8b\xbd\x82\x9e\xad\xd5\x90\x90\xdcD9\xbf9.\xeb\x9b>\xef#n\x84
which isn't readable, so I tried this:
print os.urandom(64).decode("utf-8")
but then I get:
> "D:\Python25\pythonw.exe" "D:\zjm_code\a.py"
Traceback (most recent call last):
File "D:\zjm_code\a.py", line 17, in <module>
print os.urandom(64).decode("utf-8")
File "D:\Python25\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-3: invalid data
What should I do to get human-readable output?
No shortage of choices. Here's a couple:
>>> os.urandom(64).encode('hex')
'0bf760072ea10140d57261d2cd16bf7af1747e964c2e117700bd84b7acee331ee39fae5cff6f3f3fc3ee3f9501c9fa38ecda4385d40f10faeb75eb3a8f557909'
>>> os.urandom(64).encode('base64')
'ZuYDN1BiB0ln73+9P8eoQ3qn3Q74QzCXSViu8lqueKAOUYchMXYgmz6WDmgJm1DyTX598zE2lClX\n4iEXXYZfRA==\n'
os.urandom is giving you a 64-bytes string. Encoding it in hex is probably the best way to make it "human readable" to some extent. E.g.:
>>> s = os.urandom(64)
>>> s.encode('hex')
'4c28351a834d80674df3b6eb5f59a2fd0df2ed2a708d14548e4a88c7139e91ef4445a8b88db28ceb3727851c02ce1822b3c7b55a977fa4f4c4f2a0e278ca569e'
Of course this gives you 128 characters in the result, which may be too long a line to read comfortably; it's easy to split it up, though -- e.g.:
>>> print s[:32].encode('hex')
4c28351a834d80674df3b6eb5f59a2fd0df2ed2a708d14548e4a88c7139e91ef
>>> print s[32:].encode('hex')
4445a8b88db28ceb3727851c02ce1822b3c7b55a977fa4f4c4f2a0e278ca569e
two chunks of 64 characters each shown on separate lines may be easier on the eye.
Random bytes are not likely to be unicode characters, so I'm not suprised that you get encoding errors. Instead you need to convert them somehow. If all you're trying to do is see what they are, then something like:
print [ord(o) for o in os.urandom(64)]
Or, if you'd prefer to have it as hex 0-9a-f:
print ''.join( [hex(ord(o))[2:] for o in os.urandom(64)] )

struct.error: unpack requires a bytes object of length 16 in Python3.5

I have the following code and am trying to run it. But there is an error. Here is the code and the file that I am using.
import sys
import re
import subprocess as commands
import struct
class rbin:
def file_op(self,rfile):
self.readfile = rfile
self.wfile = open('loadmem.txt', 'w')
for line in self.readfile.readlines():
for cnt in range (0,4096,1):
x = cnt*16
test = line[x:x+16]
if (len(test) == 14):
magic = struct.unpack("<14b",test)
for i in range(0,14,1):
self.wfile.write("0x%X\n" % (magic[i]))
else:
magic = struct.unpack("<16b",test)
for i in range(0,16,1):
if ((x <= 498) | ((x <= 65520) & (x >= 65280))):
self.wfile.write("0x%X\n" % (magic[i]))
self.readfile.close()
self.wfile.close()
# Call Class
T = rbin()
# Call function from class
T.file_op(open('1.ex5','rb'))
The error is:
Traceback (most recent call last):
File "check.py", line 30, in <module>
T.file_op(open('1.ex5','rb'))
File "check.py", line 19, in file_op
magic = struct.unpack("<3b",test)
struct.error: unpack requires a bytes object of length 3
The file is: 1.ex5
Please let me know how to eliminate the error and what I missed.
Each call to readlines() will yield you the data from your current position in the file to the next \n (0x0a) character. Now since this is really a binary file, there really is no notion of a "linefeed".
If you open your file in a hex editor, you will see things like:
00000d70: 4451 baf1 dab3 7f69 ba67 a75b 1dee e6c2 DQ.....i.g.[....
00000d80: c816 a6cf be27 ace2 e6bb efef 0578 9a50 .....'.......x.P
00000d90: 0a86 28b8 1cae e9b4 e5ff ac5c e664 170a ..(........\.d..
^ ^
fake linefeed another fake linefeed
In that particular section of the file, there are less than 16 bytes, which is why your attempt to struct.unpack 16 bytes fails.
Moral: do not run readlines() on binary files. It will "work" (i.e. not throw an exception), but the data is not really lines of text.

Categories