I am currently using Google Vision API in python to detect Chinese character in an image, but I found google will return python source code (Such as \xe7\x80\x86\xe7\xab\x91) instead of some human-readable string.
How can I convert it to human-readable text with utf-8 format?
Thanks all of your answer, may be I post my code is more easily for all of you.
Here is my code, basically I try to convert the whole json return from GOOGLE Vision and save in a json file, however, it hasn't success.
try:
code = requests.post('https://vision.googleapis.com/v1/images:annotate?key='+GOOGLE_API_KEY, data=params,headers=headers)
resultText = code.text.encode("utf-8")
outputFileName = image_path.split('.',1)[0]
outputDataFile = open(outputFileName+".json", "w")
outputDataFile.write(json.dumps(resultText))
outputDataFile.close()
except requests.exceptions.ConnectionError:
print('Request error')
Thank you
t = '\xe7\x80\x86\xe7\xab\x91'
t = unicode('\xe7\x80\x86\xe7\xab\x91', 'utf8')
# Output: 瀆竑
More detailed information about Unicode in here.
I finally solve this by using the below code. Thanks all of you
try:
code = requests.post('https://vision.googleapis.com/v1/images:annotate?key='+GOOGLE_API_KEY, data=params,headers=headers)
resultText = json.loads(code.text)
outputFileName = image_path.split('.',1)[0]
with open(outputFileName+".json", "w", encoding='utf8') as f:
json.dump(resultText, f, ensure_ascii=False,indent=4)
f.close()
except requests.exceptions.ConnectionError:
print('Request error')
I assume you mean you have a literal string like \xe4\xb8\x89 and you want to convert this into the character 三.
It's very strange that there isn't a straightforward way to do this. The best i can come up with is:
s = '\\xe4\\xb8\\x89'
print(bytes.fromhex(s.replace('\\x', '')).decode('utf-8')) # prints 三
Related
I am trying to get the contents of an XML file stored in an S3 bucket to show as text in the browser. However, it just displays as numbers (bytes) rather than a legible string.
My code (views.py):
def file_content(request):
file_buffer = io.BytesIO()
s3_client = boto3.client("s3", "eu-west-1")
s3_client.download_fileobj("mybucket", "myfolder/example.xml", file_buffer)
file_content = file_buffer.getvalue()
return HttpResponse(file_content)
What I've tried
I've tried changing this line:
file_content = file_buffer.getvalue()
To:
file_content = file_buffer.getvalue().decode("utf-8")
Or:
file_content = str(file_buffer.getvalue())
But it is still displaying as numbers/ bytes in the browser. The file content displays correctly as a string when using print() or if I check type(file_content) in the console but this is not happening in the browser.
I'm not sure whats going wrong, if somebody could point me in the right direction it would be much appreciated. Thank you.
Doh, I fixed this by adding a content_type to the HttpResponse so my function now looks like this:
def file_content(request):
file_buffer = io.BytesIO()
s3_client = boto3.client("s3", "eu-west-1")
s3_client.download_fileobj("mybucket", "myfolder/example.xml", file_buffer)
file_content = file_buffer.getvalue().decode('UTF-8')
response = HttpResponse(content_type="text/plain")
response.write(file_content)
return response
I am trying to use a Python code step to download an image in Zapier. Here is some sample code (but it doesn't seem to work):
r = requests.get('https://dummyimage.com/600x400/000/fff')
img = r.raw.read()
return {'image_data': img}
The error I get is
Runtime.MarshalError: Unable to marshal response: b'' is not JSON serializable
Does anyone know how I can use requests in a Python code step in Zapier to get an image? (I am trying to get the image and save it to Dropbox.)
THANKS.
It looks like you need a json serializable object and not a binary object.
One way to convert your image to a string is to use base64 and then encode it:
Make the image serializable:
r = requests.get('https://dummyimage.com/600x400/000/fff')
img_serializable = base64.b64encode(r.content).decode('utf-8')
# check with json.dumps(img_serializable)
Now return {'image_data': img_serializable} should not give errors.
Recover image from string and save to file:
with open("imageToSave.png", "wb") as f:
f.write(base64.decodebytes(img_serializable.encode('utf-8')))
The same using codecs, that is part of the standard Python library:
r = requests.get('https://dummyimage.com/600x400/000/fff')
content = codecs.encode(r.content, encoding='base64')
img_serializable = codecs.decode(content,encoding='utf-8')
type(img_serializable)
# Out:
# str
with open("imageToSave3.png", "wb") as f:
f.write(codecs.decode(codecs.encode(img_serializable, encoding='utf-8'), \
encoding='base64'))
Following multiple suggestions from other StackOverflow questions and the mutagen documentation, I was able to come up with code to get and set every ID3 tag in both MP3 and MP4 files. The issue I have is with setting the cover art for M4B files.
I have reproduced the code exactly like it is laid out in this answer:
Embedding album cover in MP4 file using Mutagen
But I am still receiving errors when I attempt to run the code. If I run the code with the 'albumart' value by itself I receive the error:
MP4file.tags['covr'] = albumart
Exception has occurred: TypeError
can't concat int to bytes
However, if I surround the albumart variable with brackets like is shown in the aforementioned StackOverflow question I get this output:
MP4file.tags['covr'] = [albumart]
Exception has occurred: struct.error
required argument is not an integer
Here is the function in it's entirety. The MP3 section works without any problems.
from mutagen.mp3 import MP3
from mutagen.mp4 import MP4, MP4Cover
def set_cover(filename, cover):
r = requests.get(cover)
with open('C:/temp/cover.jpg', 'wb') as q:
q.write(r.content)
if(filename.endswith(".mp3")):
MP3file = MP3(filename, ID3=ID3)
if cover.endswith('.jpg') or cover.endswith('.jpeg'):
mime = 'image/jpg'
else:
mime = 'image/png'
with open('C:/temp/cover.jpg', 'rb') as albumart:
MP3file.tags.add(APIC(encoding=3, mime=mime, type=3, desc=u'Cover', data=albumart.read()))
MP3file.save(filename)
else:
MP4file = MP4(filename)
if cover.endswith('.jpg') or cover.endswith('.jpeg'):
cover_format = 'MP4Cover.FORMAT_JPEG'
else:
cover_format = 'MP4Cover.FORMAT_PNG'
with open('C:/temp/cover.jpg', 'rb') as f:
albumart = MP4Cover(f.read(), imageformat=cover_format)
MP4file.tags['covr'] = [albumart]
I have been trying to figure out what I am doing wrong for two days now. If anyone can help me spot the problem I would be in your debt.
Thanks!
In the source code of mutagen at the location where the exception is being raised I've found the following lines:
def __render_cover(self, key, value):
...
for cover in value:
try:
imageformat = cover.imageformat
except AttributeError:
imageformat = MP4Cover.FORMAT_JPEG
...
Atom.render(b"data", struct.pack(">2I", imageformat, 0) + cover))
...
There key is the name for the cover tag and value is the data read from the image, wrapped into an MP4Cover object. Well, it turns out that if you iterates over an MP4Cover object, as the above code does, the iteration yields one byte of the image per iteration as int.
Moreover, in Python version 3+, struct.pack returns an object of type bytes. I think the cover argument was intended to be the collection of bytes taken from the cover image.
In the code you've given above the bytes of the cover image are wrapped inside an object of type MP4Cover that cannot be added to bytes as done in the second argument of Atom.render.
To avoid having to edit or patch the mutagen library source code, the trick is converting the 'MP4Cover' object to bytes and wrapping the result inside a collection as shown below.
import requests
from mutagen.mp3 import MP3
from mutagen.mp4 import MP4, MP4Cover
def set_cover(filename, cover):
r = requests.get(cover)
with open('C:/temp/cover.jpg', 'wb') as q:
q.write(r.content)
if(filename.endswith(".mp3")):
MP3file = MP3(filename, ID3=ID3)
if cover.endswith('.jpg') or cover.endswith('.jpeg'):
mime = 'image/jpg'
else:
mime = 'image/png'
with open('C:/temp/cover.jpg', 'rb') as albumart:
MP3file.tags.add(APIC(encoding=3, mime=mime, type=3, desc=u'Cover', data=albumart.read()))
MP3file.save(filename)
else:
MP4file = MP4(filename)
if cover.endswith('.jpg') or cover.endswith('.jpeg'):
cover_format = 'MP4Cover.FORMAT_JPEG'
else:
cover_format = 'MP4Cover.FORMAT_PNG'
with open('C:/temp/cover.jpg', 'rb') as f:
albumart = MP4Cover(f.read(), imageformat=cover_format)
MP4file.tags['covr'] = [bytes(albumart)]
MP4file.save(filename)
I've also added MP4file.save(filename) as the last line of the code to persists the changes done to the file.
I wrote this code to download an srt subtitle file, but this doesn't work. Please review this problem and help me with the code. I need to find what is the mistake that i'm doing. Thanks.
from urllib import request
srt_url = "https://subscene.com/subtitle/download?mac=LkM2jew_9BdbDSxdwrqLkJl7hDpIL_HnD-s4XbfdB9eqPHsbv3iDkjFTSuKH0Ee14R-e2TL8NQukWl82yNuykti8b_36IoaAuUgkWzk0WuQ3OyFyx04g_vHI_rjnb2290"
def download_srt_file(srt_url):
response = request.urlopen(srt_url)
srt = response.read()
srt_str = str(srt)
lines = srt_str.split('\\n')
dest_url = r'srtfile.srt'
fx = open('dest_url' , 'w')
for line in lines:
fx.write(line)
fx.close()
download_srt_file(srt_url)
A number of things are wrong or can be improved.
You are missing the return statement on your function.
You are calling the function from within the function so you are not actually calling it at all. You never enter it to begin with.
dest_url is not a string, it is a variable so fx = open('dest_url', 'w') will return an error (no such file)
To avoid handling the closing and flushing the file you are writing just use the with statement.
Your split('//n') is also wrong. You are escaping the slash like that. You want to split the lines so it has to be split('\n')
Finally, you don't have to convert the srt to string. It already is.
Below is a modified and hopefully functioning version of your code with the above implemented.
from urllib import request
def download_srt_file(srt_url):
response = request.urlopen(srt_url)
srt = response.read()
lines = srt.split('\n')
dest_url = 'srtfile.srt'
with open(dest_url, 'w') as fx:
for line in lines:
fx.write(line)
return
srt_url = "https://subscene.com/subtitle/download?mac=LkM2jew_9BdbDSxdwrqLkJl7hDpIL_HnD-s4XbfdB9eqPHsbv3iDkjFTSuKH0Ee14R-e2TL8NQukWl82yNuykti8b_36IoaAuUgkWzk0WuQ3OyFyx04g_vHI_rjnb2290"
download_srt_file(srt_url)
Tell me if it works for you.
A final remark is that you are not setting the target directory for the file you are writing. Are you sure you want to do that?
I am getting the error when trying to open a url I obtained from reading data from a .txt file in python using match.group(). This is my code below for where the error comes up. Any help as too how this can be corrected would be very much appreciated.
with open('output.txt') as f:
for line in f:
match = re.search("(?P<url>https?://docs.google.com/file[^\s]+)", line)
if match is not None:
urltest = match.group()
print urltest
print "[*] Opening Map in the web browser..."
kml_url = "urltest"
try:
webbrowser.get().open_new_tab(kml_url)
Since you have not provided what you are trying to parse I can only guess but this should pretty much work for your url:
>>> import re
>>> match = re.search('(?P<url>https:\/\/docs.google.com\/file[a-zA-z0-9-]*)', 'https://docs.google.com/fileCharWithnumbers123')
>>> match.group("url")
'https://docs.google.com/fileCharWithnumbers123'