Python json load with global language support

Python json load with global language support - python

Hi I tried to use the international language on my script.
But it was returning the encoded data type.
Here my code.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
string ='{\"NAME\":\"ทะเลทอง แลปกุ้ง\",\"DESC\":\"Shop Descriptionอาหารกุ้ง วิตามิน แร่ธาตุ\",\"ADDRESS_LINE_1\":\"29/4หมู่13 บางแก้วซอย1 ต.บางขวัญอ.เมือง\"}'
print json.loads(string)
It was returning the below encoded format
{u'ADDRESS_LINE_1': u'29/4\u0e2b\u0e21\u0e39\u0e4813 \u0e1a\u0e32\u0e07\u0e41\u0e01\u0e49\u0e27\u0e0b\u0e2d\u0e221 \u0e15.\u0e1a\u0e32\u0e07\u0e02\u0e27\u0e31\u0e0d\u0e2d.\u0e40\u0e21\u0e37\u0e2d\u0e07', u'NAME': u'\u0e17\u0e30\u0e40\u0e25\u0e17\u0e2d\u0e07 \u0e41\u0e25\u0e1b\u0e01\u0e38\u0e49\u0e07', u'DESC': u'Shop Description\u0e2d\u0e32\u0e2b\u0e32\u0e23\u0e01\u0e38\u0e49\u0e07 \u0e27\u0e34\u0e15\u0e32\u0e21\u0e34\u0e19 \u0e41\u0e23\u0e48\u0e18\u0e32\u0e15\u0e38'}
This script should suppot all kind of languages like Thai, Tamil, Chineese etc..
Expected OutPut
data = json.loads(string)
print data['NAME']
this should print 'ทะเลทอง แลปกุ้ง'

Your script works perfectly (as expected) provided you use it on a unicode capable terminal.
I use IDLE for Python 2.7.12 for win32 on a Windows 7 box and this code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
string ='{\"NAME\":\"ทะเลทอง แลปกุ้ง\",\"DESC\":\"Shop Descriptionอาหารกุ้ง วิตามิน แร่ธาตุ\",\"ADDRESS_LINE_1\":\"29/4หมู่13 บางแก้วซอย1 ต.บางขวัญอ.เมือง\"}'
data = json.loads(string)
print data
print data['NAME']
correctly displays:
{u'ADDRESS_LINE_1': u'29/4\u0e2b\u0e21\u0e39\u0e4813 \u0e1a\u0e32\u0e07\u0e41\u0e01\u0e49\u0e27\u0e0b\u0e2d\u0e221 \u0e15.\u0e1a\u0e32\u0e07\u0e02\u0e27\u0e31\u0e0d\u0e2d.\u0e40\u0e21\u0e37\u0e2d\u0e07', u'NAME': u'\u0e17\u0e30\u0e40\u0e25\u0e17\u0e2d\u0e07 \u0e41\u0e25\u0e1b\u0e01\u0e38\u0e49\u0e07', u'DESC': u'Shop Description\u0e2d\u0e32\u0e2b\u0e32\u0e23\u0e01\u0e38\u0e49\u0e07 \u0e27\u0e34\u0e15\u0e32\u0e21\u0e34\u0e19 \u0e41\u0e23\u0e48\u0e18\u0e32\u0e15\u0e38'}
ทะเลทอง แลปกุ้ง
Said differently it is not a Python problem but only a terminal configuration one.

import json
string ='{\"NAME\":\"ทะเลทอง แลปกุ้ง\",\"DESC\":\"Shop Descriptionอาหารกุ้ง วิตามิน แร่ธาตุ\",\"ADDRESS_LINE_1\":\"29/4หมู่13 บางแก้วซอย1 ต.บางขวัญอ.เมือง\"}'
print (json.loads(string))
out:
{'DESC': 'Shop Descriptionอาหารกุ้ง วิตามิน แร่ธาตุ', 'ADDRESS_LINE_1': '29/4หมู่13 บางแก้วซอย1 ต.บางขวัญอ.เมือง', 'NAME': 'ทะเลทอง แลปกุ้ง'}
Just use python3

Related

(MATE) pluma "PLUMA_SELECTED_TEXT" is missing from environment

I'm writing a pluma plugin (in python) to automate HTML markup of a selected text.
According to (the poor and scarce) documentation, the selected text in the editor should be found in os.environ["PLUMA_SELECTED_TEXT"].
However, when I select some text, run my plugin and examine the environment there is no variable such as "PLUMA_SELECTED_TEXT".
I do find 'PLUMA_CURRENT_LINE' but it contains only the last line of the selected text.
Here is the plugin itself (with debugging stuff...)
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import os
import re
print(os.environ)
try:
ptext = os.environ["PLUMA_SELECTED_TEXT"]
except KeyError:
ptext = "SELECTION NOT FOUND"
print(ptext)
#ptext = re.sub('\n','<br/>\n',ptext)
#ptext = "<p>\n%s\n</p>\n"%ptext
#print(ptext)
Anyone ran into this?

I found the solution, for the benefit of whoever runs into this.
The selected text is actually sent to the script as STDIN so this needs to be read.
Hence the code looks like that:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
try:
ptext = sys.stdin.read()
except:
ptext = "SELECTION NOT FOUND"
ptext = re.sub('\n','<br/>\n',ptext)
ptext = "<p>\n%s\n</p>\n"%ptext
print(ptext)

Read JSON data from UTF-8 encoded byte string

I have a script that sends a JSON UTF-8 encoded Byte string to a socket. (A github project: https://github.com/alios/raildriver). Now I'm writing the python script that needs to read the incoming data. Right now I can receive the data and print it to the terminal. With the following script: https://www.binarytides.com/code-telnet-client-sockets-python/
Output:
data = '{"Current": 117.42609405517578, "Accelerometer": -5.394751071929932, "SpeedometerKPH": 67.12493133544922, "Ammeter": 117.3575210571289, "Amp": 117.35590362548828, "Acceleration": -0.03285316377878189, "TractiveEffort": -5.394751071929932, "Effort": 48.72163772583008, "RawTargetDistance": 3993.927734375, "TargetDistanceBar": 0.9777777791023254, "TargetDistanceDigits100": -1.0, "TargetDistanceDigits1000": -1.0}'
The problem is that I can't find how to read the JSON array. For example read "Ammeter" and return its value 117.357521057289 to a new variable.
All the data is being received in the variable data
The code I have right now:
decodedjson = data.decode('utf-8')
dumpedjson = json.dumps(decodedjson)
loadedjson = json.loads(dumpedjson)
Can you please help me?

You are encoding to JSON then decoding again. SImply not encode, remove the second line:
decodedjson = data.decode('utf-8')
loadedjson = json.loads(decodedjson)
If you are using Python 3.6 or newer, you don't actually have to decode from UTF-8, as the json.loads() function knows how to deal with UTF-encoded JSON data directly. The same applies to Python 2:
loadedjson = json.loads(data)
Demo using Python 3.7:
>>> data = b'{"Current": 117.42609405517578, "Accelerometer": -5.394751071929932, "SpeedometerKPH": 67.12493133544922, "Ammeter": 117.3575210571289, "Amp": 117.35590362548828, "Acceleration": -0.03285316377878189, "TractiveEffort": -5.394751071929932, "Effort": 48.72163772583008, "RawTargetDistance": 3993.927734375, "TargetDistanceBar": 0.9777777791023254, "TargetDistanceDigits100": -1.0, "TargetDistanceDigits1000": -1.0}'
>>> loadedjson = json.loads(data)
>>> loadedjson['Ammeter']
117.3575210571289

Coding: utf-8 doesn't seem to work

Utf-8 doesn't work on my computer. I tried the exact same code at another computer and it worked but on my computer it doesn't. It's in python.
My program starts like this:
# -*- coding: utf-8 -*- # Behövs i python 2 för åäö
from Tkinter import *
class Kryssruta(Button):
""" Knapp som kryssas i/ur när man trycker på den """
def __init__(self, master, nr = 0, rad = 0, kolumn = 0):
#Konstruktor, notera master
Button.__init__(self,master)
self.master = master
self.rad = rad
self.kolumn = kolumn
self.markerad = False
self.kryssad = False
self.cirklad = False
self["command"] = self.kryssa
def kryssa(self):
if self.markerad==False:
self.master.klickat(self)
On one computer it works like a charm, but on my own computer I get the message.
SyntaxError: Non-ASCII character '\xc3' in file 'blah' but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
Using a PC, running in powershell.
Anyone who knows what seems to be the problem?

You have a (number of) blank line(s) above the coding: line. From the document listed in the error message:
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:

You declare that the source file is using utf-8 encoding but actually it isn't, it's using the Windows code page default for your system.
Open the file in Notepad and save it out again with Save As, setting UTF-8 in the Encoding dropdown.

Encoding error using Python

I wrote a code to connect to imap and then parse the body information and insert into database. But I am having some problems with accents.
From email header I got this information:
Content-Type: text/html; charset=ISO-8859-1
But, I am not sure if I can trust in this information...
The email was wrote in portuguese, so we have a lot of words with accents. For example, I extract the following phrase from the email source code (using my browser):
"...instalação de eletrônicos..."
So, I connected to imap and fetched some emails:
... typ, data = M.fetch(num, '(RFC822)') ...
When I print the content, I get the following word:
print data[0][1]
instala+º+úo de eletr+¦nicos
I tried to use .decode('utf-8') but I had no success.
instalaÃ§Ã£o de eletrÃ´nicos
How can I make it a human readable? My database is in utf-8.

The header says it is using "ISO-8859-1" charset. So you need to decode the string with that encoding.
Try this:
data[0][1].decode('iso-8859-1')

Specifying the source code encoding worked for me. It's the code at the top of my example code below. This should be defined at the top of your python file.
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
value = """...instalação de eletrônicos...""".decode("iso-8859-15")
print value
# prints: ...instalação de eletrônicos...
import unicodedata
value = unicodedata.normalize('NFKD', value).encode('ascii','ignore')
print value
# prints: ...instalacao de eletronicos...
And now you can do str(value) without an exception as well.
See: http://docs.python.org/2/library/unicodedata.html
This seems to keep all accents:
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import unicodedata
value = """...instalação de eletrônicos...""".decode("iso-8859-15")
value = unicodedata.normalize('NFKC', value).encode('utf-8')
print value
print str(value)
# prints (without exceptions/errors):
# ...instalação de eletrônicos...
# ...instalação de eletrônicos...
EDIT:
Do note that with the last version even though the outcome looks the same it doesn't return equal is True. In example:
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import unicodedata
inValue = """...instalação de eletrônicos...""".decode("iso-8859-15")
normalizedValue = unicodedata.normalize('NFKC', inValue).encode('utf-8')
try:
print inValue == normalizedValue
except UnicodeWarning:
pass
# False
EDIT2:
This returns the same:
normalizedValue = unicode("""...instalação de eletrônicos...""".decode("iso-8859-15")).encode('utf-8')
print normalizedValue
print str(normalizedValue )
# prints (without exceptions/errors):
# ...instalação de eletrônicos...
# ...instalação de eletrônicos...
Though I'm not sure this will actually be valid for a utf-8 encoded database. Probably not?

Thanks for Martijn Pieters. We figured out that the email had two different encode. I had to split this parts and treat individually.

Error when I retrieve data from dbpedia

I try to retrieve data from dbpedia but I get error every time i run the code.
The code in Python is:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject
WHERE { <http://dbpedia.org/resource/Musée_du_Louvre> dcterms:subject ?subject }
""")
# JSON example
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["subject"]["value"]
I believe that I must use a different char for "é" in "Musée_du_Louvre"but I cant figure which.
Thx!

The first problem is that SPARQLWrapper seems to expect its query to be in unicode, but you're passing it an utf-8 encoded string - that's why you get a UnicodeDecoreError. Instead you should pass it a unicode object, either by decoding your utf-8 string
unicode_obj = some_utf8_string.decode('utf-8')
or by using an unicode literal:
unicode_obj = u'Hello World'
Passing it a unicode object avoids that UnicodeDecodeError, but doesn't yield any results. So it looks the dbpedia API expects URLs containing non-ASCII characters to be percent-encoded. Therefore you need to encode the URL beforehand using urllib.quote_plus:
from urllib import quote_plus
encoded_url = quote_plus(url, safe='/:')
With these two changes your code could look like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from SPARQLWrapper import SPARQLWrapper, JSON
from urllib import quote_plus
url = 'http://dbpedia.org/resource/Musée_du_Louvre'
encoded_url = quote_plus(url, safe='/:')
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
query = u"""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject
WHERE { <%s> dcterms:subject ?subject }
""" % encoded_url
sparql.setQuery(query)
# JSON example
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["subject"]["value"]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python json load with global language support - python

Related

(MATE) pluma "PLUMA_SELECTED_TEXT" is missing from environment

Read JSON data from UTF-8 encoded byte string

Coding: utf-8 doesn't seem to work

Encoding error using Python

Error when I retrieve data from dbpedia

Categories

Resources