How to load unicode string into json in Python? - python

I am trying to read file from a compressed file and convert data into json/ dictionary. But there is unicode issue that I have been struggling for a while. Can anyone help ?
exfile_obj = tar.extractfile(member)
data = exfile_obj.read()
print(type(data)) ## shows str
print(data) ## it is something like: "{u'building': False, u'displayName': u'Tam\\xe1s Kosztol\\xe1nczi', u'changeSet': {u'items': u'comment'}}"
json_obj = json.loads(data) # it is a unicode object.

That data is a string representation of a Python dictionary. You can convert it to a dictionary using ast.literal_eval, and you can convert that dict to a JSON string using json.dumps.
import ast
import json
src = "{u'building': False, u'displayName': u'Tam\\xe1s Kosztol\\xe1nczi', u'changeSet': {u'items': u'comment'}}"
data = ast.literal_eval(src)
print(data)
j = json.dumps(data)
print(j)
output
{'building': False, 'displayName': 'Tamás Kosztolánczi', 'changeSet': {'items': 'comment'}}
{"building": false, "displayName": "Tam\u00e1s Kosztol\u00e1nczi", "changeSet": {"items": "comment"}}

Related

How to parse a string of multiple jsons without separators in python?

Given a single-lined string of multiple, arbitrary nested json-files without separators, like for example:
contents = r'{"payload":{"device":{"serial":213}}}{"payload":{"device":{"serial":123}}}'
How can contents be parsed into an array of dicts/jsons ? I tried
df = pd.read_json(contents, lines=True)
But only got a ValueError response:
ValueError: Unexpected character found when decoding array value (2)
You can split the string, then parse each JSON string into a dictionary:
import json
contents = r'{"payload":{"device":{"serial":213}}}{"payload":{"device":{"serial":123}}}'
json_strings = contents.replace('}{', '}|{').split('|')
json_dicts = [json.loads(string) for string in json_strings]
Output:
[{'payload': {'device': {'serial': 213}}}, {'payload': {'device': {'serial': 123}}}]

JSON - string value conversion to List

Code used for extraction from JSON
import json
string = json.loads(data)
string['Body']
import base64
base64.b64decode(string['Body'])
bytes_data = base64.b64decode(string['Body'])
str(bytes_data, encoding='utf-8')
I have a following format that is extracted from JSON
"[{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Air","values":[{"v":"46","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]},
{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Atomise","values":[{"v":"3.1","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]}]"
Any idea about converting it to actual list
[{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Air","values":[{"v":"46","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]},
{"id":"XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Atomise","values":[{"v":"3.1","q":192,"t":"2021-10-28T13:47:59.7880096Z"}]}]
things I have tried :
list(bytearray(bytes_data))
for loop - for this output string, but this is a convoluted way to do it.
some more conversion stuff. looking for something that is compact.
Reverse engineering your question....
Given a JSON file with base64 data
$ cat /tmp/data.json
{
"Body": "W3siaWQiOiJYWFhYX1UyXzE3MDIxNjpYWFhYX1UyXzE3MDIxNjpGQkVfMjMwMTUuQWlyIiwidmFsdWVzIjpbeyJ2IjoiNDYiLCJxIjoxOTIsInQiOiIyMDIxLTEwLTI4VDEzOjQ3OjU5Ljc4ODAwOTZaIn1dfSwKeyJpZCI6IlhYWFhfVTJfMTcwMjE2OlhYWFhfVTJfMTcwMjE2OkZCRV8yMzAxNS5BdG9taXNlIiwidmFsdWVzIjpbeyJ2IjoiMy4xIiwicSI6MTkyLCJ0IjoiMjAyMS0xMC0yOFQxMzo0Nzo1OS43ODgwMDk2WiJ9XX1dCg=="
}
When read and extracted
import json
import base64
with open('/tmp/data.json') as f:
string = json.load(f)
body = string['Body']
Then decoded... a list is returned
import pprint
l = json.loads(base64.b64decode(body)
pprint.pprint(l)
[{'id': 'XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Air',
'values': [{'v': '46', 'q': 192, 't': '2021-10-28T13:47:59.7880096Z'}]},
{'id': 'XXXX_U2_170216:XXXX_U2_170216:FBE_23015.Atomise',
'values': [{'v': '3.1', 'q': 192, 't': '2021-10-28T13:47:59.7880096Z'}]}]
Use the built-in json module:
import json
data = json.loads(bytes_data)
It seems like you have a json inside a json, so load it twice:
import json
import base64
string = json.loads(data)
bytes_data = base64.b64decode(string['Body'])
output = json.loads(bytes_data)
use json load method like this, suppose you have JSON array and want to convert in LIST then do following
import json
array = '{"Items": ["IPhone", "Earphone", "Powerbackup"]}'
data = json.loads(array)
print (data['Items'])

convert json to csv and store it in a variable in python

I am using csv module to convert json to csv and store it in a file or print it to stdout.
def write_csv(data:list, header:list, path:str=None):
# data is json format data as list
output_file = open(path, 'w') if path else sys.stdout
out = csv.writer(output_file)
out.writerow(header)
for row in data:
out.writerow([row[attr] for attr in header])
if path: output_file.close()
I want to store the converted csv to a variable instead of sending it to a file or stdout.
say I want to create a function like this:
def json_to_csv(data:list, header:list):
# convert json data into csv string
return string_csv
NOTE: format of data is simple
data is list of dictionaries of string to string maping
[
{
"username":"srbcheema",
"name":"Sarbjit Singh"
},
{
"username":"testing",
"name":"Test, user"
}
]
I want csv output to look like:
username,name
srbcheema,Sarbjit Singh
testing,"Test, user"
Converting JSON to CSV is not a trivial operation. There is also no standardized way to translate between them...
For example
my_json = {
"one": 1,
"two": 2,
"three": {
"nested": "structure"
}
}
Could be represented in a number of ways...
These are all (to my knowledge) valid CSVs that contain all the information from the JSON structure.
data
'{"one": 1, "two": 2, "three": {"nested": "structure"}}'
one,two,three
1,2,'{"nested": "structure"}'
one,two,three__nested
1,2,structure
In essence, you will have to figure out the best translation between the two based on your knowledge of the data. There is no right answer on how to go about this.
I'm relatively knew to Python so there's probably a better way, but this works:
def get_safe_string(string):
return '"'+string+'"' if "," in string else string
def json_to_csv(data):
csv_keys = data[0].keys()
header = ",".join(csv_keys)
res = list(",".join(get_safe_string(row.get(k)) for k in csv_keys) for row in data)
res.insert(0,header)
return "\n".join(r for r in res)

Convert json into spacy format

I'm new to Python ,help me how to pass json value as parameter instead of load from filename.Please check below code for reference..
import json
filename = input("Enter your train data filename : ")
print(filename)
with open(filename) as train_data:
train = json.load(train_data)
TRAIN_DATA = []
for data in train:
ents = [tuple(entity) for entity in data['entities']]
TRAIN_DATA.append((data['content'],{'entities':ents}))
with open('{}'.format(filename.replace('json','txt')),'w') as write:
write.write(str(TRAIN_DATA))
In above code json value loaded from file ,instead of file i want to pass json value and load ....
Ex:
train_data=[{"content":"what is the price of polo?","entities":[[21,25,"PrdName"]]}
with open(filename) as train_data:
train = json.load(train_data)
Thanks,
"json value" doesn't mean anything. Json is a text format, not a data type, and what json.loads() do is to transform the json text to python objects - dicts, lists etc - according to the json syntax and what exact type makes sense in Python (json object -> dict, json array -> list etc). You can check this by yourself in your Python shell:
>>> import json
>>> jsonstr = '{"foo":"bar", "baaz":[1, 2, 3]}'
>>> json_data = json.loads(jsonstr)
>>> json_data
{'foo': 'bar', 'baaz': [1, 2, 3]}
>>> type(json_data)
<class 'dict'>
IOW, if you already have the correct Python dict, you have nothing else to do.

How do I convert this list to a valid json object in Python 3.7?

I have the following list that I need to convert into a valid JSON object:
data = ['{"id":"0","jsonrpc":"2.0","method":"RoutingRequest","params":{"barcode":"5694501","itemID":113},"timestamp":"2018-08-06T15:38:40.531"}', '']
I've tried:
import json
my_json = data.decode('utf8').replace("'", '"')
my_json = json.loads(my_json)
Keep getting this error: raise TypeError(f'the JSON object must
be str, bytes or bytearray, ' TypeError: the JSON object must be str,
bytes or bytearray, not list
What am I doing wrong? (btw, I'm new to Python)
You have a list of data. Iterate over it and then use json.load
Ex:
import json
data = ['{"id":"0","jsonrpc":"2.0","method":"RoutingRequest","params":{"barcode":"5694501","itemID":113},"timestamp":"2018-08-06T15:38:40.531"}', '']
data = [json.loads(i) for i in data if i] #Iterate your list check if you have data then use json.loads
print(data)
Output:
[{u'params': {u'itemID': 113, u'barcode': u'5694501'}, u'jsonrpc': u'2.0', u'id': u'0', u'timestamp': u'2018-08-06T15:38:40.531', u'method': u'RoutingRequest'}]

Categories