Python List to JSON - python

I am trying to convert the output of my function which returns a list into a JSON object.
The function outputs the following the list = [b'E28011600000208', b'E28023232083', b'3000948484']
I would like to create a JSON object that has the following attributes:
{"tag": ["E28011600000208", "E28023232083", "3000948484"]}
Decoding of a list item was not shown in the similar example, I need help with that if thats the approach to solving this problem.
The function that I am calling is as follows :
reader.read(timeout=500)
Performs a synchronous read, and then returns a list of TagReadData objects resulting from the search. If no tags were found then the list will be empty.
For example:
print(reader.read())
[b'E2002047381502180820C296', b'0000000000000000C0002403']
In my code I have done the following:
tags = reader.read()
data = json.dumps({tag: tags}, separator=(',','b'))
print (data)
I get the error:
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'3000948484' is not JSON serializable
I tried the solution below to remove the byte string my code is as follows:
tags = reader.read()
tags = list(map(lambda x:x.decode('utf-8'),tags))
data = json.dumps({'tag':tags})
print(data)
I get the error:
AttributeError: 'mercury.TagReadData' object has no attribute 'decode'
The output is now JSON but I still have the b' string in my JSON file. I have the following code:
tag = list(map(lambda x: str(x), tag))
data = json.dumps({'tag': tag})
print(data)
The code outputs the following:
{"tag": ["b'30000000321'", "b'300000000'"]}
How do I go about removing the b? By doing str(x) in python 3.5 it was suppose to decode the byte but it didn't.

Python dict should have unique keys. So repeating keys will not work and as a result it will hold just one value. However if you keep the only one key i.e tag and value as list should work.
Having said that, json.dumps or json.loads does not handle the dict object if it contain tuple, byte etc. object. Here in your example the list is byte string which is having JSON (de)serializing problem.
Now if you dont care about the byte string and want to decode to utf-8 which basically convert to string then you can find the solution here.
l = [b'E28011600000208', b'E28023232083', b'3000948484']
l = list(map(lambda x: x.decode('utf-8'), l)))
data = json.dumps({'tag': l})
print(data)
# Out: '{"tag": ["E28011600000208", "E28023232083", "3000948484"]}'
But if you want to keep byte string then look at how to handle while serializeing and desirializing the object using custom json encoder class via extra params cls
json.dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
pickle is another useful lib to pack/unpack data and it does take care of any objects(python). But it returns packed data as byte string not useful if need to share to 3rd application. Very useful when memcache any python object.
pickle.loads(pickle.dumps(l))

Related

ijson : How to use ijson to retrieve a dict/list element (from file or from string)?

I am trying to use ijson to retrieve an element from a json dict object.
The json string is inside a file and the only thing in that file is that content:
{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}
(that string is very simplified but in fact is over 2GB long)
I need to help to do the following:
1/ Open that file and
2/ Use ijson to load that json data in to some object
3/ Retrieve the list "[1,2,3]" from that object
Why not just using the following simple code:
my_json = json.loads('{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}')
my_list = my_json['categoryAspects']
Well, you have to imagine that this "[1,2,3]" list is in fact over 2GB long , so using json.loads() will not work(it would just crash).
I tried a lot of combination (A LOT) and they all failed
Here are some examples of the things that I tried
ij = ijson.items(fd,'') -> this does not give any error, the one below do
my_list = ijson.items(fd,'').next()
-> error = '_yajl2.items' object has no attribute 'next'
my_list = ijson.items(fd,'').items()
-> error = '_yajl2.items' object has no attribute 'items'
my_list = ij['categoryAspects']
-> error = '_yajl2.items' object is not subscriptable
This should work:
with open('your_file.json', 'b') as f:
for n in ijson.items(f, 'categoryAspects.item'):
print(n)
Additionally, and if you know your numbers are kind of "normal numbers", you can also pass use_float=True as an extra argument to items for extra speed (ijson.items(f, 'categoryAspects.item', use_float=True) in the code above) -- more details about it in the documentation.
EDIT: Answering a further question: to simply get a list with all the numbers you can create one directly from the items function like so:
with open('your_file.json', 'b') as f:
numbers = list(ijson.items(f, 'categoryAspects.item'))
Mind you that if there are too many numbers you might still run out of memory, defeating the purpose of doing a streaming parsing.
EDIT2: An alternative to using a list is to create a numpy array with all the numbers, which should give a more compact representation in memory of all the numbers at once, in case they are needed:
with open('your_file.json', 'b') as f:
numbers = numpy.fromiter(
ijson.items(f, 'categoryAspects.item', use_float=True),
dtype='float' # or int, if these are integers
)

Python: Json.load gives list and can't parse data from it

I have a data.json file, which looks like this:
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
I am trying to get "Event" from this file using python and miserably failing at this.
with open('data.json', 'r') as json_file:
data = json.load(json_file)
print (data['Event'])
I get the following error:
TypeError: list indices must be integers or slices, not str
And even when I try
print (data[0]['Event'])
then I get this error:
TypeError: string indices must be integers
One more thing:
print(type(data))
gives me "list"
I have searched all over and have not found a solution to this. I would really appreciate your suggestions.
You could use the ast module for this:
import ast
mydata = ["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
data = ast.literal_eval(mydata[0])
data
{'Day': 'Today', 'Event': '1', 'Date': '2019-03-20'}
data['Event']
'1'
Edit
Your original code does load the data into a list structure, but only contains a single string entry inside that list, despite proper json syntax. ast, like json, will parse that string entry into a python data structure, dict.
As it sits, when you try to index that list, it's not the same as calling a key in a dict, hence the slices cannot be str:
alist = [{'a':1, 'b':2, 'c':3}]
alist['a']
TypeError
# need to grab the dict entry from the list
adict = alist[0]
adict['a']
1
You need to convert the elements in data to dict using json module.
Ex:
import json
with open(filename) as infile:
data = json.load(infile)
for d in data:
print(json.loads(d)['Event'])
Or:
data = list(map(json.loads, data))
print(data[0]["Event"])
Output:
1
Your problem is that you are parsing it as a list that consists of a single element that is a string.
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
See how the entire content of the list is surrounded by " on either side and every other " is preceded by a \? The slash generally means to ignore the special meaning the following character might have, but interpret it as purely a string.
If you have control over the file's contents, the easiest solution would be to adjust it. You will want it to be in a format like this:
[{"Day":"Today", "Event": "1", "Date": "2019-03-20"}]
Edit: As others have suggested, you can also parse it in its current state. Granted, cleaning the data is tedious, but oftentimes worth the effort. Though this may not be one of those cases. I'm leaving this answer up anyway because it may help with explaining why OPs initial attempt did not work, and why he received the error messages he got.

Converting JSON string into Python dictionary

I don't have much experience in Python and I've ran into problem converting sql query data which is technically a list containing a JSON string into a Python dictionary. I'm querying the sqlite3 database which is returning a piece of data like this:
def get_answer(self, id):
self.__cur.execute('select answer from some_table where id= %s;' % id)
res_data = self.__cur.fetchall()
return res_data
The data is a single JSON format element which its simplified version looks like this:
[
{"ind": [ {"v": 101}, {"v": 102}, {"v": 65 }]},
{"ind": [ {"v": 33}, {"v": 102}, {"v": 65}]}
]
But when I try to convert the res_data to JSON, with code like this:
temp_json = simplejson.dumps(get_answer(trace_id))
it returns a string and when I get the len(temp_json) it returns the number of characters in res_data instead of the number of objects in res_data. However, if I use Visual Studio's JSON visualizer on what get_answer(trace_id) returns, it properly shows each of the objects res_data.
I also tried to convert the res_data to a dictionary with code like this:
dict_data = ast.literal_eval(Json_data)
or
dict_data = ast.literal_eval(Json_data[0])
and in both cases it throws a "malformed string" exception. I tried to write it to a file and read it back as a JSON but it didn't worked.
Before doing that I had the copy pasted the res_data manually and used:
with open(file_name) as json_file:
Json_data = simplejson.load(json_file)
and it worked like a charm. I've been experimenting different ways stated in SO and elsewhere but although the problem seems very straight forward, I still haven't found a solution so your help is highly appreciated.
OK, I finally found the solution:
states = json.loads(temp_json[0][0])
one confusing issue was that states = json.loads(temp_json[0]) was throwing the "Expected string or buffer" exception and temp_json was a list containing only one element, so I didn't think I will get anything from temp_json[0][0].
I hope it helps others too!
I think you are confusing the data formats. What you supposedly get back from your database wrapper seems to be a list of dictionaries (it is not SQL either - your question title is misleading). But I actually think that sqlite3 would give you a list of tuples.
JSON is a text format or more precisely the serialization of an object to a string. That's why json.dumps (= dump to string) results in a string and json.loads(= load string) would result in a python object, a dictionary or a list of dictionaries).
json.dumps does NOT mean "dump JSON to string". It is rather the .dumps method of the json module which takes a Python object (dict, list) and serializes it to JSON.
I will extend the answer if I understand what exactly you want to achieve but you get JSON with json.dumps(), JSON is a string format. In contrast simplejson.load() gives you a Python list or dict.
Did you try json.loads() just in case the return from your database is actually a string (which you could easily test).

How to find all dictionaries from a long string in python

I am trying to retrieve all JSON like dictionaries from a long string.
For example,
{"uri": "something"} is referencing {"link": "www.aurl.com"}
I want to get {"uri": "something"} and {"link": "www.aurl.com"} as result. Is there a way to do this through regex in python?
Probably the "nicest" way to do this is to let a real JSON decoder do the work, not using horrible regexes. Find all open braces as "possible object start points", then try to parse them with JSONDecoder's raw_decode method (which returns the object parsed and number of characters consumed on success making it possible to skip successfully parsed objects efficiently). For example:
import json
def get_all_json(teststr):
decoder = json.JSONDecoder()
# Find first possible JSON object start point
sliceat = teststr.find('{')
while sliceat != -1:
# Slice off the non-object prefix
teststr = teststr[sliceat:]
try:
# See if we can parse it as a JSON object
obj, consumed = decoder.raw_decode(teststr)
except Exception:
# If we couldn't, find the next open brace to try again
sliceat = teststr.find('{', 1)
else:
# If we could, yield the parsed object and skip the text it was parsed from
yield obj
sliceat = consumed
This is a generator function, so you can either iterate the objects one by one e.g. for obj in get_all_json(mystr): or if you need them all at once for indexing, iterating multiple times or the like, all_objs = list(get_all_json(mystr)).

How To Format a JSON Text In Python?

When I call the JSON file for Vincent van Gogh's List of Works wikipedia page, using this url,
it obviously returns a huge blob of text which I believe is some sort of dictionary of lists.
Now, someone has already shown me Python's import wikipedia feature, so skip that. How can I decode this JSON? I feel like I have tried everything in Python 3's library, and always get an error, like I get if I try this code for example:
data = urllib.request.urlopen(long_json_url)
stuff = json.load(data) #or json.loads(data)
print(stuff)
it returns
TypeError: the JSON object must be str, not 'bytes'
Or if I try this code:
data = urllib.request.urlopen(longurl)
json_string = data.read().decode('utf-8')
json_data = json.loads(json_string)
print(json_data)
It doesn't return an error, but just what looks like nothing
>>>
>>>
But if I highlight that empty space and paste it, it pastes the same blob of text.
{'warnings': {'main': {'*': "Unrecognized parameter: 'Page'"}}, 'query': {'normalized': [{'from': 'list of works by Vincent van Gogh',... etc
If I try a for loop:
for entry in json_data:
print(entry)
It returns
>>>
query
warnings
>>>
And that's it. So it's not returning an error there, but not really much else, just two values? How would you make the JSON data into a workable Python dict or list? Or at the very least, into a more vertical format that I could actually read?
How would you make the JSON data into a workable Python dict or list?
You're already doing that with
json_data = json.loads(json_string)
This however:
for entry in json_data:
print(entry)
will only print the keys of your dictionaries. If you want to print the values, you need to use:
for entry in json_data:
print(json_data[entry])
if you inspect the data, you'll see that there are two keys for the main dictionary. The ones you already got by iterating over the dict:
{u'query': {...}, u'warnings': {...}}

Categories