How to inspect mystery deserialized object in Python

How to inspect mystery deserialized object in Python - python

I'm trying to load JSON back into an object. The "loads" method seems to work without error, but the object doesn't seem to have the properties I expect.
How can I go about examining/inspecting the object that I have (this is web-based code).
results = {"Subscriber": {"firstname": "Neal", "lastname": "Walters"}}
subscriber = json.loads(results)
for item in inspect.getmembers(subscriber):
self.response.out.write("<BR>Item")
for subitem in item:
self.response.out.write("<BR> SubItem=" + subitem)
The attempt above returned this:
Item
SubItem=__class__
I don't think it matters, but for context:
The JSON is actually coming from a urlfetch in Google App Engine to
a rest web service created using this utility:
http://code.google.com/p/appengine-rest-server.
The data is being retrieved from a datastore with this definition:
class Subscriber(db.Model):
firstname = db.StringProperty()
lastname = db.StringProperty()
Thanks,
Neal
Update #1: Basically I'm trying to deserialize JSON back into an object.
In theory it was serialized from an object, and I want to now get it back into an object.
Maybe the better question is how to do that?
Update #2: I was trying to abstract a complex program down to a few lines of code, so I made a few mistakes in "pseudo-coding" it for purposes of posting here.
Here's a better code sample, now take out of website where I can run on PC.
results = '{"Subscriber": {"firstname": "Neal", "lastname": "Walters"}}'
subscriber = json.loads(results)
for key, value in subscriber.items():
print " %s: %s" %(key, value)
The above runs, what it displays doesn't look any more structured than the JSON string itself. It displays this:
Subscriber: {u'lastname': u'Walters', u'firstname': u'Neal'}
I have more of a Microsoft background, so when I hear serialize/deserialize, I think going from an object to a string, and from a string back to an object. So if I serialize to JSON, and then deserialize, what do I get, a dictionary, a list, or an object? Actually, I'm getting the JSON from a REST webmethod, that is on my behalf serializing my object for me.
Ideally I want a subscriber object that matches my Subscriber class above, and ideally, I don't want to write one-off custom code (i.e. code that would be specific to "Subscriber"), because I would like to do the same thing with dozens of other classes. If I have to write some custom code, I will need to do it generically so it will work with any class.
Update #3: This is to explain more of why I think this is a needed tool. I'm writing a huge app, probably on Google App Engine (GAE). We are leaning toward a REST architecture for several reasons, but one is that our web GUI should access the data store via a REST web layer. (I'm a lot more used to SOAP, so switching to REST is a small challenge in itself). So one of the classic ways of getting and update data is through a business or data tier. By using the REST utility mention above, I have the choice of XML or JSON. I'm hoping to do a small working prototype of both before we develop the huge app). Then, suppose we have a successful app, and GAE doubles it prices. Then we can rewrite just the data tier, and take our Python/Django user tier (web code), and run it on Amazon or somewhere else.
If I'm going to do all that, why would I want everything to be dictionary objects. Wouldn't I want the power of full-blown class structure? One of the next tricks is sort of an object relational mapping (ORM) so that we don't necessarily expose our exact data tables, but more of a logical layer.
We also want to expose a RESTful API to paying users, who might be using any language. For them, they can use XML or JSON, and they wouldn't use the serialize routine discussed here.

json only encodes strings, floats, integers, javascript objects (python dicts) and lists.
You have to create a function to turn the returned dictionary into a class and then pass it to a json.loads using the object_hook keyword argument along with the json string. Heres some code that fleshes it out:
import json
class Subscriber(object):
firstname = None
lastname = None
class Post(object):
author = None
title = None
def decode_from_dict(cls,vals):
obj = cls()
for key, val in vals.items():
setattr(obj, key, val)
return obj
SERIALIZABLE_CLASSES = {'Subscriber': Subscriber,
'Post': Post}
def decode_object(d):
for field in d:
if field in SERIALIZABLE_CLASSES:
cls = SERIALIZABLE_CLASSES[field]
return decode_from_dict(cls, d[field])
return d
results = '''[{"Subscriber": {"firstname": "Neal", "lastname": "Walters"}},
{"Post": {"author": {"Subscriber": {"firstname": "Neal",
"lastname": "Walters"}}},
"title": "Decoding JSON Objects"}]'''
result = json.loads(results, object_hook=decode_object)
print result
print result[1].author
This will handle any class that can be instantiated without arguments to the constructor and for which setattr will work.
Also, this uses json. I have no experience with simplejson so YMMV but I hear that they are identical.
Note that although the values for the two subscriber objects are identical, the resulting objects are not. This could be fixed by memoizing the decode_from_dict class.

results in your snippet is a dict, not a string, so the json.loads would raise an exception. If that is fixed, each subitem in the inner loop is then a tuple, so trying to add it to a string as you are doing would raise another exception. I guess you've simplified your code, but the two type errors should already show that you simplified it too much (and incorrectly). Why not use an (equally simplified) working snippet, and the actual string you want to json.loads instead of one that can't possibly reproduce your problem? That course of action would make it much easier to help you.
Beyyond peering at the actual string, and showing some obvious information such as type(subscriber), it's hard to offer much more help based on that clearly-broken code and such insufficient information:-(.
Edit: in "update2", the OP says
It displays this: Subscriber: {u'lastname': u'Walters', u'firstname': u'Neal'}
...and what else could it possibly display, pray?! You're printing the key as string, then the value as string -- the key is a string, and the value is another dict, so of course it's "stringified" (and all strings in JSON are Unicode -- just like in C# or Java, and you say you come from a MSFT background, so why does this surprise you at all?!). str(somedict), identically to repr(somedict), shows the repr of keys and values (with braces around it all and colons and commas as appropriate separators).
JSON, a completely language-independent serialization format though originally centered on Javascript, has absolutely no idea of what classes (if any) you expect to see instances of (of course it doesn't, and it's just absurd to think it possibly could: how could it possibly be language-independent if it hard-coded the very concept of "class", a concept which so many languages, including Javascript, don't even have?!) -- so it uses (in Python terms) strings, numbers, lists, and dicts (four very basic data types that any semi-decent modern language can be expected to have, at least in some library if not embedded in the language proper!). When you json.loads a string, you'll always get some nested combination of the four datatypes above (all strings will be unicode and all numbers will be floats, BTW;-).
If you have no idea (and don't want to encode by some arbitrary convention or other) what class's instances are being serialized, but absolutely must have class instances back (not just dicts etc) when you deserialize, JSON per se can't help you -- that metainformation cannot possibly be present in the JSON-serialized string itself.
If you're OK with the four fundamental types, and just want to see some printed results that you consider "prettier" than the default Python string printing of the fundamental types in question, you'll have to code your own recursive pretty-printing function depending on your subjective definition of "pretty" (I doubt you'd like Python's own pprint standard library module any more than you like your current results;-).

My guess is that loads is returning a dictionary. To iterate over its content, use something like:
for key, value in subscriber.items():
self.response.out.write("%s: %s" %(key, value))

Related

Library for converting between python objects and JSON data structures

I need to work in an environment where the server has data objects cached in memory, and some or all of them needs to be sent over a websocket to a client. The conversion between the objects and the data structures is very straingforward. For example, here is a TypeScript definition of a data transfer object:
export interface IFieldStruct {
field_name: string;
type: string;
displaylabel: string;
notnull: boolean;
}
The corresponding Python objects looks like this:
class FieldStuct:
def __init__(field_name: str, type: str, displaylabel: str, notnull: bool):
self.field_name = field_name
self.field_name = field_name
self.displaylabel = displaylabel
self.notnull = notnull
Actually, the Python objects on the server side are smarter than that. They also have methods, and they also have some attributes that need not to be exported to JSON. Some of their attributes can be lists and dictionaries containing other smart objects.
Here is the problem. I would like to take advantage of code completion and code inspection in my Python IDE (pycharm). So I don't want to store this data as a data structure in Python. But I also want to be able to convert and send these objects easily.
I know that I could write my own serializer/deserializer for this. But there will be hundreds of data object classes, and I do not want to write a serializer manually. I wonder if there is good a library that already does this for me with object introspection? I do not want to reinvent the wheel. There are too many libs on PyPi, and I'm not able to find the right one. I'm not asking for opinions, I'm just asking for a list of the most popular libs that can help me in the conversion.

Pickle is one of the most popular (de)serializations libs out there, if not the most popular.
https://docs.python.org/3/library/pickle.html

Check if JSON var has nullable key (Twitter Streaming API)

I'm downloading tweets from Twitter Streaming API using Tweepy. I manage to check if downloaded data has keys as 'extended_tweet', but I'm struggling with an specific key inside another key.
def on_data(self, data):
savingTweet = {}
if not "retweeted_status" in data:
dataJson = json.loads(data)
if 'extended_tweet' in dataJson:
savingTweet['text'] = dataJson['extended_tweet']['full_text']
else:
savingTweet['text'] = dataJson['text']
if 'coordinates' in dataJson:
if 'coordinates' in dataJson['coordinates']:
savingTweet['coordinates'] = dataJson['coordinates']['coordinates']
else:
savingTweet['coordinates'] = 'null'
I'm checking 'extended_key' propertly, but when I try to do the same with ['coordinates]['coordinates] I get the following error:
TypeError: argument of type 'NoneType' is not iterable
Twitter documentation says that key 'coordinates' has the following structure:
"coordinates":
{
"coordinates":
[
-75.14310264,
40.05701649
],
"type":"Point"
}
I achieved to solve it by just putting the conflictive check in a try, except, but I think this is not the most suitable approach to the problem. Any other idea?

So the twitter API docs are probably lying a bit about what they return (shock horror!) and it looks like you're getting a None in place of the expected data structure. You've already decided against using try, catch, so I won't go over that, but here are a few other suggestions.
Using dict get() default
There are a couple of options that occur to me, the first is to make use of the default ability of the dict get command. You can provide a fall back if the expected key does not exist, which allows you to chain together multiple calls.
For example you can achieve most of what you are trying to do with the following:
return {
'text': data.get('extended_tweet', {}).get('full_text', data['text']),
'coordinates': data.get('coordinates', {}).get('coordinates', 'null')
}
It's not super pretty, but it does work. It's likely to be a little slower that what you are doing too.
Using JSONPath
Another option, which is likely overkill for this situation is to use a JSONPath library which will allow you to search within data structures for items matching a query. Something like:
from jsonpath_rw import parse
matches = parse('extended_tweet.full_text').find(data)
if matches:
print(matches[0].value)
This is going to be a lot slower that what you are doing, and for just a few fields is overkill, but if you are doing a lot of this kind of work it could be a handy tool in the box. JSONPath can also express much more complicated paths, or very deeply nested paths where the get method might not work, or would be unweildy.
Parse the JSON first!
The last thing I would mention is to make sure you parse your JSON before you do your test for "retweeted_status". If the text appears anywhere (say inside the text of a tweet) this test will trigger.
JSON parsing with a competent library is usually extremely fast too, so unless you are having real speed problems it's not necessarily worth worrying about.

UPDATED: Parsing JSON object in python when object contains an array and another associated object at the same level

I'm having trouble with a JSON object being passed to me by one of our products API's. I'm using Python 2.7 to create a function to let our customer service team see details about jobs that are posted on our website. The JSON package returns an array of objects that each contain an array and an object. I need to read the array associated with one of the objects inside the main object, however their not nested. As in the array of applicants is not nested inside the object of Job. This means my usual "response[0][0]['applicantName']" won't work here.
The data below is updated, to represent what the API is actually giving me. My apologies before, I had edited it in order to protect the data. Still done the same, but it's the actual result.
What I'd like to do is let a user input the jobId and I'll provide them with a list of all the applicants related to that jobID. Since the jobID can sometimes be non-sequential, I can't use an index number, it must be the jobID number.
Can someone help?
Heres the JSON structure I get:
[{u'bids': [{u'applicantId': 221,
u'comment': 'I have applied to the job'},
{u'applicantId': 221,
u'comment': 'I have applied to the job'}],
u'job': {u'jobId': 1}},
{u'bids': [{u'applicantId': 221,
u'comment': 'I have applied to the job'},
{u'applicantId': 221,
u'comment': 'I have applied to the job'}],
u'job': {u'jobId': 1}}]
As I said, I'm working in python 2.7 using the "requests" library to call the API and .json() to read it.
Thanks in advance!

That content doesn't seem to be a valid json, which means you won't be able to parse it with the typical well-known json.loads function.
Also, you won't be able to use ast.literal_eval cos it's not a valid python expression.
Not sure this will be a good idea... but assuming you're getting that content as a string I'd try to write my own parser for that type of server-objects instead or just looking for an external library able to parse them.

Layer between data extraction and storage

What I am doing:
Get data from data source (could be from API or scraping) in form of a dictionary
Clean/manipulate some of the fields
Combine fields from data source dictionary into new dictionaries that represent objects
Save the created dictionaries into database
Is there a pythonic way to do this? I am wondering about the whole process but I'll give some guiding questions:
What classes should I have?
What methods/classes should the cleaning of fields from the data source to objects be in?
What methods/classes should the combining/mapping of fields from the data source to objects be in?
If the method is different in scraping vs. api, please explain how and why
Here is an example:
API returns:
{data: {
name: "<b>asd</b>",
story: "tame",
story2: "adjet"
}
}
What you want to do:
Clean name
Create a name_story object
Set name_story.name = dict['data']['name']
Set name_story.story = dict['data']['story'] + dict['data']['story2']
Save name_story to database
(and consider that there could be multiple objects to create and multiple incoming data sources)
How would you structure this process? An interface of all classes/methods would be enough for me without any explanation.

What classes should I have?
In Python, there is no strong need to use classes. Classes are the way to manage complexity. If your solution is not complex, use functions (or, maybe, module-level code, if it is one-time solution)
If the method is different in scraping vs. api, please explain how and why
I prefer to organize my code in respect with modularity and principle of least knowledge and define clear interfaces between parts of modules system.
Example of modular solution
You can have module (either function or class) for fetching information, and it should return dictionary with specified fields, no matter what exactly it does.
Another module should process dictionary and return dictionary too (for example).
Third module can save information from that dictionary to database.
There is great possibility, that this plan far from what you need or want and you should develop your modules system yourself.
And some words about your wants:
Clean name
Consider this stackoverflow answer
Create a name_story object
Set name_story.name = dict['data']['name']
Set name_story.story = dict['data']['story'] + dict['data']['story2']
If you want to have access to attributes of object through dot (as you specified in 3 and 4 items, you could use either python namedtuple or plain python class. If indexed access is OK for you, use python dictionary.
In case of namedtuple, it will be:
from collections import namedtuple
NameStory = namedtuple('NameStory', ['name', 'story'])
name_story1 = NameStory(name=dict['data']['name'], story=dict['data']['story'] + dict['data']['story2'])
name_story2 = NameStory(name=dict2['data']['name'], story=dict2['data']['name'])
If your choice if dictionary, it's easier:
name_story = {
'name': dict['data']['name'],
'story': dict['data']['story'] + dict['data']['story2'],
}
Save name_story to database
This is much more complex question.
You can use raw SQL. Specific instructions depends on your database. Google for 'python sqlite' or 'python postgresql' or what you want, there are plenty of good tutorials.
Or you can utilize one of python ORMs:
peewee
SQLAlchemy
google for more options
By the way
It's strongly recommended to not override python built-in types (list, dict, str etc), as you did in this line:
name_story.name = dict['data']['name']

Passing a Python list using JSON and Django

I'm trying to send a Python list in to client side (encoded as JSON). This is the code snippet which I have written:
array_to_js = [vld_id, vld_error, False]
array_to_js[2] = True
jsonValidateReturn = simplejson.dumps(array_to_js)
return HttpResponse(jsonValidateReturn, mimetype='application/json')
How do I access it form client side? Can I access it like the following?
jsonValidateReturn[0]
Or how do I assign a name to the returned JSON array in order to access it?
Actually I'm trying to convert a server side Ajax script that returns an array (see Stack Overflow question Creating a JSON response using Django and Python that handles client side POST requests, so I wanted the same thing in return with Python, but it didn't go well.

The JSON array will be dumped without a name / assignment.
That is, in order to give it a name, in your JavaScript code you would do something like this:
var my_json_data_dump = function_that_gets_json_data();
If you want to visualize it, for example, substitute:
var my_json_data_dump = { 'first_name' : Bob, 'last_name': smith };
Also, like Iganacio said, you're going to need something like json2.js to parse the string into the object in the last example. You could wrap that parsing step inside of function_that_gets_json_data, or if you're using jQuery you can do it with a function like jQuery.getJSON().
json2.js is still nice to have, though.
In response to the comment (I need space and markup):
Yes, of course. All the Python side is doing is encoding a string representation (JSON) for you. You could do something like 'var blah = %s' % json.dumps(obj_to_encode) and then on the client side, instead of simply parsing the response as JSON, you parse it as JavaScript.
I wouldn't recommend this for a few reasons:
You're no longer outputting JSON. What if you want to use it in a context where you don't want the variable name, or can't parse JavaScript?
You're evaluating JavaScript instead of simply parsing JSON. It's an operation that's open to security holes (if someone can seed the data, they might be able to execute a XSS attack).
I guess you're facing something I think every Ajax developer runs in to. You want one place of truth in your application, but now you're being encouraged to define variables and whatnot in JavaScript. So you have to cross reference your Python code with the JavaScript code that uses it.
I wouldn't get too hung up on it. I can't see why you would absolutely need to control the name of the variable from Python in this manner. If you're counting on the variable name being the same so that you can reference it in subsequent JavaScript or Python code, it's something you might obviate by simply restructuring your code. I don't mean that as a criticism, just a really helpful (in general) suggestion!

If both client and server are in Python, here's what you need to know.
Server. Use a dictionary to get labels on the fields. Write this as the response.
>>> import json
>>> json.dumps( {'vld_id':1,'vls_error':2,'something_else':True} )
'{"vld_id": 1, "something_else": true, "vls_error": 2}'
Client. After reading the response string, create a Python dictionary this way.
>>> json.loads( '{"vld_id": 1, "something_else": true, "vls_error": 2}' )
{u'vld_id': 1, u'something_else': True, u'vls_error': 2}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.