Python - Compare 2 different output formats - Logic - python

I have a programming case which has stumped me. I'm not necassarily looking for code - I'm looking for logic advice, for which I'm at a loss.
I've tried several different things but nothing seems to really formulate concretely.
This is for a regression test. I have two files with the same data in two very dissimilar formats. I need to compare the data and automate the process. I'll worry about the 'diff' at a later stage. It should not be too hard if I can arrive at data from both files which can be compared.
File 1 has essentially JSON data. There is other garbage in the file but that can be removed. This is what data looks like:
{
"Chan-1" : [ {
"key1" : "val1",
"key2" : val2,
"key3" : val3,
}, {
"key1" : "val1",
"key2" : val2,
"key3" : val3,
} ]
}
File 2 has essentially what I can decipher as a python list with items. Each item has data which is in a key=value format, separated by commas, in parenthesis.
[
spacecraft.telemetry.channel(key1=val1,key2="val2",key3=val3),
spacecraft.telemetry.channel(key1=val1,key2="val2",key3=val3)
]
Each block in one file corresponds to the one in another, and essentially if going to be diff'd. So in other words:
{
"key1" : "val1",
"key2" : val2,
"key3" : val3,
}
from File 1 will (or should) have the same key value pairs as File 2:
(key1=val1,key2="val2",key3=val3)
The order is also similar.
Both files contain a plethora of key value pairs for the "Chan-1" object, and I truncated the amount of data for example sake. There are about 16 key value pairs in each block, and there are about 400 blocks.
I've tried working on File 2 to make it look like JSON data.
I've tried working on File 1 to make it look more like File 2.
I also tried parsing both files as a 3rd format altogether.
But I've not gotten far with either concept - and something tells me I'm missing something, that this should not be this hard, considering we already have one file in JSON.
I would really appreciate if someone can give me some advice on the Logic to follow here - what appears to be the best route, and what kind of logic should be put in making this happen.
Thanks.

For each file:
extract the 'list' of objects
convert each object in the list into a dictionary
for file 1, this step is basically 'convert JSON into dict'
for file 2, this would involve extracting just the key=value strings, splitting on =, and turning the result into a dict via dictionary comprehension.
At this point, you have two lists of dictionaries. Your question seems to indicate that you can assume the lists are ordered in the same manner, so now you can check that each dict in one list matches the dict in the same position in the other list. Check out zip(list_1, list_2); it should make this step easier.

Related

Parse an embedded object (JSON) into an ordered dictionary in Python

I am looking to parse some JSON into a dictionary but need to preserve order for one particular part of the dictionary.
I know that I can parse the entire JSON file into an ordered dictionary (ex. Can I get JSON to load into an OrderedDict?) but this is not quite what I'm looking for.
{
"foo": "bar",
"columns":
{
"col_1": [],
"col_2": []
}
}
In this example, I would want to parse the entire file in as a dictionary with the "columns" portion being an OrderedDict. Is it possible to get that granular with the JSON parsing tools while guaranteeing that order is preserved throughout? Thank you!
From the comments meanwhile, I gathered that a complete, nested OrderedDict is fine as well, but this could be a solution too, if you don't mind using some knowledge about the names of the columns:
import json
from collections import OrderedDict
def hook(partialjson):
if "col_1" in partialjson:
return OrderedDict(partialjson)
return dict(partialjson)
result = json.loads("YOUR JSON STRING", object_hook=hook)
Hope this helps!

Python Dict Transform

I've been having some strange difficulty trying to transform a dataset that I have.
I currently have a dictionary coming from a form as follows:
data['content']['answers']
I would like to have the ['answers'] appended to the first element of a list like so:
data['content'][0]['answers']
However when I try to create it as so, I get an empty dataset.
data['content'] = [data['content']['answers']]
I can't for the life of me figure out what I am doing wrong.
EDIT: Here is the opening JSON
I have:
{
"content" : {
"answers" : {
"3" : {
But I need it to be:
{
"content" : [
{
"answers" : {
"3" : {
thanks
You can do what you want by using a dictionary comprehension (which is one of the most elegant and powerful features in Python.)
In your case, the following should work:
d = {k:[v] for k,v in d.items()}
You mentioned JSON in your question. Rather than rolling your own parser (which it seems like you might be trying to do), consider using the json module.
If I've understood the question correctly, it sounds like you need data['contents'] to be equal to a list where each element is a dictionary that was previously contained in data['contents']?
I believe this might work (works in Python 2.7 and 3.6):
# assuming that data['content'] is equal to {'answers': {'3':'stuff'}}
data['content'] = [{key:contents} for key,contents in data['content'].items()]
>>> [{'answers': {'3': 'stuff'}}]
The list comprehension will preserve the dictionary content for each dictionary that was in contents originally and will return the dictionaries as a list.
Python 2 doc: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
Python 3 doc:
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
It would be best if you give us a concrete example of 'data' (what the dictionary looks like), what code you try to run, what result you get and what you except. I think I have an idea but can't be sure.
Your question isn't clear and lacks of an explicit example.
Btw, something like this can work for you?
data_list = list()
for content in data.keys():
data_list.append(data[content])

Convert data between databases with different schemes and value codings

As part of my job I am meant to write a script to convert data from an available database here on-site to an external database for publication which is similar but far less detailed. I want to achieve this in Python.
Some columns can just be converted by copying their content changing the coding, i.e. 2212 in the original database becomes 2 in the external database in column X. To achieve this I wrote the codings in JSON, e.g.
{
"2212": 2,
"2213": 1,
"2214": 2,
...
}
This leads to some repetition as you can see but since lists cannot be keys in JSON I don't see a better way to it (which is simple and clean; sure I could have the right hand side as a key but then instead of jsonParsedDict["2212"] I would have to go through all the keys 1, 2, ... and find my original key on the right hand side.)
Where it gets ugly (in my opinion) is when information from multiple columns in the original database need to be combined to get the new column. Right now I just wrote a Python function doing a lot of if checks. It works and I will finish the job this way but it just seems aesthetically wrong and I want to learn more about Python's possibilities in that task.
Imagine for example I have two columns in the original database X and Y and then based on values in X I either do nothing (for values that are not coded in the external database), return a value directly or I return a result based on what value Y has in the same row. Right now this leads to quite some if statements. My other idea was to have stacked entries in the JSON file, e.g.
{
"X":
{
"2211": 1,
"2212": null,
"2213":
{
"Y":
{
"3112": 1
"3212": 2
}
}
"2214":
{
"Y":
{
"3112": 2
"3212": 1
}
}
"2215":
{
"Y":
{
"3112": 1
"3212": 2
}
}
}
}
But this approach really blows up the JSON file and repetitions get even more painful. Alas I cannot think of any other way to code these kind of conditions, apart from if in the code.
Is this a feasible way to do it or is there a better solutions? It would be great if I could specify the variables and associated variables, which are part of the decision process, only in the JSON file. I want to abstract the conversion process such that it is mostly steered by these JSON files and the Python code is quite general. If there is a better format for this than JSON then suggestions are very welcome.

Check values inside of json data

Could you please help me? I have response with json data and would like to check not only the structure of the json but also some values inside. json data is represented by build-in python types (dict, list, str, ...). Could you please advise easy way to check data inside some arbitrary json in python?
For example let's take following json:
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
I would like to check that responses have 3 elements in employees lists with specific values in firstName and lastName.
I understand that if I have json as a python dict I can check any value inside just by doing:
data["employees"][0]["firstName"] == ???
Maybe in this simple case it is not big deal. But in my case I have responses with complex structures where interesting (to me) data are deep inside in different places. It is hard to write something like data['a']['b'][0]['c'][1] for each value which should be checked...is there a better way to check data inside complex json?
If you would like to check that you have 3 elements in employees list with specific firstName you could use check_json_data function from here https://github.com/mlyundin/check-json-data
data = {"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
exp_data = {"employees": list_verifier([{"firstName":"John"},
{"firstName": "Anna"},
{"firstName": "Peter"}],
linker=lambda item1, item2, i1, i2:item1['firstName'] == item2['firstName'], strict=True)}
print check_json_data(exp_data, data)

Access a variable within a dictionary with unknown nested location

I have a JSON file and I want to query it using python. However, I do not know the nested location of a variable before hand. E.g. to query a JSON object below loaded into python and called 'data', I could do the following:
data['experiments']['initial_ns']['icdat']
However, this assumes that I know that the icdat variable is located below initial_ns which is located under experiments. Unfortunately I do not have this information and also the JSON structure could change in the future. Is there a simpler variable to access variables within a JSON string without explicitly specifying the entire structure?
thanks!!!
{
"experiments": [
{
"management": {
"events": [
{
"date": "19122",
"timp": "TI3",
"eve": "tage"
}
]
},
"initial_ns": {
"icpcr": "MZ",
"icdat": "1922"
},
"observed": {
"mdat": "19403",
"time_series": [
{
"date": "198423",
"etac": "0"
}
],
"adat": "190218"
},
"local_name": "lhi",
"exname": "SE",
"exp_dur": "1"
}
]
}
Have a look at the jsonpath module. http://goessner.net/articles/JsonPath/. I think the search string $..icdat will match your needs.
"...without explicitly specifying the entire structure?"
Yes, there are many ways. Unfortunately you have not specified which answer you are looking for.
To be "unique in terms of the schema" (my terminology) is as follows: If you have for example multiple Foo dictionaries with the key Foo.bar, then that is still unique. What is not unique is if you have Foo objects with Foo.bar, and Baz objects with Baz.bar: searching for {... baz:...} will return different kinds of objects.
If the key is unique in terms of the schema, you can search the entire tree. You can make this go faster by caching all key-value pairs in a dictionary for later use (therefore the operation is O(1) "instant" amortized cost, since you needed to go through the entire data structure anyway to parse it!). This even works if you would like to return sets of objects: use a cache = collections.defaultdict(set) and when you preprocess items to cache, do cache[key].add(value).
If the key is not unique in terms of the schema, you will want to make a reasonable guess about the path and provide some partial information, per Hans Then's answer utilization JsonPath: https://stackoverflow.com/a/12291240/711085 (alternatively, change the schema)
No. You need to know the format, or you'll have to manually loop over everything in it.
You can write a function to recursively search nested containers for a given key, similar to findElementByID() in an XML DOM parser.
def find_key(json, key):
if isinstance(json, dict):
if key in json:
yield json[key]
if isinstance(json, (dict, list)):
for value in (json.itervalues() if isinstance(json, dict) else json):
if isinstance(value, (dict, list)):
for item in find_key(value, key):
yield item
>>> next(items_by_key(data, "icdat"))
'1922'
Since the same key may be found in multiple places in the document, this is actually written as a generator. You can iterate over the results to get all the values or, if you just want the first one (or know it's the only one), use next() around it as I've shown above. You could also convert it to a list() if desired.

Categories