Reading a json file and encoding problems

Reading a json file and encoding problems - python

I would like to parse a JSON file and print source in this code fragment :
{
"trailers": {
"quicktime": [],
"youtube": [
{
"source": "mmNhzU6ySL8",
"type": "Trailer",
"name": "Trailer 1",
"size": "HD"
},
{
"source": "CPTIgILtna8",
"type": "Trailer",
"name": "Trailer 2",
"size": "Standard"
}
],
"id": 27205
},
I wrote this code :
for item in j:
if item['trailers']:
e = item['trailers']
for k,value in e.iteritems():
if k == "youtube":
for innerk, innerv in k.iteritems():
if innerk == "source" :
print innerv
unfortunately I can't resolve this error :
for innerk, innerv in k.iteritems():
AttributeError: 'unicode' object has no attribute 'iteritems'

Assuming the JSON is formatted properly, the problem is that your code includes this check:
if k == "youtube":
for innerk, innerv in k.iteritems():
Given that you just asked for k to be "youtube" (an instance of str or unicode), it won't make sense to expect k to have an iteritems method.
I believe instead you are expecting the associated dict that would have come along with k, something like this:
if k == "youtube":
for innerk, innerv in value.iteritems():
I'm noticing from your JSON, though, that it looks like you should expect multiple dict variables to be loaded as the list-typed value for the case when k == "youtube". In that case, you'll need to iterate over those elements first, asking for each one's iteritems separately:
if k == "youtube":
for each_dict in value:
for innerk, innerv in each_dict.iteritems():
or something along those lines. The final full code would be:
for item in j:
if item['trailers']:
e = item['trailers']
for k,value in e.iteritems():
if k == "youtube":
for each_dict in value:
for innerk, innerv in each_dict.iteritems():
if innerk == "source" :
print innerv
Aside from the first-order question, you should also take a look at the dict type's built-in method get, which allows you to safely get items from a dictionary and handle the case when they are missing gracefully. In your code, when you say if item['trailers']: this may not behave the way you expect.
First, if trailers is not a key to the dictionary, it will generate a KeyError instead of just skipping that conditional block. Secondly, if the value stored for the key value trailers evaluates to False in a bool context, the conditional block will also be skipped, even if you had wanted to handle it differently (for example, suppose that None is a sentinel value signaling that there is no data for trailers in that case, but it's due to a specific error that you want to log.
Meanwhile, if it's just an empty dict then that does mean you should simply skip the conditional block). This may not matter much in one-off data exploration, but in general it's good to become automatically conditioned to avoid these sorts of pitfalls, especially when the built-in types themselves make it very easy to handle things more gracefully.
Given all of this, a more Pythonic approach might be as follows:
for item in j:
y_tube = item.get('trailers', {}).get("youtube", [])
for each_dict in y_tube:
print each_dict.get("source", "Warning: no entry found for 'source'")

Look at this line:
for k,value in e.iteritems()
So clearly, k is a key (a unicode string in your case). You clearly know this, with your comparison of if k == "youtube".
Unicode strings don't have the iteritems() method.
I have a feeling that what you're looking for is this:
for k,value in e.iteritems()
for innerk,innerv in value.iteritems():
# do stuff

Related

KeyError 0 when deleting JSON key with value

I try to write script for deleting JSON fragment.
Currently I stopped with deleting key and value.
I get key error 0:
File "<stdin>", line 4, in <module>
KeyError: 0
I use json module and Python 2.7.
My sample json file is this:
"1": {
"aaa": "234235",
"bbb": "sfd",
"date": "01.01.2022",
"ccc": "456",
"ddd": "dghgdehs"
},
"2": {
"aaa": "544634436",
"bbb": "rgdfhfdsh",
"date": "01.01.2022",
"ccc": "etw",
"ddd": "sgedsry"
}
And faulty code is this:
import json
obj = json.load(open("aaa.json"))
for i in xrange(len(obj)):
if obj[i]["date"] == "01.01.2022":
obj.pop(i)
break
What I do wrong here?

i will take on the integer values 0, 1, but your object is a dictionary with string keys "1", "2". So iterate over the keys instead, which is simply done like this:
for i in obj:
if obj[i]["date"] == "01.01.2022":
obj.pop(i)
break

In your loop, range yields integers, the first being 0. The is no integer as key in your json so this immediately raises a KeyError.
Instead, loop over obj.items() which yields key-value pairs. Since some of your entries are not dict themselves, you will need to be careful with accessing obj[i]['date'].
if isinstance(v, dict) and v.get("date") == "01.01.2022":
obj.pop(k)
break

The way you're reading it in, obj is a dict. You're trying to access it as a list, with integer indices. This code:
for i in range(len(obj)):
if obj[i]["date"] == "Your Date":
...
First calls obj[0]["date"], then obj[1]["date"], and so on. Since obj is not a list, 0 here is interpreted here as a key - and since obj doesn't have a key 0, you get a KeyError.
A better way to do this would be to iterate through the dict by keys and values:
for k, v in obj.items():
if v["date"] == "your date": # index using the value
obj.pop(k) # delete the key

Python3 - Parse list of strings inside nested json

Python Noob here. I saw many similar questions but none of it my exact use case. I have a simple nested json, and I'm trying to access the element name present inside metadata. Below is my sample json.
{
"items": [{
"metadata": {
"name": "myname1"
}
},
{
"metadata": {
"name": "myname1"
}
}
]
}
Below is the code That I have tried so far, but not successfull.
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for i in data['items']:
for j in i['metadata']:
print (j['name'])
It errors out stating below
File "pythonjson.py", line 8, in
print (j['name']) TypeError: string indices must be integers
When I printed print (type(j)) I received the following o/p <class 'str'>. So I can see that it is a list of strings and not an dictinoary. So now How can I parse through a list of strings? Any official documentation or guide would be much helpful to know the concept of this.

Your json is bad, and the python exception is clear and unambiguous. You have the basic string "name" and you are trying to ... do a lookup on that?
Let's cut out all the json and look at the real issue. You do not know how to iterate over a dict. You're actually iterating over the keys themselves. If you want to see their values too, you're going to need dict.items()
https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
metadata = {"name": "myname1"}
for key, value in metadata.items():
if key == "name":
print ('the name is', value)
But why bother if you already know the key you want to look up?
This is literally why we have dict.
print ('the name is', metadata["name"])

You likely need:
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for item in data['items']:
print(item["metadata"]["name"]
Your original JSON is not valid (colons missing).

to access contents of name use "i["metadata"].keys()" this will return all keys in "metadata".
Working code to access all values of the dictionary in "metadata".
for i in data['items']:
for j in i["metadata"].keys():
print (i["metadata"][j])
**update:**Working code to access contents of "name" only.
for i in data['items']:
print (i["metadata"]["name"])

Remove entire JSON object if it contains a specified phrase (from a list in python)

Have a JSON file output similar to:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
},
"object2": {
"json_data": {json data information}",
"tables_data": ""
}
}
Essentially, if there is an empty string for tables_data as shown in object2 (eg. "tables_data": ""), I want the entire object to be removed so that the output would look like:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
}
}
What is the best way to go about doing this? Each of these objects correspond to a separate index in a list that I've appended called summary[].

To achieve this, you could iterate through the JSON dictionary and test the tables_data values, adding the objectX elements to a new dictionary if their tables_data value passes the test:
new_dict = {k: v for k, v in json_dict.items()
if v.get("tables_data", "") != ""}
If your JSON objectX is stored in a list as you say, these could be processed as follows using a list comprehension:
filtered_summary = [object_dict for object_dict in summary
if object_dict.get("tables_data", "") != ""]

Unless you have compelling reasons to do otherwise, the pythonic way to filter out a list or dict (or any iterable) is not to change it in place but to create a new filtered one. For your case this would look like
raw_data = YOUR_DICT_HERE_WHEREVER_IT_COMES_FROM
# NB : empty string have a false value in a boolean test
cleaned_data = {k:v for k, v in raw_data.items() if not v["table_data"]}

How to compare nested dictionary values from the same dict key using Python

I have two dictionaries Content_11 and Content_05, inside the dictionary I have checksum for each file which i need to compare , if checksum is matching print something like success else failure for that filename . Following is my data structure and my code snippet is below.
Content_11 = {
"controls": {
"windows-library-1234.zip": "A123455adfasfasdfasdf", # SHA 256 checksum
"unix-library-1234.zip": "a2343dfasdfasdfasdfasdfasdfasdf"
},
"policies": {
"oracle-1234.zip": "A123455adfasfasdfasdfad",
"rhel7-1234.zip": "sdaf23234234234asdf",
}
}
Content_05 = {
"controls": {
"windows-library-1234.zip": "A123455adfasfasdfasdf",
"unix-library-1234.zip": "a2343dfasdfasdfasdfasdfasdfasdf"
},
"policies": {
"oracle-1234.zip": "A123455adfasfasdfasdfad",
"rhel7-1234.zip": "sdaf23234234234asdf",
}
}
I went through some of the questions from stackoverflow and i didnt find the one relevant to me. Any suggestions or improvements are appreciated.
for key_05, value_05 in Content_05.items(): # iterating inside content_05 dict
for key_05_1, value_05_1 in value_05.items(): # iterating inside the content_05 value for getting nested dict
for key_011, value_011 in Content_11.items(): # iterating insde content_11 dict for comparison
for key_11_1, value_11_1 in value_011.items():
if key_05 == key_011:
if value_05_1 == value_11_1:
print "Key {} and its value is {} is matching with {} and hence Success".format(key_05_1,
value_05_1,
value_11_1)
else:
print "Key {} and its value is {} is not matching with {} and hence FAILURE".format(key_05_1,
value_05_1,
value_11_1)

You are doing way too much work; there is no need to loop over both dictionaries, as you can just test if keys from one of the dictionaries are available in the other.
You could use dict.get() to return a default value and simplify testing further; by default dict.get() returns None for a missing value, just use that in the comparison:
for type_, checksums in Content_05.items():
# type_ is assumed to be present in the other dictionary
other_checksums = Content_11[type_]
for filename, hash in checksums.items():
other_hash = other_checksums.get(filename)
if hash != other_hash:
print("Failure: Checksum mismatch for {}, expected {}, got {}".format(
filename, hash, other_hash))
else:
print("Success: {} checksum matched".format(filename))
I tried to use more legible variable names too; filename, checksums and hashes is a lot more comprehensible than key_05_1, etc.

Python script to convert complicated flattened data to JSON

Sorry about the vague title, I need some help with Python magic and couldn't think of anything more descriptive.
I have a fixed JSON data structure that I need to convert a CSV file to. The structure is fixed, but deeply nested with lists and such. It's similar to this but more complicated:
{
"foo" : bar,
"baz" : qux,
"nub" : [
{
"bub": "gob",
"nab": [
{
"nip": "jus",
"the": "tip",
},
...
],
},
...
],
"cok": "hed"
}
Hopefully you get the idea. Lists on dicts on lists on lists and so forth. My csv for that might look like this:
foo, baz, nub.bub, nub.nab.nip, nub.nab.the, cok
bar, qux, "gob" ,,,, "hed"
,,,,, "nab", "jus","tip",,
,,,,, "nab", "other", "values",,
Sorry if this is hard to read, but the basic idea is if there's a listed item it will be in the next row, and values are repeated to denote what sub-lists belong to what.
I'm not looking for anyone to come up with a solution to this mess, just maybe some pointers on techniques or things to look into.
Right now I have a rough plan:
I start by turning the header into a list of tuples containing the keys. For each group of rows (item) I'll create a copy of my template dict. I have a function that will set a dict value from a tuple of keys, unless it finds a list. In this case I'm going to call a funky recursive function and pass it my iterator, and continue filling up the dict in that function, and making recursive calls as I find new lists.
I could also do a lot of hardcoding, but what's the fun in that?
So that's my story. Again, just looking for some pointers on what the best way to do this might be. I wrote this quickly so it might be kinda confusing, please let me know if any more info would help. Thanks!

Your JSON is malformed. Additionally, your json must not contain arrays in order to achieve what you want.
def _tocsv(obj, base=''):
flat_dict = {}
for k in obj:
value = obj[k]
if isinstance(value, dict):
flat_dict.update(_tocsv(value, base + k + '.'))
elif isinstance(value, (int, long, str, unicode, float, bool)):
flat_dict[base + k] = value
else:
raise ValueError("Can't serialize value of type "+ type(value).__name__)
return flat_dict
def tocsv(json_content):
#assume you imported json
value = json.loads(json_content)
if isinstance(value, dict):
return _tocsv(value)
else:
raise ValueError("JSON root object must be a hash")
will let you flatten something like:
{
foo: "nestor",
bar: "kirchner",
baz: {
clorch: 1,
narf: 2,
peep: {
ooo: "you suck"
}
}
}
into something like:
{"foo": "nestor", "bar": "kirchner", "baz.clorch": 1, "baz.narf": 2, "baz.peep.ooo": "you suck"}
the keys don't preserve any specific order. you can replace flat_dict = {} with the construction of an OrderedDict if you want to preserve order.
assuming you have an array of such flat dicts:
def tocsv_many(json_str):
#assume you imported json
value = json.loads(json_content)
result = []
if isinstance(value, list):
for element in value:
if isinstance(element, dict):
result.append(_tocsv(element))
else:
raise ValueError("root children must be dicts")
else:
raise ValueError("The JSON root must be a list")
flat_dicts = tocsv_many(yourJsonInput)
you could:
create a csvlines = [] list which will hold the csv lines for ur file.
create a keysSet = set() which will hold the possible keys.
for each dict you have in this way, add the .keys() to the set. no key order is guaranteed with a normal set; use a sorted set instead. Finally we get the first CSV line.
for flat_dict in flat_dicts:
keysSet.extend(flat_dict.keys())
csvlines.appens(",".join(keysSet))
for each dict you have (iterate again), you generate an array like this:
for flat_dict in flat_dicts:
csvline = ",".join([json.dumps(flat_dict.get(keyInSet, '')) for keyInSet in keysSet])
csvlines.append(csvline)
voilah! you have your lines in csvlines

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading a json file and encoding problems - python

Related

KeyError 0 when deleting JSON key with value

Python3 - Parse list of strings inside nested json

Remove entire JSON object if it contains a specified phrase (from a list in python)

How to compare nested dictionary values from the same dict key using Python

Python script to convert complicated flattened data to JSON

Categories

Resources