So i have following dict:
my_dict{'key1': 'value1',
'key2': 'value2',
'key3': json.dumps([
{"**subkey1**": "subvalue1", "**subkey2**": "subvalue2",},
{"**subkey1**": "other_subvalue", "**subkey2**":"other_subvalue2"}])
}
What I need is to somehow made a def where i have to check and for each subkey2 to change its value only for the def itself
And all subkey1 to check if its value is the same like the second subkey1
Please note I am talking about only subkey1 which I have twice.
I don't want to set them manually. Mean I have this dict global, and calling it from many def, so i need to make these changes and check inside each def
What I tried is:
def recurse_keys(my_dict, indent = ''):
print(indent+str(key))
if isinstance(my_dict[key], dict):
recurse_keys(my_dict[key], indent+' ')
recurse_keys(my_dict)
And for now it is only printing all of my params, but am not sure how to proceed
Example:
my_dict{'name': 'georgi',
'famili': 'ivanov',
'drinks': json.dumps([
{"breakfast": "milk", "lunch": "beer",},
{"breakfast": "tea", "lunch":"vodka"}])
def test()
....check if both breakfast are the same and if not make them so....(all these, mean dict and the function it self are in same file)
so I need to check if the values for the two breakfast are the same (without to know them) and if they are not, to make them so.
And also to check if there is lunch with empty value or 0 and again if not, to make it so
If you want to edit a json string, then probably the easiest way is to decode it to python data types d = json.loads(str), edit it, then encode it back to string str = json.dumps(d) (python JSON).
import json
my_dict = {'name': 'georgi',\
'famili': 'ivanov',\
'drinks': json.dumps([\
{"breakfast": "milk", "lunch": "beer",},\
{"breakfast": "tea", "lunch":"vodka"}])};
ddict = json.loads(my_dict["drinks"]) # json str to python data types
seen = {}; # store the items already seen
# for each dictionary object in key3
for d in range(0,len(ddict)):
for k in ddict[d]:
if k in seen:
# update the value to the one already seen
ddict[d][k] = seen[k];
if k == "lunch" and (ddict[d] == "" or ddict[d] is None):
ddict[d] = alternative_lunch_value;
else:
seen[k] = ddict[d][k];
my_dict["drinks"] = json.dumps(ddict);
print(my_dict);
The result on my machine is:
{'drinks': '[{"breakfast": "milk", "lunch": "beer"}, {"breakfast": "milk", "lunch": "beer"}]',
'famili': 'ivanov',
'name': 'georgi'}
Updating dict values
Because you wanted to update the values in my_dict so that it can be read by other modules, rather than just read the values. If all you wanted to do was read the values, then you can iterate over the list ddict as follows:
for value in ddict:
print("Sub1:{0} Sub2:{1}\n".format(value["**subkey1**"], value["**subkey2**"]));
However, since you want to update the values in the existing list, then you will need to iterate over a list of the indexes. As shown below...
Range() and len()
Range(start,end) gives a list with values from start to end. So a = range(1,4) assigns [1,2,3,4] to a. Also len(a) will return the number of items in the list, so 4 in this case. Using these principals, you can iterate through your ddict.
for d in range(1,len(ddict):
ddict[d]["**subkey1**"] = new_value;
Hope this helps get you started. If you update your question with more details on exactly what you want (i.e. example input and output, perhaps psudo code), then we will be able to give you a better answer.
Related
If I have a dictionary that is nested, and I pass in a string like "key1.key2.key3" which would translate to:
myDict["key1"]["key2"]["key3"]
What would be an elegant way to be able to have a method where I could pass on that string and it would translate to that key assignment? Something like
myDict.set_nested('key1.key2.key3', someValue)
Using only builtin stuff:
def set(my_dict, key_string, value):
"""Given `foo`, 'key1.key2.key3', 'something', set foo['key1']['key2']['key3'] = 'something'"""
# Start off pointing at the original dictionary that was passed in.
here = my_dict
# Turn the string of key names into a list of strings.
keys = key_string.split(".")
# For every key *before* the last one, we concentrate on navigating through the dictionary.
for key in keys[:-1]:
# Try to find here[key]. If it doesn't exist, create it with an empty dictionary. Then,
# update our `here` pointer to refer to the thing we just found (or created).
here = here.setdefault(key, {})
# Finally, set the final key to the given value
here[keys[-1]] = value
myDict = {}
set(myDict, "key1.key2.key3", "some_value")
assert myDict == {"key1": {"key2": {"key3": "some_value"}}}
This traverses myDict one key at a time, ensuring that each sub-key refers to a nested dictionary.
You could also solve this recursively, but then you risk RecursionError exceptions without any real benefit.
There are a number of existing modules that will already do this, or something very much like it. For example, the jmespath module will resolve jmespath expressions, so given:
>>> mydict={'key1': {'key2': {'key3': 'value'}}}
You can run:
>>> import jmespath
>>> jmespath.search('key1.key2.key3', mydict)
'value'
The jsonpointer module does something similar, although it likes / for a separator instead of ..
Given the number of pre-existing modules I would avoid trying to write your own code to do this.
EDIT: OP's clarification makes it clear that this answer isn't what he's looking for. I'm leaving it up here for people who find it by title.
I implemented a class that did this a while back... it should serve your purposes.
I achieved this by overriding the default getattr/setattr functions for an object.
Check it out! AndroxxTraxxon/cfgutils
This lets you do some code like the following...
from cfgutils import obj
a = obj({
"b": 123,
"c": "apple",
"d": {
"e": "nested dictionary value"
}
})
print(a.d.e)
>>> nested dictionary value
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
I have a technical dictionary that I am using to correct various spellings of technical terms.
How can I use this structure (or restructure the below code to work) in order to return the key for any alternate spelling?
For example, if someone has written "craniem" I wish to return "cranium". I've tried a number of different constructions, including the one below, and cannot quite get it to work.
def techDict():
myDict = {
'cranium' : ['cranum','crenium','creniam','craniem'],
'coccyx' : ['coscyx','cossyx','koccyx','kosicks'],
'1814A' : ['Aero1814','A1814','1814'],
'SodaAsh' : ['sodaash','na2co3', 'soda', 'washingsoda','sodacrystals']
}
return myDict
techDict = techDict()
correctedSpelling = next(val for key, val in techDict.iteritems() if val=='1814')
print(correctedSpelling)
Using in instead of = will do the trick
next(k for k, v in techDict.items() if 'craniem' in v)
Just reverse and flatten your dictionary:
tech_dict = {
'cranium': ['cranum', 'crenium', 'creniam', 'craniem'],
'coccyx': ['coscyx', 'cossyx', 'koccyx', 'kosicks'],
'1814A': ['Aero1814', 'A1814', '1814'],
'SodaAsh': ['sodaash', 'na2co3', 'soda', 'washingsoda', 'sodacrystals'],
}
lookup = {val: key for key, vals in tech_dict.items() for val in vals}
# ^ note dict.iteritems doesn't exist in 3.x
Then you can trivially get:
corrected_spelling = lookup['1814']
This is far more efficient than potentially scanning through every list for every key in the dictionary to find your search term.
Also note: 1. compliance with the official style guide; and 2. that I've removed the techDict function entirely - it was pointless to write a function just to create a dictionary, especially as you immediately shadowed the function with the dictionary it returned so you couldn't even call it again.
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
Sorry about the vague title, I need some help with Python magic and couldn't think of anything more descriptive.
I have a fixed JSON data structure that I need to convert a CSV file to. The structure is fixed, but deeply nested with lists and such. It's similar to this but more complicated:
{
"foo" : bar,
"baz" : qux,
"nub" : [
{
"bub": "gob",
"nab": [
{
"nip": "jus",
"the": "tip",
},
...
],
},
...
],
"cok": "hed"
}
Hopefully you get the idea. Lists on dicts on lists on lists and so forth. My csv for that might look like this:
foo, baz, nub.bub, nub.nab.nip, nub.nab.the, cok
bar, qux, "gob" ,,,, "hed"
,,,,, "nab", "jus","tip",,
,,,,, "nab", "other", "values",,
Sorry if this is hard to read, but the basic idea is if there's a listed item it will be in the next row, and values are repeated to denote what sub-lists belong to what.
I'm not looking for anyone to come up with a solution to this mess, just maybe some pointers on techniques or things to look into.
Right now I have a rough plan:
I start by turning the header into a list of tuples containing the keys. For each group of rows (item) I'll create a copy of my template dict. I have a function that will set a dict value from a tuple of keys, unless it finds a list. In this case I'm going to call a funky recursive function and pass it my iterator, and continue filling up the dict in that function, and making recursive calls as I find new lists.
I could also do a lot of hardcoding, but what's the fun in that?
So that's my story. Again, just looking for some pointers on what the best way to do this might be. I wrote this quickly so it might be kinda confusing, please let me know if any more info would help. Thanks!
Your JSON is malformed. Additionally, your json must not contain arrays in order to achieve what you want.
def _tocsv(obj, base=''):
flat_dict = {}
for k in obj:
value = obj[k]
if isinstance(value, dict):
flat_dict.update(_tocsv(value, base + k + '.'))
elif isinstance(value, (int, long, str, unicode, float, bool)):
flat_dict[base + k] = value
else:
raise ValueError("Can't serialize value of type "+ type(value).__name__)
return flat_dict
def tocsv(json_content):
#assume you imported json
value = json.loads(json_content)
if isinstance(value, dict):
return _tocsv(value)
else:
raise ValueError("JSON root object must be a hash")
will let you flatten something like:
{
foo: "nestor",
bar: "kirchner",
baz: {
clorch: 1,
narf: 2,
peep: {
ooo: "you suck"
}
}
}
into something like:
{"foo": "nestor", "bar": "kirchner", "baz.clorch": 1, "baz.narf": 2, "baz.peep.ooo": "you suck"}
the keys don't preserve any specific order. you can replace flat_dict = {} with the construction of an OrderedDict if you want to preserve order.
assuming you have an array of such flat dicts:
def tocsv_many(json_str):
#assume you imported json
value = json.loads(json_content)
result = []
if isinstance(value, list):
for element in value:
if isinstance(element, dict):
result.append(_tocsv(element))
else:
raise ValueError("root children must be dicts")
else:
raise ValueError("The JSON root must be a list")
flat_dicts = tocsv_many(yourJsonInput)
you could:
create a csvlines = [] list which will hold the csv lines for ur file.
create a keysSet = set() which will hold the possible keys.
for each dict you have in this way, add the .keys() to the set. no key order is guaranteed with a normal set; use a sorted set instead. Finally we get the first CSV line.
for flat_dict in flat_dicts:
keysSet.extend(flat_dict.keys())
csvlines.appens(",".join(keysSet))
for each dict you have (iterate again), you generate an array like this:
for flat_dict in flat_dicts:
csvline = ",".join([json.dumps(flat_dict.get(keyInSet, '')) for keyInSet in keysSet])
csvlines.append(csvline)
voilah! you have your lines in csvlines