Split string list into dictionary keys in python - python

I have a string 'request.context.user_id' and I want to split the string by '.' and use each element in the list as a dictionary key. Is there a way to do this for lists of varying lengths without trying to hard code all the different possible list lengths after the split?
parts = string.split('.')
if len(parts)==1:
data = [x for x in logData if x[parts[0]] in listX]
elif len(parts)==2:
data = [x for x in logData if x[parts[0]][parts[1]] in listX]
else:
print("Add more hard code")
listX is a list of string values that should be retrieved by x[parts[0]][parts[1]
logData is a list obtained from reading a json file and then the list can be read into a dataframe using json_normalize... the df portion is provided to give some context about its structure.. a list of dicts:
import json
from pandas.io.json import json_normalize
with open(project_root+"filename") as f:
logData = json.load(f)
df = json_normalize(logData)

If you want arbitrary counts, that means you need a loop. You can use get repeatedly to drill through layers of dictionaries.
parts = "request.context.user_id".split(".")
logData = [{"request": {"context": {"user_id": "jim"}}}]
listX = "jim"
def generate(logData, parts):
for x in logData:
ref = x
# ref will be, successively, x, then the 'request' dictionary, then the
# 'context' dictionary, then the 'user_id' value 'jim'.
for key in parts:
ref = ref[key]
if ref in listX:
yield x
data = list(generate(logData, parts))) # ['jim']

I just realized in the comments you said that you didn't want to create a new dictionary but access an existing one x via chaining up the parts in the list.
(3.b) use a for loop to get/set the value in the key the path
In case you want to only read the value at the end of the path in
import copy
def get_val(key_list, dict_):
reduced = copy.deepcopy(dict_)
for i in range(len(key_list)):
reduced = reduced[key_list[i]]
return reduced
# this solution isn't mine, see the link below
def set_val(dict_, key_list, value_):
for key in key_list[:-1]:
dict_ = dict_.setdefault(key, {})
dict_[key_list[-1]] = value_
get_val()
Where the key_list is the result of string.slit('.') and dict_ is the x dictionary in your case.
You can leave out the copy.deepcopy() part, that's just for paranoid peeps like me - the reason is the python dict is not immutable, thus working on a deepcopy (a separate but exact copy in the memory) is a solution.
set_val() As I said it's not my idea, credit to #Bakuriu
dict.setdefault(key, default_value) will take care of non-existing keys in x.
(3) evaluating a string as code with eval() and/or exec()
So here's an ugly unsafe solution:
def chainer(key_list):
new_str = ''
for key in key_list:
new_str = "{}['{}']".format(new_str, key)
return new_str
x = {'request': {'context': {'user_id': 'is this what you are looking for?'}}}
keys = 'request.context.user_id'.split('.')
chained_keys = chainer(keys)
# quite dirty but you may use eval() to evaluate a string
print( eval("x{}".format(chained_keys)) )
# will print
is this what you are looking for?
which is the innermost value of the mockup x dict
I assume you could use this in your code like this
data = [x for x in logData if eval("x{}".format(chained_keys)) in listX]
# or in python 3.x with f-string
data = [x for x in logData if eval(f"x{chained_keys}") in listX]
...or something similar.
Similarly, you can use exec() to execute a string as code if you wanted to write to x, though it's just as dirty and unsafe.
exec("x{} = '...or this, maybe?'".format(chained_keys))
print(x)
# will print
{'request': {'context': {'user_id': '...or this, maybe?'}}}
(2) An actual solution could be a recursive function as so:
def nester(key_list):
if len(key_list) == 0:
return 'value' # can change this to whatever you like
else:
return {key_list.pop(0): nester(key_list)}
keys = 'request.context.user_id'.split('.')
# ['request', 'context', 'user_id']
data = nester(keys)
print(data)
# will result
{'request': {'context': {'user_id': 'value'}}}
(1) A solution with list comprehension for split the string by '.' and use each element in the list as a dictionary key
data = {}
parts = 'request.context.user_id'.split('.')
if parts: # one or more items
[data.update({part: 'value'}) for part in parts]
print(data)
# the result
{'request': 'value', 'context': 'value', 'user_id': 'value'}
You can overwrite the values in data afterwards.

Related

Dictionary creation code. What is going on here most likely?

I am looking at this code:
DICT_IDS = dict(x.split('::')
for x in object.method()
['ids_comma_separated'].split(','))
DICT_ATTRS = dict(x.split('::')
for x in object.method()
['comma_separated_key_value_pairs'].split(','))
So each constanty will ultimately refer to a dictionary, but what is going on inside the constructors?
Does this occur first:
x.split('::')
for x in object.method()
So x must be a string that is split on the ::? right?
EDIT
Oh....
for x in object.method()
['ids_comma_separated'].split(',')
is executed first. x is probably another dictionary that we key into using ids_comma_separated whose value is a string that needs to be split on the , like "cat,dog, mouse" into a list. So x is going to be a list?
It is just parsing values like this into a dict:
'ids_comma_separated': "somekey::somevalue,anotherkey::anothervalue"
from a method (object.method()) that returns a dictionary:
class object:
def method():
return {
'ids_comma_separated': "somekey::somevalue,anotherkey::anothervalue"
}
DICT_IDS = dict(x.split('::')
for x in object.method()
['ids_comma_separated'].split(','))
DICT_IDS
# {'somekey': 'somevalue', 'anotherkey': 'anothervalue'}
The part inside the dict() is a generator comprehension but the line breaks make it a little hard to see that:
(x.split('::') for x in object.method()['ids_comma_separated'].split(','))
in each iteration x is somekey::somevalue which gets split once again.

Optimize a dictionary key conditional

I would like to optimize this piece of code. I'm sure there is a way to write it in a single line:
if 'value' in dictionary:
x = paas_server['support']
else:
x = []
use dictionary get() method as:
x = dictionary.get('support', [])
if support is not a key in the dictionary, it returns second method's argument, here, an empty list.

Build a List of Tuples from a Dict

I have a list y of keys from a dictionary that is derived from a call to the Google Places API.
I would like to build a list of tuples for each point of interest:
lst = []
for i in range(len(y)):
lst.append((y[i]['name'], y[i]['formatted_address'], y[i]['opening_hours']['open_now'], y[i]['rating']))
This works if the field is in the list and I receive a list of results that look like the one below, which is exactly what I want:
("Friedman's", '1187 Amsterdam Ave, New York, NY 10027, USA', True, 4.2)
However, the script throws an error if a desired field is not in the list y. How can I build a list of tuples that checks whether the desired field is in y before building the tuple?
Here's what I've tried:
for i in range(len(y)):
t = ()
if y[i]['name']:
t = t + lst.append(y[i]['name'])
if y[i]['formatted_address']:
t = t + lst.append(y[i]['formatted_address'])
if y[i]['opening_hours']['open_now']:
t = t + lst.append(y[i]['opening_hours']['open_now'])
if y[i]['rating']:
t = t + lst.append(y[i]['rating'])
lst.append(t)
However, this doesn't work and seems very inelegant. Any suggestions?
This list comprehension uses default values when one of the keys is not present (using dict.get()). I added variables so you can set the desired default values.
default_name = ''
default_address = ''
default_open_now = False
default_rating = 0.0
new_list = [
(
e.get('name', default_name),
e.get('formatted_address', default_address),
e.get('opening_hours', {}).get('open_now', default_open_now),
e.get('rating', default_rating),
)
for e in y]
For a start, you should almost never loop over range(len(something)). Always iterate over the thing directly. That goes a long way to making your code less inelegant.
For the actual issue, you could loop over the keys and only add the item if it is in the dict. That gets a bit more complicated with your one element that is a nested lookup, but if you take it out then your code just becomes:
for item in y:
lst.append(tuple(item[key] for key in ('name', 'formatted_address', 'opening_hours', 'rating') if key in item))
You can use the get feature from dict.
y[i].get('name')
if y[i] has key 'name' returns the value or None. For nested dicts, use default value from get.
y[i].get('opening_hours', {}).get('open_now')
For data structure, I recommend to keep it as an dict, and add dicts to an list.
lst = []
lst.append({'name': "Friedman's", "address": '1187 Amsterdam Ave, New York, NY 10027, USA'})
Try this:
for i in y:
lst.append((v for k,v in i.items()))
you can use the keys method to find the keys in a dict. In your case:
lst=[]
fields = ('name', 'formatted_address', 'opening_hours' 'open_now', 'rating')
for i in range(len(y)):
data = []
for f in fields:
if f in y[].keys():
data.append(y[i][f])
else:
data.append(None)
lst.append(set(data))
note that you can also get all the key, value pairs in a dict using the items() method. That would actually simply the code a bit. To make it even better, itterate over the set, rather than calling len(set) to:
lst=[]
fields = ('name', 'formatted_address', 'opening_hours' 'open_now', 'rating')
for i in y:
data = []
for key, value in i.items():
if key in fields:
data.append(value)
else:
data.append(None)
lst.append(set(data))

Building Nested dictionary in Python reading in line by line from file

The way I go about nested dictionary is this:
dicty = dict()
tmp = dict()
tmp["a"] = 1
tmp["b"] = 2
dicty["A"] = tmp
dicty == {"A" : {"a" : 1, "b" : 1}}
The problem starts when I try to implement this on a big file, reading in line by line.
This is printing the content per line in a list:
['proA', 'macbook', '0.666667']
['proA', 'smart', '0.666667']
['proA', 'ssd', '0.666667']
['FrontPage', 'frontpage', '0.710145']
['FrontPage', 'troubleshooting', '0.971014']
I would like to end up with a nested dictionary (ignore decimals):
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
As I am reading in line by line, I have to check whether or not the first word is still found in the file (they are all grouped), before I add it as a complete dict to the higher dict.
This is my implementation:
def doubleDict(filename):
dicty = dict()
with open(filename, "r") as f:
row = 0
tmp = dict()
oldword = ""
for line in f:
values = line.rstrip().split(" ")
print(values)
if oldword == values[0]:
tmp[values[1]] = values[2]
else:
if oldword is not "":
dicty[oldword] = tmp
tmp.clear()
oldword = values[0]
tmp[values[1]] = values[2]
row += 1
if row % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I would actually like to have this in pandas, but for now I would be happy if this would work as a dict. For some reason after reading in just the first 5 lines, I end up with:
{'proA': {'frontpage': '0.710145', 'troubleshooting': '0.971014'}},
which is clearly incorrect. What is wrong?
Use a collections.defaultdict() object to auto-instantiate nested dictionaries:
from collections import defaultdict
def doubleDict(filename):
dicty = defaultdict(dict)
with open(filename, "r") as f:
for i, line in enumerate(f):
outer, inner, value = line.split()
dicty[outer][inner] = value
if i % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I used enumerate() to generate the line count here; much simpler than keeping a separate counter going.
Even without a defaultdict, you can let the outer dictionary keep the reference to the nested dictionary, and retrieve it again by using values[0]; there is no need to keep the temp reference around:
>>> dicty = {}
>>> dicty['A'] = {}
>>> dicty['A']['a'] = 1
>>> dicty['A']['b'] = 2
>>> dicty
{'A': {'a': 1, 'b': 1}}
All the defaultdict then does is keep us from having to test if we already created that nested dictionary. Instead of:
if outer not in dicty:
dicty[outer] = {}
dicty[outer][inner] = value
we simply omit the if test as defaultdict will create a new dictionary for us if the key was not yet present.
While this isn't the ideal way to do things, you're pretty close to making it work.
Your main problem is that you're reusing the same tmp dictionary. After you insert it into dicty under the first key, you then clear it and start filling it with the new values. Replace tmp.clear() with tmp = {} to fix that, so you have a different dictionary for each key, instead of the same one for all keys.
Your second problem is that you're never storing the last tmp value in the dictionary when you reach the end, so add another dicty[oldword] = tmp after the for loop.
Your third problem is that you're checking if oldword is not "":. That may be true even if it's an empty string, because you're comparing identity, not equality. Just change that to if oldword:. (This one, you'll usually get away with, because small strings are usually interned and will usually share identity… but you shouldn't count on that.)
If you fix both of those, you get this:
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
I'm not sure how to turn this into the format you claim to want, because that format isn't even a valid dictionary. But hopefully this gets you close.
There are two simpler ways to do it:
Group the values with, e.g., itertools.groupby, then transform each group into a dict and insert it all in one step. This, like your existing code, requires that the input already be batched by values[0].
Use the dictionary as a dictionary. You can look up each key as it comes in and add to the value if found, create a new one if not. A defaultdict or the setdefault method will make this concise, but even if you don't know about those, it's pretty simple to write it out explicitly, and it'll still be less verbose than what you have now.
The second version is already explained very nicely in Martijn Pieters's answer.
The first can be written like this:
def doubleDict(s):
with open(filename, "r") as f:
rows = (line.rstrip().split(" ") for line in f)
return {k: {values[1]: values[2] for values in g}
for k, g in itertools.groupby(rows, key=operator.itemgetter(0))}
Of course that doesn't print out the dict so far after every 25 rows, but that's easy to add by turning the comprehension into an explicit loop (and ideally using enumerate instead of keeping an explicit row counter).

Python: Convert a list of python dictionaries to an array of JSON objects

I'm trying to write a function to convert a python list into a JSON array of {"mpn":"list_value"} objects, where "mpn" is the literal string value I need for every object but "list_value" is the value from the python list. I'll use the output of this function for an API get request.
part_nums = ['ECA-1EHG102','CL05B103KB5NNNC','CC0402KRX5R8BB104']
def json_list(list):
lst = []
d = {}
for pn in list:
d['mpn']=pn
lst.append(d)
return json.dumps(lst, separators=(',',':'))
print json_list(part_nums)
This current function is not working and returns last value in the python list for all JSON objects:
>[{"mpn":"CC0402KRX5R8BB104"},{"mpn":"CC0402KRX5R8BB104"},{"mpn":"CC0402KRX5R8BB104"}]
However, of course I need my function to return the unique list values in the objects as such:
>[{"mpn":"ECA-1EHG102"},{"mpn":"CL05B103KB5NNNC"},{"mpn":"CC0402KRX5R8BB104"}]
Bottom line is I don't understand why this function isn't working. I expected I could append a dictionary with a single {key:value} pair to a python list and it wouldn't matter that all of the dictionaries have the same key because they would be independent. Thanks for your help.
You are adding the exact same dictionary to the list. You should create a new dictionary for each item in the list:
json.dumps([dict(mpn=pn) for pn in lst])
As explained by others (in answers) you should create a new dictionary for each item on the list elsewhere you reference always the same dictionary
import json
part_nums = ['ECA-1EHG102','CL05B103KB5NNNC','CC0402KRX5R8BB104']
def json_list(list):
lst = []
for pn in list:
d = {}
d['mpn']=pn
lst.append(d)
return json.dumps(lst)
print json_list(part_nums)
print
[{"mpn": "ECA-1EHG102"}, {"mpn": "CL05B103KB5NNNC"}, {"mpn": "CC0402KRX5R8BB104"}]
import json
part_nums = ['ECA-1EHG102','CL05B103KB5NNNC','CC0402KRX5R8BB104']
def json_list(list):
lst = []
for pn in list:
d = {}
d['mpn']=pn
lst.append(d)
return json.dumps(lst)
print json_list(part_nums) # for pyhon2
print (json_list(part_nums)) # for python3

Categories