Processing dictionary keys in arbitrary order - python

I want to transform dictionary into a string. What would be beginner-level question is complicated by few rules that I have to adhere to:
There is a list of known keys that must come out in particular, arbitrary order
Each of known keys is optional, i.e. it may not be present in dictionary
It is guaranteed that at least one of known keys will be present in dictionary
Dictionary may contain additional keys; they must come after known keys and their order is not important
I cannot make assumptions about order in which keys will be added to dictionary
What is the pythonic way of processing some dictionary keys before others?
So far, I have following function:
def format_data(input_data):
data = dict(input_data)
output = []
for key in ["title", "slug", "date", "modified", "category", "tags"]:
if key in data:
output.append("{}: {}".format(key.title(), data[key]))
del data[key]
if data:
for key in data:
output.append("{}: {}".format(key.title(), data[key]))
return "\n".join(output)
data = {
"tags": "one, two",
"slug": "post-title",
"date": "2017-02-01",
"title": "Post Title",
}
print(format_data(data))
data = {
"format": "book",
"title": "Another Post Title",
"date": "2017-02-01",
"slug": "another-post-title",
"custom": "data",
}
print(format_data(data))
Title: Post Title
Slug: post-title
Date: 2017-02-01
Tags: one, two
Title: Another Post Title
Slug: another-post-title
Date: 2017-02-01
Custom: data
Format: book
While this function does provide expected results, it has some issues that makes me think there might be better approach. Namely, output.append() line is duplicated and input data structure is copied to allow it's modification without side-effects.
To sum up, how can I process some keys in particular order and before other keys?

I suggest that you simply run a pair of list comprehensions: one for the desired keys, and one for the rest. Concatenate them in the desired order in bulk, rather than one at a time. This reduces the critical step to a single command to build output.
The first comprehension looks for desired keys in the dict; the second looks for any dict keys not in the "desired" list.
def format_data(input_data):
data = dict(input_data)
key_list = ["title", "slug", "date", "modified", "category", "tags"]
output = ["{}: {}".format(key.title(), data[key]) for key in key_list if key in data] + \
["{}: {}".format(key.title(), data[key]) for key in data if key not in key_list]
return "\n".join(output)

I'd suggest list comprehensions and pop():
def format_data(input_data):
data = dict(input_data)
keys = ["title", "slug", "date", "modified", "category", "tags"]
output = ['{}: {}'.format(key.title(), data.pop(key)) for key in keys if key in data]
output.extend(['{}: {}'.format(key.title(), val) for key, val in data.items()])
return "\n".join(output)
To the concern about deleting during iteration - note that the iteration is over the list of keys, not the dictionary being evaluated, so I wouldn't consider that a red flag.

To completely edit, the below will take a list of primary keys (you can pass them in if you want or set it in a config file) and then it will set those in the beginning of your dictionary.
I think I see what you mean now:
Try this:
from collections import OrderedDict
data = {'aaa': 'bbbb',
'custom': 'data',
'date': '2017-02-01',
'foo': 'bar',
'format': 'book',
'slug': 'another-post-title',
'title': 'Another Post Title'}
def format_data(input_data):
primary_keys = ["title", "slug", "date", "modified", "category", "tags"]
data = OrderedDict((k, input_data.get(k)) for k in primary_keys + input_data.keys())
output = []
for key, value in data.items():
if value:
output.append("{}: {}".format(key.title(), value))
return "\n".join(output)
print(format_data(data))
Title: Another Post Title
Slug: another-post-title
Date: 2017-02-01
Aaa: bbbb
Format: book
Custom: data
Foo: bar

Find the difference between the known keys and the keys in the input dictionary; Use itertools.chain to iterate over both sets of keys; catch KeyErrors for missing keys and just pass. No need to copy the input and no duplication.
import itertools
def format_data(input_data):
known_keys = ["title", "slug", "date", "modified", "category", "tags"]
xtra_keys = set(input_data.keys()).difference(known_keys)
output = []
for key in itertools.chain(known_keys, xtra_keys):
try:
output.append("{}: {}".format(key.title(), data[key]))
except KeyError as e:
pass
return '\n'.join(output)
data = {"tags": "one, two",
"slug": "post-title",
"date": "2017-02-01",
"title": "Post Title",
"foo": "bar"}
>>> print format_data(data)
Title: Post Title
Slug: post-title
Date: 2017-02-01
Tags: one, two
Foo: bar
>>>

Related

Create dictionary using JSON data

I have a JSON file that has movie data in it. I want to create a dictionary that has the movie title as the key and a count of how many actors are in that movie as the value. An example from the JSON file is below:
{
"title": "Marie Antoinette",
"year": "2006",
"genre": "Drama",
"summary": "Based on Antonia Fraser's book about the ill-fated Archduchess of Austria and later Queen of France, 'Marie Antoinette' tells the story of the most misunderstood and abused woman in history, from her birth in Imperial Austria to her later life in France.",
"country": "USA",
"director": {
"last_name": "Coppola",
"first_name": "Sofia",
"birth_date": "1971"
},
"actors": [
{
"first_name": "Kirsten",
"last_name": "Dunst",
"birth_date": "1982",
"role": "Marie Antoinette"
},
{
"first_name": "Jason",
"last_name": "Schwartzman",
"birth_date": "1980",
"role": "Louis XVI"
}
]
}
I have the following but it's counting all of the actors from all of the movies instead of each movie and the number of actors per movie. I'm not sure how to do this correctly as I'm newer to Python so help would be great.
import json
def actor_count(json_data):
with open("movies_db.json", 'r') as file:
data = json.load(file)
for t in data:
title = [t['title'] for t in data]
for element in data:
for actor in element['actors']:
rolee = [actor['role'] for movie in data for actor in movie['actors']]
len_role = [len(role)]
newD = dict(zip(title, len_role))
print(newD)
json_data = open('movies_db.json')
actor_count(json_data)
You show json that only contains a dictionary, yet you seem to process it as if it were a list of dictionaries with the structure you have shown. Pending clarification, I am answering here as if the latter is true -- you have a list of dictionaries, since you would be asking a different question about a different error if this was not the case.
In your function, each element of data is a dictionary that contains the information for a single movie. To get a dict correlating the title to the count of actors in this movie, you just need to access the "title" key and the length of the "actors" key for each element.
def actor_count(json_data):
movie_actors = {}
for movie in json_data:
title = movie["title"]
num_actors = len(movie["actors"])
movie_actors[title] = num_actors
return movie_actors
Alternatively, use a dictionary comprehension to build this dictionary:
def actor_count(json_data):
movie_actors = {movie["title"]: len(movie["actors"]) movie in json_data}
return movie_actors
Now, load your json file once, and use that when you call actors_count. This will return a dictionary mapping each movie title to the number of actors.
with open("movies_db.json", 'r') as file:
data = json.load(file)
actors_count(data)
Note that loading the json file again in the function is unnecessary, since you already did it before calling the function, and are passing the parsed object to the function.
If you want to keep your current logic of using list comprehensions, and then zipping the resultant lists to create a dict, that is also possible although slightly less efficient. There are significant changes you will need to make:
def actor_count(json_data):
title = [t['title'] for t in json_data]
n_actors = [len(t['actors'] for t in json_data)]
newD = dict(zip(title, n_actors))
return newD
As before, no need to read the file again in the function
You're already looping over all elements in json_data as part of the list comprehension, so no need for another loop outside this.
You can get the number of actors simply by len(t['actors'])
You seem to have misconceptions about how list comprehensions and loops work. A list comprehension is a self-contained loop that builds a list. If you have a list comprehension, there's usually no need to surround it by the same for ... in ... statement that already exists in the comprehension.
def actor_count(json_data):
newD = dict()
with open("movies_db.json", 'r') as file:
data = json.load(file)
for t in data:
if t == 'title':
title_ = json_data[t]
newD[ title_ ] = 0
if t == 'actors':
newD[ title_ ] = len(json_data[t])
print(newD)
Output:
{'Marie Antoinette': 2}

Why is this dictionary comprehension for list of dictionaries not returning values?

I am iterating over a list of dictionaries in a list formatted as follows (each dictionary relates to one item of clothing, I have only listed the first:
new_products = [{'{"uniq_id": "1234", "sku": "abcdefgh", "name": "Levis skinny jeans", '
'"list_price": "75.00", "sale_price": "55.00", "category": "womens"}'}]
def find_product(dictionary, uniqid):
if 'uniq_id' in dictionary:
if ['uniq_id'] == uniqid:
return(keys, values in dictionary)
print(find_product(new_products, '1234'))
This is returning
None
The reason for the if statement in there is that not every product has a value for uniq_id so I was getting a key error on an earlier version of my code.
Your dictionary definition is quite unclear.
Assuming that you have given a list of dictionaries of size 1, it should be something like this:
new_products = [{"uniq_id": "1234", "sku": "abcdefgh", "name": "Levis skinny jeans", "list_price": "75.00", "sale_price": "55.00", "category": "womens"}]
def find_product(list_of_dicts, uniqid):
for dictionary in list_of_dicts:
if 'uniq_id' in dictionary:
if dictionary['uniq_id'] == uniqid:
return dictionary
print(find_product(new_products, '1234'))
You are using something like this:
new_products = [{'{ "some" : "stuff" }'}]
This is a list (the outer []) containing a set (the {})
{'{ "some" : "stuff" }'}
Note {1} is a set containing the number 1. Though it uses the curly braces it isn't a dictionary.
Your set contains a string:
'{ "some" : "stuff" }'
If I ask if 'some' is in this, I get True back, but if I ask for this string's keys there are no keys.
Make your new_products a list containing a dictionary (not a set), and don't put the payload in a string:
new_products = [{"uniq_id": "1234",
"sku": "abcdefgh",
"name": "Levis skinny jeans",
"list_price": "75.00",
"sale_price": "55.00",
"category": "womens"}]
Then loop over the dictionaries in the list in your function:
def find_product(dictionary_list, uniqid):
for d in dictionary_list:
if 'uniq_id' in d:
if d['uniq_id'] == uniqid:
return d.keys(), d.values()
return "not found" # or something better
>>> find_product(new_products, '1234')
(dict_keys(['uniq_id', 'sku', 'name', 'list_price', 'sale_price', 'category']), dict_values(['1234', 'abcdefgh', 'Levis skinny jeans', '75.00', '55.00', 'womens']))
>>> find_product(new_products, '12345')
'not found'

Format some JSON object with certain fields on one-line?

I want to re-format a JSON file so that certain objects (dictionaries) with some specific keys are on one-line.
For example, any object with key name should appear in one line:
{
"this": "that",
"parameters": [
{ "name": "param1", "type": "string" },
{ "name": "param2" },
{ "name": "param3", "default": "#someValue" }
]
}
The JSON file is generated, and contains programming language data. One-line certain fields makes it much easier to visually inspect/review.
I tried to override python json.JSONEncoder to turn matching dict into a string before writing, only to realize quotes " within the string are escaped again in the result JSON file, defeating my purpose.
I also looked at jq but couldn't figure out a way to do it. I found similar questions and solutions based on line length, but my requirements are simpler, and I don't want other shorter lines to be changed. Only certain objects or fields.
This code recursively replaces all the appropriate dicts in the data with unique strings (UUIDs) and records those replacements, then in the indented JSON string the unique strings are replaced with the desired original single line JSON.
replace returns a pair of:
A modified version of the input argument data
A list of pairs of JSON strings where for each pair the first value should be replaced with the second value in the final pretty printed JSON.
import json
import uuid
def replace(o):
if isinstance(o, dict):
if "name" in o:
replacement = uuid.uuid4().hex
return replacement, [(f'"{replacement}"', json.dumps(o))]
replacements = []
result = {}
for key, value in o.items():
new_value, value_replacements = replace(value)
result[key] = new_value
replacements.extend(value_replacements)
return result, replacements
elif isinstance(o, list):
replacements = []
result = []
for value in o:
new_value, value_replacements = replace(value)
result.append(new_value)
replacements.extend(value_replacements)
return result, replacements
else:
return o, []
def pretty(data):
data, replacements = replace(data)
result = json.dumps(data, indent=4)
for old, new in replacements:
result = result.replace(old, new)
return result
print(pretty({
"this": "that",
"parameters": [
{"name": "param1", "type": "string"},
{"name": "param2"},
{"name": "param3", "default": "#someValue"}
]
}))

Python Recursively Maintain Keyed Depth

Input/Goal
My input data is an OrderedDict for which there can be a variable depth of nested OrderedDicts so I have opted to handle parsing this output recursively. The desired output is a csv with header.
Elaboration of Problem
My code below will work once I am able to correctly define field_name upon traversing back up a branch after completing all of a branch's leaves. (i.e. Type_1.Field_3.Data will incorrectly be called Type_1.Field_2.Field_3.Data).
Once the leaves on a branch have been exhausted, I want to remove the last .Field_x from the field_name so that a new (correct) one can be added for the following object.
Request for Help
Does anyone see where I can include this feature? Thanks,
...
Dependencies:
Code Snippet:
def get_soql_fields(soql):
soql_fields = re.search('(?<=select)(?s)(.*)(?=from)', soql) # get fields
soql_fields = re.sub(' ', '', soql_fields.group()) # remove extra spaces
fields = re.split(',|\n|\r', soql_fields) # split on commas and newlines
fields = [field for field in fields if field != ''] # remove empty strings
return fields
def parse_output(data, soql):
fields = get_soql_fields(soql)
header = fields
master = [header]
for record in data['records']: # for each 'record' in response
row = []
for obj, value in record.iteritems(): # for each obj in record
if isinstance(value, basestring): # if query base object has desired fields
if obj in fields:
row.append(value)
elif isinstance(value, dict): # traverse down into object
path = obj
row.append(_traverse_output(obj, value, fields, row, path))
master.append(row)
return master
def _traverse_output(obj, value, fields, row, path):
for f, v in value.iteritems(): # for each item in obj
if not isinstance(v, (dict, list, tuple)):
field_name = '{path}.{name}'.format(path=path, name=f) # TODO fix this to full field name
print('FName: {0}'.format(field_name))
if field_name in fields:
print('match')
row.append(v)
elif isinstance(v, dict): # it is a dict
path += '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, path)
Example Salesforce SOQL:
select
Type_1.Field_1,
Type_1.Field_2.Data,
Type_1.Field_3,
Type_1.Field_4,
Type_1.Field_5.Data_1.Data,
Type_1.Field_6,
Type_2.Field_1,
Type_2.Field_2
from
Obj_1
limit
1
;
Example Salesforce Output:
{
"records": [
{
"attributes": {
"type": "Obj_1",
"url": "<url>"
},
"Type_1": {
"attributes": {
"type": "Type_1",
"url": "<url>"
},
"Field_1": "<stuff>",
"Field_2": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data": "<data>"
},
"Field_3": "<data>",
"Field_4": "<data>",
"Field_5": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data_1": {
"attributes": {
"type": "Data_1",
"url": "<url>"
},
"Data": "<data>"
}
},
"Field_6": 1.0
},
"Type_2": {
"attributes": {
"type": "Type_2",
"url": "<url>"
},
"Field_1": "<data>",
"Field_2": "<data>"
}
}
]
}
I worked out a quick solution for this. I'll just note what I figured out, and append the code I wrote to the end.
Essentially your problem is that you keep trying to modify path in place, which isn't going to work. Instead do something like
new_path = path + '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, new_path)
A note about this: it will NOT necessarily result in a row where the values are in the same order as the header (i.e., if Type_1.Field_1 is in position 0 of the header list, then the value corresponding to it might not be).
The easy way to solve this (and handle csvs in general) is to use DictWriter from the csv module, then pass an empty dictionary to your first call where the keys will be the field names and the values will be their values.
Another way to solve the problem is to pre-populate your row list with None or empty strings, then use the list.index method to assign the value to the appropriate position.
I wrote an implementation of _traverse_output as examples for each, though they differ slightly from your code. They take an element of the 'records' list.
Dictionary Example
def _traverse_output_with_dict(record, fields, row_values, field_name=''):
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row_values[new_field_name] = value
else:
_traverse_output_with_dict(value, fields, row_values, new_field_name)
List Example
def _traverse_output_with_list(record, fields, row, field_name=''):
while len(row) < len(fields):
row.append('')
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row[fields.index(new_field_name)] = value
else:
_traverse_output_with_list(value, fields, row, new_field_name)

Serialize Dictionary with a string key and List[] value to JSON

How can I serialize a python Dictionary to JSON and pass back to javascript, which contains a string key, while the value is a List (i.e. [])
if request.is_ajax() and request.method == 'GET':
groupSet = GroupSet.objects.get(id=int(request.GET["groupSetId"]))
groups = groupSet.groups.all()
group_items = [] #list
groups_and_items = {} #dictionary
for group in groups:
group_items.extend([group_item for group_item in group.group_items.all()])
#use group as Key name and group_items (LIST) as the value
groups_and_items[group] = group_items
data = serializers.serialize("json", groups_and_items)
return HttpResponse(data, mimetype="application/json")
the result:
[{"pk": 5, "model": "myApp.group", "fields": {"name": "\u6fb4\u9584", "group_items": [13]}}]
while the group_items should have many group_item and each group_item should have "name", rather than only the Id, in this case the Id is 13.
I need to serialize the group name, as well as the group_item's Id and name as JSON and pass back to javascript.
I am new to Python and Django, please advice me if you have a better way to do this, appreciate. Thank you so much. :)
Your 'groups' variable is a QuerySet object, not a dict. You will want to be more explicit with the data that you want to return.
import json
groups_and_items = {}
for group in groups:
group_items = []
for item in group.group_items.all():
group_items.append( {'id': item.id, 'name': item.name} )
# <OR> if you just want a list of the group_item names
#group_items = group.group_items.all().values_list('name', flat=True)
groups_and_items[group.name] = group_items
data = json.dumps(groups_and_items)
What exactly did you want you want your data to look like? The above should give you data like this :
[{ 'groupA': [{'id': 1, 'name': 'item-1'}],
'groupB': [{'id': 2, 'name': 'item-2'}, ...],
'groupC': []
}]
Or this if you just want the list of group_item names:
[{ 'groupA': ['item-1'],
'groupB': ['item-2', ...],
'groupC': []
}]
You should use Python's json module to encode your JSON.
Also, what indentation level do you have data = serializers at? It looks like it could be inside the for loop?

Categories