How to transform a python dictionary to a desired format - python

I have the following dictionary, which i got when applied to_dict() method on a pandas dataframe.
{
'name' : {
0: 'abc',
1: 'xyz'
},
'email': {
0: 'abc#abc.com',
1: 'xyz#xyz.com',
},
'category': {
0: 'category 1',
1: 'category 2',
}
}
How do I transform it into the following structure?
[
{
'name': 'abc',
'email' : 'abc#abc.com',
'category': 'category 1',
},
{
'name': 'xyz',
'email' : 'xyz#xyz.com',
'category': 'category 2',
}
]
I tried applying many variations of for loop, but came out as bogus code, if anyone could help or point to some links it would be great, python newbie here :|
EDIT: changed the desired structure to a list of dicts, as dicts are not hashble,

The target structure you display is a set of dicts. Since dicts are not hashable, it is not possible to create.
Instead you probably want a list of dicts.
result = [
{k: yourdict[k][n] for k in yourdict} for n in sorted(yourdict['name'])
]
Testing:
[
{'category': 'category 1', 'email': 'abc#abc.com', 'name': 'abc'},
{'category': 'category 2', 'email': 'xyz#xyz.com', 'name': 'xyz'}
]

You could transpose your dataframe before converting it to a dictionary. This will produce a dictionary of dictionaries, where each key is an index value from the original dataframe.
import pandas as pd
pd.DataFrame({
'name' : {
0: 'abc',
1: 'xyz'
},
'email': {
0: 'abc#abc.com',
1: 'xyz#xyz.com',
},
'category': {
0: 'category 1',
1: 'category 2',
}
}).T.to_dict()
Outputs:
{0: {'name': 'abc', 'email': 'abc#abc.com', 'category': 'category 1'},
1: {'name': 'xyz', 'email': 'xyz#xyz.com', 'category': 'category 2'}}

You could just pass 'records' as the desired orientation to to_dict():
df.to_dict('records')
The default orientation 'dict' produces output like {column -> {index -> value}} as can be seen in your example, where as 'records' a list like [{column -> value}, … , {column -> value}], which is your desired output.

Related

How delete all keys in a nested dictionary if the key equals a given key

I have a nested dictionary that's updated dynamically so I never know how many levels there are. What I need to do is delete all entries in the dictionary that equal a given key like "command" for example.
I've tried looping through the dict but I found that the number of levels change at runtime so that didn't work. I was thinking that maybe this should use recursion but I would like to avoid that if I can.
I have include a sample of a mock dict, what I want is all keys that = command to be removed.
data = {
'id': 1,
'name': 'Option 1',
'command': do_something,
'sub_opt': {
'id': 10,
'name': 'Sub Option',
'command': do_something_more,
'sub_sub_opt': {
'id': 100,
'name': 'Sub Sub Option',
'command': do_something_crazy,
}
}
}
I know you're trying to avoid recursion, but the code isn't all that bad. Here's an example. (I changed the the values of 'command' keys to strings.)
def delete(data, key):
data.pop(key, None)
for k, v in data.items():
if isinstance(v, dict):
delete(v, key)
delete(data, 'command')
print(data)
{'id': 1, 'name': 'Option 1', 'command': 'do_something', 'sub_opt': {'id': 10, 'name': 'Sub Option', 'command': 'do_something_more', 'sub_sub_opt': {'id': 100, 'name': 'Sub Sub Option', 'command': 'do_something_crazy'}}}
{'id': 1, 'name': 'Option 1', 'sub_opt': {'id': 10, 'name': 'Sub Option', 'sub_sub_opt': {'id': 100, 'name': 'Sub Sub Option'}}}

Filter python dictionary with dictionary-comprehension

I have a dictionary that is really a geojson:
points = {
'crs': {'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}, 'type': 'name'},
'features': [
{'geometry': {
'coordinates':[[[-3.693162104185235, 40.40734504903418],
[-3.69320229317164, 40.40719570724241],
[-3.693227952841606, 40.40698546120488],
[-3.693677594635894, 40.40712700492216]]],
'type': 'Polygon'},
'properties': {
'name': 'place1',
'temp': 28},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place2',
'temp': 27},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place',
'temp': 25},
'type': 'Feature'
}
],
'type': u'FeatureCollection'
}
I would like to filter it to stay only with places that have a specific temperature, for example, more than 25 degrees Celsius.
I have managed to do it this way:
dict(crs = points["crs"],
features = [i for i in points["features"] if i["properties"]["temp"] > 25],
type = points["type"])
But I wondered if there was any way to do it more directly, with dictionary comprehension.
Thank you very much.
I'm very late. A dict compreheneison won't help you since you have only three keys. But if you meet the following conditions: 1. you don't need a copy of features (e.g. your dict is read only); 2. you don't need index access to features, you my use a generator comprehension instead of a list comprehension:
dict(crs = points["crs"],
features = (i for i in points["features"] if i["properties"]["temp"] > 25),
type = points["type"])
The generator is created in constant time, while the list comprehension is created in O(n). Furthermore, if you create a lot of those dicts, you have only one copy of the features in memory.

Updating complex JSON object in Python

I am grabbing sort of a complex MongoDB document with Python (v3.5) and I should update some values in it which are scattered all around the object and have no particular pattern in the structure and save it back to a different MongoDB collection. The object looks like this:
# after json.loads(mongo_db_document) my dict looks like this
notification = {
'_id': '570f934f45213b0d14b1256f',
'key': 'receipt',
'label': 'Delivery Receipt',
'version': '0.0.1',
'active': True,
'children': [
{
'key': 'started',
'label': 'Started',
'children': [
'date',
'time',
'offset'
]
},
{
'key': 'stop',
'label': 'Ended',
'children': [
'date',
'time',
'offset'
]
},
{
'label': '1. Particulars',
'template': 'formGroup',
'children': [
{
'children': [
{
'key': 'name',
'label': '2.1 Name',
'value': '********** THIS SHOULD BE UPDATED **********',
'readonly': 'true'
},
{
'key': 'ims_id',
'label': '2.2 IMS Number',
'value': '********** THIS SHOULD BE UPDATED **********',
'readonly': 'true'
}
]
},
{
'children': [
{
'key': 'type',
'readonly': '********** THIS SHOULD BE UPDATED **********',
'label': '2.3 Type',
'options': [
{
'label': 'Passenger',
'value': 'A37'
},
{
'label': 'Cargo',
'value': 'A35'
},
{
'label': 'Other',
'value': '********** THIS SHOULD BE UPDATED **********'
}
]
}
]
}
]
},
{
'template': 'formGroup',
'key': 'waste',
'label': '3. Waste',
'children': [
{
'label': 'Waste',
'children': [
{
'label': 'Plastics',
'key': 'A',
'inputType': 'number',
'inputAttributes': {
'min': 0
},
'value': '********** THIS SHOULD BE UPDATED **********'
},
{
'label': 'B. Oil',
'key': 'B',
'inputType': 'number',
'inputAttributes': {
'min': 0
},
'value': '********** THIS SHOULD BE UPDATED **********'
},
{
'label': 'C. Operational',
'key': 'C',
'inputType': 'number',
'inputAttributes': {
'min': 0
},
'value': '********** THIS SHOULD BE UPDATED **********'
}
]
}
]
},
{
'template': 'formRow',
'children': [
'empty',
'signature'
]
}
],
'filter': {
'timestamp_of_record': [
'date',
'time',
'offset'
]
}
}
My initial idea was to put placeholders (like $var_name) in places where I need to update values, and load the string with Python's string.Template, but that approach unfortunately breaks lots of stuff to other users of the same MongoDB document for some reason.
Is there a solution to simply modify this kind of object without "hardcoding" path to find the values I need to update?
There's this small script that I had written a couple years ago - I used it to find entries in some very long and unnerving JSONs. Admittedly it's not beautiful, but it might help in your case, perhaps?
You can find the script on Bitbucket, here (and here is the code).
Unfortunately it's not documented; at the time I wasn't really believing other people would use it, I guess.
Anyways, if you'd like to try it, save the script in your working directory and then use something like this:
from RecursiveSearch import Retriever
def alter_data(json_data, key, original, newval):
'''
Alter *all* values of said keys
'''
retr = Retriever(json_data)
for item_no, item in enumerate(retr.__track__(key)): # i.e. all 'value'
# Pick parent objects with a last element False in the __track__() result,
# indicating that `key` is either a dict key or a set element
if not item[-1]:
parent = retr.get_parent(key, item_no)
try:
if parent[key] == original:
parent[key] = newval
except TypeError:
# It's a set, this is not the key you're looking for
pass
if __name__ == '__main__':
alter_data(notification, key='value',
original = '********** THIS SHOULD BE UPDATED **********',
newval = '*UPDATED*')
Unfortunately as I said the script isn't well documented, so if you want to try it and need more info, I'll be glad to provide it.
Not sure if I understood correctly, but this will dynamically find all keys "value" and "readonly" and print out the paths to address the fields.
def findem(data, trail):
if isinstance(data, dict):
for k in data.keys():
if k in ('value', 'readonly'):
print("{}['{}']".format(trail, k))
else:
findem(data[k], "{}['{}']".format(trail, k))
elif isinstance(data, list):
for k in data:
findem(k, '{}[{}]'.format(trail, data.index(k)))
if __name__ == '__main__':
findem(notification, 'notification')
notification['children'][2]['children'][0]['children'][0]['readonly']
notification['children'][2]['children'][0]['children'][0]['value']
notification['children'][2]['children'][0]['children'][1]['readonly']
notification['children'][2]['children'][0]['children'][1]['value']
notification['children'][2]['children'][1]['children'][0]['readonly']
notification['children'][2]['children'][1]['children'][0]['options'][0]['value']
notification['children'][2]['children'][1]['children'][0]['options'][1]['value']
notification['children'][2]['children'][1]['children'][0]['options'][2]['value']
notification['children'][3]['children'][0]['children'][0]['value']
notification['children'][3]['children'][0]['children'][1]['value']
notification['children'][3]['children'][0]['children'][2]['value']
Add another list to the JSON object. Each item in that list would be a list of keys that lead to the values to be changed. An example for one such list is: ['children', 2, 'children', 'children', 0, 'value'].
Then, to access the value you could use a loop:
def change(json, path, newVal):
cur = json
for key in path[:-1]:
cur = cur[key]
cur[path[-1]] = newVal
path = notification['paths'][0]
#path, for example, could be ['children', 2, 'children', 'children', 0, 'value']
newVal = 'what ever you want'
change(notification, path, newVal)

Mongo Distinct Query with full row object

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.
I have following data stored in mongo
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
you can see some of the rows are duplicate with different id
as long as it will take to solve this issue from input I must tackle it on output.
I need the data in the following way:
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
My query
keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})
As you can see it takes two queries for a same result set Please combine it to one as the database is very large
I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.
Or it will??
You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:
db.data.aggregate([{
"$sort": {
"_id": 1
}
}, {
"$group": {
"_id": "$key",
"first": {
"$first": "$$ROOT"
}
}
}, {
"$project": {
"_id": 0,
"id":"$first.id",
"key":"$first.key",
"name":"$first.name",
"country":"$first.country"
}
}])

How to make one-line choices generator from nested dictionaries?

I have such structure:
actions = {
'cat1': {
'visit': {
'id': 1,
'description': 'Desc 1',
'action': 'Act 1',
},
},
'cat2': {
'download': {
'id': 2,
'description': 'Desc 2',
'action': 'Act 2',
},
'click': {
'id': 3,
'description': 'Desc 3',
'action': 'Act 3',
},
...
},
...
}
And following code for generating tuple of tuples for django choice field:
CHOICES = []
for a in actions.values():
for c in a.values():
CHOICES.append((c['id'], c['description']))
Is it possible to write above code in one line nested for loop?
CHOICES = [(c['id'], c['description']) for a in actions.values() for c in a.values()]
With map and reduce:
map(lambda x : [x['id'],x['description']],reduce(lambda x,y:x+y.values(),actions.values(),[]))

Categories