Mapping JSON key-value pairs from source to destination using Python

Mapping JSON key-value pairs from source to destination using Python - python

Using Python requests I want to grab a piece of JSON from one source and post it to a destination. The structure of the JSON received and the one required by the destination, however, differs a bit so my question is, how do I best map the items from the source structure onto the destination structure?
To illustrate, imagine we get a list of all purchases made by John and Mary. And now we want to post the individual items purchased linking these to the individuals who purchased them (NOTE: The actual use case involves thousands of entries so I am looking for an approach that would scale accordingly):
Source JSON:
{
'Total Results': 2,
'Results': [
{
'Name': 'John',
'Age': 25,
'Purchases': [
{
'Fruits': {
'Type': 'Apple',
'Quantity': 3,
'Color': 'Red'}
},
{
'Veggie': {
'Type': 'Salad',
'Quantity': 2,
'Color': 'Green'
}
}
]
},
{
'Name': 'Mary',
'Age': 20,
'Purchases': [
{
'Fruits': {
'Type': 'Orange',
'Quantity': 2,
'Color': 'Orange'
}
}
]
}
]
}
Destination JSON:
{
[
{
'Purchase': 'Apple',
'Purchased by': 'John',
'Quantity': 3,
'Type': 'Red',
},
{
'Purchase': 'Salad',
'Purchased by': 'John',
'Quantity': 2,
'Type': 'Green',
},
{
'Purchase': 'Orange',
'Purchased by': 'Mary',
'Quantity': 2,
'Type': 'Orange',
}
]
}
Any help on this would be greatly appreciated! Cheers!

Just consider loop through the dict.
res = []
for result in d['Results']:
value = {}
for purchase in result['Purchases']:
item = list(purchase.values())[0]
value['Purchase'] = item['Type']
value['Purchased by'] = result['Name']
value['Quantity'] = item['Quantity']
value['Type'] = item['Color']
res.append(value)
pprint(res)
[{'Purchase': 'Apple', 'Purchased by': 'John', 'Quantity': 3, 'Type': 'Red'},
{'Purchase': 'Salad', 'Purchased by': 'John', 'Quantity': 2, 'Type': 'Green'},
{'Purchase': 'Orange', 'Purchased by': 'Mary', 'Quantity': 2, 'Type': 'Orange'}]

Related

parsing nested json using Pandas

I wanted to try to parse this nested json using Pandas, and I'm confused when i wanted to extract the data from column "amount" and "items", and the data has so many rows like hundreds, this is one of the example
{
"_id": "62eaa99b014c9bb30203e48a",
"amount": {
"product": 291000,
"shipping": 75000,
"admin_fee": 4500,
"order_voucher_deduction": 0,
"transaction_voucher_deduction": 0,
"total": 366000,
"paid": 366000
},
"status": 32,
"items": [
{
"_id": "62eaa99b014c9bb30203e48d",
"earning": 80400,
"variants": [
{
"name": "Color",
"value": "Black"
},
{
"name": "Size",
"value": "38"
}
],
"marketplace_price": 65100,
"product_price": 62000,
"reseller_price": 145500,
"product_id": 227991,
"name": "Heels",
"sku_id": 890512,
"internal_markup": 3100,
"weight": 500,
"image": "https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg",
"quantity": 1,
"supplier_price": 60140
}
I've tried using this and it'd only shows the index
dfjson=pd.json_normalize(datasetjson)
dfjson.head(3)
##UPDATE
I tried added the pd.Dataframe , yes it works to become dataframe, but i still haven't got to know how to extract the _id, earning, variants

Given:
data = {
'_id': '62eaa99b014c9bb30203e48a',
'amount': {'admin_fee': 4500,
'order_voucher_deduction': 0,
'paid': 366000,
'product': 291000,
'shipping': 75000,
'total': 366000,
'transaction_voucher_deduction': 0},
'items': [{'_id': '62eaa99b014c9bb30203e48d',
'earning': 80400,
'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
'internal_markup': 3100,
'marketplace_price': 65100,
'name': 'Heels',
'product_id': 227991,
'product_price': 62000,
'quantity': 1,
'reseller_price': 145500,
'sku_id': 890512,
'supplier_price': 60140,
'variants': [{'name': 'Color', 'value': 'Black'},
{'name': 'Size', 'value': '38'}],
'weight': 500}],
'status': 32
}
Doing:
df = pd.json_normalize(data, ['items'], ['amount'])
df = df.join(df.amount.apply(pd.Series))
df = df.join(df.variants.apply(pd.DataFrame)[0].set_index('name').T.reset_index(drop=True))
df = df.drop(['amount', 'variants'], axis=1)
print(df)
Output:
_id earning marketplace_price product_price reseller_price product_id name sku_id internal_markup weight image quantity supplier_price product shipping admin_fee order_voucher_deduction transaction_voucher_deduction total paid Color Size
0 62eaa99b014c9bb30203e48d 80400 65100 62000 145500 227991 Heels 890512 3100 500 https://product-asset.s3.ap-southeast-1.amazon... 1 60140 291000 75000 4500 0 0 366000 366000 Black 38
There's probably a better way to do some of this, but the sample provided wasn't even a valid json object, so I can't be sure what the real data actually looks like.

Try pd.json_normalize(datasetjson, max_level=0)

I guess you are confuse working with dictionaries or json format.
This line is the same sample you have but it's missed ]} at the end. I formatted removing blank spaces but it's the same:
dfjson = {"_id":"62eaa99b014c9bb30203e48a","amount":{"product":291000,"shipping":75000,"admin_fee":4500,"order_voucher_deduction":0,"transaction_voucher_deduction":0,"total":366000,"paid":366000},"status":32,"items":[{"_id":"62eaa99b014c9bb30203e48d","earning":80400,"variants":[{"name":"Color","value":"Black"},{"name":"Size","value":"38"}],"marketplace_price":65100,"product_price":62000,"reseller_price":145500,"product_id":227991,"name":"Heels","sku_id":890512,"internal_markup":3100,"weight":500,"image":"https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg","quantity":1,"supplier_price":60140}]}
Now, if you want to call amount:
dfjson['amount']
# Output
{'product': 291000,
'shipping': 75000,
'admin_fee': 4500,
'order_voucher_deduction': 0,
'transaction_voucher_deduction': 0,
'total': 366000,
'paid': 366000}
If you want to call items:
dfjson['items']
# Output
[{'_id': '62eaa99b014c9bb30203e48d',
'earning': 80400,
'variants': [{'name': 'Color', 'value': 'Black'},
{'name': 'Size', 'value': '38'}],
'marketplace_price': 65100,
'product_price': 62000,
'reseller_price': 145500,
'product_id': 227991,
'name': 'Heels',
'sku_id': 890512,
'internal_markup': 3100,
'weight': 500,
'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
'quantity': 1,
'supplier_price': 60140}]
For getting the items, you can create a list:
list_items = []
for i in dfjson['items']:
list_items.append(i)

For how to import the entire json data into a pandas DataFrame check out the answer given by BeRT2me
but if you are only after extracting _id, earning, variants to a Pandas DataFrame giving:
_id _id_id earning variants
0 62eaa99b014c9bb30203e48a 62eaa99b014c9bb30203e48d 80400 [{'name': 'Color', 'value': 'Black'}, {'name':...
as you state in your question:
but I still haven't got to know how to extract the _id, earning, variants
notice that the problem with extracting _id, earning, variants is that this values are 'hidden" within a list of one item. Resolving this issue with indexing it by [0] gives the required values:
json_text = """\
{'_id': '62eaa99b014c9bb30203e48a',
'amount': {'admin_fee': 4500,
'order_voucher_deduction': 0,
'paid': 366000,
'product': 291000,
'shipping': 75000,
'total': 366000,
'transaction_voucher_deduction': 0},
'items': [{'_id': '62eaa99b014c9bb30203e48d',
'earning': 80400,
'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
'internal_markup': 3100,
'marketplace_price': 65100,
'name': 'Heels',
'product_id': 227991,
'product_price': 62000,
'quantity': 1,
'reseller_price': 145500,
'sku_id': 890512,
'supplier_price': 60140,
'variants': [{'name': 'Color', 'value': 'Black'},
{'name': 'Size', 'value': '38'}],
'weight': 500}],
'status': 32}
json_dict = eval(json_text)
print(f'{(_id := json_dict["items"][0]["_id"])=}')
print(f'{(earning := json_dict["items"][0]["earning"])=}')
print(f'{(variants := json_dict["items"][0]["variants"])=}')
print('---')
print(f'{_id=}')
print(f'{earning=}')
print(f'{variants=}')
gives:
(_id := json_dict["items"][0]["_id"])='62eaa99b014c9bb30203e48d'
(earning := json_dict["items"][0]["earning"])=80400
(variants := json_dict["items"][0]["variants"])=[{'name': 'Color', 'value': 'Black'}, {'name': 'Size', 'value': '38'}]
---
_id='62eaa99b014c9bb30203e48d'
earning=80400
variants=[{'name': 'Color', 'value': 'Black'}, {'name': 'Size', 'value': '38'}]```
If you want in addition a Pandas DataFrame with rows holding all this extracted values you can loop over all your json data files adding a row to a DataFrame using:
# Create an empty DataFrame:
df = pd.DataFrame(columns=['_id', '_id_id', 'earning', 'variants'])
# Add rows to df in a loop processing the json data files:
df_to_append = pd.DataFrame(
[[json_dict['_id'], _id, earning, variants]],
columns=['_id', '_id_id', 'earning', 'variants']
)
df = df.append(df_to_append)
pd.set_option('display.max_columns', None)
print(df.to_dict())

Create avro schema for python dict

I want to create an avro-schema for following python-dictionary:
d = {
'topic': 'example',
'content': (
{ 'description': {'name': 'alex', 'value': 12}, 'id': '234ba' },
{ 'description': {'name': 'john', 'value': 14}, 'id': '823cx' }
)
}
How can I do this?

Have you tried to use the default serialization and deserialization included in the avro library for python?
https://avro.apache.org/docs/1.10.0/gettingstartedpython.html
Verify that is what you want

How to convert raw json to required format in pythonic way

I have json from some service, where each value is different row.
Input example:
[
{'author': 'alf', 'topic': 'topic1', 'lang': 'ge', 'value': 11},
{'author': 'alf', 'topic': 'topic1', 'lang': 'ge', 'value': 22},
{'author': 'bob', 'topic': 'topic1', 'lang': 'ge', 'value': 33},
{'author': 'bob', 'topic': 'topic1', 'lang': 'ge', 'value': 44},
{'author': 'alf', 'topic': 'topic1', 'lang': 'fr', 'value': 99},
{'author': 'alf', 'topic': 'topic2', 'lang': 'ge', 'value': -20},
]
Output example:
{
'alf': {
'topic1': [
{'ge': [11, 22]},
{'fr': [99]}
],
'topic2': [
{'ge': [-20]}
]
},
'bob': {
'topic1': [
{'ge': [33, 44]}
]
}
}
So basically this is simple transformation via grouping specified keys to collect all values in to one array.
I done this transformation via checking and creating required key if it is missing:
for entry in self._raw_data:
parsed = {}
author = entry["author"]
topic = entry["topic"]
lang = entry["lang"]
value = entry["value"]
if not parsed.get(author):
parsed[author] = {}
if not parsed[author].get(topic):
parsed[author][topic] = []
#etc
I am sure, that could be done in more transparent way. Can anyone recommend something?

If you're willing to change the type of "topic"'s value from list to dict, you can use .setdefault():
res = {}
for entry in raw_data:
res.setdefault(entry['author'], {}).setdefault(entry["topic"], {}).setdefault(entry["lang"], []).append(entry["value"])
OUTPUT:
{
"alf": {
"topic1": {
"fr": [99],
"ge": [11, 22]
},
"topic2": {
"ge": [-20]
}
},
"bob": {
"topic1": {
"ge": [33, 44]
}
}
}

Mongo Distinct Query with full row object

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.
I have following data stored in mongo
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
you can see some of the rows are duplicate with different id
as long as it will take to solve this issue from input I must tackle it on output.
I need the data in the following way:
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
My query
keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})
As you can see it takes two queries for a same result set Please combine it to one as the database is very large
I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.
Or it will??

You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:
db.data.aggregate([{
"$sort": {
"_id": 1
}
}, {
"$group": {
"_id": "$key",
"first": {
"$first": "$$ROOT"
}
}
}, {
"$project": {
"_id": 0,
"id":"$first.id",
"key":"$first.key",
"name":"$first.name",
"country":"$first.country"
}
}])

In Python how can I copy a list of dictionaries, and filter specific keys, including nested ones?

If I have a dictionary, which contains lists, dictionaries, strings, and integers, how can I copy that dictionary, but only specific white listed keys?
Filtering old_dict through this function would give me new_dict. The white listed keys can be in whatever format, but the solution should be able to take an object of arbitrary white listed keys to produce the output.
old_dict = [
{
'name': {'first': "John", 'last': "Doe"},
'groups': ["foo", "bar"],
'widgets': [
{'id': 0, 'name': "Acme"},
{'id': 1, 'name': "Anvil"},
],
},
{
'name': {'first': "David"},
'groups': ["bar", "bash", "ding"],
'widgets': [
{'id': 1, 'name': "Anvil"},
{'id': 8, 'name': "Bingo"},
],
},
]
new_dict = [
{
'name': {'last': "Doe"},
'widgets': [
{'name': "Acme"},
{'name': "Anvil"},
],
},
{
'name': { },
'widgets': [
{'name': "Anvil"},
{'name': "Bingo"},
],
},
]

new_dict = []
for dic in old_dic:
new_dict.append({})
for key in white_list: #list of keys
if key in dic:
new_dict[key] = old_dic[key]
Which you can make in to a list comprehension as:
new_dic = [ dict( key, old_dic[key])
for key in whitelist
if key in old_dic)
for dic in old_dic ]

Make a recursive function that handles each type and filters out whenever you see a dict:
def keep_only(keys, val):
if type(val) is list:
return [keep_only(keys, v) for v in val]
if type(val) is dict:
return dict((k, keep_only(keys, v)) for k,v in val.items() if k in keys)
return val
Sample usage:
>>> old_dict = [
{
'name': {'first': "John", 'last': "Doe"},
'groups': ["foo", "bar"],
'widgets': [
{'id': 0, 'name': "Acme"},
{'id': 1, 'name': "Anvil"},
],
},
{
'name': {'first': "David"},
'groups': ["bar", "bash", "ding"],
'widgets': [
{'id': 1, 'name': "Anvil"},
{'id': 8, 'name': "Bingo"},
],
},
]
>>> keep_only(set(['name', 'widgets', 'last']), old_dict)
[{'widgets': [{'name': 'Acme'}, {'name': 'Anvil'}], 'name': {'last': 'Doe'}}, {'widgets': [{'name': 'Anvil'}, {'name': 'Bingo'}], 'name': {}}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mapping JSON key-value pairs from source to destination using Python - python

Related

parsing nested json using Pandas

Create avro schema for python dict

How to convert raw json to required format in pythonic way

Mongo Distinct Query with full row object

In Python how can I copy a list of dictionaries, and filter specific keys, including nested ones?

Categories

Resources