Create avro schema for python dict - python

I want to create an avro-schema for following python-dictionary:
d = {
'topic': 'example',
'content': (
{ 'description': {'name': 'alex', 'value': 12}, 'id': '234ba' },
{ 'description': {'name': 'john', 'value': 14}, 'id': '823cx' }
)
}
How can I do this?

Have you tried to use the default serialization and deserialization included in the avro library for python?
https://avro.apache.org/docs/1.10.0/gettingstartedpython.html
Verify that is what you want

Related

Mapping JSON key-value pairs from source to destination using Python

Using Python requests I want to grab a piece of JSON from one source and post it to a destination. The structure of the JSON received and the one required by the destination, however, differs a bit so my question is, how do I best map the items from the source structure onto the destination structure?
To illustrate, imagine we get a list of all purchases made by John and Mary. And now we want to post the individual items purchased linking these to the individuals who purchased them (NOTE: The actual use case involves thousands of entries so I am looking for an approach that would scale accordingly):
Source JSON:
{
'Total Results': 2,
'Results': [
{
'Name': 'John',
'Age': 25,
'Purchases': [
{
'Fruits': {
'Type': 'Apple',
'Quantity': 3,
'Color': 'Red'}
},
{
'Veggie': {
'Type': 'Salad',
'Quantity': 2,
'Color': 'Green'
}
}
]
},
{
'Name': 'Mary',
'Age': 20,
'Purchases': [
{
'Fruits': {
'Type': 'Orange',
'Quantity': 2,
'Color': 'Orange'
}
}
]
}
]
}
Destination JSON:
{
[
{
'Purchase': 'Apple',
'Purchased by': 'John',
'Quantity': 3,
'Type': 'Red',
},
{
'Purchase': 'Salad',
'Purchased by': 'John',
'Quantity': 2,
'Type': 'Green',
},
{
'Purchase': 'Orange',
'Purchased by': 'Mary',
'Quantity': 2,
'Type': 'Orange',
}
]
}
Any help on this would be greatly appreciated! Cheers!
Just consider loop through the dict.
res = []
for result in d['Results']:
value = {}
for purchase in result['Purchases']:
item = list(purchase.values())[0]
value['Purchase'] = item['Type']
value['Purchased by'] = result['Name']
value['Quantity'] = item['Quantity']
value['Type'] = item['Color']
res.append(value)
pprint(res)
[{'Purchase': 'Apple', 'Purchased by': 'John', 'Quantity': 3, 'Type': 'Red'},
{'Purchase': 'Salad', 'Purchased by': 'John', 'Quantity': 2, 'Type': 'Green'},
{'Purchase': 'Orange', 'Purchased by': 'Mary', 'Quantity': 2, 'Type': 'Orange'}]

how to ajax post form as json type to python then get data correct way?

js part
$('#btnUpdate').click(function(){
var formData = JSON.stringify($("#contrast_rule_set").serializeArray());
$.ajax({
type: "POST",
url: "./contrast_rule_set",
data: formData,
success: function(){},
dataType: "json",
contentType : "application/json"
});
})
python part
#app.route('/get_test', methods=['GET','POST'])
def get_test():
web_form_data = request.json
print(web_form_data)
print(type(web_form_data))
print(jsonify(web_form_data))
print(json.dumps(web_form_data))
python print console like
[{'name': 'logic_1', 'value': '1'}, {'name': 'StudyDescription_1', 'value': ''}, {'name': 'SeriesDescription_1', 'value': 'C\\+'}, {'name': 'ImageComments_1', 'value': ''}, {'name': 'logic_2', 'value': '1'}, {'name': 'StudyDescription_2', 'value': '\\-C'}, {'name': 'SeriesDescription_2', 'value': '\\-C'}, {'name': 'ImageComments_2', 'value': '\\-C'}, {'name': 'logic_3', 'value': '1'}, {'name': 'StudyDescription_3', 'value': ''}, {'name': 'SeriesDescription_3', 'value': '\\+C'}, {'name': 'ImageComments_3', 'value': '\\+C'}]
<class 'list'>
how to get list to json data type (or converter ) (html side code adjust or python side code adjust? )
then hope to get data like json type (data is from my another json file )
{
'Logic': 'AND',
'StudyDescription': '',
'SeriesDescription': 'C\+',
'ImageComments': ''
},
{
'Logic': 'NOT',
'StudyDescription': '\-C',
'SeriesDescription': '\-C',
'ImageComments': '\-C'
},
{
'Logic': 'AND',
'StudyDescription': '',
'SeriesDescription': '\+C',
'ImageComments': '\+C'
}

How to convert raw json to required format in pythonic way

I have json from some service, where each value is different row.
Input example:
[
{'author': 'alf', 'topic': 'topic1', 'lang': 'ge', 'value': 11},
{'author': 'alf', 'topic': 'topic1', 'lang': 'ge', 'value': 22},
{'author': 'bob', 'topic': 'topic1', 'lang': 'ge', 'value': 33},
{'author': 'bob', 'topic': 'topic1', 'lang': 'ge', 'value': 44},
{'author': 'alf', 'topic': 'topic1', 'lang': 'fr', 'value': 99},
{'author': 'alf', 'topic': 'topic2', 'lang': 'ge', 'value': -20},
]
Output example:
{
'alf': {
'topic1': [
{'ge': [11, 22]},
{'fr': [99]}
],
'topic2': [
{'ge': [-20]}
]
},
'bob': {
'topic1': [
{'ge': [33, 44]}
]
}
}
So basically this is simple transformation via grouping specified keys to collect all values in to one array.
I done this transformation via checking and creating required key if it is missing:
for entry in self._raw_data:
parsed = {}
author = entry["author"]
topic = entry["topic"]
lang = entry["lang"]
value = entry["value"]
if not parsed.get(author):
parsed[author] = {}
if not parsed[author].get(topic):
parsed[author][topic] = []
#etc
I am sure, that could be done in more transparent way. Can anyone recommend something?
If you're willing to change the type of "topic"'s value from list to dict, you can use .setdefault():
res = {}
for entry in raw_data:
res.setdefault(entry['author'], {}).setdefault(entry["topic"], {}).setdefault(entry["lang"], []).append(entry["value"])
OUTPUT:
{
"alf": {
"topic1": {
"fr": [99],
"ge": [11, 22]
},
"topic2": {
"ge": [-20]
}
},
"bob": {
"topic1": {
"ge": [33, 44]
}
}
}

Filter python dictionary with dictionary-comprehension

I have a dictionary that is really a geojson:
points = {
'crs': {'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}, 'type': 'name'},
'features': [
{'geometry': {
'coordinates':[[[-3.693162104185235, 40.40734504903418],
[-3.69320229317164, 40.40719570724241],
[-3.693227952841606, 40.40698546120488],
[-3.693677594635894, 40.40712700492216]]],
'type': 'Polygon'},
'properties': {
'name': 'place1',
'temp': 28},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place2',
'temp': 27},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place',
'temp': 25},
'type': 'Feature'
}
],
'type': u'FeatureCollection'
}
I would like to filter it to stay only with places that have a specific temperature, for example, more than 25 degrees Celsius.
I have managed to do it this way:
dict(crs = points["crs"],
features = [i for i in points["features"] if i["properties"]["temp"] > 25],
type = points["type"])
But I wondered if there was any way to do it more directly, with dictionary comprehension.
Thank you very much.
I'm very late. A dict compreheneison won't help you since you have only three keys. But if you meet the following conditions: 1. you don't need a copy of features (e.g. your dict is read only); 2. you don't need index access to features, you my use a generator comprehension instead of a list comprehension:
dict(crs = points["crs"],
features = (i for i in points["features"] if i["properties"]["temp"] > 25),
type = points["type"])
The generator is created in constant time, while the list comprehension is created in O(n). Furthermore, if you create a lot of those dicts, you have only one copy of the features in memory.

Mongo Distinct Query with full row object

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.
I have following data stored in mongo
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
you can see some of the rows are duplicate with different id
as long as it will take to solve this issue from input I must tackle it on output.
I need the data in the following way:
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
My query
keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})
As you can see it takes two queries for a same result set Please combine it to one as the database is very large
I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.
Or it will??
You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:
db.data.aggregate([{
"$sort": {
"_id": 1
}
}, {
"$group": {
"_id": "$key",
"first": {
"$first": "$$ROOT"
}
}
}, {
"$project": {
"_id": 0,
"id":"$first.id",
"key":"$first.key",
"name":"$first.name",
"country":"$first.country"
}
}])

Categories