flatten out sub level data of dictionary into one level?

flatten out sub level data of dictionary into one level? - python

"dataFrameData": [
{
"intersection": {
"Item": "Item1",
"Customer": "Customer1",
"Month": "1"
},
"measures": {
"Sales": 1212.33,
"Forecast": 400
}
},
{
"intersection": {
"Item": "Item1",
"Customer": "Customer1",
"Month": "2"
},
"measures": {
"Sales": 1200,
"Forecast": 450
}
}
]
I have dataframe stored like this in a list and want to flatten out into one level by removing "intersection" and "measures" level. after flattening it out it should look like this:
[
{
"Item": "Item1",
"Customer": "Customer1",
"Month": "1"
"Sales": 1212.33,
"Forecast": 400
},
{
"Item": "Item2",
"Customer": "Customer2",
"Month": "12"
"Sales": 1212.33,
"Forecast": 800
}
]
Is there any approach to do that in o(1) space complexity? instead of building new list and copying items using loop

Depends on what you mean by O(1). If you just want to avoid using an explicit loop, you can do this
dict = [dict(x['intersection'],**x['measures']) for x in dataFrameData],
which returns
[{'Item': 'Item1',
'Customer': 'Customer1',
'Month': '1',
'Sales': 1212.33,
'Forecast': 400},
{'Item': 'Item1',
'Customer': 'Customer1',
'Month': '2',
'Sales': 1200,
'Forecast': 450}]
If you really need the space complexity, you can use
[(x.update(x['intersection']),x.update(x['measures']),x.pop('intersection'),x.pop('measures')) for x in dataFrameData].
While this is list comprehension and is technically a loop, every function in there is in-place. This gives me the output:
[{'Item': 'Item1',
'Customer': 'Customer1',
'Month': '1',
'Sales': 1212.33,
'Forecast': 400},
{'Item': 'Item1',
'Customer': 'Customer1',
'Month': '2',
'Sales': 1200,
'Forecast': 450}]

Try this to make your new list:
original_list = [
{
"intersection": {
"Item": "Item1",
"Customer": "Customer1",
"Month": "1"
},
"measures": {
"Sales": 1212.33,
"Forecast": 400
}
},
{
"intersection": {
"Item": "Item1",
"Customer": "Customer1",
"Month": "2"
},
"measures": {
"Sales": 1200,
"Forecast": 450
}
}
]
print([each_element["intersection"] for each_element in original_list])
The output:
[{'Item': 'Item1', 'Customer': 'Customer1', 'Month': '1'}, {'Item': 'Item1', 'Customer': 'Customer1', 'Month': '2'}]

Related

How to get the individual count of field from Elasticsearch

My content inside a dictionary is below
test=
[ { 'masterid': '1', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Accounting', 'parentname': 'Finance'}, { 'id': '3', 'name': 'Research', 'parentname': 'R & D' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }] },
{ 'masterid': '2', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Research', 'parentname': '' }, { 'id': '3', 'name': 'Accounting', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Tester' }, { 'id': '5033', 'name': 'Developer' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]},
{ 'masterid': '3', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Engineering' }, { 'id': '3', 'name': 'Engineering', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Developer' }, { 'id': '5033', 'name': 'Developer', 'parentname': '' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]}]
Code is below to put into elastic search index
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index='new')
for e in test:
es.index(index="new", body=e, id=e['id'])
I want to get the count of masterid of BusinessArea which is all the names
Here it is Accounting, Research Engineering
[ {
"name": "BusinessArea",
"values": [
{
"name": "Accounting",
"count": "2"
},
{
"name": "Research",
"count": "2"
},
{
"name": "Engineering",
"count": "1"
}]
}]
or can i have answer like below
{
"A": {
"Designation": [{
"key": "L1",
"doc_count": 3
},
{
"key": "L2",
"doc_count": 3
}
]
},
{
"B": {
"BusinessArea": [{
"key": "Accounting",
"doc_count": 2
},
{
"key": "Research",
"doc_count": 2
},
{
"key": "Engineering",
"doc_count": 1
}
]
}
}

If you want to get the individual count of the field you can use the terms aggregation that is a multi-bucket value source-based aggregation where buckets are dynamically built - one per unique value.
Search Query:
{
"size":0,
"aggs": {
"countNames": {
"terms": {
"field": "BusinessArea.name.keyword"
}
}
}
}
Search Result:
"aggregations": {
"countNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Accounting",
"doc_count": 2
},
{
"key": "Research",
"doc_count": 2
},
{
"key": "Engineering",
"doc_count": 1
}
]
}
Update 1:
If you want to have an individual count of the field for Designation as well as BusinessArea
Search Query:
{
"size": 0,
"aggs": {
"countNames": {
"terms": {
"field": "BusinessArea.name.keyword"
}
},
"designationNames": {
"terms": {
"field": "Designation.name.keyword"
}
}
}
}
Search Result:
"aggregations": {
"designationNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "L1",
"doc_count": 3
},
{
"key": "L2",
"doc_count": 3
}
]
},
"countNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Accounting",
"doc_count": 2
},
{
"key": "Research",
"doc_count": 2
},
{
"key": "Engineering",
"doc_count": 1
}
]
}

You can simply use the count API of elasticsearch to get the count of All the documents in the elasticsearch index or based on a condition as shown in the same doc.
For your case, it should be like
GET /<your-index-name>/_count?q=name:BusinessArea
Or, if masterid is the Unique-id in your document, you can simply use
GET /<your-index-name>/_count

How to get the count for a particular key in the dictionary

My content inside a dictionary is below
I need to now for BusinessArea how many different name key is there, like this need to know Designation also
test=
[ { 'masterid': '1', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Accounting', 'parentname': 'Finance'}, { 'id': '3', 'name': 'Research', 'parentname': 'R & D' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }] },
{ 'masterid': '2', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Research', 'parentname': '' }, { 'id': '3', 'name': 'Accounting', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Tester' }, { 'id': '5033', 'name': 'Developer' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]},
{ 'masterid': '3', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Engineering' }, { 'id': '3', 'name': 'Engineering', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Developer' }, { 'id': '5033', 'name': 'Developer', 'parentname': '' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]}]
I want to get the count of masterid of BusinessArea and Designation which is all the names
Expected out is below
[
{
"name": "BusinessArea",
"values": [
{
"name": "Accounting",
"count": "2"
},
{
"name": "Research",
"count": "2"
},
{
"name": "Engineering",
"count": "1"
}
]
},
{
"name": "Designation",
"values": [
{
"name": "L1",
"count": "3"
},
{
"name": "l2",
"count": "3"
}
]
}
]

Try this:
res=[{'name': 'BusinessArea', 'values': []}, {'name': 'Designation', 'values': []}]
listbus=sum([i['BusinessArea'] for i in test], [])
listdes=sum([i['Designation'] for i in test], [])
res[0]['values']=[{'name':i, 'count':0} for i in set(k['name'] for k in listbus)]
res[1]['values']=[{'name':i, 'count':0} for i in set(k['name'] for k in listdes)]
for i in listbus:
for k in range(len(res[0]['values'])):
if i['name']==res[0]['values'][k]['name']:
res[0]['values'][k]['count']+=1
for i in listdes:
for k in range(len(res[1]['values'])):
if i['name']==res[1]['values'][k]['name']:
res[1]['values'][k]['count']+=1
>>> print(res)
[{'name': 'BusinessArea', 'values': [{'name': 'Accounting', 'count': 2}, {'name': 'Research', 'count': 2}, {'name': 'Engineering', 'count': 2}]}, {'name': 'Designation', 'values': [{'name': 'L1', 'count': 3}, {'name': 'L2', 'count': 6}]}]

You could count unique names using a nested collections.defaultdict:
from collections import defaultdict
from json import dumps
keys = ["BusinessArea", "Designation"]
group_counts = defaultdict(lambda: defaultdict(int))
for group in test:
for key in keys:
names = [item["name"] for item in group[key]]
unique_names = list(dict.fromkeys(names))
for name in unique_names:
group_counts[key][name] += 1
print(dumps(group_counts, indent=2))
Which will give you these counts:
{
"BusinessArea": {
"Accounting": 2,
"Research": 2,
"Engineering": 1
},
"Designation": {
"L1": 3,
"L2": 3
}
}
Then you could modify the result to get the list of dicts you expect:
result = [
{
"name": name,
"values": [{"name": value, "count": count} for value, count in counts.items()],
}
for name, counts in group_counts.items()
]
print(dumps(result, indent=2))
Which gives you this:
[
{
"name": "BusinessArea",
"values": [
{
"name": "Accounting",
"count": 2
},
{
"name": "Research",
"count": 2
},
{
"name": "Engineering",
"count": 1
}
]
},
{
"name": "Designation",
"values": [
{
"name": "L1",
"count": 3
},
{
"name": "L2",
"count": 3
}
]
}
]

MongoDB | Update rows record by record on basis of one field

I want to update the documents/records of a collection in mongodb in python with the min/max/avg of temperature on the basis of a time range.
In the below example suppose time range is given to me "20:09-20:15", then the last row will not be updated rest of the ones will do.
Sample Data:
[
{'date': "1-10-2020", 'time': "20:09", 'temperature': 20}, //1
{'date': "1-10-2020", 'time': "20:11", 'temperature': 19}, //2
{'date': "1-10-2020", 'time': "20:15", 'temperature': 18}, //3
{'date': "1-10-2020", 'time': "20:18", 'temperature': 18} //4
]
Required output:
[
{'date': "1-10-2020", 'time': "20:09", 'temperature': 20, 'MIN': 20, 'MAX': 20, 'AVG': 20}, //1
{'date': "1-10-2020", 'time': "20:11", 'temperature': 19, 'MIN': 19, 'MAX': 20, 'AVG': 19.5}, //2
{'date': "1-10-2020", 'time': "20:15", 'temperature': 18, 'MIN': 18, 'MAX': 20, 'AVG': 19}, //3
{'date': "1-10-2020", 'time': "20:18", 'temperature': 18} //4
]

If you're using Mongo version 4.4+ you can use $merge to achieve this using a pipline:
db.collection.aggregate([
{
$match: {
time: {
$gte: "20:09",
$lte: "20:15"
}
}
},
{
$group: {
_id: null,
avg: {
$avg: "$temperature"
},
min: {
$min: "$temperature"
},
max: {
$max: "$temperature"
},
root: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$root"
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$root",
{
"MIN": "$min",
"MAX": "$max",
"AVG": "$avg"
}
]
}
}
},
{
$merge: {
into: "collection",
on: "_id",
whenMatched: "replace"
}
}
])
Mongo Playground
If you're on a lesser Mongo version you have to split this into 2 calls, First use the same $group stage to fetch results, then use the values to update: (i'll write this one in python as you've tagged you're using pymongo)
results = list(collection.aggregate([
{
"$match": {
"time": {
"$gte": "20:09",
"$lte": "20:15"
}
}
},
{
"$group": {
"_id": None,
"avg": {
"$avg": "$temperature"
},
"min": {
"$min": "$temperature"
},
"max": {
"$max": "$temperature"
},
"root": {
"$push": "$$ROOT"
}
}
}
]))
collection.update_many(
{
"time": {
"$gte": "20:09",
"$lt": "20:15"
}
},
{
"$set": {
"MAX": results[0]["max"],
"MIN": results[0]["min"],
"AVG": results[0]["avg"],
}
}
)

How to return a single group from a panda dataframe

I have a dataframe with goal scorers and I would like to extract the top scoring group into an array. This group can contain more than one items (in the example below there are two players with 8 goals).
So in the example below it would result in an array like this:
[{'goals': 8, 'name': 'Sergio Agüero', 'team': 'Manchester City'}, {'goals': 8, 'name': 'Tammy Abraham', 'team': 'Chelsea'}]
import pandas as pd
data = [
{
"name": "Sergio Ag\u00fcero",
"team": "Manchester City",
"goals": "8"
},
{
"name": "Tammy Abraham",
"team": "Chelsea",
"goals": "8"
},
{
"name": "Pierre-Emerick Aubameyang",
"team": "Arsenal",
"goals": "7"
},
{
"name": "Raheem Sterling",
"team": "Manchester City",
"goals": "6"
},
{
"name": "Teemu Pukki",
"team": "Norwich",
"goals": "6"
}
]
top_scorers = pd.DataFrame(data, columns=["name", "team", "goals"])
top_scoring_group = top_scorers.groupby("goals")

IIUC,
(top_scorers[top_scorers['goals'].eq(top_scorers['goals'].max())]
.to_dict('rows')
)
Output:
[{'name': 'Sergio Agüero', 'team': 'Manchester City', 'goals': '8'},
{'name': 'Tammy Abraham', 'team': 'Chelsea', 'goals': '8'}]

top_scoring_group = top_scorers.groupby("team", as_index=False)['goals'].sum().nlargest(1, 'goals', keep='all')['team']
This will get the teams with most goals, and keep them all if there are more than one.

Python sort a JSON list by two key values

I have a JSON list looks like this:
[{ "id": "1", "score": "100" },
{ "id": "3", "score": "89" },
{ "id": "1", "score": "99" },
{ "id": "2", "score": "100" },
{ "id": "2", "score": "59" },
{ "id": "3", "score": "22" }]
I want to sort the id first, I used
sorted_list = sorted(json_list, key=lambda k: int(k['id']), reverse = False)
This will only sort the list by id, but base on id, I also want sort the score as will, the final list I want is like this:
[{ "id": "1", "score": "100" },
{ "id": "1", "score": "99" },
{ "id": "2", "score": "100" },
{ "id": "2", "score": "59" },
{ "id": "3", "score": "89" },
{ "id": "3", "score": "22" }]
So for each id, sort their score as well. Any idea how to do that?

use a tuple adding second sort key -int(k["score"]) to reverse the order when breaking ties and remove reverse=True:
sorted_list = sorted(json_list, key=lambda k: (int(k['id']),-int(k["score"])))
[{'score': '100', 'id': '1'},
{'score': '99', 'id': '1'},
{'score': '100', 'id': '2'},
{'score': '59', 'id': '2'},
{'score': '89', 'id': '3'},
{'score': '22', 'id': '3'}]
So we primarily sort by id from lowest-highest but we break ties using score from highest-lowest. dicts are also unordered so there is no way to put id before score when you print without maybe using an OrderedDict.
Or use pprint:
from pprint import pprint as pp
pp(sorted_list)
[{'id': '1', 'score': '100'},
{'id': '1', 'score': '99'},
{'id': '2', 'score': '100'},
{'id': '2', 'score': '59'},
{'id': '3', 'score': '89'},
{'id': '3', 'score': '22'}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

flatten out sub level data of dictionary into one level? - python

Related

How to get the individual count of field from Elasticsearch

How to get the count for a particular key in the dictionary

MongoDB | Update rows record by record on basis of one field

How to return a single group from a panda dataframe

Python sort a JSON list by two key values

Categories

Resources