I m new to programming, I want to change the following JSON format. I want to remove the "content" keyword as shown in the below example.
[{
"content": "abc",
'entities': [
[44, 55, "SEN"],
[27, 31, "FIN"]
]
}, {
"content": "xyz",
'entities': [
[8, 17, "FIN"]
]
}, {
"content": "klm",
'entities': [
[18, 26, "FIN"]
]
}]
to
[
('abc', {
'entities': [(44, 55, "SEN"), (27, 31, "FIN")]
}),
('xyz', {
'entities': [(8, 17, "FIN")]
}),
('klm', {
'entities': [(18, 26, "FIN"]
})
]
Please help.
Thanks
>>> data = [{
... "content": "abc",
... 'entities': [
... [44, 55, "SEN"],
... [27, 31, "FIN"]
... ]
... }, {
... "content": "xyz",
... 'entities': [
... [8, 17, "FIN"]
... ]
... }, {
... "content": "klm",
... 'entities': [
... [18, 26, "FIN"]
... ]
... }]
>>> [(dct["content"], {"entities": list(map(tuple, dct["entities"]))}) for dct in data]
[('abc', {'entities': [(44, 55, 'SEN'), (27, 31, 'FIN')]}), ('xyz', {'entities': [(8, 17, 'FIN')]}), ('klm', {'entities': [(18, 26, 'FIN')]})]
>>>
In a more readable format:
[
# 2. build a tuple...
(
# 3. whose first element is `content`
dct["content"],
# 4. and the second - a dictionary with one element
{
# 5. which is a list of entities that are converted to `tuple`
"entities": list(map(tuple, dct["entities"]))
}
)
# 1. For each dictionary...
for dct in data
]
You can use list comprehension as:
lst = [{
"content": "abc",
'entities': [
[44, 55, "SEN"],
[27, 31, "FIN"]
]
}, {
"content": "xyz",
'entities': [
[8, 17, "FIN"]
]
}, {
"content": "klm",
'entities': [
[18, 26, "FIN"]
]
}]
output = [( elt["content"], { "entities": [tuple(e) for e in elt["entities"]] } ) for elt in lst]
print(output)
Related
I'm parsing a json and I don't understand how to correctly decompose it into a dataframe.
Json structure i have (api response):
{
"result": {
"data": [],
"totals": [
0
]
},
"timestamp": "2021-11-25 15:19:21"
}
response_ =
{
"result":{
"data":[
{
"dimensions":[
{
"id":"2023-01-10",
"name":""
},
{
"id":"123",
"name":"good3"
}
],
"metrics":[
10,
20,
30,
40
]
},
{
"dimensions":[
{
"id":"2023-01-10",
"name":""
},
{
"id":"234",
"name":"good2"
}
],
"metrics":[
1,
2,
3,
4
]
}
],
"totals":[
11,
22,
33,
44
]
},
"timestamp":"2023-02-07 12:58:40"
}
I don't need "timestamp" and "totals" - just "data". So i do:
...
response_ = requests.post(url, headers=head, data=body)
datas = response_.json()
datas_ = datas['result']['data']
df1 = pd.json_normalize(datas_)
I got:
dimensions
metrics
0
[{'id': '2023-01-10', 'name': ''}, {'id': '123', 'name': 'good1'}]
[10, 20, 30, 40]
1
[{'id': '2023-01-10', 'name': ''}, {'id': '234', 'name': 'good2'}]
[1, 2, 3, 4]
But i need dataframe like:
id_
name_
id
name
metric1
metric2
metric3
metric4
0
2023-01-10
123
good1
10
20
30
40
1
2023-01-10
234
good2
1
2
3
4
When i try like:
df1 = pd.json_normalize(datas_, 'dimensions')
i get all id's and name's in one column.
Explain step by step if possible. Thank you.
Try:
response = {
"result": {
"data": [
{
"dimensions": [
{"id": "2023-01-10", "name": ""},
{"id": "123", "name": "good3"},
],
"metrics": [10, 20, 30, 40],
},
{
"dimensions": [
{"id": "2023-01-10", "name": ""},
{"id": "234", "name": "good2"},
],
"metrics": [1, 2, 3, 4],
},
],
"totals": [11, 22, 33, 44],
},
"timestamp": "2023-02-07 12:58:40",
}
tmp = [
{
**{f"{k}_": v for k, v in d["dimensions"][0].items()},
**{k: v for k, v in d["dimensions"][1].items()},
**{f'metric{i}':m for i, m in enumerate(d['metrics'], 1)}
}
for d in response["result"]["data"]
]
df = pd.DataFrame(tmp)
print(df)
Prints:
id_ name_ id name metric1 metric2 metric3 metric4
0 2023-01-10 123 good3 10 20 30 40
1 2023-01-10 234 good2 1 2 3 4
hopefully he the title is not too confusing, I have a dictionary (sample below) whereby im trying to sort the dictionary by the number of list (dictionary items) across a number of key values beneath a parent. Hopefully the example makes more sense then my description?
{
"data": {
"London": {
"SHOP 1": [
{
"kittens": 10,
"type": "fluffy"
},
{
"puppies": 11,
"type": "squidgy"
}
],
"SHOP 2": [
{
"kittens": 15,
"type": "fluffy"
},
{
"puppies": 3,
"type": "squidgy"
},
{
"fishes": 132,
"type": "floaty"
}
]
},
"Manchester": {
"SHOP 1": [
{
"kittens": 10,
"type": "fluffy"
},
{
"puppies": 11,
"type": "squidgy"
}
],
"SHOP 2": [
{
"kittens": 15,
"type": "fluffy"
},
{
"puppies": 3,
"type": "squidgy"
},
{
"fishes": 132,
"type": "floaty"
}
],
"SHOP 3": [
{
"kittens": 15,
"type": "fluffy"
},
{
"puppies": 3,
"type": "squidgy"
},
]
},
"Edinburgh": {
"SHOP 1": [
{
"kittens": 10,
"type": "fluffy"
},
{
"puppies": 11,
"type": "squidgy"
}
],
"SHOP 2": [
{
"kittens": 15,
"type": "fluffy"
},
],
"SHOP 3": [
{
"puppies": 3,
"type": "squidgy"
},
]
}
}
}
Summary
# London 2 shops, 5 item dictionaries total
# Machester 3 shops, 7 item dictionaries total
# Edinburgh 3 shops, 4 item dictionaries total
Desired sorting would be by total items across the shops, so ordered Manchester, London, Edinburgh
id usually use somethign like the below to sort, but im not sure how to do this oen with it being counting the number of items across a number of keys?
{k: v for k, v in sorted(x.items(), key=lambda item: item[1])}
You need to reverse sort based on the total number of items for each location, which you can generate as:
sum(len(i) for i in s.values())
where s is the shop dictionary for each location.
Putting this into a sorted expression:
dict(sorted(d['data'].items(), key=lambda t:sum(len(i) for i in t[1].values()), reverse=True))
gives:
{
'Manchester': {
'SHOP 1': [{'kittens': 10, 'type': 'fluffy'}, {'puppies': 11, 'type': 'squidgy'}],
'SHOP 2': [{'kittens': 15, 'type': 'fluffy'}, {'puppies': 3, 'type': 'squidgy'}, {'fishes': 132, 'type': 'floaty'}],
'SHOP 3': [{'kittens': 15, 'type': 'fluffy'}, {'puppies': 3, 'type': 'squidgy'}]
},
'London': {
'SHOP 1': [{'kittens': 10, 'type': 'fluffy'}, {'puppies': 11, 'type': 'squidgy'}],
'SHOP 2': [{'kittens': 15, 'type': 'fluffy'}, {'puppies': 3, 'type': 'squidgy'}, {'fishes': 132, 'type': 'floaty'}]
},
'Edinburgh': {
'SHOP 1': [{'kittens': 10, 'type': 'fluffy'}, {'puppies': 11, 'type': 'squidgy'}],
'SHOP 2': [{'kittens': 15, 'type': 'fluffy'}], 'SHOP 3': [{'puppies': 3, 'type': 'squidgy'}]
}
}
No need to make things complex:
adict = adict['data']
result = []
for capital, value in adict.items():
shop_count = len(value)
items = sum([len(obj) for obj in value.values()])
result.append((capital, shop_count, items))
for capital, shop_count, items in sorted(result, key=lambda x: x[2], reverse=True):
print(f'{capital} {shop_count} shops, {items} item dictionaries total')
Output:
Manchester 3 shops, 7 item dictionaries total
London 2 shops, 5 item dictionaries total
Edinburgh 3 shops, 4 item dictionaries total
I have list of dictionaries
rows = [{'sku':123,'barcode':99123,'day_1_qty':9,'store':118},
{'sku':123,'barcode':99123,'day_1_qty':7,'store':109},
{'sku':124,'barcode':99124,'day_1_qty':9,'store':118},
{'sku':123,'barcode':99123,'day_2_qty':10,'store':118}....]
I want merge them and this is my desired output:
rows = [{'sku':123,'barcode':99123,'day_1_qty':9,'store':118,'day_2_qty':10},
{'sku':123,'barcode':99123,'day_1_qty':7,'store':109},
{'sku':124,'barcode':99124,'day_1_qty':9,'store':118},....]
tried merging them by sku but the other store wont show please help
def generate_oos(dict_list):
res = map(lambda dict_tuple: dict(ChainMap(*dict_tuple[1])),
groupby(sorted(dict_list,key=lambda sub_dict: sub_dict["SKU"]),
key=lambda sub_dict: sub_dict["SKU"]))
return list(res)
Try:
rows = [
{"sku": 123, "barcode": 99123, "day_1_qty": 9, "store": 118},
{"sku": 123, "barcode": 99123, "day_1_qty": 7, "store": 109},
{"sku": 124, "barcode": 99124, "day_1_qty": 9, "store": 118},
{"sku": 123, "barcode": 99123, "day_2_qty": 10, "store": 118},
]
tmp = {}
for d in rows:
tmp.setdefault((d["sku"], d["store"]), []).append(d)
out = []
for k, v in tmp.items():
out.append({})
for vv in v:
out[-1].update(vv)
print(out)
Prints:
[
{
"sku": 123,
"barcode": 99123,
"day_1_qty": 9,
"store": 118,
"day_2_qty": 10,
},
{"sku": 123, "barcode": 99123, "day_1_qty": 7, "store": 109},
{"sku": 124, "barcode": 99124, "day_1_qty": 9, "store": 118},
]
I'm making a standard find query to my MongoDB database, it looks like this:
MyData = pd.DataFrame(list(db.MyData.find({'datetimer': {'$gte': StartTime, '$lt': Endtime}})), columns=['price', 'amount', 'datetime'])
Now i'm trying to do another query, but it's more complicated and i don't know how to do it. Here is a sample of my data:
{"datetime": "2020-07-08 15:10", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:15", "price": 22, "amount": 50}
{"datetime": "2020-07-08 15:19", "price": 21, "amount": 40}
{"datetime": "2020-07-08 15:30", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:35", "price": 32, "amount": 50}
{"datetime": "2020-07-08 15:39", "price": 41, "amount": 40}
{"datetime": "2020-07-08 15:49", "price": 32, "amount": 40}
I need to group that data in intervals of 30 Minutes and have them distinct by price. So all the records before 15:30must have 15:30 as datetime, all the records before 16:00 need to have 16:00. An example of the expected output:
The previous data becomes this:
{"datetime": "2020-07-08 15:30", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:30", "price": 22, "amount": 50}
{"datetime": "2020-07-08 16:00", "price": 32, "amount": 50}
{"datetime": "2020-07-08 16:00", "price": 41, "amount": 40}
I don't know if this query is doable, so any kind of advice is appreciated. I can also do that from my code, if it's not possible to do
I tried the code suggested here, but i got the following result, which is not the expected output:
Query = db.myData.aggregate([
{ "$group": {
"_id": {
"$toDate": {
"$subtract": [
{ "$toLong": "$datetime" },
{ "$mod": [ { "$toLong": "$datetime" }, 1000 * 60 * 15 ] }
]
}
},
"count": { "$sum": 1 }
}}
])
for x in Query:
print(x)
//OUTPUT:
{'_id': datetime.datetime(2020, 7, 7, 9, 15), 'count': 39}
{'_id': datetime.datetime(2020, 7, 6, 18, 30), 'count': 44}
{'_id': datetime.datetime(2020, 7, 7, 16, 30), 'count': 54}
{'_id': datetime.datetime(2020, 7, 7, 11, 45), 'count': 25}
{'_id': datetime.datetime(2020, 7, 6, 22, 15), 'count': 48}
{'_id': datetime.datetime(2020, 7, 7, 15, 0), 'count': 30}
...
What #Gibbs suggested is correct, you just have to modify the data a little bit.
Check if the below aggregate query is what you are looking for
Query = db.myData.aggregate([
{
"$group": {
"_id": {
"datetime":{
"$toDate": {
"$subtract": [
{ "$toLong": "$datetime" },
{ "$mod": [ { "$toLong": "$datetime" }, 1000 * 60 * 30 ] }
]
}
},
"price": "$price",
"amount": "$amount"
},
}
},
{
"$replaceRoot": { "newRoot": "$_id"}
}
])
for x in Query:
print(x)
I've list of ordered dict that includes some duplicate Ids in data.. something like this
[OrderedDict([('caseId', 20), ('userId', 1), ('emailStatus', 21)]),
OrderedDict([('caseId', 20), ('userId', 1), ('emailStatus', 20)]),
OrderedDict([('caseId', 18), ('userId', 4), ('emailStatus', 21)]),
OrderedDict([('caseId', 19), ('userId', 3), ('emailStatus', 21)]),
OrderedDict([('caseId', 18), ('userId', 1), ('emailStatus', 20)]),
OrderedDict([('caseId', 20), ('userId', 3), ('emailStatus', 21)]),
OrderedDict([('caseId', 18), ('userId', 4), ('emailStatus', 20)]),
OrderedDict([('caseId', 19), ('userId', 1), ('emailStatus', 20)])]
I want to get a list of nested lists, something like this;
[{
"caseId": "20",
"users": [
{
"userId": "1",
"emailStatus": [
{
"emailStatus" : "20"
},
{
"emailStatus" : "21"
}
]
},
{
"userId": "3",
"emailStatus": [
{
"emailStatus" : "21"
}
]
}
]
},
{
"caseId": "19",
"users": [
{
"userId": "1",
"emailStatus": [
{
"emailStatus" : "20"
}
]
},
{
"userId": "3",
"emailStatus": [
{
"emailStatus" : "21"
}
]
}
]
},
{
"caseId": "18",
"users": [
{
"userId": "1",
"emailStatus": [
{
"emailStatus" : "20"
}
]
},
{
"userId": "4",
"emailStatus": [
{
"emailStatus" : "20"
},
{
"emailStatus" : "21"
}
]
}
]
}
]
presenting a nested list like this;
I tried to achieve this by iterating both lists but couldn't get any idea how to keep record of previous and next records and same data.. that's so confusing.. if anyone can give me a start that how I can iterate my list, it would be very kind of you.
Many regards..
Updated Question
More detailed question here
First, you can use a loop and dict.setdefault to group the data in a nested dict:
temp = {}
for d in lst:
temp.setdefault(d["caseId"], {}).setdefault(d["userId"], []).append(d["emailStatus"])
print(temp)
# {18: {1: [20], 4: [21, 20]}, 19: {1: [20], 3: [21]}, 20: {1: [21, 20], 3: [21]}}
Or using a collections.defaultdict:
temp = defaultdict(lambda: defaultdict(list))
for d in lst:
temp[d["caseId"]][d["userId"]].append(d["emailStatus"])
Then, use a nested mixed dict and list comprehension to aggregate your final result:
res = [{"caseId": case, "users": [{"userId": user, "emailStatus": [{"emailStatus": s} for s in status]}
for user, status in users.items()]}
for case, users in temp.items()]
print(res)
# [{'caseId': 18, 'users': [{'userId': 1, 'emailStatus': [{'emailStatus': 20}]}, {'userId': 4, 'emailStatus': [{'emailStatus': 21}, {'emailStatus': 20}]}]},
# {'caseId': 19, 'users': [{'userId': 1, 'emailStatus': [{'emailStatus': 20}]}, {'userId': 3, 'emailStatus': [{'emailStatus': 21}]}]},
# {'caseId': 20, 'users': [{'userId': 1, 'emailStatus': [{'emailStatus': 21}, {'emailStatus': 20}]}, {'userId': 3, 'emailStatus': [{'emailStatus': 21}]}]}]