How to aggregate after fetching result using groupby using itertools

How to aggregate after fetching result using groupby using itertools - python

I am having a list as
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
This list i have fetched using itertools groupby.
I want to form a list after adding up the quant, amt and current filed for same name and inv_name and create a list something like : [{'name':'xyz','inv_name':'asd','quant':500,'amt':22000,'current':33000}
Any suggestions on how to achieve this?

If you are happy using a 3rd party library, pandas accepts a list of dictionaries:
import pandas as pd
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},
{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
df = pd.DataFrame(a)
res = df.groupby(['name', 'inv_name'], as_index=False).sum().to_dict(orient='records')
# [{'amt': 22000,
# 'current': 33000,
# 'inv_name': 'asd',
# 'name': 'xyz',
# 'quant': 500}]

Related

Handle nested lists in pandas

How can I turn a nested list with dict inside into extra columns in a dataframe in Python?
I received information within a dict from an API,
{'orders':
[
{ 'orderId': '2838168630',
'dateTimeOrderPlaced': '2020-01-22T18:37:29+01:00',
'orderItems': [{ 'orderItemId': 'BFC0000361764421',
'ean': '234234234234234',
'cancelRequest': False,
'quantity': 1}
]},
{ 'orderId': '2708182540',
'dateTimeOrderPlaced': '2020-01-22T17:45:36+01:00',
'orderItems': [{ 'orderItemId': 'BFC0000361749496',
'ean': '234234234234234',
'cancelRequest': False,
'quantity': 3}
]},
{ 'orderId': '2490844970',
'dateTimeOrderPlaced': '2019-08-17T14:21:46+02:00',
'orderItems': [{ 'orderItemId': 'BFC0000287505870',
'ean': '234234234234234',
'cancelRequest': True,
'quantity': 1}
]}
which I managed to turn into a simple dataframe by doing this:
pd.DataFrame(recieved_data.get('orders'))
output:
orderId date oderItems
1 1-12 [{orderItemId: 'dfs13', 'ean': '34234'}]
2 etc.
...
I would like to have something like this
orderId date oderItemId ean
1 1-12 dfs13 34234
2 etc.
...
I already tried to single out the orderItems column with Iloc and than turn it into a list so I can then try to extract the values again. However I than still end up with a list which I need to extract another list from, which has the dict in it.

# Load the dataframe as you have already done.
temp_df = df['orderItems'].apply(pd.Series)
# concat the temp_df and original df
final_df = pd.concat([df, temp_df])
# drop columns if required
Hope it works for you.
Cheers

By combining the answers on this question I reached my end goal. I dit the following:
#unlist the orderItems column
temp_df = df['orderItems'].apply(pd.Series)
#Put items in orderItems into seperate columns
temp_df_json = json_normalize(temp_df[0])
#Join the tables
final_df = df.join(temp_df_json)
#Drop the old orderItems coloumn for a clean table
final_df = final_df.drop(["orderItems"], axis=1)
Also, instead of .concat() I applied .join() to join both tables based on the existing index.

Just to make it clear, you are receiving a json from the API, so you can try to use the function json_normalize.
Try this:
import pandas as pd
from pandas.io.json import json_normalize
# DataFrame initialization
df = pd.DataFrame({"orderId": [1], "date": ["1-12"], "oderItems": [{ 'orderItemId': 'dfs13', 'ean': '34234'}]})
# Serializing inner dict
sub_df = json_normalize(df["oderItems"])
# Dropping the unserialized column
df = df.drop(["oderItems"], axis=1)
# joining both dataframes.
df.join(sub_df)
So the output is:
orderId date ean orderItemId
0 1 1-12 34234 dfs13

How to iterate through this nested dictionary within a list using for loop

I have a list of nested dictionaries that I want to get specific values and put into a dictionary like this:
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'}},
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
I saw previous questions similar to this, but I still can't get the values such as 'axe' and 'red'. I would like the new dict to have a 'Description', 'Confidence' and other columns with the values from the nested dict.
I have tried this for loop:
new_dict = {}
for x in range(len(vid)):
for y in vid[x]['a']:
desc = y['desc']
new_dict['Description'] = desc
I got many errors but mostly this error:
TypeError: string indices must be integers
Can someone please help solve how to get the values from the nested dictionary?

You don't need to iterate through the keys in the dictionary (the inner for-loop), just access the value you want.
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'} },
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
new_dict = {}
list_of_dicts = []
for x in range(len(vid)):
desc = vid[x]['a']['desc']
list_of_dicts.append({'desc': desc})

I have found a temporary solution for this. I decided to use the pandas dataframe instead.
df = pd.DataFrame(columns = ['Desc'])
for x in range(len(vid)):
desc = vid[x]['a']['desc']
df.loc[len(df)] = [desc]

so you want to write this to csv later so pandas will help you a lot for this problem using pandas you can get the desc by
import pandas as pd
new_dict = {}
df = pd.DataFrame(vid)
for index, row in df.iterrows() :
new_dict['description'] = row['a']['desc']
a b
0 {'display': 'axe', 'desc': 'red'} {'confidence': 'good'}
1 {'display': 'book', 'desc': 'blue'} {'confidence': 'poor'}
2 {'display': 'apple', 'desc': 'green'} {'confidence': 'good'}
this is how dataframe looks like a b are column of the dataframe and your nested dicts are rows of dataframe

Try using this list comprehension:
d = [{'Description': i['a']['desc'], 'Confidence': i['b']['confidence']} for i in vid]
print(d)

Get a list of keys and values in a nested dictionary oriented by index

I have an Excel file with a structure like this:
name age status
anna 35 single
petr 27 married
I have converted such a file into a nested dictionary with a structure like this:
{'anna': {'age':35}, {'status': 'single'}},
{'petr': {'age':27}, {'status': 'married'}}
using pandas:
import pandas as pd
df = pd.read_excel('path/to/file')
df.set_index('name', inplace=True)
print(df.to_dict(orient='index'))
But now when running list(df.keys()) it returns me a list of all keys in the dictionary ('age', 'status', etc) but not 'name'.
My eventual goal is that it returns me all the keys and values by typing a name.
Is it possible somehow? Or maybe I should use some other way to import a data in order to achieve a goal? Eventually I should anyway come to a dictionary because I will merge it with other dictionaries by a key.

I think you need parameter drop=False to set_index for not drop column Name:
import pandas as pd
df = pd.read_excel('path/to/file')
df.set_index('name', inplace=True, drop=False)
print (df)
name age status
name
anna anna 35 single
petr petr 27 married
d = df.to_dict(orient='index')
print (d)
{'anna': {'age': 35, 'status': 'single', 'name': 'anna'},
'petr': {'age': 27, 'status': 'married', 'name': 'petr'}}
print (list(df.keys()))
['name', 'age', 'status']

Given a dataframe from excel, you should do this to obtain the thing you want:
resulting_dict = {}
for name, info in df.groupby('name').apply(lambda x: x.to_dict()).iteritems():
stats = {}
for key, values in info.items():
if key != 'name':
value = list(values.values())[0]
stats[key] = value
resulting_dict[name] = stats

Try this :
import pandas as pd
df = pd.read_excel('path/to/file')
df[df['name']=='anna'] #Get all details of anna
df[df['name']=='petr'] #Get all details of petr

Pandas df to dictionary with values as python lists aggregated from a df column

I have a pandas df containing 'features' for stocks, which looks like this:
I am now trying to create a dictionary with unique sector as key, and a python list of tickers for that unique sector as values, so I end up having something that looks like this:
{'consumer_discretionary': ['AAP',
'AMZN',
'AN',
'AZO',
'BBBY',
'BBY',
'BWA',
'KMX',
'CCL',
'CBS',
'CHTR',
'CMG',
etc.
I could iterate over the pandas df rows to create the dictionary, but I prefer a more pythonic solution. Thus far, this code is a partial solution:
df.set_index('sector')['ticker'].to_dict()
Any feedback is appreciated.
UPDATE:
The solution by #wrwrwr
df.set_index('ticker').groupby('sector').groups
partially works, but it returns a pandas series as a the value, instead of a python list. Any ideas about how to transform the pandas series into a python list in the same line and w/o having to iterate the dictionary?

Wouldn't f.set_index('ticker').groupby('sector').groups be what you want?
For example:
f = DataFrame({
'ticker': ('t1', 't2', 't3'),
'sector': ('sa', 'sb', 'sb'),
'name': ('n1', 'n2', 'n3')})
groups = f.set_index('ticker').groupby('sector').groups
# {'sa': Index(['t1']), 'sb': Index(['t2', 't3'])}
To ensure that they have the type you want:
{k: list(v) for k, v in f.set_index('ticker').groupby('sector').groups.items()}
or:
f.set_index('ticker').groupby('sector').apply(lambda g: list(g.index)).to_dict()

Map multiple lists of values to a list of keys in a python dictionary?

I want to map some values(a list of lists) to some keys(a list) in a python dictionary.
I read Map two lists into a dictionary in Python
and figured I could do that this way :
headers = ['name', 'surname', 'city']
values = [
['charles', 'rooth', 'kentucky'],
['william', 'jones', 'texas'],
['john', 'frith', 'los angeles']
]
data = []
for entries in values:
data.append(dict(itertools.izip(headers, entries)))
But I was just wondering is there is a nicer way to go?
Thanks
PS: I'm on python 2.6.7

You could use a list comprehension:
data = [dict(itertools.izip(headers, entries)) for entries in values]

from functools import partial
from itertools import izip, imap
data = map(dict, imap(partial(izip, headers), values))

It's already really nice...
data = [dict(itertools.izip(headers, entries) for entries in values]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to aggregate after fetching result using groupby using itertools - python

Related

Handle nested lists in pandas

How to iterate through this nested dictionary within a list using for loop

Get a list of keys and values in a nested dictionary oriented by index

Pandas df to dictionary with values as python lists aggregated from a df column

Map multiple lists of values to a list of keys in a python dictionary?

Categories

Resources