Related
I have a json file with the following structure:
{'0': {'transaction': [{'transaction_key': '406.l.657872.tr.374',
'transaction_id': '374',
'type': 'add/drop',
'status': 'successful',
'timestamp': '1639593953'},
{'players': {'0': {'player': [[{'player_key': '406.p.100006'},
{'player_id': '100006'},
{'name': {'full': 'Dallas',
'first': 'Dallas',
'last': '',
'ascii_first': 'Dallas',
'ascii_last': ''}},
{'editorial_team_abbr': 'Dal'},
{'display_position': 'DEF'},
{'position_type': 'DT'}],
{'transaction_data': [{'type': 'add',
'source_type': 'freeagents',
'destination_type': 'team',
'destination_team_key': '406.l.657872.t.10',
'destination_team_name': 'Team 1'}]}]},
'1': {'player': [[{'player_key': '406.p.24793'},
{'player_id': '24793'},
{'name': {'full': 'Julio Jones',
'first': 'Julio',
'last': 'Jones',
'ascii_first': 'Julio',
'ascii_last': 'Jones'}},
{'editorial_team_abbr': 'Ten'},
{'display_position': 'WR'},
{'position_type': 'O'}],
{'transaction_data': {'type': 'drop',
'source_type': 'team',
'source_team_key': '406.l.657872.t.10',
'source_team_name': 'Team 1',
'destination_type': 'waivers'}}]},
'count': 2}}]},
'1': {'transaction': [{'transaction_key': '406.l.657872.tr.373',
'transaction_id': '373',
'type': 'add/drop',
'status': 'successful',
'timestamp': '1639575496'},
{'players': {'0': {'player': [[{'player_key': '406.p.32722'},
{'player_id': '32722'},
{'name': {'full': 'Cam Akers',
'first': 'Cam',
'last': 'Akers',
'ascii_first': 'Cam',
'ascii_last': 'Akers'}},
{'editorial_team_abbr': 'LAR'},
{'display_position': 'RB'},
{'position_type': 'O'}],
{'transaction_data': [{'type': 'add',
'source_type': 'freeagents',
'destination_type': 'team',
'destination_team_key': '406.l.657872.t.5',
'destination_team_name': 'Team 2'}]}]},
'1': {'player': [[{'player_key': '406.p.100007'},
{'player_id': '100007'},
{'name': {'full': 'Denver',
'first': 'Denver',
'last': '',
'ascii_first': 'Denver',
'ascii_last': ''}},
{'editorial_team_abbr': 'Den'},
{'display_position': 'DEF'},
{'position_type': 'DT'}],
{'transaction_data': {'type': 'drop',
'source_type': 'team',
'source_team_key': '406.l.657872.t.5',
'source_team_name': 'Team 2',
'destination_type': 'waivers'}}]},
'count': 2}}]},
'2': {'transaction': [{'transaction_key': '406.l.657872.tr.372',
'transaction_id': '372',
'type': 'add/drop',
'status': 'successful',
'timestamp': '1639575448'},
{'players': {'0': {'player': [[{'player_key': '406.p.33413'},
{'player_id': '33413'},
{'name': {'full': 'Travis Etienne',
'first': 'Travis',
'last': 'Etienne',
'ascii_first': 'Travis',
'ascii_last': 'Etienne'}},
{'editorial_team_abbr': 'Jax'},
{'display_position': 'RB'},
{'position_type': 'O'}],
{'transaction_data': [{'type': 'add',
'source_type': 'freeagents',
'destination_type': 'team',
'destination_team_key': '406.l.657872.t.5',
'destination_team_name': 'Team 2'}]}]},
'1': {'player': [[{'player_key': '406.p.24815'},
{'player_id': '24815'},
{'name': {'full': 'Mark Ingram II',
'first': 'Mark',
'last': 'Ingram II',
'ascii_first': 'Mark',
'ascii_last': 'Ingram II'}},
{'editorial_team_abbr': 'NO'},
{'display_position': 'RB'},
{'position_type': 'O'}],
{'transaction_data': {'type': 'drop',
'source_type': 'team',
'source_team_key': '406.l.657872.t.5',
'source_team_name': 'Team 2',
'destination_type': 'waivers'}}]},
'count': 2}}]}
These are transactions for a fantasy football league and I'd like to organize each transaction into a dataframe, however I'm running into issues normalizing the data. I figure I'd need to begin a loop, but am slightly stuck in the mud and would appreciate if anyone has any suggestions. Thank You.
Ideally, I'm looking to summarize each transaction with the following dataframe structure:
transaction_id type added pos_1 dropped pos_2 timestamp
374 add/drop Dallas DEF Julio Jones WR 1639593953
373 add/drop Cam Akers RB Denver DEF 1639575496
372 add/drop Travis Etienne RB Mark Ingram II RB 1639575448
I got the following list of dicts
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 8}
]
My target is, when values of 'product' and 'city' are the same, sum up the values of 'quantity'.
The result should look like this:
result_list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 38},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 13},
]
Is there a pythonic way? I tried a couple of things but I better not show them because they are really ugly.
Thank you in advance!
You can do the following, using only standard library utils:
from operator import itemgetter
from functools import reduce
from itertools import groupby
pc = itemgetter("product", "city") # sorting and grouping key
q = itemgetter("quantity")
combine = lambda d1, d2: {**d1, "quantity": q(d1) + q(d2)}
[reduce(combine, g) for _, g in groupby(sorted(list_of_dicts, key=pc), key=pc)]
# [{'product': 'bus', 'city': 'miami', 'quantity': 5},
# {'product': 'car', 'city': 'new york', 'quantity': 38},
# {'product': 'container', 'city': 'atlanta', 'quantity': 13}]
Or, maybe even simpler and linear:
from collections import Counter
pc = itemgetter("product", "city")
q = itemgetter("quantity")
totals = Counter()
for dct in list_of_dicts:
totals[pc(dct)] += q(dct)
result_list_of_dicts = [
{"product": p, "city": c, "quantity": q} for (p, c), q in totals.items()
]
One approach using collections.Counter
from collections import Counter
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
counts = sum((Counter({(d["product"], d["city"]): d["quantity"]}) for d in list_of_dicts), Counter())
result = [{"product": product, "city": city, "quantity": quantity} for (product, city), quantity in counts.items()]
print(result)
A pandas implementation
Group by "product" and "city", sum over the groups and reset index to get original columns.
import pandas as pd
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
df = pd.DataFrame(list_of_dicts)
print(df)
df = df.groupby(["product", "city"]).sum().reset_index()
print(df)
summed_dict = df.to_dict("records")
print(summed_dict)
You could do it with a loop, initializing it the first time you encounter the product.
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
new_dict = {}
for ld in list_of_dicts:
if ld['product'] not in new_dict:
new_dict[ld['product']] = {}
new_dict[ld['product']]['city'] = ld['city']
new_dict[ld['product']]['quantity'] = 0
new_dict[ld['product']]['quantity'] += ld['quantity']
# print(new_dict)
# {'car': {'city': 'new york', 'quantity': 38}, 'bus': {'city': 'miami', 'quantity': 5}, 'container': {'city': 'atlanta', 'quantity': 13}}
result_list_of_dicts = [{'product': nd,
'city': new_dict[nd]['city'],
'quantity': new_dict[nd]['quantity']} for nd in new_dict]
# print(result_list_of_dicts)
# [{'product': 'car', 'city': 'new york', 'quantity': 38}, {'product': 'bus', 'city': 'miami', 'quantity': 5}, {'product': 'container', 'city': 'atlanta', 'quantity': 13}]
I have a list of dictionaries, and within the dictionaries there are dictionaries, and within those dictionaries, there are lists as values - within those lists is the information I need to access.
I want to turn the lists into dictionaries. The entire list of dictionaries are set up like this:
data = [{'date': 'Aug 1 1980',
'hour': '2PM',
'group': {'location' :
[{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]},
{'date': 'Aug 1 1980',
'hour': '4PM',
'group': {'location' :
[{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]}]
I need the the coordinates but the type of location is a list. How can I turn this list into a dictionary? Should I start by splitting by ':' and ','s into keys and values? That seems like an awful way to do it and I'm hoping someone can help me with a better, quicker way.
Edit
I would want my dictionary to look like this:
{'country': 'United States', 'state': 'Utah', 'city' :'St George', 'coordinates': [37.0965, 113.5684]}
I think the following does what you want (although I'm not totally sure since I had to fix your input data to make it valid and guess a little on what exactly you wanted the result to be.
from pprint import pprint
data = [{'date': 'Aug 1 1980',
'hour': '2PM',
'group': {'location': [{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]}},
{'date': 'Aug 1 1980',
'hour': '4PM',
'group': {'location': [{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]}}]
fixed_data = []
for dct in data:
dct['group']['location'] = dct['group']['location'][0]
fixed_data.append(dct)
pprint(fixed_data, sort_dicts=0)
Printed result:
[{'date': 'Aug 1 1980',
'hour': '2PM',
'group': {'location': {'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}}},
{'date': 'Aug 1 1980',
'hour': '4PM',
'group': {'location': {'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}}}]
Imagine I have the following dictionary.For every record (row of data), I want to merge the dictionaries of sub fields into a single dictionary. So in the end I have a list of dictionaries. One per each record.
Data = [{'Name': 'bob', 'age': '40’}
{'Name': 'tom', 'age': '30’},
{'Country’: 'US', 'City': ‘Boston’},
{'Country’: 'US', 'City': ‘New York},
{'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Email’: 'tom#fake.com', 'Phone': ‘none'}]
Output = [
{'Name': 'bob', 'age': '40’,'Country’: 'US', 'City': ‘Boston’,'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Name': 'tom', 'age': '30’,'Country’: 'US', 'City': ‘New York', 'Email’: 'tom#fake.com', 'Phone': ‘none'}
]
Related: How do I merge a list of dicts into a single dict?
I understand you know which dictionary relates to Bob and which dictionary relates to Tom by their position: dictionaries at even positions relate to Bob, while dictionaries at odd positions relate to Tom.
You can check whether a number is odd or even using % 2:
Data = [{'Name': 'bob', 'age': '40'},
{'Name': 'tom', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Email': 'bob#fake.com', 'Phone': 'bob phone'},
{'Email': 'tom#fake.com', 'Phone': 'none'}]
bob_dict = {}
tom_dict = {}
for i,d in enumerate(Data):
if i % 2 == 0:
bob_dict.update(d)
else:
tom_dict.update(d)
Output=[bob_dict, tom_dict]
Or alternatively:
Output = [{}, {}]
for i, d in enumerate(Data):
Output[i%2].update(d)
This second approach is not only shorter to write, it's also faster to execute and easier to scale if you have more than 2 people.
Splitting the list into more than 2 dictionaries
k = 4 # number of dictionaries you want
Data = [{'Name': 'Alice', 'age': '40'},
{'Name': 'Bob', 'age': '30'},
{'Name': 'Charlie', 'age': '30'},
{'Name': 'Diane', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Country': 'UK', 'City': 'London'},
{'Country': 'UK', 'City': 'Oxford'},
{'Email': 'alice#fake.com', 'Phone': 'alice phone'},
{'Email': 'bob#fake.com', 'Phone': '12345'},
{'Email': 'charlie#fake.com', 'Phone': '0000000'},
{'Email': 'diane#fake.com', 'Phone': 'none'}]
Output = [{} for j in range(k)]
for i, d in enumerate(Data):
Output[i%k].update(d)
# Output = [
# {'Name': 'Alice', 'age': '40', 'Country': 'US', 'City': 'Boston', 'Email': 'alice#fake.com', 'Phone': 'alice phone'},
# {'Name': 'Bob', 'age': '30', 'Country': 'US', 'City': 'New York', 'Email': 'bob#fake.com', 'Phone': '12345'},
# {'Name': 'Charlie', 'age': '30', 'Country': 'UK', 'City': 'London', 'Email': 'charlie#fake.com', 'Phone': '0000000'},
# {'Name': 'Diane', 'age': '30', 'Country': 'UK', 'City': 'Oxford', 'Email': 'diane#fake.com', 'Phone': 'none'}
#]
Additionally, instead of hardcoding k = 4:
If you know the number of fields but not the number of people, you can compute k by dividing the initial number of dictionaries by the number of dictionary types:
fields = ['Name', 'Country', 'Email']
assert(len(Data) % len(fields) == 0) # make sure Data is consistent with number of fields
k = len(Data) // len(fields)
Or alternatively, you can compute k by counting how many occurrences of the 'Names' field you have:
k = sum(1 for d in Data if 'Name' in d)
I have a variable and list imported from excel that looks like below:
cities= [{'City': 'Buenos Aires',
'Country': 'Argentina',
'Population': 2891000,
'Area': 4758},
{'City': 'Toronto',
'Country': 'Canada',
'Population': 2800000,
'Area': 2731571},
{'City': 'Pyeongchang',
'Country': 'South Korea',
'Population': 2581000,
'Area': 3194},
{'City': 'Marakesh', 'Country': 'Morocco', 'Population': 928850, 'Area': 200},
{'City': 'Albuquerque',
'Country': 'New Mexico',
'Population': 559277,
'Area': 491},
{'City': 'Los Cabos',
'Country': 'Mexico',
'Population': 287651,
'Area': 3750},
{'City': 'Greenville', 'Country': 'USA', 'Population': 84554, 'Area': 68},
{'City': 'Archipelago Sea',
'Country': 'Finland',
'Population': 60000,
'Area': 8300},
{'City': 'Walla Walla Valley',
'Country': 'USA',
'Population': 32237,
'Area': 33},
{'City': 'Salina Island', 'Country': 'Italy', 'Population': 4000, 'Area': 27},
{'City': 'Solta', 'Country': 'Croatia', 'Population': 1700, 'Area': 59},
{'City': 'Iguazu Falls',
'Country': 'Argentina',
'Population': 0,
'Area': 672}]
I just want the value 'Population' from each cities.
What is the most efficient or easiest way to make a list with value from each cities 'Population'?
Below is the code that I came up with, but it's inefficient.
City_Population = [cities[0]['Population'], cities[1]['Population'], cities[2]['Population']]
I am currently learning Python and any advice would be helpful!
Thank you!
Using list comprehension:
print([city['Population'] for city in cities])
OUTPUT:
[2891000, 2800000, 2581000, 928850, 559277, 287651, 84554, 60000, 32237, 4000, 1700, 0]
EDIT:
Assuming there is no population in a city:
print([city['Population'] for city in cities if 'Population' in city])
OUTPUT (removed population from a few cities in the list):
[2891000, 2800000, 2581000, 928850, 287651, 84554, 32237, 4000]
Use a getter, that way you will have empty/none values if some of them are not defined.
populations = [city.get('Population') for city in cities]
If you don't want the empty values:
populations = [pop for pop in populations if pop is not None]