Count values in dict of pandas series - python

I have a pandas series of dicts like this:
print(df['genres'])
0 {'0': '1', '1': '4', '2': '23'}
1 {'0': '1', '1': '25', '2': '4', '3': '37'}
2 {'0': '9'}
print(type(df['genres']))
<class 'pandas.core.series.Series'>
print(type(df['genres'][0]))
<class 'dict'>
I want to count the values to get something like this:
{'1': 2, '4': 2, '9': 1, '23': 1, '25': 1, '37': 1}
I tried the following:
print(Counter(chain.from_iterable(df.genres.values)))
Counter({'0': 3, '1': 2, '2': 2, '3': 1})
print(pd.Series(df['genres']).value_counts())
{'0': '1', '1': '4', '2': '23'} 1
{'0': '1', '1': '25', '2': '4', '3': '37'} 1
{'0': '9'} 1
I think it is pretty easy for someone more experienced than me. But I really don't get it ...

Try:
pd.DataFrame(list(df.genres)).stack().value_counts().to_dict()
Output:
{'1': 2, '4': 2, '37': 1, '9': 1, '23': 1, '25': 1}

Related

Create nested dictionaries of arbitrary length

I have the following problem. I have a nested dictionary like the following
A = {'A':{'STATES':['0','1','2']}, 'B':{'STATES':['10','20']}}
What I want to build eventually is a nested dictionary with all the possible combinations of STATES. So for a known number of keys in A is trivial as this
_dict = {}
for s in A['A']['STATES']:
if s not in _dict:
_dict[s] = {}
for s1 in A['B']['STATES']:
_dict[s][s1] = 0
This gives
{'0': {'10': 0, '20': 0}, '1': {'10': 0, '20': 0}, '2': {'10': 0, '20': 0}}
which is what I want. However I do not know the number of keys in A beforehand. What it would be an efficient solution to to the same with an arbitrary number of elements in A?
EDIT
For instance with three elements I would have
{'0': {'10': {'100':0}, '20': {'100':0}, '1': {'10': {'100':0}, '20': {'100':0}, '2': {'10': {'100':0}, '20': {'100':0}}
This problem is a little complex, but it can be splitted up in three parts:
Parse all the values, mapping a list to every valid key.
Get the list of all the combinations in the order of dictionary insertion.
Translate the list of tuples into a nested dictionary, looping over the values inside the tuple itself - because we don't know its length.
import itertools
A = {'A':{'STATES':['0','1','2']}, 'B':{'STATES':['10','20']}, 'C':{'STATES':['100']}}
# 1. get your dictionary A, but reduced, so that
# for every key you have a list of states if the key "STATES" exists
Ared = {k: A[k]["STATES"] for k in A if A[k].get("STATES")}
print(Ared) # {'A': ['0', '1', '2'], 'B': ['10', '20'], 'C': ['100']}
# 2. get all the combinations
combs = list(itertools.product(*Ared.values()))
print(combs) # [('0', '10', '100'), ('0', '20', '100'), ('1', '10', '100'), ('1', '20', '100'), ('2', '10', '100'), ('2', '20', '100')]
# 3. translate them into a nested dictionary
d = dict()
for comb in combs:
old_dict = d
for i, key in enumerate(comb):
if i == len(comb) - 1:
old_dict[key] = 0
elif not old_dict.get(key):
old_dict[key] = {}
old_dict = old_dict[key]
print(d) # {'0': {'10': {'100': 0}, '20': {'100': 0}}, '1': {'10': {'100': 0}, '20': {'100': 0}}, '2': {'10': {'100': 0}, '20': {'100': 0}}}
You can use recursion:
A = {'A':{'STATES':['0','1','2']}, 'B':{'STATES':['10','20']}}
def combos(d):
return 0 if not d else {i:combos(d[1:]) for i in d[0]}
print(combos([j['STATES'] for j in A.values()]))
Output:
{'0': {'10': 0, '20': 0}, '1': {'10': 0, '20': 0}, '2': {'10': 0, '20': 0}}
With more than two keys:
A = {'A':{'STATES':['0','1','2']}, 'B':{'STATES':['10','20']}, 'C':{'STATES':['100']}}
print(combos([j['STATES'] for j in A.values()]))
Output:
{'0': {'10': {'100': 0}, '20': {'100': 0}}, '1': {'10': {'100': 0}, '20': {'100': 0}}, '2': {'10': {'100': 0}, '20': {'100': 0}}}

3 level nested dictionary comprehension in Python

I have a Python dictionary as follows:
d = {'1': {'1': 3, '2': 1, '3': 1, '4': 4, '5': 2, '6': 3},
'2': {'1': 3, '2': 3, '3': 1, '4': 2},
'3': {'1': 1, '2': 1, '3': 3, '4': 2, '5': 1, '6': 1, '7': 1},
'4': {'1': 1, '2': 1, '3': 3, '4': 2, '5': 1, '6': 1, '7': 1}}
I have this operation on the dictionary:
D = {}
for ko, vo in d.items():
for ki, vi in vo.items():
for i in range(vi):
D[f'{ko}_{ki}_{i}'] = someFunc(ko, ki, i)
I want to translate it into a one liner with dictionary comprehension as follows:
D = {f'{ko}_{ki}_{i}': someFunc(ko, ki, i) for i in range(vi) for ki, vi in vo.items() for ko, vo in d.items()}
But I get an error
NameError: name 'vi' is not defined
Can someone help me with the correct syntax for achieving this?
The order of the loops has to be reversed.
This is what you're looking for:
D = {f'{ko}_{ki}_{i}': someFunc(ko, ki, i) for ko, vo in d.items() for ki, vi in vo.items() for i in range(vi) }
The for clauses in the list comprehension should appear in the same order as in the equivalent for-loop code. The only thing that "moves" is that the innermost assignment is replaced by an expression at the beginning.
Please see https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/ for details.

list of dictionary: aggregate value by grouping by inner dictionary key

I have this signature:
def aggregate_by_player_id(input, playerid, fields):
By 'fields', i mean fields to sum up grouping by 'playerID' within the 'input'.
I call the function like this:
aggregate_by_player_id(input, 'player', ['stat1','stat3'])
Input look like this:
[{'player': '1', 'stat1': '3', 'stat2': '4', 'stat3': '5'},
{'player': '1', 'stat1': '1', 'stat2': '4', 'stat3': '1'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '3'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '1'},
{'player': '3', 'stat1': '4', 'stat2': '1', 'stat3': '6'}]
My output structure is:
nested_dic = {value_of_playerid1: {'playerid': value_of_playerid1, 'stat1': value_of_stat1, 'stat2': value_of_stat2},
value_of_playerid2: {'playerid': value_of_playerid2, 'stat2': value_of_stat2, 'stat2': value_of_stat2},
value_of_playerid3: {'playerid': value_of_playerid3, 'stat3': value_of_stat3, 'stat3': value_of_stat3}}
Hence the output should look like:
{'1': {'player': '1', 'stat1': 4, 'stat3': 6},
'2': {'player': '2', 'stat1': 2, 'stat3': 4},
'3': {'player': '3', 'stat1': 4, 'stat3': 6}}
We can use itertools.groupby for this to group on playerid and then sum values across the fields.
from itertools import groupby
from operator import itemgetter
def aggregate_by_player_id(input_, playerid, fields):
player = itemgetter(playerid)
output = {}
for k, v in groupby(input_, key=player):
data = list(v)
stats = {playerid: k}
for field in fields:
stats[field] = sum(int(d.get(field, 0)) for d in data)
output[k] = stats
return output
data.sort(key=player) # data must be pre-sorted on grouping key
results = aggregate_by_player_id(data, 'player', ['stat1', 'stat3'])
{'1': {'player': '1', 'stat1': 4, 'stat3': 6},
'2': {'player': '2', 'stat1': 2, 'stat3': 4},
'3': {'player': '3', 'stat1': 4, 'stat3': 6}}
Capturing the result you're after in a single comprehension might be possible, but is likely not very readable. Here's a simple function that does the work:
data = [
{'player': '1', 'stat1': '3', 'stat2': '4', 'stat3': '5'},
{'player': '1', 'stat1': '1', 'stat2': '4', 'stat3': '1'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '3'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '1'},
{'player': '3', 'stat1': '4', 'stat2': '1', 'stat3': '6'}
]
def aggregate_dicts(ds, id_field, aggr_fields):
result = {}
for d in ds:
identifier = d[id_field]
if identifier not in result:
result[identifier] = {f: 0 for f in aggr_fields}
for f in aggr_fields:
result[identifier][f] += int(d[f])
return result
print(aggregate_dicts(data, 'player', ['stat1', 'stat3']))
Result:
{'1': {'stat1': 4, 'stat3': 6}, '2': {'stat1': 2, 'stat3': 4}, '3': {'stat1': 4, 'stat3': 6}}
If you want to repeat the identifier inside the dict, just add this line to the if block:
result[identifier][id_field] = identifier

Printing a text file as a dictionary python

Text file I am working on
3:d:10:i:30
1:d:10:i:15
4:r:30
1:r:15
2:d:12:r:8
4:l:20
5:i:15
3:l:20:r:22
4:d:30
5:l:15:r:15
I am trying to print a dictionary from a text file that should come out looking like :
{1: {'d': 10, 'i': 15, 'r': 15},
2: {'d': 12, 'r': 8},
3: {'d': 10, 'i': 30, 'l': 20, 'r': 22},
4: {'d': 30, 'l': 20, 'r': 30},
5: { 'i': 15, 'l': 15, 'r': 15}}
Instead my code is overiding each line in the file and only takes the most recent line as the value in the dictionary so it looks like:
{'3': {'l': '20', 'r': '22'},
'1': {'r': '15'},
'4': {'d': '30'},
'2': {'d': '12', 'r': '8'},
'5': {'l': '15', 'r': '15'}})
This is what I have so far
def read_db(file):
d = defaultdict(dict)
for line in open('db1.txt'):
z = line.rstrip().split(':')
d[z[0]]=dict( zip(z[1::2],z[2::2]))
print(d)
I tried doing += but that operand is not working on the dict. I am a bit stuck. Thanks for any help
This is one approach using a simple iteration and dict.setdefault.
Ex:
res = {}
with open(filename) as infile: #Open file to read
for line in infile: #Iterate Each line
line = line.strip().split(":") #Split by colon
res.setdefault(line.pop(0), {}).update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}}
Or using collections.defaultdict
Ex:
from collections import defaultdict
res = defaultdict(dict)
with open(filename) as infile:
for line in infile:
line = line.strip().split(":")
res[line.pop(0)].update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
defaultdict(<class 'dict'>,
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}})

Python CSV reader return Row as list

Im trying to parse a CSV using python and would like to be able to index items in a row so they can be accessed using row[0], row[1] and so on.
So far this is my code:
def get_bitstats():
url = 'http://bitcoincharts.com/t/trades.csv?symbol=mtgoxUSD'
data = urllib.urlopen(url).read()
dictReader = csv.DictReader(data)
obj = BitData()
for row in dictReader:
obj.datetime = datetime.datetime.fromtimestamp(int(row['0'])/1000000)
q = db.Query(BitData).filter('datetime', obj.datetime)
if q != None:
raise ValueError(obj.datetime + 'is already in database')
else:
obj.price = row['1']
obj.amount = row['2']
obj.put()
This returns KeyError: '0' and I have no idea how to set it up. I did input this into an interactive shell and when running
for row in dictReader:
print row
I get this as the output:
{'1': '3'}
{'1': '6'}
{'1': '2'}
{'1': '6'}
{'1': '9'}
{'1': '8'}
{'1': '6'}
{'1': '4'}
{'1': '4'}
{'1': '', None: ['']}
{'1': '4'}
{'1': '2'}
{'1': '.'}
{'1': '0'}
{'1': '5'}
{'1': '7'}
{'1': '1'}
{'1': '6'}
{'1': '0'}
{'1': '0'}
{'1': '0'}
{'1': '0'}
{'1': '0'}
{'1': '0'}
{'1': '0'}
{'1': '', None: ['']}
{'1': '0'}
{'1': '.'}
{'1': '0'}
{'1': '1'}
{'1': '0'}
{'1': '0'}
{'1': '5'}
{'1': '4'}
{'1': '2'}
{'1': '5'}
{'1': '0'}
{'1': '0'}
{'1': '0'}
{'1': '0'}
{'1': '1'}
{'1': '3'}
{'1': '6'}
{'1': '2'}
{'1': '6'}
{'1': '9'}
{'1': '8'}
{'1': '6'}
{'1': '4'}
{'1': '4'}
and on and on for thousands and thousands of lines. ( as Im sure the CSV is thousands of digits)
Why is my CSV printing this way and is there anyway to separate a row into a list of 3 ints such as [130534543, 47.00009, 23001.9000]
EDIT:
as the Answer states I was using the wrong csv function in my code above but even though fixing it gave me a list the list itself was in the same format as the dict such that:
['1']
['2']
['1']
['3']
['8']
['3']
['5']
.
.
.
It turns out I also had to remove the .read() from data = urllib.urlopen(url).read().
csv.reader will return each row as a list
reader = csv.reader(data)
for line_list in reader:
pass
# line_list is a list of the data contained in a row so you can access line_list[0]

Categories