How to get average value from a dict like below? - python

I have a dict like below:
dict={idx1:{tokenA: 0.1,
tokenB: 1.3,
tokenD: 2.3},
idx2:{tokenC: 0.9,
tokenE: 3.4},
...
idxn:{tokenA: 0.3,
tokenF: 0.4,
...
tokenZ: 7.4}
}
each index may have different tokens/Values, Now I want to get average of each token, simple as below:
{tokenA: average_value, tokenB: average_value, ... tokenZ: average_value)
any efficient way to do this? Thanks in advance!

d ={'idx1':{'tokenA': 0.1,
'tokenB': 1.3,
'tokenD': 2.3},
'idx2':{'tokenC': 0.9,
'tokenE': 3.4},
'idxn':{'tokenA': 0.3,
'tokenF': 0.4,
'tokenZ': 7.4}
}
from collections import Counter
token_sums = sum((Counter(v ) for k,v in d.iteritems()), Counter())
token_counts = sum((Counter(v.keys()) for k,v in d.iteritems()), Counter())
token_mean = {k:token_sums[k]/token_counts[k] for k in token_sums}
print token_mean

my_lists = defaultdict(list)
for key,val in my_dict.items():
for key2,val2 in val.items():
my_lists[key2].append(val2)
def average(key_val):
key,val = key_val
return (key, sum(val)*1.0/len(val))
print dict(map(average,my_lists))

Using pandas:
import pandas
d = {'a': {'t1': 0.1,
't2': 0.2},
'b': {'t1': 0.1,
't3': 0.2}}
data = pandas.DataFrame(d)
data.T.mean()
=>
t1 0.1
t2 0.2
t3 0.2
dtype: float64

import collections
d ={'idx1':{'tokenA' : 0.1,
'tokenB': 1.3,
'tokenD': 2.3},
'idx2':{'tokenC': 0.9,
'tokenE': 3.4},
'idxn':{'tokenA': 0.3,
'tokenF': 0.4,
'tokenZ': 7.4}
}
avg = collections.defaultdict(float)
count = collections.Counter()
for dat in d.itervalues():
for k,v in dat.iteritems():
avg[k] += v
count[k] += 1
for k,v in count.iteritems():
avg[k] /= count[k]
print avg

Related

Decimal list to float list in dictionary

I have list from mssql query which includes Decimals. Such as:
[(1, Decimal('33.00'), Decimal('5.30'), Decimal('50.00')),
(2, Decimal('17.00'), Decimal('0.50'), Decimal('10.00'))]
I want to transform that to dict and float number like that:
{1: [33.00, 5.30, 50.00],
2: [17.00, 0.50, 10.00]}
I writed below line:
load_dict = {key: values for key, *values in dataRead}
which results:
{1: [Decimal('33.00'), Decimal('105.30'), Decimal('25650.00')],
2: [Decimal('17.00'), Decimal('40.50'), Decimal('10000.00')]}
I am asking that is there anyway making this transformation with list/dict comprehension?
you could use a dict-comprehension with a cast to float like this:
from decimal import Decimal
lst = [(1, Decimal('33.00'), Decimal('5.30'), Decimal('50.00')),
(2, Decimal('17.00'), Decimal('0.50'), Decimal('10.00'))]
ret = {key: [float(f) for f in values] for key, *values in lst}
print(ret)
# {1: [33.0, 5.3, 50.0], 2: [17.0, 0.5, 10.0]}
Apply float to values:
from decimal import Decimal
data = [(1, Decimal('33.00'), Decimal('5.30'), Decimal('50.00')),
(2, Decimal('17.00'), Decimal('0.50'), Decimal('10.00'))]
load_dict = {key: list(map(float, values)) for key, *values in data}
print(load_dict)
Output
{1: [33.0, 5.3, 50.0], 2: [17.0, 0.5, 10.0]}

Altering all keys of a dictionary

Is it possible to rename/alter all the keys of a dict? As an example, let's look at the following dictionary:
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
I want to remove all the a_ in the keys, so I end up with
a_dict = {'var1': 0.05,
'var2': 4.0,
'var3': 100.0,
'var4': 0.3}
If you want to alter the existing dict, instead of creating a new one, you can loop the keys, pop the old one, and insert the new, modified key with the old value.
>>> for k in list(a_dict):
... a_dict[k[2:]] = a_dict.pop(k)
...
>>> a_dict
{'var2': 4.0, 'var1': 0.05, 'var3': 100.0, 'var4': 0.3}
(Iterating a list(a_dict) will prevent errors due to concurrent modification.)
Strictly speaking, this, too, does not alter the existing keys, but inserts new keys, as it has to re-insert them according to their new hash codes. But it does alter the dictionary as a whole.
As noted in comments, updating the keys in the dict in a loop can in fact be slower than a dict comprehension. If this is a problem, you could also create a new dict using a dict comprehension, and then clear the existing dict and update it with the new values.
>>> b_dict = {k[2:]: a_dict[k] for k in a_dict}
>>> a_dict.clear()
>>> a_dict.update(b_dict)
You can use:
{k[2:]: v for k, v in a_dict.items()}
You can do that easily enough with a dict comprehension.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = { k[2:]:v for k,v in a_dict.items() }
Result:
{'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}
You could use the str.replace function to replace key to match the desired format.
a_dict = {'a_var1': 0.05,
'a_var2': 4.0,
'a_var3': 100.0,
'a_var4': 0.3}
a_dict = {k.replace('a_', ''): v for k, v in a_dict.items()}
# {'var1': 0.05, 'var2': 4.0, 'var3': 100.0, 'var4': 0.3}

Build dict from list of tuples combining two multi index dfs and column index

I have two multi-index dataframes: mean and std
arrays = [['A', 'A', 'B', 'B'], ['Z', 'Y', 'X', 'W']]
mean=pd.DataFrame(data={0.0:[np.nan,2.0,3.0,4.0], 60.0: [5.0,np.nan,7.0,8.0], 120.0:[9.0,10.0,np.nan,12.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
mean.columns.name='Times'
std=pd.DataFrame(data={0.0:[10.0,10.0,10.0,10.0], 60.0: [10.0,10.0,10.0,10.0], 120.0:[10.0,10.0,10.0,10.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
std.columns.name='Times'
My task is to combine them in a dictionary with '{id:' as first level, followed by second level dictionary with '{comp:' and then for each comp a list of tuples, which combines the (time-points, mean, std). So, the result should look like that:
{'A': {
'Z': [(60.0,5.0,10.0),
(120.0,9.0,10.0)],
'Y': [(0.0,2.0,10.0),
(120.0,10.0,10.0)]
},
'B': {
'X': [(0.0,3.0,10.0),
(60.0,7.0,10.0)],
'W': [(0.0,4.0,10.0),
(60.0,8.0,10.0),
(120.0,12.0,10.0)]
}
}
Additionally, when there is NaN in data, the triplets are left out, so value A,Z at time 0, A,Y at time 60 B,X at time 120.
How do I get there? I constructed already a dict of dict of list of tuples for a single line:
iter=0
{mean.index[iter][0]:{mean.index[iter][1]:list(zip(mean.columns, mean.iloc[iter], std.iloc[iter]))}}
>{'A': {'Z': [(0.0, 1.0, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]}}
Now, I need to extend to a dictionary with a loop over each line {inner dict) and adding the ids each {outer dict}. I started with iterrows and dic comprehension, but here I have problems, indexing with the iter ('A','Z') which i get from iterrows(), and building the whole dict, iteratively.
{mean.index[iter[1]]:list(zip(mean.columns, mean.loc[iter[1]], std.loc[iter[1]])) for (iter,row) in mean.iterrows()}
creates errors, and I would only have the inner loop
KeyError: 'the label [Z] is not in the [index]'
Thanks!
EDIT: I exchanged the numbers to float in this example, because here integers were generated before which was not consistent with my real data, and which would fail in following json dump.
Here is a solution using a defaultdict:
from collections import defaultdict
mean_as_dict = mean.to_dict(orient='index')
std_as_dict = std.to_dict(orient='index')
mean_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in mean_as_dict.items()}
std_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in std_as_dict.items()}
sol = {k: [j + (std_clean_sorted[k][i][1],) for i, j in enumerate(v) if not np.isnan(j[1])] for k, v in mean_clean_sorted.items()}
solution = defaultdict(dict)
for k, v in sol.items():
solution[k[0]][k[1]] = v
Resulting dict will be defaultdict object that you can change to dict easily:
solution = dict(solution)
con = pd.concat([mean, std])
primary = dict()
for i in set(con.index.values):
if i[0] not in primary.keys():
primary[i[0]] = dict()
primary[i[0]][i[1]] = list()
for x in con.columns:
primary[i[0]][i[1]].append((x, tuple(con.loc[i[0]].loc[i[1][0].values)))
Here is sample output
I found a very comprehensive way of putting up this nested dict:
mean_dict_items=mean.to_dict(orient='index').items()
{k[0]:{u[1]:list(zip(mean.columns, mean.loc[u], std.loc[u]))
for u,v in mean_dict_items if (k[0],u[1]) == u} for k,l in mean_dict_items}
creates:
{'A': {'Y': [(0.0, 2.0, 10.0), (60.0, nan, 10.0), (120.0, 10.0, 10.0)],
'Z': [(0.0, nan, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]},
'B': {'W': [(0.0, 4.0, 10.0), (60.0, 8.0, 10.0), (120.0, 12.0, 10.0)],
'X': [(0.0, 3.0, 10.0), (60.0, 7.0, 10.0), (120.0, nan, 10.0)]}}

Multiple list to a dictionary

I'm looking to convert lists like:
idx = ['id','m','x','y','z']
a = ['1, 1.0, 1.11, 1.11, 1.11']
b = ['2, 2.0, 2.22, 2.22, 2,22']
c = ['3, 3.0, 3.33, 3.33, 3.33']
d = ['4, 4.0, 4.44, 4.44, 4.44']
e = ['5, 5.0, 5.55, 5.55, 5.55']
Into a dictionary where:
dictlist = {
'id':[1,2,3,4,5],
'm':[1.0,2.0,3.0,4.0,5.0],
'x':[1.11,2.22,3.33,4.44,5.55],
'y':[1.11,2.22,3.33,4.44,5.55],
'z':[1.11,2.22,3.33,4.44,5.55]
}
But I would like to be able to do this for a longer set of lists >> 6 elements per list. So I assume a function would be best to be able to create dict for the len of elements in the idx list.
**Edit:
in response to g.d.d.c:
I had tried something like:
def make_dict(indx):
data=dict()
for item in xrange(0,len(indx)):
data.update({a[item]:''})
return data
data = make_dict(idx)
Which worked for making:
{'id': '', 'm': '', 'x': '', 'y': '', 'z': ''}
but then adding each value to the dictionary became an issue.
result = {}
keys = idx
lists = [a, b, c, d, e]
for index, key in enumerate(keys):
result[key] = []
for l in lists:
result[key].append(l[index])
As a single comprehension
Start by grouping your lists {a,b,c,d,e,...} into a list of lists
dataset = [a,b,c,d,e]
idx = ['id','m','x','y','z']
d = { k: [v[i] for v in dataset] for i,k in enumerate(idx) }
The last line builds a dictionary by enumerating over idx using the value for the dict key, and its index to pick out the correct column of each data sample.
The comprehension will work regardless of the number of fields, as long as each list has the same length as idx
You can try this:
idx = ['id','m','x','y','z']
a = [1, 1.0, 1.11, 1.11, 1.11]
b = [2, 2.0, 2.22, 2.22, 2,22]
c = [3, 3.0, 3.33, 3.33, 3.33]
d = [4, 4.0, 4.44, 4.44, 4.44]
e = [5, 5.0, 5.55, 5.55, 5.55]
dictlist = {x[0] : list(x[1:]) for x in zip(idx,a,b,c,d,e)}
print dictlist
answer = {}
for key, a,b,c,d,e in zip(idx, map(lambda s:[float(i) for i in s.split(',')], [a,b,c,d,e])):
answer[key] = [a,b,c,d,e]

Python: merging tally data

Okay - I'm sure this has been answered here before but I can't find it....
My problem: I have a list of lists with this composition
0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C
My goal is to output the following:
0.6 A
0.3 B
0.7 C
In other words, I need to merge the data from multiple lines together.
Here's the code I'm using:
unique_percents = []
for line in percents:
new_percent = float(line[0])
for inner_line in percents:
if line[1] == inner_line[1]:
new_percent += float(inner_line[0])
else:
temp = []
temp.append(new_percent)
temp.append(line[1])
unique_percents.append(temp)
break
I think it should work, but it's not adding the percents up and still has the duplicates. Perhaps I'm not understanding how "break" works?
I'll also take suggestions of a better loop structure or algorithm to use. Thanks, David.
You want to use a dict, but collections.defaultdict can come in really handy here so that you don't have to worry about whether the key exists in the dict or not -- it just defaults to 0.0:
import collections
lines = [[0.2, 'A'], [0.1, 'A'], [0.3, 'A'], [0.3, 'B'], [0.2, 'C'], [0.5, 'C']]
amounts = collections.defaultdict(float)
for amount, letter in lines:
amounts[letter] += amount
for letter, amount in sorted(amounts.iteritems()):
print amount, letter
Try this out:
result = {}
for line in percents:
value, key = line
result[key] = result.get(key, 0) + float(value)
total = {}
data = [('0.1', 'A'), ('0.2', 'A'), ('.3', 'B'), ('.4', 'B'), ('-10', 'C')]
for amount, key in data:
total[key] = total.get(key, 0.0) + float(amount)
for key, amount in total.items():
print key, amount
Since all of the letter grades are grouped together, you can use itertools.groupby (and if not, just sort the list ahead of time to make them so):
data = [
[0.2, 'A'],
[0.1, 'A'],
[0.3, 'A'],
[0.3, 'B'],
[0.2, 'C'],
[0.5, 'C'],
]
from itertools import groupby
summary = dict((k, sum(i[0] for i in items))
for k,items in groupby(data, key=lambda x:x[1]))
print summary
Gives:
{'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999}
If you have a list of lists like this:
[ [0.2, A], [0.1, A], ...] (in fact it looks like a list of tuples :)
res_dict = {}
for pair in lst:
letter = pair[1]
val = pair[0]
try:
res_dict[letter] += val
except KeyError:
res_dict[letter] = val
res_lst = [(val, letter) for letter, val in res_dict] # note, a list of tuples!
Using collections.defaultdict to tally values
(assuming text data in d):
>>> s=collections.defaultdict(float)
>>> for ln in d:
... v,k=ln.split()
... s[k] += float(v)
>>> s
defaultdict(<type 'float'>, {'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999})
>>> ["%s %s" % (v,k) for k,v in s.iteritems()]
['0.6 A', '0.7 C', '0.3 B']
>>>
If you are using Python 3.1 or newer, you can use collections.Counter. Also I suggest using decimal.Decimal instead of floats:
# Counter requires python 3.1 and newer
from collections import Counter
from decimal import Decimal
lines = ["0.2 A", "0.1 A", "0.3 A", "0.3 B", "0.2 C", "0.5 C"]
results = Counter()
for line in lines:
percent, label = line.split()
results[label] += Decimal(percent)
print(results)
The result is:
Counter({'C': Decimal('0.7'), 'A': Decimal('0.6'), 'B': Decimal('0.3')})
This is verbose, but works:
# Python 2.7
lines = """0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C"""
lines = lines.split('\n')
#print(lines)
pctg2total = {}
thing2index = {}
index = 0
for line in lines:
pctg, thing = line.split()
pctg = float(pctg)
if thing not in thing2index:
thing2index[thing] = index
index = index + 1
pctg2total[thing] = pctg
else:
pctg2total[thing] = pctg2total[thing] + pctg
output = ((pctg2total[thing], thing) for thing in pctg2total)
# Let's sort by the first occurrence.
output = list(sorted(output, key = lambda thing: thing2index[thing[1]]))
print(output)
>>>
[(0.60000000000000009, 'A'), (0.29999999999999999, 'B'), (0.69999999999999996, 'C')]
letters = {}
for line in open("data", "r"):
lineStrip = line.strip().split()
percent = float(lineStrip[0])
letter = lineStrip[1]
if letter in letters:
letters[letter] = percent + letters[letter]
else:
letters[letter] = percent
for letter, percent in letters.items():
print letter, percent
A 0.6
C 0.7
B 0.3
Lets say we have this
data =[(b, float(a)) for a,b in
(line.split() for line in
"""
0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C""".splitlines()
if line)]
print data
# [('A', 0.2), ('A', 0.1), ('A', 0.3), ('B', 0.3), ('C', 0.2), ('C', 0.5)]
You can now just go though this and sum
counter = {}
for letter, val in data:
if letter in counter:
counter[letter]+=val
else:
counter[letter]=val
print counter.items()
Or group values together and use sum:
from itertools import groupby
# you want the name and the sum of the values
print [(name, sum(value for k,value in grp))
# from each group
for name, grp in
# where the group name of a item `p` is given by `p[0]`
groupby(sorted(data), key=lambda p:p[0])]
>>> from itertools import groupby, imap
>>> from operator import itemgetter
>>> data = [['0.2', 'A'], ['0.1', 'A'], ['0.3', 'A'], ['0.3', 'B'], ['0.2', 'C'], ['0.5', 'C']]
>>> # data = sorted(data, key=itemgetter(1))
...
>>> for k, g in groupby(data, key=itemgetter(1)):
... print sum(imap(float, imap(itemgetter(0), g))), k
...
0.6 A
0.3 B
0.7 C
>>>

Categories