How do I loop through all keys list values one by one until I reach the end of the last list's array value? All lists are the same length for each key.
I mean:
my_dict1 = {'c1': [10, 11, 12], 'c2': [100, 110, 120], 'c3': [200, 210, 220]}
my_dict2 = {'c1': 3, 'c2': 1, 'c3': 2}
The result I need to get is:
result = 0.5 * ([10 * 3 + 100 * 1 + 200 * 2] + [11 * 3 + 110 * 1 + 210 * 2] + [12 * 3 + 120 * 1 + 220 * 2])
I checked Multiplying the values of dictionaries with different keys and How to Multiply list elements in dictionary but they did not come in handy here.
The following just worked when two dictionary have the similar keys.
dict1 = {2: [10, 11, 12], 2: [100, 110, 120]}
dict2 = {2: [100, 110, 120], 2: [100, 110, 120]}
result = {i :[x*y for x, y in zip(dict1[i], dict2[i])] for i in dict1.keys()}
print(result)
Result:
{2: [10000, 12100, 14400]}
Should I work by the NumPy or Pandas to handle it? In my real job there is a dictionary or a data frame with unknown number of keys.
You must use the key from my_dict1 to find the multiplier in my dict2. From there on, you can just zip the lists and sum their items. It can be a plain Python inliner:
[sum(i)/2 for i in zip(*([i * my_dict2[k] for i in v]
for k, v in my_dict1.items()))]
Above gives as expected:
[265.0, 281.5, 298.0]
You could also build 2 numpy arrays for the same operation:
arr1 = np.array(list(my_dict1.values()))
arr2 = np.array([my_dict2[k] for k in my_dict1])
Then:
np.sum(np.transpose(arr1) * arr2, axis=1)/2
gives as expected
array([265. , 281.5, 298. ])
One way just using zip:
arr = []
for k, v in my_dict1.items():
c = my_dict2[k]
arr.append([i * c for i in v])
out = [sum(a)/2 for a in zip(*arr)]
Or using pandas:
out = pd.DataFrame(my_dict1).mul(my_dict2).sum(axis=1).mul(0.5)
Output:
[265.0, 281.5, 298.0]
If you just want to do the math...
my_dict1 = {'c1': [10, 11, 12], 'c2': [100, 110, 120], 'c3': [200, 210, 220]}
my_dict2 = {'c1': 3, 'c2': 1, 'c3': 2}
result=0
for i in my_dict1:
for j in range(len(my_dict1[i])):
result+=my_dict1[i][j]*my_dict2[i]
result/=2
print(result)
It seems here using the Dataframe, Numpy and just python this question can be solved.
As Ballesta have replied based on Python:
[sum(i)/2 for i in zip(*([i * my_dict2[k] for i in v]
for k, v in my_dict1.items()))]
Using Ballesta answer and Numpy:
arr1 = np.array(list(my_dict1.values()))
arr2 = np.array([my_dict2[k] for k in my_dict1])
np.sum(np.transpose(arr1) * arr2, axis=1)/2
Or using pandas as Chris has answerd:
out = pd.DataFrame(my_dict1).mul(my_dict2).sum(axis=1).mul(0.5)
A simpler way using the DataFrame could be:
import pandas as pd
df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index() # make sure indexes pair with number of rows
for index, row in df.iterrows():
print((row['c1'] + row['c2'] )/ 2)
Related
I want to find duplicates in a selection of columns of a df,
# converts the sub df into matrix
mat = df[['idx', 'a', 'b']].values
str_dict = defaultdict(set)
for x in np.ndindex(mat.shape[0]):
concat = ''.join(str(x) for x in mat[x][1:])
# take idx as values of each key a + b
str_dict[concat].update([mat[x][0]])
dups = {}
for key in str_dict.keys():
dup = str_dict[key]
if len(dup) < 2:
continue
dups[key] = dup
The code finds duplicates of the concatenation of a and b. Uses the concatenation as key for a set defaultdict (str_dict), updates the key with idx values; finally uses a dict (dups) to store any concatenation if the length of its value (set) is >= 2.
I am wondering if there is a better way to do that in terms of efficiency.
You can just concatenate and convert to set:
res = set(df['a'].astype(str) + df['b'].astype(str))
Example:
df = pd.DataFrame({'idx': [1, 2, 3],
'a': [4, 4, 5],
'b': [5, 5,6]})
res = set(df['a'].astype(str) + df['b'].astype(str))
print(res)
# {'56', '45'}
If you need to map indices too:
df = pd.DataFrame({'idx': [1, 2, 3],
'a': [41, 4, 5],
'b': [3, 13, 6]})
df['conc'] = (df['a'].astype(str) + df['b'].astype(str))
df = df.reset_index()
res = df.groupby('conc')['index'].apply(set).to_dict()
print(res)
# {'413': {0, 1}, '56': {2}}
You can filter the column you need before drop_duplicate
df[['a','b']].drop_duplicates().astype(str).apply(np.sum,1).tolist()
Out[1027]: ['45', '56']
So here is what I am trying to achieve in Python:
I have a list "A" with unsorted and repeated indices.
I have a list "B" with some float values
Length A = Length B
I want list "C" with summed values of B based on the repeated indices in A in a sorted ascending manner.
Example:
A = [0, 1, 0, 3, 2, 1, 2] # (indicates unsorted and repeated indices)
B = [25, 10, 15, 10, 5, 30, 50] # (values to be summed)
C = [25+15, 10+30, 5+50, 15] # (summed values in a sorted manner)
So far I know how to do the sorting bit with:
C = zip(*sorted(zip(A, B)))
Getting the result:
[(0, 0, 1, 1, 2, 2, 3), (15, 25, 10, 30, 5, 50, 10)]
But I do not know how to do the sum.
What would be a good way to create list C?
Use zip() in combination with a dict:
A = [0 , 1 , 0 , 3 , 2 , 1 , 2]
B = [25 , 10 , 15 , 10 , 5 , 30 , 50]
sums = {}
for key, value in zip(A,B):
try:
sums[key] += value
except KeyError:
sums[key] = value
print(sums)
# {0: 40, 1: 40, 2: 55, 3: 10}
And see a demo on ideone.com.
You could use groupby, if the order matters:
In [1]: A=[0 , 1 , 0 , 3 , 2 , 1 , 2]
In [2]: B=[25 , 10 , 15 , 10 , 5 , 30 , 50]
In [3]: from itertools import groupby
In [4]: from operator import itemgetter
In [5]: C = [sum(map(itemgetter(1), group))
...: for key, group in groupby(sorted(zip(A, B)),
...: key=itemgetter(0))]
In [6]: C
Out[6]: [40, 40, 55, 10]
or defaultdict(float), if it does not:
In [10]: from collections import defaultdict
In [11]: res = defaultdict(float)
In [12]: for k, v in zip(A, B):
...: res[k] += v
...:
In [13]: res
Out[13]: defaultdict(float, {0: 40.0, 1: 40.0, 2: 55.0, 3: 10.0})
Note that dicts in python are unordered (you are not to trust any CPython implementation details).
It is actually a bit unclear what you want, but if you want them to be indexed by whatever the number is, you shouldn't even use a list, but a Counter instead:
>>> from collections import Counter
>>> c = Counter()
>>> A = [0, 1, 0, 3, 2, 1, 2]
>>> B = [25, 10, 15, 10 , 5, 30, 50]
>>> for k, v in zip(A, B):
... c[k] += v
...
>>> c
Counter({2: 55, 0: 40, 1: 40, 3: 10})
>>> c[0]
40
If you really want a list, you can use
>>> [i[1] for i in sorted(c.items())]
but then any missing key would cause the rest of the values show up upper, which might or might not be that what you wanted.
If I have a list x of consecutive integers (each consecutive integer occurs at least once)
x = [0, 0, 1, 3, 2, 1, 1, 4, 4]
representing the group memberships of another list, y
y = [0, 0, 100, 30, 2000, 100, 1000, 40, 4]
What's the cleanest way to extract a list of all the groupings, z? (Note: the order within each sublist in z does not matter.)
z == [[0,0], [100, 100, 1000], [2000], [30], [40, 4]]
I have a gut feeling that I can do it in 1 line but I can't figure it out.
You can create a dictionary of lists (implemented using defaultdict), and then update each index location with the corresponding value. This was done using zip to pair the results.
from collections import defaultdict
dd = defaultdict(list)
for key, val in zip(x, y):
dd[key].append(val)
>>> dd.values()
[[0, 0], [100, 100, 1000], [2000], [30], [40, 4]]
To guarantee the output matches the sorted order of keys:
>>> [dd[key] for key in sorted(dd.keys())]
[[0, 0], [100, 100, 1000], [2000], [30], [40, 4]]
Timings
x = x * 10000
y = y * 10000
%%timeit
od = OrderedDict()
for k,v in zip(x, y):
od.setdefault(k, []).append(v)
10 loops, best of 3: 49.7 ms per loop
%%timeit
dd = defaultdict(list)
for key, val in zip(x, y):
dd[key].append(val)
100 loops, best of 3: 17.1 ms per loop
You can keep the order, have readable and efficient code using a for loop with an OrderedDict:
x = [0, 0, 1, 3, 2, 1, 1, 4, 4]
y = [0, 0, 100, 30, 2000, 100, 1000, 40, 4]
from collections import OrderedDict
od = OrderedDict()
for k,v in zip(x, y):
od.setdefault(k, []).append(v)
print(od.values())
[[0, 0], [100, 100, 1000], [30], [2000], [40, 4]]
Bar the import and dict creation, it is two lines of code to create the pairings which I think is pretty reasonable.
I like the defaultdict solution. In order to get away from the unused list comprehension you could pair it with a reduce:
from collections import defaultdict
z = reduce(lambda dct, kv: dct[kv[0]].append(kv[1]) or dct, zip(x,y), \
defaultdict(list))
This works for both Python2 and Python3.
In Python2 you can use the argument unpacking (a shame it was removed in 3):
z = reduce(lambda dct, (key,value): dct[key].append(value) or dct, zip(x,y), \
defaultdict(list))
And it is your one-line solution - I split it but you wouldn't have to :-)
List B is expanded at index positions where list A has adjacent matching values using groupby
A = [476, 1440, 3060, 3060, 500,500]
B = [0,4,10,15]
so resultant list is:
B_update1 = [0,4,10,10,15,15]
which after some intermediate steps will be:
B_update2 = [0,4,12,10,20,20]
Now I want to take sum and mean of duplicated values which will give me back:
B_mean = [0,4,11,20]
B_sum = [0,4,22,40]
I am not sure how to do it. Any suggestions?
B_update1, B_update2, B_mean, B_sum = [0,4,10,10,15,15], [0,4,12,10,20,20], [],[]
from itertools import groupby
from operator import itemgetter
for num, grp in groupby(enumerate(B_update1), itemgetter(1)):
tmp_list = [B_update2[idx] for idx, _ in grp]
B_mean.append(sum(tmp_list)/len(tmp_list))
B_sum.append(sum(tmp_list))
print B_mean, B_sum
Output
[0, 4, 11, 20] [0, 4, 22, 40]
The information you essentially need is the number of repetitions:
>>> A = [476, 1440, 3060, 3060, 500, 500]
>>> B = [0, 4, 10, 15]
>>> repetitions = [len(list(g)) for n, g in groupby(A)]
>>> repetitions
[1, 1, 2, 2]
From this, you can construct both the B_update1 and return back to the simple list:
>>> B_update1 = []
>>> for i, r in enumerate(repetitions):
B_update1.extend([B[i]] * r)
>>> B_update1
[0, 4, 10, 10, 15, 15]
>>> B_update2 = [0, 4, 12, 10, 20, 20] # MAGIC
>>> B_sum, B_mean = [], []
>>> i = 0
>>> for r in repetitions:
s = sum(B_update2[i:i + r])
B_sum.append(s)
B_mean.append(s / r)
i += r
>>> B_sum
[0, 4, 22, 40]
>>> B_mean
[0.0, 4.0, 11.0, 20.0]
I have n lists of equal length representing values of database rows. The data is pretty complicated, so i'll present simplified values in the example.
Essentially, I want to map the values of these lists (a,b,c) to a dictionary where the keys are the set of the list (id).
Example lists:
id = [1,1,1,2,2,2,3,3,3]
a = [1,2,3,4,5,6,7,8,9]
b = [10,11,12,13,14,15,16,17,18]
c = [20,21,22,23,24,25,26,27,28]
Needed dictionary output:
{id:[[a],[b],[c]],...}
{'1':[[1,2,3],[10,11,12],[20,21,22]],'2':[[4,5,6],[13,14,15],[23,24,25]],'3':[[7,8,9],[16,17,18],[26,27,28]]}
The dictionary now has a list of lists for the values in the original a,b,c subsetted by the unique values in the id list which is now the dictionary key.
I hope this is clear enough.
Try this:
id = ['1','1','1','2','2','2','3','3','3']
a = [1,2,3,4,5,6,7,8,9]
b = [10,11,12,13,14,15,16,17,18]
c = [20,21,22,23,24,25,26,27,28]
from collections import defaultdict
d = defaultdict(list)
# add as many lists as needed, here n == 3
lsts = [a, b, c]
for ki, kl in zip(id, zip(*lsts)):
d[ki] += [kl]
for k, v in d.items():
# if you don't mind using tuples, simply do this: d[k] = zip(*v)
d[k] = map(list, zip(*v))
The result is exactly as expected according to the question:
d == {'1':[[1,2,3],[10,11,12],[20,21,22]],
'2':[[4,5,6],[13,14,15],[23,24,25]],
'3':[[7,8,9],[16,17,18],[26,27,28]]}
=> True
IDs = [1,1,1,2,2,2,3,3,3]
a = [1,2,3,4,5,6,7,8,9]
b = [10,11,12,13,14,15,16,17,18]
c = [20,21,22,23,24,25,26,27,28]
import itertools
d = {}
for key, group in itertools.groupby(sorted(zip(IDs, a, b, c)), key=lambda x:x[0]):
d[key] = map(list, zip(*group)[1:]) # [1:] to get rid of the ID
print d
OUTPUT:
{1: [[1, 2, 3], [10, 11, 12], [20, 21, 22]],
2: [[4, 5, 6], [13, 14, 15], [23, 24, 25]],
3: [[7, 8, 9], [16, 17, 18], [26, 27, 28]]}