So, I'm sure similar questions have been asked before but I couldn't find quite what I need.
I have a program that outputs a 2D array like the one below:
arr = [[0.2, 3], [0.3, "End"], ...]
There may be more or less elements, but each is a 2-element array, where the first value is a float and the second can be a float or a string.
Both of those values may repeat. In each of those arrays, the second element takes on only a few possible values.
What I want to do is sum the first elements' value within the arrays that have the same value of the second element and output a similar array that does not have those duplicated values.
For example:
input = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
output = [[0.5, 1.5], [0.4, 3.5], [0.85, "End"]]
I'd appreciate if the output array was sorted by this second element (floats ascending, strings at the end), although it's not necessary.
EDIT: Thanks for both answers; I've decided to use the one by Chris, because the code was more comprehensible to me, although groupby seems like a function designed to solved this very problem, so I'll try to read up on that, too.
UPDATE: The values of floats were always positive, by nature of the task at hand, so I used negative values to stop the usage of any strings - now I have a few if statements that check for those "encoded" negative values and replace them with strings again just before they're printed out, so sorting is now easier.
You could use a dictionary to accumulate the sum of the first value in the list keyed by the second item.
To get the 'string' items at the end of the list, the sort key could be set to positive infinity, float('inf'), in the sort key .
input_ = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
d = dict()
for pair in input_:
d[pair[1]] = d.get(pair[1], 0) + pair[0]
L = []
for k, v in d.items():
L.append([v,k])
L.sort(key=lambda x: x[1] if type(x[1]) == float else float('inf'))
print(L)
This prints:
[[0.5, 1.5], [0.4, 3.5], [0.8500000000000001, 'End']]
You can try to play with itertools.groupby:
import itertools
out = [[key, sum([elt[0]for elt in val])] for key, val in itertools.groupby(a, key=lambda elt: elt[1])]
>>> [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Explanation:
Groupby the 2D list according to the 2nd element of each sublist using itertools.groupby and the key parameters. We define the lambda key=lambda elt: elt[1] to groupby on the 2nd element:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(key, val)
# 1.5 <itertools._grouper object at 0x0000026AD1F6E160>
# End <itertools._grouper object at 0x0000026AD2104EF0>
# 3.5 <itertools._grouper object at 0x0000026AD1F6E160>
For each value of the group, compute the sum using the buildin function sum:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(sum([elt[0]for elt in val]))
# 0.5
# 0.8500000000000001
# 0.4
Compute the desired output:
out = []
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
out.append([sum([elt[0]for elt in val]), key])
print(out)
# [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Then you said about sorting on the 2nd value but there are strings and numbers, it's quite a problem for the computer. It can't make a choice between a number and a string. Objects must be comparable.
I have the following set of values stored in a list.
[-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
I want to be able to create 100 more lists that are similar to this one but contain randomly chosen different random values within the highest and lowest values of the original list. Let's consider a smaller example to better understand the problem. Let's consider that I have the list with highest lowest value -1 and highest value 7.2.
original list : [0.5, 0.8, 1.1, 2.5, 7.2, -1]
random list 1 : [0.5, 0.2, 1.4, 4.5, 6.2, -0.5]
random list 2 : [5.3, 0.3, 0.7, 2.3, 4.2, -0.1]
....
random list 100 : [0.5, 0.9, 1.1, 2,1, 6.5, -1]
The key is that not all values have to change(in some cases they can like in list 2 for example). Is there a straightforward way to accomplish this in Python?
Below code prints what you need as the output. First you have to find the max and min numbers in the original list and then you have to use random library and random.uniform() function to get what you need.
import random
original_list = [0.5, 0.8, 1.1, 2.5, 7.2, -1]
max_number = max(original_list)
min_number = min(original_list)
'''because you need 100 more lists'''
for i in range(100):
random_list = []
for j in range(len(original_list)):
random_list.append(round(random.uniform(min_number,max_number),1))
print('random list '+str(i+1)+' ', end='')
print(random_list)
smallest = min(original_list)
largest = max(original_list)
newlist1 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
newlist2 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
# and so on
Using list comprehension and numpy.random.uniform:
import numpy as np
orig = [-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
a = min(orig)
b = max(orig)
n = len(orig)
res = [[np.random.uniform(a,b,n)] for i in range(100)]
and you get res which is a list of 100 lists (with size len(orig)) of uniformly distributed numbers over [min(orig), max(orig)).
I have a collection of key value pairs like this:
{
'key1': [value1_1, value2_1, value3_1, ...],
'key2': [value1_2, value2_2, value3_2, ...],
...
}
and also a list which is in the same order as the values list, which contains the weight each variable should have applied. So it looks like [weight_1, weight_2, weight_3, ...].
My goal is to end up with an ordered list of keys in accordance to which has the highest overall score of values. Note that the values aren't all standardized / normalized, so value1_x could range from 1 - 10 but value 2_x could range from 1 - 100000. This has been the tricky part for me as I have to normalize the data somehow.
I'm trying to make this algorithm run to scale for many different values, so it would take the same amount of time for 1 or for 100 (or at least logarithmically more time). Is that possible? Is there any really efficient way I can go about this?
You can't get linear-time, but you can do it faster; this looks like a matrix-multiply to me, so I suggest you use numpy:
import numpy as np
keys = ['key1', 'key2', 'key3']
values = np.matrix([
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4]
])
weights = np.matrix([[10., 20., 30., 40.]]).transpose()
res = (values * weights).transpose().tolist()[0]
items = zip(res, keys)
items.sort(reverse=True)
which gives
[(330.0, 'key3'), (230.0, 'key2'), (130.0, 'key1')]
Edit: with thanks to #Ondro for np.dot and to #unutbu for np.argsort, here is an improved version entirely in numpy:
import numpy as np
# set up values
keys = np.array(['key1', 'key2', 'key3'])
values = np.array([
[1.1, 1.2, 1.3, 1.4], # values1_x
[2.1, 2.2, 2.3, 2.4], # values2_x
[3.1, 3.2, 3.3, 3.4] # values3_x
])
weights = np.array([10., 20., 30., 40.])
# crunch the numbers
res = np.dot(values, -weights) # negative of weights!
order = res.argsort(axis=0) # sorting on negative value gives
# same order as reverse-sort; there does
# not seem to be any way to reverse-sort
# directly
sortedkeys = keys[order].tolist()
which results in ['key3', 'key2', 'key1'].
Here's a normalization function, that will linearly transform your values into [0,1]
def normalize(val, ilow, ihigh, olow, ohigh):
return ((val-ilow) * (ohigh-olow) / (ihigh - ilow)) + olow
Now, use normalize to compute a new dictionary with normalized values. Then, sort by the weighted sum:
def sort(d, weights, ranges):
# ranges is a list of tuples containing the lower and upper bounds of the corresponding value
newD = {k:[normalize(v,ilow, ihigh, 0, 1) for v,(ilow, ihigh) in zip(vals, ranges)] for k,val in d.iteritems()} # d.items() in python3
return sorted(newD, key=lambda k: sum(v*w for v,w in zip(newD[k], weights)))