I'm messing around with lists of lists containing strings and values EX: LofL = [["string", 4.0, 1.1, -3.0, -7.2],["string", 2.0, -1.0, 3.3], ["string", 4.4, 5.5, -6.6, 1.1]] and I'm trying to take the values within each list within the list, and average them as long as the values are not below 0. For example the first would be 5.1/2 since the third digit is negative. This in the end would make the List of lists look like: LofL =[["string", 5.1/2],["string", 2/1], ["string", 9.9/2]]. I've tried this so far:
LofL = *see above example*
avgLofL = LofL
for sublist in LofL:
while sublist in range(1,len(sublist)) > 0.0:
rowavg = [sum(sublist) / range(1,len(sublist)) for sublist in LofL]
for sublist in avgLofL:
for sublist in range(1,len(sublist)):
avgLofL.append(rowavg)
return avgLofL
It says my rowavg isn't referenced before assingment, but when I intitialize it as rowavg = 0 my list has no length. I'm unsure where I'm making a mistake
This is a possible solution:
from statistics import mean
avgLofL = [[next(x for x in lst if isinstance(x, str)),
mean(x for x in lst if not isinstance(x, str) and x >= 0)]
for lst in LofL]
Ok, I think this is what you actually asked for:
LofL = [["string", 4.0, 1.1, -3.0, -7.2],
["string", 2.0, -1.0, 3.3],
["string", 4.4, 5.5, -6.6, 1.1]]
avgLofL = []
for row in LofL:
sublist = []
for x in row[1:]:
if x>=0:
sublist.append(x)
else:
break
avgLofL.append([row[0], sum(sublist)/float(len(sublist))])
print(avgLofL)
Its result seems to match the example:
[['string', 2.55], ['string', 2.0], ['string', 4.95]]
It processes one row at a time, completely. Assumes that the first element is a string which should be kept, then collects the other elements in sublist until it finds a negative one. Then calculates the average of the collection, builds and stores a "[string,average]" pair, and continues with the next row.
In its current form it will die on having a negative number right at the start (division by zero). You can either drop an explicit if somewhere, or some dirty hack, like sum(sublist)/max(1,float(len(sublist))).
def avgofList(LofL):
avgLofL = []
for sublist in LofL:
total = 0
count = 0
for item in sublist:
if isinstance(item, float) and item > 0:
total += item
count += 1
sl = [x for x in sublist if not isinstance(x, float)]
sl.append(total / count)
avgLofL.append(sl)
return avgLofL
Very similar to the answer provided by #tevemadar except that no intermediate lists are used and also accounts for the first negative number being in index 1 of the sub-lists. It does however assume that the sublists are not empty and that the first element is to be retained.
LofL = [["string", 4.0, 1.1, -3.0, -7.2],
["string", 2.0, -1.0, 3.3],
["string", 4.4, 5.5, -6.6, 1.1]]
def process(e):
t, n = 0, 0
for v in e[1:]:
if v >= 0:
t += v
n += 1
else:
break
return [e[0], t / n] if n > 0 else [e[0]]
result = [process(e) for e in LofL]
print(result)
Output:
[['string', 2.55], ['string', 2.0], ['string', 4.95]]
I have the following set of values stored in a list.
[-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
I want to be able to create 100 more lists that are similar to this one but contain randomly chosen different random values within the highest and lowest values of the original list. Let's consider a smaller example to better understand the problem. Let's consider that I have the list with highest lowest value -1 and highest value 7.2.
original list : [0.5, 0.8, 1.1, 2.5, 7.2, -1]
random list 1 : [0.5, 0.2, 1.4, 4.5, 6.2, -0.5]
random list 2 : [5.3, 0.3, 0.7, 2.3, 4.2, -0.1]
....
random list 100 : [0.5, 0.9, 1.1, 2,1, 6.5, -1]
The key is that not all values have to change(in some cases they can like in list 2 for example). Is there a straightforward way to accomplish this in Python?
Below code prints what you need as the output. First you have to find the max and min numbers in the original list and then you have to use random library and random.uniform() function to get what you need.
import random
original_list = [0.5, 0.8, 1.1, 2.5, 7.2, -1]
max_number = max(original_list)
min_number = min(original_list)
'''because you need 100 more lists'''
for i in range(100):
random_list = []
for j in range(len(original_list)):
random_list.append(round(random.uniform(min_number,max_number),1))
print('random list '+str(i+1)+' ', end='')
print(random_list)
smallest = min(original_list)
largest = max(original_list)
newlist1 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
newlist2 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
# and so on
Using list comprehension and numpy.random.uniform:
import numpy as np
orig = [-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
a = min(orig)
b = max(orig)
n = len(orig)
res = [[np.random.uniform(a,b,n)] for i in range(100)]
and you get res which is a list of 100 lists (with size len(orig)) of uniformly distributed numbers over [min(orig), max(orig)).
Let's say we want to find the 2 items which have their value the closest to 10:
A = {'abc': 12.3, 'def': 17.3, 'dsfsf': 18, 'ppp': 3.2, "jlkljkjlk": 9.23}
It works with:
def nearest(D, centre, k=10):
return sorted([[d, D[d], abs(D[d] - centre)] for d in D], key=lambda e: e[2])[:k]
print(nearest(A, centre=10, k=2))
[['jlkljkjlk', 9.23, 0.7699999999999996], ['abc', 12.3, 2.3000000000000007]]
But is there a Python built-in way to do this and/or a more optimized version when the dict has a much larger size (hundreds of thousands of items)?
If you do not mind using Pandas:
import pandas as pd
closest = (pd.Series(A) - 10).abs().sort_values()[:2]
#jlkljkjlk 0.77
#abc 2.30
closest.to_dict()
#{'jlkljkjlk': 0.7699999999999996, 'abc': 2.3000000000000007}
You could use heapq.nsmallest():
from heapq import nsmallest
A = {'abc': 12.3, 'def': 17.3, 'dsfsf': 18, 'ppp': 3.2, 'jlkljkjlk': 9.23}
def nearest(D, centre, k=10):
return [[x, D[x], abs(D[x] - centre)] for x in nsmallest(k, D, key=lambda x: abs(D[x] - centre))]
print(nearest(A, centre=10, k=2))
# [['jlkljkjlk', 9.23, 0.7699999999999996], ['abc', 12.3, 2.3000000000000007]]
As far as time complexity, this runs in O(n log(k)) time instead of O(n log(n)) of the solution based on sorting the dictionary.
Given you need to perform a lookup quite often, we can make this an O(log n) algorithm, by first storing the data in a sorted list:
from operator import itemgetter
ks = sorted(A.items(), key=itemgetter(1))
vs = list(map(itemgetter(1), ks))
Then for each item we can use the bisect.bisect_left point to determine the insertion point. We can then check the two surrounding values, to check the smallest, and return the corresponding key. It is also possible that
from bisect import bisect_left
from operator import itemgetter
def closests(v):
idx = bisect_left(vs, v)
i, j = max(0, idx-1), min(idx+2, len(ks))
part = ks[i:j]
return sorted([[*pi, abs(pi[-1]-v)] for pi in part], key=itemgetter(-1))[:2]
The above might not look as an improvement, but here we will always evaluate at most three elements in the sorted(..), and bisect_left will evaluate a logarithmic number of elements.
For example:
>>> closests(1)
[['ppp', 3.2, 2.2], ['jlkljkjlk', 9.23, 8.23]]
>>> closests(3.2)
[['ppp', 3.2, 0.0], ['jlkljkjlk', 9.23, 6.03]]
>>> closests(5)
[['ppp', 3.2, 1.7999999999999998], ['jlkljkjlk', 9.23, 4.23]]
>>> closests(9.22)
[['jlkljkjlk', 9.23, 0.009999999999999787], ['abc', 12.3, 3.08]]
>>> closests(9.24)
[['jlkljkjlk', 9.23, 0.009999999999999787], ['abc', 12.3, 3.0600000000000005]]
The "loading" phase thus takes O(n log n) (with n the number of elements). Then if we generalize the above method to fetch k elements (by increasing the slice), it would take O(log n + k log k) to perform a lookup.
I have a collection of key value pairs like this:
{
'key1': [value1_1, value2_1, value3_1, ...],
'key2': [value1_2, value2_2, value3_2, ...],
...
}
and also a list which is in the same order as the values list, which contains the weight each variable should have applied. So it looks like [weight_1, weight_2, weight_3, ...].
My goal is to end up with an ordered list of keys in accordance to which has the highest overall score of values. Note that the values aren't all standardized / normalized, so value1_x could range from 1 - 10 but value 2_x could range from 1 - 100000. This has been the tricky part for me as I have to normalize the data somehow.
I'm trying to make this algorithm run to scale for many different values, so it would take the same amount of time for 1 or for 100 (or at least logarithmically more time). Is that possible? Is there any really efficient way I can go about this?
You can't get linear-time, but you can do it faster; this looks like a matrix-multiply to me, so I suggest you use numpy:
import numpy as np
keys = ['key1', 'key2', 'key3']
values = np.matrix([
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4]
])
weights = np.matrix([[10., 20., 30., 40.]]).transpose()
res = (values * weights).transpose().tolist()[0]
items = zip(res, keys)
items.sort(reverse=True)
which gives
[(330.0, 'key3'), (230.0, 'key2'), (130.0, 'key1')]
Edit: with thanks to #Ondro for np.dot and to #unutbu for np.argsort, here is an improved version entirely in numpy:
import numpy as np
# set up values
keys = np.array(['key1', 'key2', 'key3'])
values = np.array([
[1.1, 1.2, 1.3, 1.4], # values1_x
[2.1, 2.2, 2.3, 2.4], # values2_x
[3.1, 3.2, 3.3, 3.4] # values3_x
])
weights = np.array([10., 20., 30., 40.])
# crunch the numbers
res = np.dot(values, -weights) # negative of weights!
order = res.argsort(axis=0) # sorting on negative value gives
# same order as reverse-sort; there does
# not seem to be any way to reverse-sort
# directly
sortedkeys = keys[order].tolist()
which results in ['key3', 'key2', 'key1'].
Here's a normalization function, that will linearly transform your values into [0,1]
def normalize(val, ilow, ihigh, olow, ohigh):
return ((val-ilow) * (ohigh-olow) / (ihigh - ilow)) + olow
Now, use normalize to compute a new dictionary with normalized values. Then, sort by the weighted sum:
def sort(d, weights, ranges):
# ranges is a list of tuples containing the lower and upper bounds of the corresponding value
newD = {k:[normalize(v,ilow, ihigh, 0, 1) for v,(ilow, ihigh) in zip(vals, ranges)] for k,val in d.iteritems()} # d.items() in python3
return sorted(newD, key=lambda k: sum(v*w for v,w in zip(newD[k], weights)))