Categorizing data based on value of interger using Python - python

I have these two variables:
instance = [0.45,6.54,19.0,3.34,2.34]
distance_tolerance = [5.00,10.00,20.00]
I like to sort each data in instance and categorize it based on their value that is fewer than each data in distance_tolerance and save it in a variable.
For example 0.45 < 5.00 then make a variable to save 0.45 and iterate for every data.
Expected result:
data5 = [0.45,3.34,2.34]
data10 = [0.45,6.54,3.34,2,34]
data20 = [0.45,6.54,19.0,3.34,2.34]
I need to do looping for this task since the real data is large.
What is the best way to perform this task? thanks

You can simply iterative through and append the values of instance which are lower than the value of distance_tolerance that your on.
So for some element in distance_tolerance you could have a function called itemsLowerThan(value) which will return an array of the elements in instance that are lower than the value you pass.
For example:
instance = [0.45,6.54,19.0,3.34,2.34]
distance_tolerance = [5.00,10.00,20.00]
def itemsLowerThan(value):
arr = []
for item in instance:
if (item < value):
arr.append(item)
return arr
for tolerance in distance_tolerance:
print(itemsLowerThan(tolerance))
Would give the output:
[0.45, 3.34, 2.34]
[0.45, 6.54, 3.34, 2.34]
[0.45, 6.54, 19.0, 3.34, 2.34]

You can use a nested list comprehension.
[[i for i in instance if i < tolerance] for tolerance in distance_tolerance]
Which is equivalent to:
[
[0.45, 3.34, 2.34],
[0.45, 6.54, 3.34, 2,34],
[0.45, 6.54, 19.0, 3.34, 2.34],
]

Related

Sum of duplicate values in 2d array

So, I'm sure similar questions have been asked before but I couldn't find quite what I need.
I have a program that outputs a 2D array like the one below:
arr = [[0.2, 3], [0.3, "End"], ...]
There may be more or less elements, but each is a 2-element array, where the first value is a float and the second can be a float or a string.
Both of those values may repeat. In each of those arrays, the second element takes on only a few possible values.
What I want to do is sum the first elements' value within the arrays that have the same value of the second element and output a similar array that does not have those duplicated values.
For example:
input = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
output = [[0.5, 1.5], [0.4, 3.5], [0.85, "End"]]
I'd appreciate if the output array was sorted by this second element (floats ascending, strings at the end), although it's not necessary.
EDIT: Thanks for both answers; I've decided to use the one by Chris, because the code was more comprehensible to me, although groupby seems like a function designed to solved this very problem, so I'll try to read up on that, too.
UPDATE: The values of floats were always positive, by nature of the task at hand, so I used negative values to stop the usage of any strings - now I have a few if statements that check for those "encoded" negative values and replace them with strings again just before they're printed out, so sorting is now easier.
You could use a dictionary to accumulate the sum of the first value in the list keyed by the second item.
To get the 'string' items at the end of the list, the sort key could be set to positive infinity, float('inf'), in the sort key .
input_ = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
d = dict()
for pair in input_:
d[pair[1]] = d.get(pair[1], 0) + pair[0]
L = []
for k, v in d.items():
L.append([v,k])
L.sort(key=lambda x: x[1] if type(x[1]) == float else float('inf'))
print(L)
This prints:
[[0.5, 1.5], [0.4, 3.5], [0.8500000000000001, 'End']]
You can try to play with itertools.groupby:
import itertools
out = [[key, sum([elt[0]for elt in val])] for key, val in itertools.groupby(a, key=lambda elt: elt[1])]
>>> [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Explanation:
Groupby the 2D list according to the 2nd element of each sublist using itertools.groupby and the key parameters. We define the lambda key=lambda elt: elt[1] to groupby on the 2nd element:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(key, val)
# 1.5 <itertools._grouper object at 0x0000026AD1F6E160>
# End <itertools._grouper object at 0x0000026AD2104EF0>
# 3.5 <itertools._grouper object at 0x0000026AD1F6E160>
For each value of the group, compute the sum using the buildin function sum:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(sum([elt[0]for elt in val]))
# 0.5
# 0.8500000000000001
# 0.4
Compute the desired output:
out = []
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
out.append([sum([elt[0]for elt in val]), key])
print(out)
# [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Then you said about sorting on the 2nd value but there are strings and numbers, it's quite a problem for the computer. It can't make a choice between a number and a string. Objects must be comparable.

Replace random elements in list with random values in python

I have the following set of values stored in a list.
[-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
I want to be able to create 100 more lists that are similar to this one but contain randomly chosen different random values within the highest and lowest values of the original list. Let's consider a smaller example to better understand the problem. Let's consider that I have the list with highest lowest value -1 and highest value 7.2.
original list : [0.5, 0.8, 1.1, 2.5, 7.2, -1]
random list 1 : [0.5, 0.2, 1.4, 4.5, 6.2, -0.5]
random list 2 : [5.3, 0.3, 0.7, 2.3, 4.2, -0.1]
....
random list 100 : [0.5, 0.9, 1.1, 2,1, 6.5, -1]
The key is that not all values have to change(in some cases they can like in list 2 for example). Is there a straightforward way to accomplish this in Python?
Below code prints what you need as the output. First you have to find the max and min numbers in the original list and then you have to use random library and random.uniform() function to get what you need.
import random
original_list = [0.5, 0.8, 1.1, 2.5, 7.2, -1]
max_number = max(original_list)
min_number = min(original_list)
'''because you need 100 more lists'''
for i in range(100):
random_list = []
for j in range(len(original_list)):
random_list.append(round(random.uniform(min_number,max_number),1))
print('random list '+str(i+1)+' ', end='')
print(random_list)
smallest = min(original_list)
largest = max(original_list)
newlist1 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
newlist2 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
# and so on
Using list comprehension and numpy.random.uniform:
import numpy as np
orig = [-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
a = min(orig)
b = max(orig)
n = len(orig)
res = [[np.random.uniform(a,b,n)] for i in range(100)]
and you get res which is a list of 100 lists (with size len(orig)) of uniformly distributed numbers over [min(orig), max(orig)).

Best way to convert value in nested list to string

I have a nested list named value, and I need to convert all things inside into string type and join them together.
This is currently how I do it:
value=[['2014-11-20 10:51:50', 7.36, 7.63, 0.4487, 12.37, 10.4, 39.85, 52.27, 0.41, 0.78, 6],
['2014-11-20 11:22:07', 7.41, 7.67, 0.4489, 12.44, 6.6, 40.39, 53.98, 0.41, 0.754, 6]]
for i, n in enumerate(value):
for j, m in enumerate(value[i]):
value[i][j]=str(value[i][j])
",".join(value[i])
As I am new to Python, I would like to know is there a better or faster way to do it. Or maybe there is some built in functions that could do the job?
value = [ ",".join(map(str,i)) for i in value ]
map will convert all float type to str and then join will join them
if you didn't understand about map how it working:
value = [ ",".join(str(x) for x in i) for i in value ]

Fastest way to rank items with multiple values and weightings

I have a collection of key value pairs like this:
{
'key1': [value1_1, value2_1, value3_1, ...],
'key2': [value1_2, value2_2, value3_2, ...],
...
}
and also a list which is in the same order as the values list, which contains the weight each variable should have applied. So it looks like [weight_1, weight_2, weight_3, ...].
My goal is to end up with an ordered list of keys in accordance to which has the highest overall score of values. Note that the values aren't all standardized / normalized, so value1_x could range from 1 - 10 but value 2_x could range from 1 - 100000. This has been the tricky part for me as I have to normalize the data somehow.
I'm trying to make this algorithm run to scale for many different values, so it would take the same amount of time for 1 or for 100 (or at least logarithmically more time). Is that possible? Is there any really efficient way I can go about this?
You can't get linear-time, but you can do it faster; this looks like a matrix-multiply to me, so I suggest you use numpy:
import numpy as np
keys = ['key1', 'key2', 'key3']
values = np.matrix([
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4]
])
weights = np.matrix([[10., 20., 30., 40.]]).transpose()
res = (values * weights).transpose().tolist()[0]
items = zip(res, keys)
items.sort(reverse=True)
which gives
[(330.0, 'key3'), (230.0, 'key2'), (130.0, 'key1')]
Edit: with thanks to #Ondro for np.dot and to #unutbu for np.argsort, here is an improved version entirely in numpy:
import numpy as np
# set up values
keys = np.array(['key1', 'key2', 'key3'])
values = np.array([
[1.1, 1.2, 1.3, 1.4], # values1_x
[2.1, 2.2, 2.3, 2.4], # values2_x
[3.1, 3.2, 3.3, 3.4] # values3_x
])
weights = np.array([10., 20., 30., 40.])
# crunch the numbers
res = np.dot(values, -weights) # negative of weights!
order = res.argsort(axis=0) # sorting on negative value gives
# same order as reverse-sort; there does
# not seem to be any way to reverse-sort
# directly
sortedkeys = keys[order].tolist()
which results in ['key3', 'key2', 'key1'].
Here's a normalization function, that will linearly transform your values into [0,1]
def normalize(val, ilow, ihigh, olow, ohigh):
return ((val-ilow) * (ohigh-olow) / (ihigh - ilow)) + olow
Now, use normalize to compute a new dictionary with normalized values. Then, sort by the weighted sum:
def sort(d, weights, ranges):
# ranges is a list of tuples containing the lower and upper bounds of the corresponding value
newD = {k:[normalize(v,ilow, ihigh, 0, 1) for v,(ilow, ihigh) in zip(vals, ranges)] for k,val in d.iteritems()} # d.items() in python3
return sorted(newD, key=lambda k: sum(v*w for v,w in zip(newD[k], weights)))

matrix list comprehension mean

This is an offshoot of a previous question which started to snowball. If I have a matrix A and I want to use the mean/average of each row [1:] values to create another matrix B, but keep the row headings intact, how would I do this? I've included matrix A, my attempt at cobbling together a list comprehension, and the expected result.
from operator import sum,len
# matrix A with row headings and values
A = [('Apple',0.95,0.99,0.89,0.87,0.93),
('Bear',0.33,0.25.0.85,0.44,0.33),
('Crab',0.55,0.55,0.10,0.43,0.22)]
#List Comprehension
B = [(A[0],sum,A[1:]/len,A[1:]) for A in A]
Expected outcome
B = [('Apple', 0.926), ('Bear', 0.44), ('Crab', 0.37)]
Your list comprehension looks a little weird. You are using the same variable for the iterable and the item.
This approach seems to work:
def average(lst):
return sum(lst) / len(lst)
B = [(a[0], average(a[1:])) for a in A]
I've created a function average for readability. It matches your expected values, so I think that's what you want, although your use of mul suggests that I may be missing something.
Taking from #recursive and #Steven Rumbalski:
>>> def average(lst):
... return sum(lst) / len(lst)
...
>>> A = {
... 'Apple': (0.95, 0.99, 0.89, 0.87, 0.93),
... 'Bear': (0.33, 0.25, 0.85, 0.44, 0.33),
... 'Crab': (0.55, 0.55, 0.10, 0.43, 0.22),
... }
>>>
>>> B = [{key: average(values)} for key, values in A.iteritems()]
>>> B
[{'Apple': 0.92599999999999993}, {'Bear': 0.44000000000000006}, {'Crab': 0.37}]

Categories