Binning a list in groups python - python

I have a list:
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
I want to group the elements in list in group size difference of 10. (i.e, 0-10,10-20,20-30,30-40...etc)
For eg:
Output that I'm looking for is:
[ [2,4,5,6,7,8,10],[12],[96],[192],[300],[360],[480],[504] ]
I tried using:
list(zip(*[iter(l)] * 10))
But getting wrong answer.

Use itertools.groupby to group together after dividing(//) it by 10
from itertools import groupby
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
groups = []
for _, g in groupby(l, lambda x: (x-1)//10):
groups.append(list(g)) # Store group iterator as a list
print(groups)
Output:
[[2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0], [12.0], [96.0], [192.0], [480.0], [360.0], [504.0], [300.0]]

A defaultdict might not be bad for this, it's not in one pass, but you can sort the keys to keep everything in place. The integer divide by 10 will bin everything for you
groups = defaultdict(list)
for i in l:
groups[int((i-1)//10)].append(i)
groups_list = sorted(groups.values())
groups_list[[2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0], [12.0], [96.0], [192.0], [300.0], [360.0], [480.0], [504.0]]

Even though, an answer is accepted, here is another way :
l = [2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, 12.0,96.0, 192.0, 480.0, 360.0, 504.0, 300.0]
l1 = [int(k) for k in l]
l2 = list(list([k for k in l1 if len(str(k))==j]) for j in range(1,len(str(max(l1))) +1))
OUTPUT :
l2 = [[2, 4, 5, 6, 7, 8], [10, 12, 96], [192, 480, 360, 504, 300]]

It can be sub listed using dictionary : the key for dict will be value-1/10 if same key comes value will be appended:
gd={}
for i in l:
k=int((i-1)//10)
if k in gd:
gd[k].append(i)
else:
gd[k]=[i]
print(gd.values())

You can loop over you list l and create a new list using extend and an if condition:
smaller_list = []
larger_list = []
desired_result_list = []
for element in l:
if element <= 10:
smaller_list.extend([element])
else:
larger_list.append([element])
desired_result_list.extend(larger_list + [smaller_list])

Related

How to group a dictionary by the first character of their key-values and sort them in ascending order?

I'd like to group the dictionary by the first character of their key-value, find the minimum and maximum value and sort them in ascending order of the maximum value found.
dict = {'1,1': [1.0, 2.0], '3,1': [5.0, 8.0], '2,2': [3.0, 9.0], '2,1': [3.0, 11.0]}
The dictionary after grouping, finding the max and min value, and sort in ascending order of their maximum values should be:
dict = {'1': [1.0, 2.0], '3': [5.0, 8.0], '2': [3.0, 11.0]}
First you can keep concatenating the lists grouped by k[0], and then take minimum and maximum of the lists:
dct = {'1,1': [1.0, 2.0], '3,1': [5.0, 8.0], '2,2': [3.0, 9.0], '2,1': [3.0, 11.0]}
output = {}
for k, v in dct.items():
output[k[0]] = output.get(k[0], []) + v
output = {k: [min(v), max(v)] for k, v in output.items()}
print(output) # {'1': [1.0, 2.0], '3': [5.0, 8.0], '2': [3.0, 11.0]}
Alternatively, if you are willing to use defaultdict:
from collections import defaultdict # this at the beginning of the script
output = defaultdict(list)
for k, v in dct.items():
output[k[0]] += v
output = {k: [min(v), max(v)] for k, v in output.items()}
this works but maybe someone has a more elegant answer:
dictionnary = {'1,1': [1.0, 2.0], '3,1': [5.0, 8.0], '2,2': [3.0, 9.0], '2,1': [3.0, 11.0]}
a = [i[0] for i in dictionnary.keys()]
b = dict.fromkeys(a)
for i in b:
b[i] = []
for j in dictionnary:
if j[0] == i:
if b[i]:
if dictionnary[j][0]<b[i][0]:
b[i][0] = dictionnary[j][0]
if dictionnary[j][1]>b[i][1]:
b[i][1] = dictionnary[j][1]
else:
b[i] = dictionnary[j]
b
Output:
{'1': [1.0, 2.0], '3': [5.0, 8.0], '2': [3.0, 11.0]}
Also, you shouldn't overwrite the builtin python dict

How to convert a list containing an even number of floats into a string divided by lists whose size is half of that even number? [duplicate]

This question already has answers here:
Split list into smaller lists (split in half)
(21 answers)
Closed 1 year ago.
Say you have this list of floats assuming we have an even number of entries :
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
How could one turn this into this string :
[1.0, 2.0, 3.0][4.0, 5.0, 6.0]
If we had 10 elements we would have :
[1.0, 2.0, 3.0, 4.0, 5.0][6.0, 7.0, 8.0, 9.0, 10.0]
etc.
I tried :
list_to_str = ' '.join([str(e) for e in total_list])
final_str = '[' + list_to_str + ']'
But with this, the first '[' and the last ']' are placed only at the beginning and at the end of the string... the middle ones are missing...
Try this:
total_list = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
str(total_list[:len(total_list)//2]) + str(total_list[len(total_list)//2:])
#'[1.0, 2.0, 3.0][4.0, 5.0, 6.0]'
you can try list-comprehension
' '.join(map(str, [total_list[i: i+len(total_list)//2] for i in range(0, len(total_list), len(total_list)//2)]))
'[1.0, 2.0, 3.0] [4.0, 5.0, 6.0]'
I'd try something like this.
yourLst = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
middle = len(yourLst)//2
lsts = [yourLst[:middle],yourLst[middle:]]
yourString = ''.join(str(lst) for lst in lsts)
output
[1.0, 2.0, 3.0][4.0, 5.0, 6.0]
and for those who crave one line code,
yourLst = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
yourString = ''.join(str(lst) for lst in [yourLst[:len(yourLst)//2],yourLst[len(yourLst)//2:]])

Calculating mean and standard deviation and ignoring 0 values

I have a list of lists with sublists all of which contain float values.
For example the one below has 2 lists with sublists each:
mylist = [[[2.67, 2.67, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]], [[2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 0.0, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]]]
I want to calculate the standard deviation and the mean of the sublists and what I applied was this:
mean = [statistics.mean(d) for d in mylist]
stdev = [statistics.stdev(d) for d in mylist]
but it takes also the 0.0 values that I do not want because I turned them to 0 in order not to be empty ones. Is there a way to ignore these 0s as they do not exist in the sublist?To not take them under consideration at all? I could not find a way for how I am doing it.
You can use numpy's nanmean and nanstd functions.
import numpy as np
def zero_to_nan(d):
array = np.array(d)
array[array == 0] = np.NaN
return array
mean = [np.nanmean(zero_to_nan(d)) for d in mylist]
stdev = [np.nanstd(zero_to_nan(d)) for d in mylist]
You can do this with a list comprehension.
The following lambda function flattens the nested list into a single list and filters out all zeros:
flatten = lambda nested: [x for sublist in nested for x in sublist if x != 0]
Note that the list comprehension has two for and one ifstatement similar to this code snippet, which does essentially the same:
flat_list = []
for sublist in nested:
for x in sublist:
if x != 0:
flat_list.append(x)
To apply this to your list you can use map. The map function will return an iterator. To get a list we need to pass the iterator to list:
flat_list = list(map(flatten, myList))
Now you can calculate the mean and standard deviation:
mean = [statistics.mean(d) for d in flat]
stdev = [statistics.stdev(d) for d in flat]
print(mean)
print(stdev)
mean = [statistics.mean(d) for d in mylist if d != 0]
stdev = [statistics.stdev(d) for d in mylist if d != 0]
Try:
mean = [statistics.mean([k for k in d if k]) for d in mylist]
stdev = [statistics.stdev([k for k in d if k]) for d in mylist]

How to efficiently do a grid search for parameter combinations in Python?

Problem
For a computation engineering model, I want to do a grid search for all feasible parameter combinations. Each parameter has a certain possibility range, e.g. (0 … 100) and the parameter combination must fulfil the condition a+b+c=100. An example:
ranges = {
'a': (95, 99),
'b': (1, 4),
'c': (1, 2)}
increment = 1.0
target = 100.0
So the combinations that fulfil the condition a+b+c=100 are:
[(95, 4, 1), (95, 3, 2), (96, 2, 2), (96, 3, 1), (97, 1, 2), (97, 2, 1), (98, 1, 1)]
This algorithm should run with any number of parameters, range lengths, and increments.
My solutions (so far)
The solutions I have come up with are all brute-forcing the problem. That means calculating all combinations and then discarding the ones that do not fulfil the given condition:
def solution1(ranges, increment, target):
combinations = []
for parameter in ranges:
combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
# np.arange() is exclusive of the upper bound, let's fix that
if combinations[-1][-1] != ranges[parameter][1]:
combinations[-1].append(ranges[parameter][1])
combinations = list(itertools.product(*combinations))
df = pd.DataFrame(combinations, columns=ranges.keys())
# using np.isclose() so that the algorithm works for floats
return df[np.isclose(df.sum(axis=1), target)]
Since I ran into RAM problems with solution1(), I used itertools.product as an iterator.
def solution2(ranges, increment, target):
combinations = []
for parameter in ranges:
combinations.append(list(np.arange(ranges[parameter][0], ranges[parameter][1], increment)))
# np.arange() is exclusive of the upper bound, let's fix that
if combinations[-1][-1] != ranges[parameter][1]:
combinations[-1].append(ranges[parameter][1])
result = []
for combination in itertools.product(*combinations):
# using np.isclose() so that the algorithm works for floats
if np.isclose(sum(combination), target):
result.append(combination)
df = pd.DataFrame(result, columns=ranges.keys())
return df
However, this quickly takes a few days to compute. Hence, both solutions are not viable for large number of parameters and ranges. For instance, one set that I am trying to solve is (already unpacked combinations variable):
[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], [0.0, 1.0, 2.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], [0.0], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0], [0.0]]
This results in memory use of >40 GB for solution1() and calculation time >400 hours for solution2().
Question
Do you see a solution that is either faster or more intelligent, i.e. not trying to brute-force the problem?
P.S.: I am not 100% sure if this question would be a better fit on one of the other Stackexchange sites. Please suggest in the comments if you think it should be moved and I will delete it here.
Here is a recursive solution:
a = [95, 100]
b = [1, 4]
c = [1, 2]
Params = (a, b, c)
def GetValidParamValues(Params, constriantSum, prevVals):
validParamValues = []
if (len(Params) == 1):
if (constriantSum >= Params[0][0] and constriantSum <= Params[0][1]):
validParamValues.append(constriantSum)
for v in validParamValues:
print(prevVals + v)
return
sumOfLowParams = sum([Params[x][0] for x in range(1, len(Params))])
sumOfHighParams = sum([Params[x][1] for x in range(1, len(Params))])
lowEnd = max(Params[0][0], constriantSum - sumOfHighParams)
highEnd = min(Params[0][1], constriantSum - sumOfLowParams) + 1
if (len(Params) == 2):
for av in range(lowEnd, highEnd):
bv = constriantSum - av
if (bv <= Params[1][1]):
validParamValues.append([av, bv])
for v in validParamValues:
print(prevVals + v)
return
for av in range(lowEnd, highEnd):
nexPrevVals = prevVals + [av]
subSeParams = Params[1:]
GetValidParamValues(subSeParams, constriantSum - av, nexPrevVals)
GetValidParamValues(Params, 100)
The idea is that if there were 2 parameters, a and b, we could list all the valid pairs by passing through the values of a, and taking (ai, S - ai) and just checking if S-ai is a valid value for b.
This is improved on since we can calculate ahead of time which values of ai will make S-ai a valid value for b, so we never check values that don't work.
When the number of params is more than 2, we can again look at every valid value of ai, and we know the sum of the other numbers must be S - ai. So the only thing we need is every possible way for the other numbers to add to S - ai, which is the same problem with one fewer parameter. So by using recursion we can get it go all the way down to size 2 and solve it.

Unique value in list of lists in Python

I have following list of lists in python :
[
u'aaaaa',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
u'zzzzzz',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
u'bbbbb',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
[1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']
]
I want following output in python :
[
[u'aaaaa', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time']],
[u'zzzzzz', [1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'], [2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time']],
[u'bbbbb', [1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]
]
Please suggest me how can it possible with manage order in python.
The following should give you the desired output. It uses a dictionary to spot duplicate entries.
entries = [
u'aaaaa', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
u'zzzzzz', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
u'bbbbb',
[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time'],
[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'],
[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time'],
[1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]
d = {}
output = []
entry = []
for item in entries:
if type(item) == type([]):
t = tuple(item)
if t not in d:
d[t] = 0
entry.append(item)
else:
if len(entry):
output.append(entry)
entry = [item]
output.append(entry)
print output
This gives the following output:
[[u'aaaaa', [1, 6, u'testing', 20.0, 18.0, 2.0, 'In time']], [u'zzzzzz', [1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going'], [2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time']], [u'bbbbb', [1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]]
Tested using Python 2.7
Update: If a list of lists format is needed, simply add [] to item in the above script as follows::
entry.append([item])
This would give the following output:
[[u'aaaaa', [[1, 6, u'testing', 20.0, 18.0, 2.0, 'In time']]], [u'zzzzzz', [[1, 1, u'xyz ', 30.0, 25.0, 5.0, 'On Going']], [[2, 1, u'abcd', 10.0, 8.0, 2.0, 'In time']]], [u'bbbbb', [[1, 7, u'develop', 20.0, 15.0, 5.0, 'On Going']]]]
If you want all unique values from a list:
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
mylist = [list(x) for x in set(tuple(x) for x in testdata)]
print myset # This is now a set containing all unique values.
# This will not maintain the order of the items
1) I really think you should check out Python dictionaries. They would make much more sense, looking at the kind of output you want.
2) In this case, if I understand you correctly, you want to convert a list with elements that are either strings or lists into a list of lists. This list of lists should have a starting element as a string, and the remaining elements as the following list items within the main list, till you hit the next string. (At least that's what it looks like from your example).
output_list = []
for elem in main_list:
if isinstance(elem,basestring):
output_list.append([elem])
else:
output_list[-1].append(elem)

Categories