I have two variable lenght lists extracted from an excel file. One has wagon number and the other the wagon weight, something like this:
wagon_list = [1234567, 2345678, 3456789, 4567890]
weight_list = [1.1, 2.2, 3.3, 4.4]
Sometimes the wagon_list will have a duplicate number, I need to sum the wagon weight and remove the duplicate from both:
wagon_list = [1234567, 2345678, 2345678, 4567890]
weight_list = [1.1, 2.2, 3.3, 4.4]
should become:
wagon_list = [1234567, 2345678, 4567890]
weight_list = [1.1, 5.5, 4.4]
My first option was to pop items and sum them while iterating with a for loop. It didnt work because (after some research) you cant change a list youre iterating over.
So I moved to the second option, using an auxiliary list. It doesnt work when it hits the last index. Even after some tweaking of my code, I cant find a solution.
I can see it would have further problems if the last three elements were to be added.
counter_3 = 0
for i in wagon_list:
if i == wagon_list[-1]: #last entry, simply appends to the new list. This comes first because the next option returns error if running the last entry as i
new_wagon_list.append(wagon_list[counter_3])
new_weight_list.append(weight_list[counter_3])
counter_3 +=2
elif i != wagon_list[(counter_3 + 1)]: #if they are different, appends.
new_wagon_list.append(wagon_list[counter_3])
new_weight_list.append(weight_list[counter_3])
counter_3 += 1
elif i == wagon_list[(counter_3 + 1)]: #if equal to next item, appends the wagon and sums the weights
new_wagon_list.append(wagon_list[counter_3])
new_weight_list.append(weight_list[counter_3] + weight_list[counter_3 + 1])
This should return:
wagon_list = [1234567, 2345678, 4567890]
weight_list = [1.1, 5.5, 4.4]
But returns
wagon_list = [1234567, 2345678, 3456789, 3456789, 3456789]
weight_list = [1.1, 2.2, 7.7, 7.7, 3.3]
Here is a simple way, using defaultdict (hence the result is correct even if wagon_list is unordered). You could also use groupby but then you have to sort both lists so that duplicate wagons are consecutive.
This solution requires a single pass through the lists, and doesn't change the order of the lists. It just removes duplicate wagons and adds their weight.
from collections import defaultdict
def group_weights(wagon_list, weight_list):
ww = defaultdict(float)
for wagon, weight in zip(wagon_list, weight_list):
ww[wagon] += weight
return list(ww), list(ww.values())
Example
# set up MRE
wagon_list = [1234567, 2345678, 2345678, 4567890]
weight_list = [1.1, 2.2, 3.3, 4.4]
new_wagon_list, new_weight_list = group_weights(wagon_list, weight_list)
>>> new_wagon_list
[1234567, 2345678, 4567890]
>>> new_weight_list
[1.1, 5.5, 4.4]
Addendum
If you'd like to avoid defaultdict altogether, you can also simply do this (same result as above):
ww = {}
for k, v in zip(wagon_list, weight_list):
ww[k] = ww.get(k, 0) + v
new_wagon_list, new_weight_list = map(list, zip(*ww.items()))
Explanation
A quick review of some of the tools and syntax used above:
zip(*iterables) "Make an iterator that aggregates elements from each of the iterables." So e.g.:
for x, y in zip(wagon_list, weight_list):
print(f'x={x}, y={y}')
# prints out
x=1234567, y=1.1
x=2345678, y=2.2
x=2345678, y=3.3
x=4567890, y=4.4
dict.get(key[, default]) "Return the value for key if key is in the dictionary, else default." In other words, with ww[k] = ww.get(k, 0) + v, we are saying: add v to ww[k], but if it doesn't exist yet, then use 0 as a starting point.
The last bit (new_wagon_list, new_weight_list = map(list, zip(*ww.items()))) uses the idiom that "zip() in conjunction with the * operator can be used to unzip a list" (or, in this case, an iterator of tuples key, value obtained from dict.items()). Without the map(list, ...), we would get tuples in the two variables. I thought you may want to stick with lists, so we apply list(.) to each tuple before assigning to new_wagon_list resp. new_weight_list.
Modifying a list that you're iterating over doesn't work out well. I'd zip the two lists together and use itertools.groupby:
>>> from itertools import groupby
>>> wagon_list = [1234567, 2345678, 2345678, 4567890]
>>> weight_list = [1.1, 2.2, 3.3, 4.4]
>>> wagon_list, weight_list = map(list, zip(*(
... (wagon, sum(weight for _, weight in group))
... for wagon, group in groupby(sorted(
... zip(wagon_list, weight_list)
... ), key=lambda t: t[0])
... )))
>>> wagon_list
[1234567, 2345678, 4567890]
>>> weight_list
[1.1, 5.5, 4.4]
Use a dictionary to combine the values:
In [1]: wagon_list = [1234567, 2345678, 2345678, 4567890]
...: weight_list = [1.1, 2.2, 3.3, 4.4]
Out[1]: [1.1, 2.2, 3.3, 4.4]
In [2]: together = {}
Out[2]: {}
In [3]: for k, v in zip(wagon_list, weight_list):
...: together[k] = together.setdefault(k, 0) + v
...:
In [4]: together
Out[4]: {1234567: 1.1, 2345678: 5.5, 4567890: 4.4}
In [6]: new_wagon_list = list(together.keys())
Out[6]: [1234567, 2345678, 4567890]
In [7]: new_weight_list = list(together.values())
Out[7]: [1.1, 5.5, 4.4]
No fluff, frills, dependency or mystery version. Either an index for the current wagon is going to be found, allowing us to pinpoint the weight index to modify or no index is found and we append both of the new values.
Your entire problem revolves around "Does this already exist?". When using any Iterable, we can answer that question with index. index throws an Exception if no index is found so, we wrap it in try and treat except as an else.
def wagon_filter(wagons:list, weights:list) -> tuple:
#pre-zip and clear so we can reuse the references
data = zip(wagons, weights)
wagons, weights = [], []
#reassign
for W, w in data:
try: #(W)agon exists? modify it's (w)eight index
i = wagons.index(W)
weights[i] += w
except: #else append new (W)agon and (w)eight
wagons.append(W)
weights.append(w)
return wagons, weights
usage:
#data
wagons = [1234567, 2345678, 2345678, 4567890]
weights = [1.1, 2.2, 3.3, 4.4]
#print filter results
print(*wagon_filter(wagons, weights), sep='\n')
#[1234567, 2345678, 4567890]
#[1.1, 5.5, 4.4]
I'm messing around with lists of lists containing strings and values EX: LofL = [["string", 4.0, 1.1, -3.0, -7.2],["string", 2.0, -1.0, 3.3], ["string", 4.4, 5.5, -6.6, 1.1]] and I'm trying to take the values within each list within the list, and average them as long as the values are not below 0. For example the first would be 5.1/2 since the third digit is negative. This in the end would make the List of lists look like: LofL =[["string", 5.1/2],["string", 2/1], ["string", 9.9/2]]. I've tried this so far:
LofL = *see above example*
avgLofL = LofL
for sublist in LofL:
while sublist in range(1,len(sublist)) > 0.0:
rowavg = [sum(sublist) / range(1,len(sublist)) for sublist in LofL]
for sublist in avgLofL:
for sublist in range(1,len(sublist)):
avgLofL.append(rowavg)
return avgLofL
It says my rowavg isn't referenced before assingment, but when I intitialize it as rowavg = 0 my list has no length. I'm unsure where I'm making a mistake
This is a possible solution:
from statistics import mean
avgLofL = [[next(x for x in lst if isinstance(x, str)),
mean(x for x in lst if not isinstance(x, str) and x >= 0)]
for lst in LofL]
Ok, I think this is what you actually asked for:
LofL = [["string", 4.0, 1.1, -3.0, -7.2],
["string", 2.0, -1.0, 3.3],
["string", 4.4, 5.5, -6.6, 1.1]]
avgLofL = []
for row in LofL:
sublist = []
for x in row[1:]:
if x>=0:
sublist.append(x)
else:
break
avgLofL.append([row[0], sum(sublist)/float(len(sublist))])
print(avgLofL)
Its result seems to match the example:
[['string', 2.55], ['string', 2.0], ['string', 4.95]]
It processes one row at a time, completely. Assumes that the first element is a string which should be kept, then collects the other elements in sublist until it finds a negative one. Then calculates the average of the collection, builds and stores a "[string,average]" pair, and continues with the next row.
In its current form it will die on having a negative number right at the start (division by zero). You can either drop an explicit if somewhere, or some dirty hack, like sum(sublist)/max(1,float(len(sublist))).
def avgofList(LofL):
avgLofL = []
for sublist in LofL:
total = 0
count = 0
for item in sublist:
if isinstance(item, float) and item > 0:
total += item
count += 1
sl = [x for x in sublist if not isinstance(x, float)]
sl.append(total / count)
avgLofL.append(sl)
return avgLofL
Very similar to the answer provided by #tevemadar except that no intermediate lists are used and also accounts for the first negative number being in index 1 of the sub-lists. It does however assume that the sublists are not empty and that the first element is to be retained.
LofL = [["string", 4.0, 1.1, -3.0, -7.2],
["string", 2.0, -1.0, 3.3],
["string", 4.4, 5.5, -6.6, 1.1]]
def process(e):
t, n = 0, 0
for v in e[1:]:
if v >= 0:
t += v
n += 1
else:
break
return [e[0], t / n] if n > 0 else [e[0]]
result = [process(e) for e in LofL]
print(result)
Output:
[['string', 2.55], ['string', 2.0], ['string', 4.95]]
So, I'm sure similar questions have been asked before but I couldn't find quite what I need.
I have a program that outputs a 2D array like the one below:
arr = [[0.2, 3], [0.3, "End"], ...]
There may be more or less elements, but each is a 2-element array, where the first value is a float and the second can be a float or a string.
Both of those values may repeat. In each of those arrays, the second element takes on only a few possible values.
What I want to do is sum the first elements' value within the arrays that have the same value of the second element and output a similar array that does not have those duplicated values.
For example:
input = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
output = [[0.5, 1.5], [0.4, 3.5], [0.85, "End"]]
I'd appreciate if the output array was sorted by this second element (floats ascending, strings at the end), although it's not necessary.
EDIT: Thanks for both answers; I've decided to use the one by Chris, because the code was more comprehensible to me, although groupby seems like a function designed to solved this very problem, so I'll try to read up on that, too.
UPDATE: The values of floats were always positive, by nature of the task at hand, so I used negative values to stop the usage of any strings - now I have a few if statements that check for those "encoded" negative values and replace them with strings again just before they're printed out, so sorting is now easier.
You could use a dictionary to accumulate the sum of the first value in the list keyed by the second item.
To get the 'string' items at the end of the list, the sort key could be set to positive infinity, float('inf'), in the sort key .
input_ = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
d = dict()
for pair in input_:
d[pair[1]] = d.get(pair[1], 0) + pair[0]
L = []
for k, v in d.items():
L.append([v,k])
L.sort(key=lambda x: x[1] if type(x[1]) == float else float('inf'))
print(L)
This prints:
[[0.5, 1.5], [0.4, 3.5], [0.8500000000000001, 'End']]
You can try to play with itertools.groupby:
import itertools
out = [[key, sum([elt[0]for elt in val])] for key, val in itertools.groupby(a, key=lambda elt: elt[1])]
>>> [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Explanation:
Groupby the 2D list according to the 2nd element of each sublist using itertools.groupby and the key parameters. We define the lambda key=lambda elt: elt[1] to groupby on the 2nd element:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(key, val)
# 1.5 <itertools._grouper object at 0x0000026AD1F6E160>
# End <itertools._grouper object at 0x0000026AD2104EF0>
# 3.5 <itertools._grouper object at 0x0000026AD1F6E160>
For each value of the group, compute the sum using the buildin function sum:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(sum([elt[0]for elt in val]))
# 0.5
# 0.8500000000000001
# 0.4
Compute the desired output:
out = []
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
out.append([sum([elt[0]for elt in val]), key])
print(out)
# [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Then you said about sorting on the 2nd value but there are strings and numbers, it's quite a problem for the computer. It can't make a choice between a number and a string. Objects must be comparable.
I have the following set of values stored in a list.
[-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
I want to be able to create 100 more lists that are similar to this one but contain randomly chosen different random values within the highest and lowest values of the original list. Let's consider a smaller example to better understand the problem. Let's consider that I have the list with highest lowest value -1 and highest value 7.2.
original list : [0.5, 0.8, 1.1, 2.5, 7.2, -1]
random list 1 : [0.5, 0.2, 1.4, 4.5, 6.2, -0.5]
random list 2 : [5.3, 0.3, 0.7, 2.3, 4.2, -0.1]
....
random list 100 : [0.5, 0.9, 1.1, 2,1, 6.5, -1]
The key is that not all values have to change(in some cases they can like in list 2 for example). Is there a straightforward way to accomplish this in Python?
Below code prints what you need as the output. First you have to find the max and min numbers in the original list and then you have to use random library and random.uniform() function to get what you need.
import random
original_list = [0.5, 0.8, 1.1, 2.5, 7.2, -1]
max_number = max(original_list)
min_number = min(original_list)
'''because you need 100 more lists'''
for i in range(100):
random_list = []
for j in range(len(original_list)):
random_list.append(round(random.uniform(min_number,max_number),1))
print('random list '+str(i+1)+' ', end='')
print(random_list)
smallest = min(original_list)
largest = max(original_list)
newlist1 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
newlist2 = [random.uniform(smallest, largest) for _ in range(len(original_list))]
# and so on
Using list comprehension and numpy.random.uniform:
import numpy as np
orig = [-1.7683218, 0.22206295, -0.28429198, 5.925369, -3.952484, -3.0728238, 0.09690776, -0.31914753, 3.9695702, 26.934353, 1.4882066, 1.8194668, -0.5614318, 1.2354431, -0.09714768, -0.15579335, -0.059994906, 1.0105655, -23.25607, 31.982368, -0.09390785, 0.17786688, 0.36164832, -4.673975, 13.495866, -3.57134, 0.5583399, -1.801314, 2.4207468, 2.0513844, -3.429592, -9.599998, 23.412394, -3.963623, 6.930485, 2.5186272, 0.6805691, -1.1615586, -0.915736, -2.6307302, -14.409785, 0.6327307, 10.512744, -0.09292421, -0.61977243, 0.35928893, -1.3844814, 8.098062, -0.8270248, 0.47219157, 0.089366496, 0.9056338, 1.5297629, 3.3246832, -0.9748858, 36.62332, -1.0525678, -0.87139374, 6.7600174, 36.210625, -0.25728267, 14.568578, 0.87466383, -4.2237897, -5.4309, 19.762472, 0.8426512, -0.7807278, 0.03435099, 12.787761, -4.9308186, -1.4322343, 0.49790275, -12.979129, 0.18121482, -0.81953144, -1.5393608, 17.757078, 3.5726204, -11.319154, -0.002896044, -1.8806648, 0.30027565, -2.6210017, 16.230186, -2.2566936, 37.37506, -2.7738526, -0.91440165, -3.652771, 1.8378688, -0.25519317, 0.5222581, 0.2189773, 23.825306, 0.3779062, 2.6709516, 0.84001434, -0.41394734, -0.600579, -3.1629875, 0.2880843, -3.9132822, 5.674796, -0.5569526, 0.30253112, -4.4269695, 4.5206604, -0.8477638, 0.0032483074, -2.2814171, 0.5524869, -1.4271426, -0.24263692, 1.0095457, -3.187037, -1.6656531, 1.4805393, 0.064992905, -4.8124804, -0.07194552, -0.28692132, -0.19502515, 0.010771384, -32.744797, 1.2642047, 6.3942785, -1.2971659, 29.70087, 0.19707158, -2.734262, 2.8497686, -1.710305, -1.3836008, 22.758884, -1.8488939, 4.1740856, 0.26019523, -8.814447, -3.937495, 0.22731477, -0.7874651, 17.22002, -7.89242, -0.5795766, 3.3960745, 1.0440702, 0.5483718, 1.2849183, -0.63732344, -40.38428, -4.25527, 3.034935, 0.25527972, -0.81940174, -7.0720696, 1.7420169, 14.904871, -1.5399592, 0.20110837, 0.1902977, 2.5790472, -28.560707, 0.09560776, -0.973604, 0.6214314, -5.1268454, -0.9104073, 33.082394, 0.23800176, -9.696023, 12.288443, -16.52249, -7.6811, -21.928356, 25.690449, -0.6803232, -1.4738222, -1.831514, 0.00013296002, -3.1330614, 3.6067219, -3.0617614, -6.334016, -24.856865, -6.0669985, 2.8829474, 0.76423097, -0.21836776, -2.3173273, -2.092735, -0.19577695, 4.2984896, 0.029742926, 1.0902604, -0.28707412, -0.1671038, -0.4607489, -15.966867, -1.7149612, -1.3445716, 1.400264, 4.906401, -6.314724, -0.92188597, -0.14341217, -6.819194, 1.2750683, 21.634096, 0.5503013, 5.2122655, -0.096101895, -0.69029164, 2.6239898, -26.33101, -3.7901835, 10.026649, 1.0661886, 0.8891293, 34.24628, -0.9036363, -4.4846773, -30.846636, -5.8609247, -0.018534392, 4.657759e-06, 16.96108, 10.725708, -0.3170653, -3.2331817, 0.73887914, 0.69840825, 0.9043666, 1.0727708, 1.6571997, -0.70257163, 2.4863558, 0.07501343, -35.059708, 0.72496796, -3.0723267, -3.2004805, -0.9447444, 0.56954986, 2.6018164, -0.49256825, 22.71359, 0.45523545, -2.1936522, 4.008838, 0.62327665, 10.315046, 1.4006382, 1.1290226, 1.2660133, -8.46607]
a = min(orig)
b = max(orig)
n = len(orig)
res = [[np.random.uniform(a,b,n)] for i in range(100)]
and you get res which is a list of 100 lists (with size len(orig)) of uniformly distributed numbers over [min(orig), max(orig)).