Averaging values within a list of lists - python

I'm messing around with lists of lists containing strings and values EX: LofL = [["string", 4.0, 1.1, -3.0, -7.2],["string", 2.0, -1.0, 3.3], ["string", 4.4, 5.5, -6.6, 1.1]] and I'm trying to take the values within each list within the list, and average them as long as the values are not below 0. For example the first would be 5.1/2 since the third digit is negative. This in the end would make the List of lists look like: LofL =[["string", 5.1/2],["string", 2/1], ["string", 9.9/2]]. I've tried this so far:
LofL = *see above example*
avgLofL = LofL
for sublist in LofL:
while sublist in range(1,len(sublist)) > 0.0:
rowavg = [sum(sublist) / range(1,len(sublist)) for sublist in LofL]
for sublist in avgLofL:
for sublist in range(1,len(sublist)):
avgLofL.append(rowavg)
return avgLofL
It says my rowavg isn't referenced before assingment, but when I intitialize it as rowavg = 0 my list has no length. I'm unsure where I'm making a mistake

This is a possible solution:
from statistics import mean
avgLofL = [[next(x for x in lst if isinstance(x, str)),
mean(x for x in lst if not isinstance(x, str) and x >= 0)]
for lst in LofL]

Ok, I think this is what you actually asked for:
LofL = [["string", 4.0, 1.1, -3.0, -7.2],
["string", 2.0, -1.0, 3.3],
["string", 4.4, 5.5, -6.6, 1.1]]
avgLofL = []
for row in LofL:
sublist = []
for x in row[1:]:
if x>=0:
sublist.append(x)
else:
break
avgLofL.append([row[0], sum(sublist)/float(len(sublist))])
print(avgLofL)
Its result seems to match the example:
[['string', 2.55], ['string', 2.0], ['string', 4.95]]
It processes one row at a time, completely. Assumes that the first element is a string which should be kept, then collects the other elements in sublist until it finds a negative one. Then calculates the average of the collection, builds and stores a "[string,average]" pair, and continues with the next row.
In its current form it will die on having a negative number right at the start (division by zero). You can either drop an explicit if somewhere, or some dirty hack, like sum(sublist)/max(1,float(len(sublist))).

def avgofList(LofL):
avgLofL = []
for sublist in LofL:
total = 0
count = 0
for item in sublist:
if isinstance(item, float) and item > 0:
total += item
count += 1
sl = [x for x in sublist if not isinstance(x, float)]
sl.append(total / count)
avgLofL.append(sl)
return avgLofL

Very similar to the answer provided by #tevemadar except that no intermediate lists are used and also accounts for the first negative number being in index 1 of the sub-lists. It does however assume that the sublists are not empty and that the first element is to be retained.
LofL = [["string", 4.0, 1.1, -3.0, -7.2],
["string", 2.0, -1.0, 3.3],
["string", 4.4, 5.5, -6.6, 1.1]]
def process(e):
t, n = 0, 0
for v in e[1:]:
if v >= 0:
t += v
n += 1
else:
break
return [e[0], t / n] if n > 0 else [e[0]]
result = [process(e) for e in LofL]
print(result)
Output:
[['string', 2.55], ['string', 2.0], ['string', 4.95]]

Related

How to iterate over a list, modify it but avoid an error?

I have two variable lenght lists extracted from an excel file. One has wagon number and the other the wagon weight, something like this:
wagon_list = [1234567, 2345678, 3456789, 4567890]
weight_list = [1.1, 2.2, 3.3, 4.4]
Sometimes the wagon_list will have a duplicate number, I need to sum the wagon weight and remove the duplicate from both:
wagon_list = [1234567, 2345678, 2345678, 4567890]
weight_list = [1.1, 2.2, 3.3, 4.4]
should become:
wagon_list = [1234567, 2345678, 4567890]
weight_list = [1.1, 5.5, 4.4]
My first option was to pop items and sum them while iterating with a for loop. It didnt work because (after some research) you cant change a list youre iterating over.
So I moved to the second option, using an auxiliary list. It doesnt work when it hits the last index. Even after some tweaking of my code, I cant find a solution.
I can see it would have further problems if the last three elements were to be added.
counter_3 = 0
for i in wagon_list:
if i == wagon_list[-1]: #last entry, simply appends to the new list. This comes first because the next option returns error if running the last entry as i
new_wagon_list.append(wagon_list[counter_3])
new_weight_list.append(weight_list[counter_3])
counter_3 +=2
elif i != wagon_list[(counter_3 + 1)]: #if they are different, appends.
new_wagon_list.append(wagon_list[counter_3])
new_weight_list.append(weight_list[counter_3])
counter_3 += 1
elif i == wagon_list[(counter_3 + 1)]: #if equal to next item, appends the wagon and sums the weights
new_wagon_list.append(wagon_list[counter_3])
new_weight_list.append(weight_list[counter_3] + weight_list[counter_3 + 1])
This should return:
wagon_list = [1234567, 2345678, 4567890]
weight_list = [1.1, 5.5, 4.4]
But returns
wagon_list = [1234567, 2345678, 3456789, 3456789, 3456789]
weight_list = [1.1, 2.2, 7.7, 7.7, 3.3]
Here is a simple way, using defaultdict (hence the result is correct even if wagon_list is unordered). You could also use groupby but then you have to sort both lists so that duplicate wagons are consecutive.
This solution requires a single pass through the lists, and doesn't change the order of the lists. It just removes duplicate wagons and adds their weight.
from collections import defaultdict
def group_weights(wagon_list, weight_list):
ww = defaultdict(float)
for wagon, weight in zip(wagon_list, weight_list):
ww[wagon] += weight
return list(ww), list(ww.values())
Example
# set up MRE
wagon_list = [1234567, 2345678, 2345678, 4567890]
weight_list = [1.1, 2.2, 3.3, 4.4]
new_wagon_list, new_weight_list = group_weights(wagon_list, weight_list)
>>> new_wagon_list
[1234567, 2345678, 4567890]
>>> new_weight_list
[1.1, 5.5, 4.4]
Addendum
If you'd like to avoid defaultdict altogether, you can also simply do this (same result as above):
ww = {}
for k, v in zip(wagon_list, weight_list):
ww[k] = ww.get(k, 0) + v
new_wagon_list, new_weight_list = map(list, zip(*ww.items()))
Explanation
A quick review of some of the tools and syntax used above:
zip(*iterables) "Make an iterator that aggregates elements from each of the iterables." So e.g.:
for x, y in zip(wagon_list, weight_list):
print(f'x={x}, y={y}')
# prints out
x=1234567, y=1.1
x=2345678, y=2.2
x=2345678, y=3.3
x=4567890, y=4.4
dict.get(key[, default]) "Return the value for key if key is in the dictionary, else default." In other words, with ww[k] = ww.get(k, 0) + v, we are saying: add v to ww[k], but if it doesn't exist yet, then use 0 as a starting point.
The last bit (new_wagon_list, new_weight_list = map(list, zip(*ww.items()))) uses the idiom that "zip() in conjunction with the * operator can be used to unzip a list" (or, in this case, an iterator of tuples key, value obtained from dict.items()). Without the map(list, ...), we would get tuples in the two variables. I thought you may want to stick with lists, so we apply list(.) to each tuple before assigning to new_wagon_list resp. new_weight_list.
Modifying a list that you're iterating over doesn't work out well. I'd zip the two lists together and use itertools.groupby:
>>> from itertools import groupby
>>> wagon_list = [1234567, 2345678, 2345678, 4567890]
>>> weight_list = [1.1, 2.2, 3.3, 4.4]
>>> wagon_list, weight_list = map(list, zip(*(
... (wagon, sum(weight for _, weight in group))
... for wagon, group in groupby(sorted(
... zip(wagon_list, weight_list)
... ), key=lambda t: t[0])
... )))
>>> wagon_list
[1234567, 2345678, 4567890]
>>> weight_list
[1.1, 5.5, 4.4]
Use a dictionary to combine the values:
In [1]: wagon_list = [1234567, 2345678, 2345678, 4567890]
...: weight_list = [1.1, 2.2, 3.3, 4.4]
Out[1]: [1.1, 2.2, 3.3, 4.4]
In [2]: together = {}
Out[2]: {}
In [3]: for k, v in zip(wagon_list, weight_list):
...: together[k] = together.setdefault(k, 0) + v
...:
In [4]: together
Out[4]: {1234567: 1.1, 2345678: 5.5, 4567890: 4.4}
In [6]: new_wagon_list = list(together.keys())
Out[6]: [1234567, 2345678, 4567890]
In [7]: new_weight_list = list(together.values())
Out[7]: [1.1, 5.5, 4.4]
No fluff, frills, dependency or mystery version. Either an index for the current wagon is going to be found, allowing us to pinpoint the weight index to modify or no index is found and we append both of the new values.
Your entire problem revolves around "Does this already exist?". When using any Iterable, we can answer that question with index. index throws an Exception if no index is found so, we wrap it in try and treat except as an else.
def wagon_filter(wagons:list, weights:list) -> tuple:
#pre-zip and clear so we can reuse the references
data = zip(wagons, weights)
wagons, weights = [], []
#reassign
for W, w in data:
try: #(W)agon exists? modify it's (w)eight index
i = wagons.index(W)
weights[i] += w
except: #else append new (W)agon and (w)eight
wagons.append(W)
weights.append(w)
return wagons, weights
usage:
#data
wagons = [1234567, 2345678, 2345678, 4567890]
weights = [1.1, 2.2, 3.3, 4.4]
#print filter results
print(*wagon_filter(wagons, weights), sep='\n')
#[1234567, 2345678, 4567890]
#[1.1, 5.5, 4.4]

Sum of duplicate values in 2d array

So, I'm sure similar questions have been asked before but I couldn't find quite what I need.
I have a program that outputs a 2D array like the one below:
arr = [[0.2, 3], [0.3, "End"], ...]
There may be more or less elements, but each is a 2-element array, where the first value is a float and the second can be a float or a string.
Both of those values may repeat. In each of those arrays, the second element takes on only a few possible values.
What I want to do is sum the first elements' value within the arrays that have the same value of the second element and output a similar array that does not have those duplicated values.
For example:
input = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
output = [[0.5, 1.5], [0.4, 3.5], [0.85, "End"]]
I'd appreciate if the output array was sorted by this second element (floats ascending, strings at the end), although it's not necessary.
EDIT: Thanks for both answers; I've decided to use the one by Chris, because the code was more comprehensible to me, although groupby seems like a function designed to solved this very problem, so I'll try to read up on that, too.
UPDATE: The values of floats were always positive, by nature of the task at hand, so I used negative values to stop the usage of any strings - now I have a few if statements that check for those "encoded" negative values and replace them with strings again just before they're printed out, so sorting is now easier.
You could use a dictionary to accumulate the sum of the first value in the list keyed by the second item.
To get the 'string' items at the end of the list, the sort key could be set to positive infinity, float('inf'), in the sort key .
input_ = [[0.4, 1.5], [0.1, 1.5], [0.8, "End"], [0.05, "End"], [0.2, 3.5], [0.2, 3.5]]
d = dict()
for pair in input_:
d[pair[1]] = d.get(pair[1], 0) + pair[0]
L = []
for k, v in d.items():
L.append([v,k])
L.sort(key=lambda x: x[1] if type(x[1]) == float else float('inf'))
print(L)
This prints:
[[0.5, 1.5], [0.4, 3.5], [0.8500000000000001, 'End']]
You can try to play with itertools.groupby:
import itertools
out = [[key, sum([elt[0]for elt in val])] for key, val in itertools.groupby(a, key=lambda elt: elt[1])]
>>> [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Explanation:
Groupby the 2D list according to the 2nd element of each sublist using itertools.groupby and the key parameters. We define the lambda key=lambda elt: elt[1] to groupby on the 2nd element:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(key, val)
# 1.5 <itertools._grouper object at 0x0000026AD1F6E160>
# End <itertools._grouper object at 0x0000026AD2104EF0>
# 3.5 <itertools._grouper object at 0x0000026AD1F6E160>
For each value of the group, compute the sum using the buildin function sum:
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
print(sum([elt[0]for elt in val]))
# 0.5
# 0.8500000000000001
# 0.4
Compute the desired output:
out = []
for key, val in itertools.groupby(a, key=lambda elt: elt[1]):
out.append([sum([elt[0]for elt in val]), key])
print(out)
# [[0.5, 1.5], [0.8500000000000001, 'End'], [0.4, 3.5]]
Then you said about sorting on the 2nd value but there are strings and numbers, it's quite a problem for the computer. It can't make a choice between a number and a string. Objects must be comparable.

Using list index values to compute list value changes in Python

I have a list of lists for years and unemployment rates:
datalist= [[1947, 3.9], [1948, 3.8], [1949, 5.9], [1950, 5.3], [1951, 3.3],
[1952, 3.0], [1953, 2.9], [1954, 5.5], [1955, 4.4] . . .]
I am able to modify the year (adding 1 to it) via
def newYear(a):
newList = datalist[:]
for i in range(len(newList)):
newList[i][0] += 1
return newList
I'm looking to create a new list of lists with the year and percent change in the unemployment rate from the previous year. I tried adding
b, a and c to the function, but I don't know how to make this work.
def newYear(a):
newList = datalist[:]
for i in range(len(newList)):
newList[i][0] += 1
b = newList[i + 1][1]
a = newList[i][1]
c = (b-a)/a * 100
return newList
Trying to stick as close as possible to your current approach, you can try this. Looking forwards (i.e. using x+1 indexing) makes things more difficult for you; the first year in your list cannot have a percentage change from the previous year. You would also get an IndexError by using range(len(a)) when you got to the last item in your list. So it's more natural to use x and then look one period backwards (x-1).
datalist= [[1947, 3.9], [1948, 3.8], [1949, 5.9], [1950, 5.3], [1951, 3.3],
[1952, 3.0], [1953, 2.9], [1954, 5.5], [1955, 4.4]]
def newYear(a):
new_list = []
new_list.append([a[0][0], a[0][1], 0]) # No change for first year
for x in range(1, len(a)):
year = a[x][0]
previous_unemployment = a[x-1][1]
current_unemployment = a[x][1]
percent_change = ((current_unemployment - previous_unemployment)
/ previous_unemployment)*100
new_list.append([year, current_unemployment, percent_change])
return new_list
calc_percentages = newYear(datalist)
print calc_percentages

find indices of duplicate floats in a list

I have an input of a very large list of floating point numbers, a sample is given
[1.2, 2.4, 3.1, 4.0, 5.6, 6.5, 1.2, 3.1, 8.1, 23.6, 29.3]
I want to find all the duplicates and their index i.e. location in the list. The duplicates will only occur as a pair; never more than twice.
The output should be like
1.2 1 7
3.1 3 8
so there are just two entries 1.2 and 3.1 which occur as duplicates, and their positions are 1, 7 and 3, 8 respectively.
any suggestions with python?
Taking xi_'s answer a bit further. By adding the list comprehension it will provide a list of all indices that contain the value.
x = [1.2, 2.4, 3.1, 4.0, 5.6, 6.5, 1.2, 3.1, 8.1, 23.6, 29.3]
for el in set(x):
if x.count(el) > 1:
print el, " ".join([str(index) for index, value in enumerate(x) if value == el])
You will get an output of: (0-based index)
1.2 0 6
3.1 2 7
Edit
Explanation of [str(index) for index, value in enumerate(x) if value == el]
This is enumerating x which creates an enumerate object of the list which will return tuple pairs of (<index>, <value>)
Then it loops through this enumerate object using the for index, value in enumerate(x)
The if value == el checks each value and if it is equal to el then we evaluate, otherwise we do nothing.
The str(index) is the part that gets evaluated based on the condition we defined above. It returns a string version of index which is an integral type.
This will provide a list (all the code between the [ and ]) which will then be passed to the string method join(list) which joins all of the items in the list with the value in the " " (in this case a space, it could be any string.) providing a string of space separated values from the list that was created.
I also assume that you may even want this data later on other than just printing it. Here is a version to do that. This creates an empty dictionary y = {} then we create a new entry with a key of the value (el), providing it a list of the indices.
x = [1.2, 2.4, 3.1, 4.0, 5.6, 6.5, 1.2, 3.1, 8.1, 23.6, 29.3]
y = {}
for el in set(x):
if x.count(el) > 1:
y[el] = [str(index) for index, value in enumerate(x) if value == el]
If you do a print y this is what you should get:
{3.1: ['2', '7'], 1.2: ['0', '6']}
Edit2
To print y so that it matches the output you specified. Do something like this:
print "\n".join(["{} {}".format(key, " ".join(vals)) for key, vals in y.iteritems()])
output:
3.1 2 7
1.2 0 6
What this is doing is iterating through the dictionary y with: (for key, vals in y.iteritems()) making a string of "<key> <values...>" with: ("{} {}".format(key, " ".join(vals))) This returns a list of strings so we join them using "\n" to make them on each line.
Now it is important to note that since dictionary is a hash, that your output order of the keys will not be sorted. If you want to do that, then you could change the code above to this:
print "\n".join(["{} {}".format(key, " ".join(y[key])) for key in sorted(y.keys())])
output:
1.2 0 6
3.1 2 7
You could try something like that:
x = [1.2, 2.4, 3.1, 4.0, 5.6, 6.5, 1.2, 3.1, 8.1, 23.6, 29.3]
for el in set(x):
if x.count(el) > 1:
print el, x.count(el), len(x) - x[::-1].index(el)
Output (element with duplicates, quantity, index of last occurrence):
1.2 2 7
3.1 2 8

Merging duplicate lists and deleting a field in each list depending on the value in Python

I am still a beginner in Python. I have a tuple to be filtered, merged and sorted.
The tuple looks like this:
id, ts,val
tup = [(213,5,10.0),
(214,5,20.0),
(215,5,30.0),
(313,5,60.0),
(314,5,70.0),
(315,5,80.0),
(213,10,11.0),
(214,10,21.0),
(215,10,31.0),
(313,10,61.0),
(314,10,71.0),
(315,10,81.0),
(315,15,12.0),
(314,15,22.0),
(215,15,32.0),
(313,15,62.0),
(214,15,72.0),
(213,15,82.0] and so on
Description about the list: The first column(id)can have only these 6 values 213,214,215,313,314,315 but in any different order. The second column(ts) will have same values for every 6 rows. Third column(val) will have some random floating point values
Now my final result should be something like this:
result = [(5,10.0,20.0,30.0,60.0,70.0,80.0),
(10,11.0,21.0,31.0,61.0,71.0,81.0),
(15,82.0,72.0,32.0,62.0,22.0,12.0)]
That is the first column in each row is to be deleted. There should be only one unique row for each unique value in the second column. so the order of each result row should be:
(ts,val corresponding to id 213,val corresponding to 214, corresponding to id 215,val corresponding to 313,corresponding to id 314,val corresponding to 315)
Note : I am restricted to use only the standard python libraries. So panda, numpy cannot be used.
I tried a lot of possibilities but couldnt solve it. Please help me do this. Thanks in advance.
You can use itertools.groupby
from itertools import groupby
result=[]
for i,g in groupby(lst, lambda x:x[1]):
group= [i]+map(lambda x:x[-1],sorted(list(g),key=lambda x:x[0]))
result.append(tuple(group))
print result
Output:
[(5, 10.0, 20.0, 30.0, 60.0, 70.0, 80.0),
(10, 11.0, 21.0, 31.0, 61.0, 71.0, 81.0),
(15, 82.0, 72.0, 32.0, 62.0, 22.0, 12.0)]
With a slight change to your code you can fix it. If you change i[1] in ssd[cnt] to i[1] == ssd[cnt][0] your code may work. Also in else part you should add another list to ssd because you are creating another set of data. Also if the data should come according to their id's you should sort them by (ts,id). After applying the changes:
tup.sort( key = lambda x: (x[1],x[0]) )
ssd = [[]]
cnt = 0
ssd[0].append(tup[0][1])
for i in tup:
if i[1] == ssd[cnt][0]:
ssd[cnt].append(i[2])
else:
cnt = cnt + 1
ssd.append([])
ssd[cnt].append(i[1])
ssd[cnt].append(i[2])
Output
[[5, 10.0, 20.0, 30.0, 60.0, 70.0, 80.0],
[10, 11.0, 21.0, 31.0, 61.0, 71.0, 81.0],
[15, 82.0, 72.0, 32.0, 62.0, 22.0, 12.0]]
Here's a vanilla python solution, although I do think that using groupby is more pythonic. This does have the disadvantage that it has to build the dicts in memory, so it won't scale to a large tup list.
This does, however, obey the ordering requirement.
from collections import defaultdict
tup = ...
tup_dict = defaultdict(dict)
for id, ts, val in tup:
print id, ts, val
tup_dict[ts][id] = val
for tup_key in sorted(tup_dict):
id_dict = tup_dict[tup_key]
print tuple([tup_key] + [ id_dict[id_key] for id_key in sorted(id_dict)])
We want to iterate on a sorted instance of your tup, unpacking the items as we go, but first we need an auxiliary variable to store the keys and a variable to store our results
keys, res = [], []
for t0, t1, t2 in sorted(tup, key=lambda x:(x[1],x[0])):
the key argument is a lambda function that instructs thesorted` function to sort on the second and the first item of each element in the individual tuple --- so here we have the body of the loop
if t1 not in keys:
keys.append[t1]
res.append([t1])
that is, if the second integer in the tuple was not already processed, we have to memorize the fact that it's being processed and we want to add a new list in our result variable, that starts with the value of the second integer
To finish the operation on an individual tuple, we are sure that there is a list in res that starts with t1, indexing the aux variable we know the index of that list and so we can append the float to it...
i = keys.index(t1)
res[i].append(t2)
To have all of that in short
keys, res = [], []
for t0, t1, t2 in sorted(tup, key=lambda x:(x[1],x[0])):
if t1 not in keys:
keys.append[t1]
res.append([t1])
i = keys.index(t1)
res[i].append(t2)
Now, in res you have a list of lists, if you really need a list of tuples you can convert with a list comprehension
res = [tuple(elt) for elt in res]
adding to the answer of #Ahsanul Haque he also need it in order so instead of list(g) do sorted(g,key=lambda y:y[0]) you can also do the use tuple from the start
for i,g in groupby(tup,lambda x:x[1]):
gro = (i,) + tuple(map(lambda x:x[-1],sorted(g,key=lambda y:y[0])))
resul.append(gro)

Categories