Variable Length Slice in List Comprehension - python

Suppose I have a numpy array of step sizes, N, and a set of variables V of length np.sum(N). For example:
N = np.array([2,3,2])
V = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7])
Preferably using list comprehension, how do I slice V such that the result is a list of lists, split by the steps in N?
For example:
foo(V, N)
> [[0.1,0.2], [0.3, 0.4,0.5], [0.6,0.7]]

numpy has a split() function that will give you unequal arrays. It wants indices rather than lengths. You can do this with cumsum() and just ignore the last empty value (or leftovers if you don't account for the whole list).
N = np.array([2,3,2])
V = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7])
np.split(V, N.cumsum())[:-1]
# [array([0.1, 0.2]), array([0.3, 0.4, 0.5]), array([0.6, 0.7])]

How about this?
Vi = iter(V)
[[next(Vi) for _ in range(n)] for n in N]
# [[0.1, 0.2], [0.3, 0.4, 0.5], [0.6, 0.7]]

I managed to solve the problem using itertools.islice:
def UnequalDivide(self, iterable, chunks):
it = iter(iterable)
return [list(islice(it, c)) for c in chunks]

Here is how you can do that with regular lists and slices:
N = [2,3,2]
V = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
def foo(N,V):
n, lst = 0, []
for i,v in enumerate(N, 1):
lst.append(V[n:n+v])
n += v
return lst
print(foo(N,V))
Output:
[[0.1, 0.2], [0.3, 0.4, 0.5], [0.6, 0.7]]
You can do the same with numpy arrays:
import numpy as np
N = np.array([2,3,2])
V = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7])
def foo(N,V):
n, lst = 0, []
for i,v in enumerate(N, 1):
lst.append(V[n:n+v].tolist())
n += v
return lst
print(foo(N,V))
Output:
[[0.1, 0.2], [0.3, 0.4, 0.5], [0.6, 0.7]]

Related

Get indices greater than value and keep value

I have a 2D array that looks like this:
[[0.1, 0.2, 0.4, 0.6, 0.9]
[0.3, 0.7, 0.8, 0.3, 0.9]
[0.7, 0.9, 0.4, 0.6, 0.9]
[0.1, 0.2, 0.6, 0.6, 0.9]]
And I want to save the indices where the array is higher than 0.6 but I also want to keep the value of that position, so the output would be:
[0, 3, 0.6]
[0, 4, 0.9]
[1, 2, 0.7]
and so on.
To get the indices I did this:
x = np.where(PPCF> 0.6)
high_pc = np.asarray(x).T.tolist()
but how do I keep the value in a third position?
Simple, no loops:
x = np.where(PPCF > 0.6) # condition to screen values
vals = PPCF[x] # find values by indices
np.concatenate((np.array(x).T, vals.reshape(vals.size, 1)), axis = 1) # resulting array
Feel free to convert it to a list.
This should work :
x = np.where(PPCF> 0.6)
high_pc = np.asarray(x).T.tolist()
for i in high_pc:
i.append(float(PPCF[i[0],i[1]]))
You could just run a loop along the columns and rows and check if each element is greater than the threshold and save them in a list.
a = [[0.1, 0.2, 0.4, 0.6, 0.9],
[0.3, 0.7, 0.8, 0.3, 0.9],
[0.7, 0.9, 0.4, 0.6, 0.9],
[0.1, 0.2, 0.6, 0.6, 0.9]]
def find_ix(a, threshold = 0.6):
res_list = []
for i in range(len(a)):
for j in range(len(a[i])):
if a[i][j] >= threshold:
res_list.append([i, j, a[i][j]])
return res_list
print("Resulting list = \n ", find_ix(a))
import numpy as np
arr = np.array([[0.1, 0.2, 0.4, 0.6, 0.9],
[0.3, 0.7, 0.8, 0.3, 0.9],
[0.7, 0.9, 0.4, 0.6, 0.9],
[0.1, 0.2, 0.6, 0.6, 0.9]])
rows, cols = np.where(arr > 0.6) # Get rows and columns where arr > 0.6
values = arr[rows, cols] # Get all values > 0.6 in arr
result = np.column_stack((rows, cols, values)) # Stack three columns to create final array
"""
Result -
[ 0. 4. 0.9]
[ 1. 1. 0.7]
[ 1. 2. 0.8]
[ 1. 4. 0.9]
[ 2. 0. 0.7]
[ 2. 1. 0.9]
[ 2. 4. 0.9]
[ 3. 4. 0.9]]
"""
You can convert result into a list.

Why does random.shuffle fail on numpy lists?

I have an array of row vectors, upon which I run random.shuffle:
#!/usr/bin/env python
import random
import numpy as np
zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1. ]])
iterations = 100000
f = 0
for _ in range(iterations):
random.shuffle(zzz)
if np.array_equal(zzz[0], zzz[1]):
print(zzz)
f += 1
print(float(f)/float(iterations))
Between 99.6 and 100% of the time, using random.shuffle on zzz returns a list with the same elements in it, e.g.:
$ ./test.py
...
[[ 0.1 0.2 0.3 0.4 0.5]
[ 0.1 0.2 0.3 0.4 0.5]]
0.996
Using numpy.random.shuffle appears to pass this test and shuffle row vectors correctly. I'm curious to know why random.shuffle fails.
If you look at the code of random.shuffle it performs swaps in the following way:
x[i], x[j] = x[j], x[i]
which for a numpy.array would fail, without raising any error. Example:
>>> zzz[1], zzz[0] = zzz[0], zzz[1]
>>> zzz
array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.1, 0.2, 0.3, 0.4, 0.5]])
The reason is that Python first evaluates the right hand side completely and then make the assignment (this is why with Python single line swap is possible) but for a numpy array this is not True.
numpy
>>> arr = np.array([[1],[1]])
>>> arr[0], arr[1] = arr[0]+1, arr[0]
>>> arr
array([[2],
[2]])
Python
>>> l = [1,1]
>>> l[0], l[1] = l[0]+1, l[0]
>>> l
[2, 1]
Try it like this :
#!/usr/bin/env python
import random
import numpy as np
zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1. ]])
iterations = 100000
f = 0
for _ in range(iterations):
random.shuffle(zzz[0])
random.shuffle(zzz[1])
if np.array_equal(zzz[0], zzz[1]):
print(zzz)
f += 1
print(float(f)/float(iterations))
In [200]: zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
...: [0.6, 0.7, 0.8, 0.9, 1. ]])
...:
In [201]: zl = zzz.tolist()
In [202]: zl
Out[202]: [[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1.0]]
random.random is probably using an in-place assignment like:
In [203]: zzz[0],zzz[1]=zzz[1],zzz[0]
In [204]: zzz
Out[204]:
array([[0.6, 0.7, 0.8, 0.9, 1. ],
[0.6, 0.7, 0.8, 0.9, 1. ]])
Note the replication.
But applied to a list of lists:
In [205]: zl[0],zl[1]=zl[1],zl[0]
In [206]: zl
Out[206]: [[0.6, 0.7, 0.8, 0.9, 1.0], [0.1, 0.2, 0.3, 0.4, 0.5]]
In [207]: zl[0],zl[1]=zl[1],zl[0]
In [208]: zl
Out[208]: [[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1.0]]
I tested zl = list(zzz) and still got the array behavior. This zl is a list with views of zzz. tolist makes a list of lists thats totally independent ofzzz`.
In short random.random cannot handle inplace modifications of a ndarray correctly. np.random.shuffle is designed to work with the 1st dim of an array, so it gets it right.
correct assignment for ndarray is:
In [211]: zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
...: [0.6, 0.7, 0.8, 0.9, 1. ]])
...:
In [212]: zzz[[0,1]] = zzz[[1,0]]
In [213]: zzz
Out[213]:
array([[0.6, 0.7, 0.8, 0.9, 1. ],
[0.1, 0.2, 0.3, 0.4, 0.5]])
In [214]: zzz[[0,1]] = zzz[[1,0]]
In [215]: zzz
Out[215]:
array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1. ]])

How to sum/average a specific subset of columns or rows and return the new ndarray in numpy?

For the sake of illustration, imaging I have the following ndarray:
x = [[0.5, 0.3, 0.1, 0.1],
[0.4, 0.1, 0.3, 0.2],
[0.4, 0.3, 0.2, 0.1],
[0.6, 0.1, 0.1, 0.2]]
I want to sum the two vectors at columns 1 and 2 (starting the count from 0) so that the new ndarray would be:
y = [[0.5, 0.4, 0.1],
[0.4, 0.4, 0.2],
[0.4, 0.5, 0.1],
[0.6, 0.2, 0.2]]
And then, I want to average the vectors at rows 1 and 2 so that the final result would be:
z = [[0.5, 0.4, 0.1 ],
[0.4, 0.45, 0.15],
[0.6, 0.2, 0.2 ]]
Is there an efficient way to do that in numpy in one command? I really need efficiency as this operation is going to be applied in a nested loop.
Thanks in advance
#hpaulj s solution is very good, be sure to read it
You can sum columns quite easily:
a_summed = np.sum(a[:,1:3], axis=1)
You can also take the mean of multiple rows:
a_mean = np.mean(a[1:3], axis=0)
All you have to do is replace and delete the remaining columns, so it becomes:
import numpy as np
a_summed = np.sum(a[:,1:3], axis=1)
a[:, 1] = a_summed
a = np.delete(a, 2, 1)
a_mean = np.mean(a[1:3], axis=0)
a[1] = a_mean
a = np.delete(a, 2, 0)
print(a)
Since you are changing the original matrix size it would be better to do it in two steps as mentioned in the previous answers but, if you want to do it in one command, you could do it as follows and it makes for a nice generalized solution:
import numpy as np
x = np.array(([0.5, 0.3, 0.1, 0.1, 1],
[0.4, 0.1, 0.3, 0.2, 1],
[0.4, 0.3, 0.2, 0.1, 1],
[0.6, 0.1, 0.1, 0.2, 1]))
def sum_columns(matrix, col_start, col_end):
return np.column_stack((matrix[:, 0:col_start],
np.sum(matrix[:, col_start:col_end + 1], axis=1),
matrix[:, col_end + 1:]))
def avgRows_summedColumns(matrix, row_start, row_end):
return np.row_stack((matrix[0:row_start, :],
np.mean(matrix[row_start:row_end + 1, :], axis=0),
matrix[row_end:-1, :]))
# call the entire operation in one command
print(avgRows_summedColumns(sum_columns(x, 1, 2), 1, 2))
This way it doesn't matter how big your matrix is.
In [68]: x = [[0.5, 0.3, 0.1, 0.1],
...: [0.4, 0.1, 0.3, 0.2],
...: [0.4, 0.3, 0.2, 0.1],
...: [0.6, 0.1, 0.1, 0.2]]
In [69]: x=np.array(x)
ufunc like np.add have a reduceat method that lets us perform the action over groups of rows or columns. With that the first reduction is easy (but takes a little playing to understand the parameters):
In [70]: np.add.reduceat(x,[0,1,3], axis=1)
Out[70]:
array([[0.5, 0.4, 0.1],
[0.4, 0.4, 0.2],
[0.4, 0.5, 0.1],
[0.6, 0.2, 0.2]])
Apparently mean is not a ufunc, so I had to settle for add to reduce the rows:
In [71]: np.add.reduceat(Out[70],[0,1,3],axis=0)
Out[71]:
array([[0.5, 0.4, 0.1],
[0.8, 0.9, 0.3],
[0.6, 0.2, 0.2]])
and then divide by the row count to get the mean. I could generalize that to use the same [0,1,3] used in the reduceat, but for now just use a column array:
In [72]: np.add.reduceat(Out[70],[0,1,3],axis=0)/np.array([1,2,1])[:,None]
Out[72]:
array([[0.5 , 0.4 , 0.1 ],
[0.4 , 0.45, 0.15],
[0.6 , 0.2 , 0.2 ]])
and the whole thing in one expression:
In [73]: np.add.reduceat(np.add.reduceat(x,[0,1,3], axis=1),[0,1,3],axis=0)/ np.array([1,2,1])[:,None]
Out[73]:
array([[0.5 , 0.4 , 0.1 ],
[0.4 , 0.45, 0.15],
[0.6 , 0.2 , 0.2 ]])

Python: Adding list values to each other in a list of lists

I have a list of lists like this:
[[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]], etc.]
If the first and second element of an inner list is the same as the first and second element of another inner list (like the example above), I want to create a function that adds the remaining values and merges them into one list. The example output would be like this:
[12411.0, 31937, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.2, 0.25, 0.3, 0.2, 0.25, 0.25, 0.25, 0.25]
I'm having trouble how to tell Python to initially recognize and compare the two elements of the list before merging them together. Here is my best attempt so far:
def group(A):
for i in range(len(A)):
for j in range(len(A[i])):
if A[i][0:1] == A[i: ][0:1]:
return [A[i][0], A[i][1], sum(A[i][j+2], A[i: ][j+2])]
I get an index error, I believe, because of the A[i: ] and A[i: ][j+2] parts of the code. I don't know how to phrase it though in Python to tell the function to add any other lines that meet the criteria.
Here's a function that will merge all sublists where the first two entries match. It also handles cases where the sub-lists are not the same length:
from itertools import izip_longest
l = [[1,3,4,5,6], [1,3,2,2,2], [2,3,5,6,6], [1,1,1,1,1], [1,1,2,2,2], [1,3,6,2,1,1,2]]
l2 = [[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
def merge(l):
d = {}
for ent in l:
key = tuple(ent[0:2])
merged = d.get(key, None)
if merged is None:
d[key] = ent
else:
merged[2:] = [a+b for a,b in izip_longest(merged[2:], ent[2:], fillvalue=0)]
return d.values()
print merge(l)
print merge(l2)
Output:
[[1, 3, 12, 9, 9, 1, 2], [2, 3, 5, 6, 6], [1, 1, 3, 3, 3]]
[[12411.0, 31937.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.2, 0.25, 0.30000000000000004, 0.2, 0.25, 0.25, 0.25, 0.25]]
It's implemented by maintaining a dict where the keys are the first two entries of a sub-list (stored as a tuple). As we iterate over the sublists, we check to see if there's an entry in the dict. If there isn't, we store the current sublist in the dict. If there already is an entry, we add up all their values from index 2 onward, and update the dict. Once we're one iterating, we just return all the values from the dict.
This is one way to do it:
>>> a_list = [[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
>>> result = [a + b for a, b in zip(*a_list)]
>>> result[:2] = a_list[0][:2]
>>> result
[12411.0, 31937.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.25, 0.2, 0.25, 0.30000000000000004, 0.2, 0.25, 0.25, 0.25, 0.25]
This works by blindly adding up corresponding elements in all the sub-lists by doing:
[a + b for a, b in zip(*a_list)]
And then rewriting the first two elements of the result which according to the question does not change, by doing:
result[:2] = a_list[0][:2]
It is not evident from your question, as to what should the behavior be if the first two elements of the sub lists do not match. But the following snippet will help you check if the first two elements of the sub lists match. Lets assume a_list contains sublists whose first two elements do not match:
>>> a_list = [[12411.0, 31937.0, 0.1, 0.1], [12411.3, 31937.0, 0.1, 0.1]]
then, this condition:
all([True if list(a)[1:] == list(a)[:-1] else False for a in list(zip(*a_list))[:2]])
will return False. True otherwise. The code extracts the first elements and second elements of all the sub lists and then checks if they are equal.
You can include the above check in your code and modify your code accordingly for the expected behavior.
To sum it up:
a_list = [[12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.1, 0.15, 0.2, 0.1, 0.15, 0.15, 0.15, 0.15], [12411.0, 31937.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]
check = all([True if list(a)[1:] == list(a)[:-1] else False for a in list(zip(*a_list))[:2]])
result = []
if check:
result = [a + b for a, b in zip(*a_list)]
result[:2] = a_list[0][:2]
else:
# whatever the behavior should be.
This is a function that will take a list of lists A and check internal list i and j using your criteria. It will then either return the summed list you want or None if the first two elements don't match.
def check_internal_ij(A,i,j):
""" checks internal list i against internal list j """
if A[i][0:2] == A[j][0:2]:
new = [x+y for x,y in zip( A[i], A[j] )]
new[0:2] = A[i][0:2]
return new
else:
return None
Then you can run the function over all combinations of internal lists you want to check.
If you are fond of itertools with a little effort, this can easily be solved by playing around with groupby, islice, izip, imap and chain.
And off course you should also remember to use operator.itemgetter
Implementation
# Create a group of lists where the key (the first two elements of the lists) matches
groups = groupby(sorted(l, key = itemgetter(0, 1)), key = itemgetter(0, 1))
# zip the lists and then chop of the first two elements. Sum the elements of the resultant list
# Remember to add the newly accumulated list with the first two elements
groups_sum = ([k, imap(sum, islice(izip(*g), 2, None))] for k, g in groups )
# Reformat the final list to match the output format
[list(chain.from_iterable(elem)) for elem in groups_sum]
Implementation (If you are a fan of single liner)
[list(chain.from_iterable([k, imap(sum, islice(izip(*g), 2, None))]))
for k, g in groupby(sorted(l, key = itemgetter(0, 1)), key = itemgetter(0, 1))]
Sample Input
l = [[10,20,0.1,0.2,0.3,0.4],
[11,22,0.1,0.2,0.3,0.4],
[10,20,0.1,0.2,0.3,0.4],
[11,22,0.1,0.2,0.3,0.4],
[20,30,0.1,0.2,0.3,0.4],
[10,20,0.1,0.2,0.3,0.4]]
Sample Output
[[10, 20, 0.3, 0.6, 0.9, 1.2],
[11, 22, 0.2, 0.4, 0.6, 0.8],
[20, 30, 0.1, 0.2, 0.3, 0.4]]
Dissection
groups = groupby(sorted(l, key = itemgetter(0, 1)), key = itemgetter(0, 1))
# After grouping, similar lists gets clustered together
[((10, 20),
[[10, 20, 0.1, 0.2, 0.3, 0.4],
[10, 20, 0.1, 0.2, 0.3, 0.4],
[10, 20, 0.1, 0.2, 0.3, 0.4]]),
((11, 22), [[11, 22, 0.1, 0.2, 0.3, 0.4], [11, 22, 0.1, 0.2, 0.3, 0.4]]),
((20, 30), [[20, 30, 0.1, 0.2, 0.3, 0.4]])]
groups_sum = ([k, imap(sum, islice(izip(*g), 2, None))] for k, g in groups )
# Each group is accumulated from the second element onwards
[[(10, 20), [0.3, 0.6, 0.9, 1.2]],
[(11, 22), [0.2, 0.4, 0.6, 0.8]],
[(20, 30), [0.1, 0.2, 0.3, 0.4]]]
[list(chain.from_iterable(elem)) for elem in groups_sum]
# Now its just a matter of representing in the output format
[[10, 20, 0.3, 0.6, 0.9, 1.2],
[11, 22, 0.2, 0.4, 0.6, 0.8],
[20, 30, 0.1, 0.2, 0.3, 0.4]]

Get information out of sub-lists in main list elegantly

Ok, so here's my issue. I have a list composed of N sub-lists composed of M elements (floats) each. So in a general form it looks like this:
a_list = [b_list_1, b_list_2, ..., b_list_N]
with:
b_list_i = [c_float_1, c_float_2, ..., c_float_M]
For this example assume N=9 ; M=3, so the list looks like this:
a = [[1.1, 0.5, 0.7], [0.3, 1.4, 0.2], [0.6, 0.2, 1.], [1.1, 0.5, 0.3], [0.2, 1.1, 0.8], [1.1, 0.5, 1.], [1.2, 0.3, 0.6], [0.6, 0.4, 0.9], [0.6, 0.2, 0.5]]
I need to loop through this list identifying those items that share the same first two floats as the same item where the third float should be averaged before storing. This means I should check if an item was already identified as being repeated previously, so I do not identify it again as a new item.
To give a more clear idea of what I mean, this is what the output of processing list a should look like:
a_processed = [[1.1, 0.5, 0.67], [0.3, 1.4, 0.2], [0.6, 0.2, 0.75], [0.2, 1.1, 0.8], [1.2, 0.3, 0.6], [0.6, 0.4, 0.9]]
Note that the first item in this new list was identified three times in a (a[0], a[3] and a[5]) and so it was stored with its third float averaged ((0.7+0.3+1.)/3. = 0.67). The second item was not repeated in a so it was stored as is. The third item was found twice in a (a[2] and a[8]) and stored with its third float averaged ((1.+0.5)/2.=0.75). The rest of the items in the new list were not found as repeated in a so they were also stored with no modifications.
Since I know updating/modifying a list while looping through it is not recommended, I opted to use several temporary lists. This is the code I came up with:
import numpy as np
a = [[1.1, 0.5, 0.7], [0.3, 1.4, 0.2], [0.6, 0.2, 1.], [1.1, 0.5, 0.3],
[0.2, 1.1, 0.8], [1.1, 0.5, 1.], [1.2, 0.3, 0.6], [0.6, 0.4, 0.9],
[0.6, 0.2, 0.5]]
# Final list.
a_processed = []
# Holds indexes of elements to skip.
skip_elem = []
# Loop through all items in a.
for indx, elem in enumerate(a):
temp_average = []
temp_average.append(elem)
# Only process if not found previously.
if indx not in skip_elem:
for indx2, elem2 in enumerate(a[(indx+1):]):
if elem[0] == elem2[0] and elem[1] == elem2[1]:
temp_average.append(elem2)
skip_elem.append(indx2+indx+1)
# Store 1st and 2nd floats and averaged 3rd float.
a_processed.append([temp_average[0][0], temp_average[0][1],
round(np.mean([i[2] for i in temp_average]),2)])
This code works, but I'm wondering if there might be a more elegant/pythonic way of doing this. It just looks too convoluted (Fortran-esque I'd say) as is.
I think you can certainly make your code more concise and easier to read by using defaultdict to create a dictionary from the first two elements in each sublist to all the third items:
from collections import defaultdict
nums = defaultdict(list)
for arr in a:
key = tuple(arr[:2]) # make the first two floats the key
nums[key].append( arr[2] ) # append the third float for the given key
a_processed = [[k[0], k[1], sum(vals)/len(vals)] for k, vals in nums.items()]
Using this, I get the same output as you (albeit in a different order):
[[0.2, 1.1, 0.8], [1.2, 0.3, 0.6], [0.3, 1.4, 0.2], [0.6, 0.4, 0.9], [1.1, 0.5, 0.6666666666666666], [0.6, 0.2, 0.75]]
If the order of a_processed is an issue, you can use an OrderedDict, as pointed out by #DSM.
For comparison, here's the pandas approach. If this is really a data processing problem behind the scenes, then you can save yourself a lot of time that way.
>>> a
[[1.1, 0.5, 0.7], [0.3, 1.4, 0.2], [0.6, 0.2, 1.0], [1.1, 0.5, 0.3], [0.2, 1.1, 0.8], [1.1, 0.5, 1.0], [1.2, 0.3, 0.6], [0.6, 0.4, 0.9], [0.6, 0.2, 0.5]]
>>> df = pd.DataFrame(a)
>>> df.groupby([0,1]).mean()
2
0 1
0.2 1.1 0.800000
0.3 1.4 0.200000
0.6 0.2 0.750000
0.4 0.900000
1.1 0.5 0.666667
1.2 0.3 0.600000
This problem is common enough that it's a one-liner. You can use named columns, compute a host of other useful statistics, handle missing data, etc.

Categories