Computing mean for non-unique elements of numpy array pairs - python

I have three arrays, all of the same size:
arr1 = np.array([1.4, 3.0, 4.0, 4.0, 7.0, 9.0, 9.0, 9.0])
arr2 = np.array([2.3, 5.0, 2.3, 2.3, 4.0, 6.0, 5.0, 6.0])
data = np.array([5.4, 7.1, 9.5, 1.9, 8.7, 1.8, 6.1, 7.4])
arr1 can take up any float value and arr2 only a few float values. I want to obtain the unique pairs of arr1 and arr2, e.g.
arr1unique = np.array([1.4, 3.0, 4.0, 7.0, 9.0, 9.0])
arr2unique = np.array([2.3, 5.0, 2.3, 4.0, 6.0, 5.0])
For each non-unique pair I need to average the corresponding elements in the data-array, e.g. averaging the values 9.5 and 1.9 since the pair (arr1[3], arr2[3]) and (arr1[4], arr2[4]) are equal. The same holds for the values in data corresponding to the indices 6 and 8. The data array therefore becomes
dataunique = np.array([5.4, 7.1, 5.7, 8.7, 4.6, 6.1])

Here is a 'pure numpy' solution to the problem. Pure numpy in quotes because it relies on a numpy enhancement proposal which I am still working on, but you can find the full code here:
http://pastebin.com/c5WLWPbp
group_by((arr1, arr2)).mean(data)
Voila, problem solved. Way faster than any of the posted solutions; and much more elegant too, if I may say so myself ;).

defaultdict can help you here:
>>> import numpy as np
>>> arr1 = np.array([1.4, 3.0, 4.0, 4.0, 7.0, 9.0, 9.0, 9.0])
>>> arr2 = np.array([2.3, 5.0, 2.3, 2.3, 4.0, 6.0, 5.0, 6.0])
>>> data = np.array([5.4, 7.1, 9.5, 1.9, 8.7, 1.8, 6.1, 7.4])
>>> from collections import defaultdict
>>> dd = defaultdict(list)
>>> for x1, x2, d in zip(arr1, arr2, data):
... dd[x1, x2].append(d)
...
>>> arr1unique = np.array([x[0] for x in dd.iterkeys()])
>>> arr2unique = np.array([x[1] for x in dd.iterkeys()])
>>> dataunique = np.array([np.mean(x) for x in dd.itervalues()])
>>> print arr1unique
[ 1.4  7.   4.   9.   9.   3. ]
>>> print arr2unique
[ 2.3  4.   2.3  5.   6.   5. ]
>>> print dataunique
[ 5.4  8.7  5.7  6.1  4.6  7.1]
This method gives your answer, but destroys the ordering. If the ordering is important, you can do basically the same thing with collections.OrderedDict

Make a dictionary from arr1 as key and store its equivalent arr2 as value.for each save to dictionary generate its dataunique entry.If key already exists skip that iteration and continue.

All you have to is to create a OrderedDict to store the keys as pair of elements in (arr1,arr2) and the values as a list of elements in data. For any duplicate key (pair of arr1 and arr2), the duplicate entries would be stored in the list. You can then re-traverse the values in the dictionary and create the average. To get the unique keys, just iterate over the keys and split the tuples
Try the following
>>> d=collections.OrderedDict()
>>> for k1,k2,v in zip(arr1,arr2,data):
d.setdefault((k1,k2),[]).append(v)
>>> np.array([np.mean(v) for v in d.values()])
array([ 5.4, 7.1, 5.7, 8.7, 4.6, 6.1])
>>> arr1unique = np.array([e[0] for e in d])
>>> arr2unique = np.array([e[1] for e in d])

Related

Is there a way to add list elements that is equal to a given number with higher elements

I am trying to add list elements that are closest or equal to 15
I am assuming 1st element from the list as total.
It should add in total with 3rd element from top to bottom.
If total > 15 then it should not add in total and go for the next loop.
I am trying the below code, could you suggest here what I am doing wrong -
list1 = [
[5.0, 1.3, 6.6, 5.076923076923077],
[9.0, 1.5, 7.0, 4.666666666666667],
[4.0, 1.0, 4.0, 4.0],
[3.0, 2.0, 5.5, 2.75],
[7.0, 1.6, 3.5, 2.1875],
[2.0, 1.7, 3.5, 2.058823529411765],
[1.0, 3.0, 6.0, 2.0],
[6.0, 1.0, 2.0, 2.0],
[8.0, 2.5, 5.0, 2.0],
[10.0, 1.8, 1.0, 0.5555555555555556]
]
income = 15
total = 0
for i in list1:
if not (total + i[1] > 15):
total += i[1]
print(total)
the output should be 14.9
The problem is that you use a break.
You have to check that adding the current
number in your loop will not result in the
total sum being more than 15.
income = 15
total = 0
for i in list1:
if not (total + i[1] > income):
total += i[1]
But this code will not always work. because number might come in different orders there might be an order were it adds up to exactly 15 but that's a bit more complicated.

How to pass argument of type char ** from Python to C API [duplicate]

As seen here How do I convert a Python list into a C array by using ctypes? this code will take a python array and transform it to a C array.
import ctypes
arr = (ctypes.c_int * len(pyarr))(*pyarr)
Which would the way of doing the same with a list of lists or a lists of lists of lists?
For example, for the following variable
list3d = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
I have tried the following with no luck:
([[ctypes.c_double * 4] *2]*3)(*list3d)
# *** TypeError: 'list' object is not callable
(ctypes.c_double * 4 *2 *3)(*list3d)
# *** TypeError: expected c_double_Array_4_Array_2 instance, got list
Thank you!
EDIT: Just to clarify, I am trying to get one object that contains the whole multidimensional array, not a list of objects. This object's reference will be an input to a C DLL that expects a 3D array.
It works with tuples if you don't mind doing a bit of conversion first:
from ctypes import *
list3d = [
[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0]],
[[0.2, 1.2, 2.2, 3.2], [4.2, 5.2, 6.2, 7.2]],
[[0.4, 1.4, 2.4, 3.4], [4.4, 5.4, 6.4, 7.4]],
]
arr = (c_double * 4 * 2 * 3)(*(tuple(tuple(j) for j in i) for i in list3d))
Check that it's initialized correctly in row-major order:
>>> (c_double * 24).from_buffer(arr)[:]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,
0.2, 1.2, 2.2, 3.2, 4.2, 5.2, 6.2, 7.2,
0.4, 1.4, 2.4, 3.4, 4.4, 5.4, 6.4, 7.4]
Or you can create an empty array and initialize it using a loop. enumerate over the rows and columns of the list and assign the data to a slice:
arr = (c_double * 4 * 2 * 3)()
for i, row in enumerate(list3d):
for j, col in enumerate(row):
arr[i][j][:] = col
I made the change accordingly
a = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
arr = (((ctypes.c_float * len(a[0][0])) * len(a[0])) * len(a))
arr_instance=arr()
for i in range(0,len(a)):
for j in range(0,len(a[0])):
for k in range(0,len(a[0][0])):
arr_instance[i][j][k]=a[i][j][k]
The arr_instance is what you want.

Convert an array stored as a string, to a proper numpy array

For long and tedious reasons, I have lots of arrays that are stored as strings:
tmp = '[[1.0, 3.0, 0.4]\n [3.0, 4.0, -1.0]\n [3.0, 4.0, 0.1]\n [3.0, 4.0, 0.2]]'
Now I obviously do not want my arrays as long strings, I want them as proper numpy arrays so I can use them. Consequently, what is a good way to convert the above to:
tmp_np = np.array([[1.0, 3.0, 0.4]
[3.0, 4.0, -1.0]
[3.0, 4.0, 0.1]
[3.0, 4.0, 0.2]])
such that I can do simple things like tmp_np.shape = (4,3) or simple indexing tmp_np[0,:] = [1.0, 3.0, 0.4] etc.
Thanks
You can use ast.literal_eval, if you replace your \n characters with ,:
temp_np = np.array(ast.literal_eval(tmp.replace('\n', ',')))
Returns:
>>> tmp_np
array([[ 1. , 3. , 0.4],
[ 3. , 4. , -1. ],
[ 3. , 4. , 0.1],
[ 3. , 4. , 0.2]])

how to merge the values of a list of lists and a list into 1 resulting list of lists

I have a list of lists (a) and a list (b) which have the same "length" (in this case "4"):
a = [
[1.0, 2.0],
[1.1, 2.1],
[1.2, 2.2],
[1.3, 2.3]
]
b = [3.0, 3.1, 3.2, 3.3]
I would like to merge the values to obtain the following (c):
c = [
[1.0, 2.0, 3.0],
[1.1, 2.1, 3.1],
[1.2, 2.2, 3.2],
[1.3, 2.3, 3.3]
]
currently I'm doing the following to achieve it:
c = []
for index, elem in enumerate(a):
x = [a[index], [b[index]]] # x assigned here for better readability
c.append(sum(x, []))
my feeling is that there is an elegant way to do this...
note: the lists are a lot larger, for simplicity I shortened them. they are always(!) of the same length.
In python3.5+ use zip() within a list comprehension and in-place unpacking:
In [7]: [[*j, i] for i, j in zip(b, a)]
Out[7]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
In python 2 :
In [8]: [j+[i] for i, j in zip(b, a)]
Out[8]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
Or use numpy.column_stack in numpy:
In [16]: import numpy as np
In [17]: np.column_stack((a, b))
Out[17]:
array([[ 1. , 2. , 3. ],
[ 1.1, 2.1, 3.1],
[ 1.2, 2.2, 3.2],
[ 1.3, 2.3, 3.3]])

Return first non NaN value in python list

What would be the best way to return the first non nan value from this list?
testList = [nan, nan, 5.5, 5.0, 5.0, 5.5, 6.0, 6.5]
edit:
nan is a float
You can use next, a generator expression, and math.isnan:
>>> from math import isnan
>>> testList = [float('nan'), float('nan'), 5.5, 5.0, 5.0, 5.5, 6.0, 6.5]
>>> next(x for x in testList if not isnan(x))
5.5
>>>
It would be very easy if you were using NumPy:
array[numpy.isfinite(array)][0]
... returns the first finite (non-NaN and non-inf) value in the NumPy array 'array'.
If you're doing it a lot, put it into a function to make it readable and easy:
import math
t = [float('nan'), float('nan'), 5.5, 5.0, 5.0, 5.5, 6.0, 6.5]
def firstNonNan(listfloats):
for item in listfloats:
if math.isnan(item) == False:
return item
firstNonNan(t)
5.5
one line lambda below:
from math import isnan
lst = [float('nan'), float('nan'), 5.5, 5.0, 5.0, 5.5, 6.0, 6.5]
lst
[nan, nan, 5.5, 5.0, 5.0, 5.5, 6.0, 6.5]
first non nan value
lst[lst.index(next(filter(lambda x: not isnan(x), lst)))]
5.5
index of first non nan value
lst.index(next(filter(lambda x: not isnan(x), lst)))
2

Categories