Python: properly iterating through a dictionary of numpy arrays

Python: properly iterating through a dictionary of numpy arrays - python

Given the following numpy arrays:
import numpy
a=numpy.array([[1,1,1],[1,1,1],[1,1,1]])
b=numpy.array([[2,2,2],[2,2,2],[2,2,2]])
c=numpy.array([[3,3,3],[3,3,3],[3,3,3]])
and this dictionary containing them all:
mydict={0:a,1:b,2:c}
What is the most efficient way of iterating through mydict so to compute the average numpy array that has (1+2+3)/3=2 as values?
My attempt fails as I am giving it too many values to unpack. It is also extremely inefficient as it has an O(n^3) time complexity:
aver=numpy.empty([a.shape[0],a.shape[1]])
for c,v in mydict.values():
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]):
aver[i][j]=mydict[c][i][j] #<-too many values to unpack
The final result should be:
In[17]: aver
Out[17]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
EDIT
I am not looking for an average value for each numpy array. I am looking for an average value for each element of my colleciton of numpy arrays. This is a minimal example, but the real thing I am working on has over 120,000 elements per array, and for the same position the values change from array to array.

I think you're making this harder than it needs to be. Either sum them and divide by the number of terms:
In [42]: v = mydict.values()
In [43]: sum(v) / len(v)
Out[43]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
Or stack them into one big array -- which it sounds like is the format they probably should have been in to start with -- and take the mean over the stacked axis:
In [44]: np.array(list(v)).mean(axis=0)
Out[44]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])

You really shouldn't be using a dict of numpy.arrays. Just use a multi-dimensional array:
>>> bigarray = numpy.array([arr.tolist() for arr in mydict.values()])
>>> bigarray
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
>>> bigarray.mean(axis=0)
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
>>>
You should modify your code to not even work with a dict. Especially not a dict with integer keys...

Related

How to stack uneven numpy arrays?

how can I stack the elements from the same respective index from each array in a list of arrays?
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
output = [[1,6,11,2],
[2,7,22,4],
[3,8,33],
[4,9,44],
[5,55]]
arrays is a list of arrays of uneven lengths. The output has a first array (don't mind if it's a list too) that contains all possible index 0s from each array. The next array within output contains all possible index 1s and so on...
Closest thing I can find (but requires same shape arrays) is:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b), axis=-1)
# which gives
array([[1, 2],
[2, 3],
[3, 4]])
Thanks.

This gets you close. You can't really have a 2D sparse array as shown in your example output.
import numpy as np
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
maxx = max(x.shape[0] for x in arrays)
for x in arrays:
x.resize(maxx,refcheck=False)
output = np.stack(arrays, axis=1)
print(output)
C:\tmp>python x.py
[[ 1 6 11 2]
[ 2 7 22 4]
[ 3 8 33 0]
[ 4 9 44 0]
[ 5 0 55 0]]

You could just wrap it in a DataFrame first:
arr = pd.DataFrame(arrays).values.T
Output:
array([[ 1., 6., 11., 2.],
[ 2., 7., 22., 4.],
[ 3., 8., 33., nan],
[ 4., 9., 44., nan],
[ 5., nan, 55., nan]])
Though if you really want it with different sizes, go with:
arr = [x.dropna().values for _, x in pd.DataFrame(arrays).iteritems()]
Output:
[array([ 1, 6, 11, 2]),
array([ 2, 7, 22, 4]),
array([ 3., 8., 33.]),
array([ 4., 9., 44.]),
array([ 5., 55.])]

numpy arange for list (vectorized calculation)

I want to create a 2d matrix b from an array a, where a contains range_stop values for each matrix column.
For example, with a = [2,3], I want to obtain
b = [[0, 0],
[1, 1],
[2, 2],
[NaN, 3]]
What's the most efficient way (for vectorized calculation) to do it? My current code is:
a = [2,3]
b = np.zeros((max(a)+1,len(a)))
b.fill(np.nan)
for i,ai in enumerate(a):
b[:ai, i] = np.arange(ai)

You can first create the 2D arange using repeat
a = np.asarray([2, 3])
b = np.repeat(np.arange(np.max(a) + 1, dtype=float)[:, None], len(a), axis=1)
# array([[0., 0.],
# [1., 1.],
# [2., 2.],
# [3., 3.]])
and then compare each column with a to fill in np.nans
b[b > a] = np.nan
# array([[ 0., 0.],
# [ 1., 1.],
# [ 2., 2.],
# [nan, 3.]])

Which dim to use on tf.metrics.mean_cosine_distance?

I'm confused about which dim refers to which actual dimension in Tensorflow in general, but concretely, when using tf.metrics.mean_cosine_distance
Given
x = [
[1, 2, 3, 4, 5],
[0, 2, 3, 4, 5],
]
I'd like to calculate the distance column-wise. In other words, which dimension resolves to (pseudo code):
mean([
cosine_distance(x[0][0], x[1][0]),
cosine_distance(x[0][1], x[1][1]),
cosine_distance(x[0][2], x[1][2]),
cosine_distance(x[0][3], x[1][3]),
cosine_distance(x[0][4], x[1][4]),
])

It is along dim 0 for your input x. It's intuitive to see this once you construct your input x as a numpy array.
In [49]: x_arr = np.array(x, dtype=np.float32)
In [50]: x_arr
Out[50]:
array([[ 1., 2., 3., 4., 5.],
[ 0., 2., 3., 4., 5.]], dtype=float32)
# compute (mean) cosine distance between `x[0]` & `x[1]`
# where `x[0]` can be considered as `labels`
# while `x[1]` can be considered as `predictions`
In [51]: cosine_dist_axis0 = tf.metrics.mean_cosine_distance(x_arr[0], x_arr[1], 0)
This dim corresponds to the name axis in NumPy terminology. For example, a simple sum operation can be done along axis 0 like:
In [52]: x_arr
Out[52]:
array([[ 1., 2., 3., 4., 5.],
[ 0., 2., 3., 4., 5.]], dtype=float32)
In [53]: np.sum(x_arr, axis=0)
Out[53]: array([ 1., 4., 6., 8., 10.], dtype=float32)
When you compute the tf.metrics.mean_cosine_distance, you're essentially computing the cosine distance between the vectors labels and predictions along dim 0 (and then taking mean) if your inputs are of shape (n, ) where n is the length of each vector (i.e. number of entries in labels/prediction)
But, if you're passing the labels and predictions as a column vector, then the tf.metrics.mean_cosine_distance has to be calculated along dim 1
Example:
If your input label and prediction are column vectors,
# if your `label` is a column vector
In [66]: (x_arr[0])[:, None]
Out[66]:
array([[ 1.],
[ 2.],
[ 3.],
[ 4.],
[ 5.]], dtype=float32)
# if your `prediction` is a column vector
In [67]: (x_arr[1])[:, None]
Out[67]:
array([[ 0.],
[ 2.],
[ 3.],
[ 4.],
[ 5.]], dtype=float32)
Then, tf.metrics.mean_cosine_distance has to computed along dim 1
# inputs
In [68]: labels = (x_arr[0])[:, None]
In [69]: predictions = (x_arr[1])[:, None]
# compute mean cosine distance between them
In [70]: cosine_dist_dim1 = tf.metrics.mean_cosine_distance(labels, predictions, 1)
This tf.metrics.mean_cosine_distance is more or less doing the same thing as scipy.spatial.distance.cosine but it also takes mean.
For your example case:
In [77]: x
Out[77]: [[1, 2, 3, 4, 5], [0, 2, 3, 4, 5]]
In [78]: import scipy
In [79]: scipy.spatial.distance.cosine(x[0], x[1])
Out[79]: 0.009132

Adding n columns to a numpy array [duplicate]

This question already has answers here:
Concatenate a NumPy array to another NumPy array
(12 answers)
Closed 7 years ago.
I'm making a program where I need to make a matrix looking like this:
A = np.array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
So I started thinking about this np.arange(1,4)
But, how to append n columns of np.arange(1,4) to A?

As mentioned in docs you can use concatenate
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])

Here's another way, using broadcasting:
In [69]: np.arange(1,4)*np.ones((4,1))
Out[69]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])

You can get something like what you typed in your question with:
N = 3
A = np.tile(np.arange(1, N+1), (N, 1))
I'm assuming you want a square array?

>>> np.repeat([np.arange(1, 4)], 4, 0)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])

Efficient way to add a singleton dimension to a NumPy vector so that slice assignments work

In NumPy, how can you efficiently make a 1-D object into a 2-D object where the singleton dimension is inferred from the current object (i.e. a list should go to either a 1xlength or lengthx1 vector)?
# This comes from some other, unchangeable code that reads data files.
my_list = [1,2,3,4]
# What I want to do:
my_numpy_array[some_index,:] = numpy.asarray(my_list)
# The above doesn't work because of a broadcast error, so:
my_numpy_array[some_index,:] = numpy.reshape(numpy.asarray(my_list),(1,len(my_list)))
# How to do the above without the call to reshape?
# Is there a way to directly convert a list, or vector, that doesn't have a
# second dimension, into a 1 by length "array" (but really it's still a vector)?

In the most general case, the easiest way to add extra dimensions to an array is by using the keyword None when indexing at the position to add the extra dimension. For example
my_array = numpy.array([1,2,3,4])
my_array[None, :] # shape 1x4
my_array[:, None] # shape 4x1

Why not simply add square brackets?
>> my_list
[1, 2, 3, 4]
>>> numpy.asarray([my_list])
array([[1, 2, 3, 4]])
>>> numpy.asarray([my_list]).shape
(1, 4)
.. wait, on second thought, why is your slice assignment failing? It shouldn't:
>>> my_list = [1,2,3,4]
>>> d = numpy.ones((3,4))
>>> d
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
>>> d[0,:] = my_list
>>> d[1,:] = numpy.asarray(my_list)
>>> d[2,:] = numpy.asarray([my_list])
>>> d
array([[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]])
even:
>>> d[1,:] = (3*numpy.asarray(my_list)).T
>>> d
array([[ 1., 2., 3., 4.],
[ 3., 6., 9., 12.],
[ 1., 2., 3., 4.]])

import numpy as np
a = np.random.random(10)
sel = np.at_least2d(a)[idx]

What about expand_dims?
np.expand_dims(np.array([1,2,3,4]), 0)
has shape (1,4) while
np.expand_dims(np.array([1,2,3,4]), 1)
has shape (4,1).

You can always use dstack() to replicate your array:
import numpy
my_list = array([1,2,3,4])
my_list_2D = numpy.dstack((my_list,my_list));

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: properly iterating through a dictionary of numpy arrays - python

Related

How to stack uneven numpy arrays?

numpy arange for list (vectorized calculation)

Which dim to use on tf.metrics.mean_cosine_distance?

Adding n columns to a numpy array [duplicate]

Efficient way to add a singleton dimension to a NumPy vector so that slice assignments work

Categories

Resources