Convert list to np.arrays efficient - python

I am loading a dataset in my python code with contains two matrices. The name of those matrices are train_dataset_face and train_dataset_audio and the way that I am reading them is as list of np.arrays. Finally I convert them as np.arrrays of np.arrays. Initially my matrices during the debug look like that:
and
Then I convert them into np.arrays using the following code:
train_dataset_face = np.array(train_dataset_face)
train_dataset_audio = np.array(train_dataset_audio)
And in the end my matrices look like:
For some weird reason in the case of train_dataset_face I got this array indication before each vector of my array while in the case of train_dataset_audio i dont have it. Is it possible to remove it? This "array" indication cause me problems when I am trying to apply several algorithms to the train_dataset_face. Any idea what happened here?

You can only create a single array, if all arrays in your list has the same shape, which is true for train_dataset_audio but not for train_dataset_face.
>>> a = [numpy.array([1,2,3,4]), numpy.array([1,2,3,4])]
>>> numpy.array(a)
array([[1, 2, 3, 4],
[1, 2, 3, 4]])
>>> b = [numpy.array([1,2,3]), numpy.array([1,2,3,4])]
>>> numpy.array(b)
array([array([1, 2, 3]), array([1, 2, 3, 4])], dtype=object)

Related

Efficiently slicing an array using indexes from another array

(I apologize in advance if this is a duplicate question, though I looked at many similar questions on SO but didn't find a matching solution)
Suppose you have an array
A = np.array([
[0, 1, 2],
[3, 4, 5],
[6, 7, 8]
])
and another array
I = np.array([1, 1, 2])
For each row in A, I want to get the i-th element of it, where i is the row-th element of I.
In this case, the output I'd like to have is array([1, 4, 8]).
My most intuitive attempt to do so is:
A[:, I]
then I figured that the desired output is actually the diagonal of it, so A[:, I].diagonal() would do the trick.
But it feels that there's some waste of space and time by doing this way, because it requires an intermediate "big" matrix, which diagonal will be extracted from.
Is there a more efficient to perform this slicing?
This would do the trick:
res = A[np.arange(A.shape[0]), I]

Rationale for numpy.split returning a list and not an array

I was surprised that numpy.split yields a list and not an array. I would have thought it would be better to return an array, since numpy has put a lot of work into making arrays more useful than lists. Can anyone justify numpy returning a list instead of an array? Why would that be a better programming decision for the numpy developers to have made?
A comment pointed out that if the slit is uneven, the result can't be a array, at least not one that has the same dtype. At best it would be an object dtype.
But lets consider the case of equal length subarrays:
In [124]: x = np.arange(10)
In [125]: np.split(x,2)
Out[125]: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]
In [126]: np.array(_) # make an array from that
Out[126]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
But we can get the same array without split - just reshape:
In [127]: x.reshape(2,-1)
Out[127]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Now look at the code for split. It just passes the task to array_split. Ignoring the details about alternative axes, it just does
sub_arys = []
for i in range(Nsections):
# st and end from `div_points
sub_arys.append(sary[st:end])
return sub_arys
In other words, it just steps through array and returns successive slices. Those (often) are views of the original.
So split is not that sophisticate a function. You could generate such a list of subarrays yourself without a lot of numpy expertise.
Another point. Documentation notes that split can be reversed with an appropriate stack. concatenate (and family) takes a list of arrays. If give an array of arrays, or a higher dim array, it effectively iterates on the first dimension, e.g. concatenate(arr) => concatenate(list(arr)).
Actually you are right it returns a list
import numpy as np
a=np.random.randint(1,30,(2,2))
b=np.hsplit(a,2)
type(b)
it will return type(b) as list so, there is nothing wrong in the documentation, i also first thought that the documentation is wrong it doesn't return a array, but when i checked
type(b[0])
type(b[1])
it returned type as ndarray.
it means it returns a list of ndarrary's.

Numpy: How to stack arrays in columns?

Let's say that I have n numpy arrays of the same length. I would like to now create a numpy matrix, sucht that each column of the matrix is one of the numpy arrays. How can I achieve this? Now I'm doing this in a loop and it produces the wrong results.
Note: I have to be able to stack them next to each other one by one iteratively.
my code looks like assume that get_array is a function that returns a certain array based on its argument. I don't know until after the loop, how many columns that I'm going to have.
matrix = np.empty((n_rows,))
for item in sorted_arrays:
array = get_array(item)
matrix = np.vstack((matrix,array))
any help would be appreciated
You could try putting all your arrays (or lists) into a matrix and then transposing it. This will work if all arrays are the same length.
mymatrix = np.asmatrix((array1, array2, array3)) #... putting arrays into matrix.
mymatrix = mymatrix.transpose()
This should output a matrix with each array as a column. Hope this helps.
Time and again, we recommend collecting the arrays in a list, and making the final array with one call. That's more efficient, and usually easier to get right.
alist = []
for item in sorted_arrays:
alist.append(get_array(item)
or
alist = [get_array(item) for item in sorted_arrays]
There are various ways of assembling the list. Since you want columns, and assuming get_array produces equal sized 1d arrays:
arr = np.column_stack(alist)
Collecting them in rows and transposing that works too:
arr = np.array(alist).T
arr = np.vstack(alist).T
arr = np.stack(alist).T
arr = np.stack(alist, axis=1)
If the arrays are already 2d
arr = np.concatenate(alist, axis=1)
All the stack variations use concatenate, just varying in how they tweak the shape(s) of the input arrays. The key to using concatenate is to understand the dimensions and shapes, and how to add dimensions as needed. That should, soon or later, become fluent in that kind of coding.
If they vary in shape or dimensions, things get messier.
Equally good is to put the arrays in a pre-allocated array. But you need to know the desired final shape
arr = np.zeros((m,n), dtype)
for i, item in enumerate(sorted_arrays):
arr[:,i] = get_array(item)
n is len(sorted_arrays), and m is the length of one of get_array(item). You also need to know the expected dtype (int, float etc).
If you have a, b, c, d np array of same length, the following code will accomplish what you want:
out_matrix = np.vstack([a, b, c, d]).transpose()
An example:
In [3]: a = np.array([1, 2, 3, 4])
In [4]: b = np.array([5, 6, 7, 8])
In [5]: c = np.array([2, 3, 4, 5])
In [6]: d = np.array([6, 8, 2, 4])
In [10]: np.vstack([a, b, c, d]).transpose()
Out[10]:
array([[1, 5, 2, 6],
[2, 6, 3, 8],
[3, 7, 4, 2],
[4, 8, 5, 4]])

numpy array TypeError: only integer scalar arrays can be converted to a scalar index

i=np.arange(1,4,dtype=np.int)
a=np.arange(9).reshape(3,3)
and
a
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a[:,0:1]
>>>array([[0],
[3],
[6]])
a[:,0:2]
>>>array([[0, 1],
[3, 4],
[6, 7]])
a[:,0:3]
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Now I want to vectorize the array to print them all together. I try
a[:,0:i]
or
a[:,0:i[:,None]]
It gives TypeError: only integer scalar arrays can be converted to a scalar index
Short answer:
[a[:,:j] for j in i]
What you are trying to do is not a vectorizable operation. Wikipedia defines vectorization as a batch operation on a single array, instead of on individual scalars:
In computer science, array programming languages (also known as vector or multidimensional languages) generalize operations on scalars to apply transparently to vectors, matrices, and higher-dimensional arrays.
...
... an operation that operates on entire arrays can be called a vectorized operation...
In terms of CPU-level optimization, the definition of vectorization is:
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
The problem with your case is that the result of each individual operation has a different shape: (3, 1), (3, 2) and (3, 3). They can not form the output of a single vectorized operation, because the output has to be one contiguous array. Of course, it can contain (3, 1), (3, 2) and (3, 3) arrays inside of it (as views), but that's what your original array a already does.
What you're really looking for is just a single expression that computes all of them:
[a[:,:j] for j in i]
... but it's not vectorized in a sense of performance optimization. Under the hood it's plain old for loop that computes each item one by one.
I ran into the problem when venturing to use numpy.concatenate to emulate a C++ like pushback for 2D-vectors; If A and B are two 2D numpy.arrays, then numpy.concatenate(A,B) yields the error.
The fix was to simply to add the missing brackets: numpy.concatenate( ( A,B ) ), which are required because the arrays to be concatenated constitute to a single argument
This could be unrelated to this specific problem, but I ran into a similar issue where I used NumPy indexing on a Python list and got the same exact error message:
# incorrect
weights = list(range(1, 129)) + list(range(128, 0, -1))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
# TypeError: only integer scalar arrays can be converted to a scalar index
It turns out I needed to turn weights, a 1D Python list, into a NumPy array before I could apply multi-dimensional NumPy indexing. The code below works:
# correct
weights = np.array(list(range(1, 129)) + list(range(128, 0, -1)))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
try the following to change your array to 1D
a.reshape((1, -1))
You can use numpy.ravel to return a flattened array from n-dimensional array:
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.ravel()
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
I had a similar problem and solved it using list...not sure if this will help or not
classes = list(unique_labels(y_true, y_pred))
this problem arises when we use vectors in place of scalars
for example in a for loop the range should be a scalar, in case you have given a vector in that place you get error. So to avoid the problem use the length of the vector you have used
I ran across this error when while trying to access elements of a list using a 1-D array. I was suggested this page but I don't the answer I was looking for.
Let l be the list and myarray be my 1D array. The correct way to access list l using elements of myarray is
np.take(l,myarray)

Efficient way to multiply/add/devide each element of a list with each element of another list in Python

I want to multiply each element of a list with each element of another list.
lst1 = [1, 2, 1, 2]
lst2 = [2, 2, 2]
lst3 = []
for item in lst1:
for i in lst2:
rs = i * item
lst3.append(rs)
This would work, but this is very inefficient in large dataset and can take very long to complete loop. Note, the length of both lists can vary here.
I am fine with using non built-in data structures. I checked numpy and there seems to be way called broadcasting in ndarray. I am not sure if it is way to go. So far, multiplying array with scalar works as expected.
arr = np.arange(3)
arr * 2
This returns:
array([0, 2, 4])
But they way it works with another array is bit different and I can't seem to achieve above.
I guess it must be something straight forward, but I can't seem to find exact solution needed at the moment. Any input will be highly appreciated. Thanks.
Btw, there is similar question for Scheme without considering efficiency here
Edit: Thanks for you answers. Multiplication works, see Dval's answer. However, I also need to do addition and possibly division too exactly same way. For that reason, I updated question a bit.
Edit: I can work with numpy array itself, so I don't need to convert list to array and back.
Numpy is the way to go, specifically numpy.outer, which returns the product of each element as a matrix. Using .flatten() compresses it into 1d.
import numpy
lst1 = numpy.array([1, 2, 1, 2])
lst2 = numpy.array([2, 2, 2])
numpy.outer(lst1, lst2).flatten()
To add to updated question, addition seems to work similar way:
numpy.add.outer(lst1, lst2).flatten()
Linear operations on arrays like this are the meat-n-potatoes of numpy. Once you have defined the arrays, matrix like operations on them are easy, and relatively fast. That includes outer products and inner (matrix) products, as well as element by element operations.
For example:
In [133]: a=np.array([1,2,1,2])
In [134]: b=np.array([2,2,2])
A list comprehension version of your double loop:
In [135]: [i*j for i in a for j in b]
Out[135]: [2, 2, 2, 4, 4, 4, 2, 2, 2, 4, 4, 4]
A numpy product using broadcasting. Think a a[:,None] as turning a into a column vector.
In [136]: a[:,None]*b
Out[136]:
array([[2, 2, 2],
[4, 4, 4],
[2, 2, 2],
[4, 4, 4]])
element by element division also works
In [137]: a[:,None]/b
Out[137]:
array([[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])
But this gets more useful when combining operations.
There is overhead in converting lists to arrays, so I wouldn't recommend it for small occasional calculations.
Use numpy - it's a library designed for complex matrix-based arithmetic.
import numpy
lst1 = numpy.array([1, 2, 1, 2])
lst2 = numpy.array([2, 2, 2]]
numpy.outer(lst1, lst2)

Categories