How to check if a python object is a numpy ndarray - python

I have a function that takes an array as input and does some computation on it. The input array may or may not be a numpy ndarray (may be a list, pandas object, etc).
In the function, I convert the input array (regardless of its type) to a numpy ndarray. But this step may be computationally expensive for large arrays, especially if the function is called multiple times in a for loop.
Hence, I want to convert the input array to numpy ndarray ONLY if it is not already a numpy ndarray.
How can I do this?
import numpy as np
def myfunc(array):
# Check if array is not already numpy ndarray
# Not correct way, this is where I need help
if type(array) != 'numpy.ndarray':
array = np.array(array)
# The computation on array
# Do something with array
new_array = other_func(array)
return new_array

You're quite close, but you need to call the specific class, i.e numpy.ndarray (here you're just comparing with a string). Also for this you have the built-in isinstance, too see if a given object is an instance of another:
def myfunc(array):
# Check if array is not already numpy ndarray
if not isinstance(array, np.ndarray):
array = np.array(array)
# The computation on array
# Do something with array
new_array = other_func(array)
return new_array

You can use isinstance here.
import numpy as np
a=np.array([1,2,...])
isinstance(a,np.ndarray)
#True
def myfunc(array):
return array if isinstance(array,np.ndarray) else np.array(array)
You just return array if it's a np.ndarray already else you convert array to np.array.

It is simpler to use asarray:
def myfunc(arr):
arr = np.asarray(arr)
# The computation on array
# Do something with array
new_array = other_func(arr)
return new_array
If arr is already an array, asarray does not make a copy, so there's no penalty to passing it through asarray. Let numpy do the testing for you.
numpy functions often pass their inputs through asarray (or variant) just make sure the type is what they expect.

import numpy as np
def myfunc(array):
# Check if array is not already numpy ndarray
# Not correct way, this is where I need help
if bool(np.type(array)):
array = np.array(array)
else:
print('Big array computationally expensive')
array = np.array(array)
# The computation on array
# Do something with array
new_array = other_func(array)
return new_array

Related

Making multiple copies of a smaller matrix into a bigger matrix

So suppose I have a 2by2 numpy array. I want to create another 2 by 2 numpy array so that the elements will each be the previous 2by2 array, without using an explicit for loop. How can I achieve this? The shape of the new numpy matrix should be (2,2,2,2)
This helps you copy the numpy matrix.
But I really did not understand your point
import numpy as np
a = np.matrix('1,2; 3,2; 3,2')
b = a.copy()

How to read a numpy ndarray from a block of memory?

I have a block of memory that stores a 2D array of float32 numbers.
For example, the shape is (1000, 10), and what I have in memory is something like a C array with 10000 elements.
Can I turn this into a numpy array just by specifying the shape and dtype?
Reading a memory-mapped array from disk involves numpy.memmap() function. The data type and the shape need to be specified again, as this information is not stored in the file.
Lets call the file containing data in disk : memmapped.dat
import numpy as np
array = np.memmap('memmapped.dat', dtype=np.float32,shape=(1000, 10))
Ref : https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.memmap.html and https://ipython-books.github.io/48-processing-large-numpy-arrays-with-memory-mapping/
Turns out numpy supports interpreting a buffer as a 1-D array.
import numpy as np
def load_array(data, shape):
return np.frombuffer(data, dtype=np.float32).reshape(shape)

Pandas Series.as_matrix() doesn't properly convert a series of nd arrays into a single nd array

I have a pandas dataframe where one column is labeled "feature_vector" and contains in it a 1d numpy array with a bunch of numbers. Now, I am needing to use this data in an scikit learn model, so I need it as a single numpy array. So naturally I call DataFrame["feature_vector"].as_matrix() to get the numpy array from the correct series. The only problem is, the as_matrix() function will return an 1d numpy array where each element is an 1d numpy array containing each vector. When this is passed to an sklearn model's .fit() function, it throws an error. What I instead need is a 2d numpy array rather than the 1d array of 1d arrays. I wrote this work around, which uses presumably unnecessary memory and computation time:
x = dataframe["feature_vector"].as_matrix()
#x is a 1d array of 1d arrays.
l = []
for e in x:
l.append(e)
x = np.array(l)
#x is now a single 2d array.
Is this a bug in pandas .as_matrix()? Is there a better work around that doesn't require me to change the structure of the original dataframe?

How to "remove" mask from numpy array after performing operations?

I have a 2D numpy array that I need to mask based on a condition so that I can apply an operation to the masked array then revert the masked values back to the original.
For example:
import numpy as np
array = np.random.random((3,3))
condition = np.random.randint(0, 2, (3,3))
masked = np.ma.array(array, mask=condition)
masked += 2.0
But how can I change the masked values back to the original and "remove" the mask after applying a given operation to the masked array?
The reason why I need to do this is that I am generating a boolean array based on a set of conditions and I need to modify the elements of the array that satisfy the condition.
I could use boolean indexing to do this with a 1D array, but with the 2D array I need to retain its original shape ie. not return a 1D array with only the values satisfying the condition(s).
The accepted answer doesn't answer the question. Assigning the mask to False works in practice but many algorithms do not support masked arrays (e.g. scipy.linalg.lstsq()) and this method doesn't get rid of it. So you will experience an error like this:
ValueError: masked arrays are not supported
The only way to really get rid of the mask is assigning the variable only to the data of the masked array.
import numpy as np
array = np.random.random((3,3))
condition = np.random.randint(0, 2, (3,3))
masked = np.ma.array(array, mask=condition)
masked += 2.0
masked.mask = False
hasattr(masked, 'mask')
>> True
Assigning the variable to the data using the MaskedArray data attribute:
masked = masked.data
hasattr(masked, 'mask')
>> False
You already have it: it's called array!
This is because while masked makes sure you only increment certain values in the matrix, the data is never actually copied. So once your code executes, array has the elements at condition incremented, and the rest remain unchanged.

How to copy data from memory to numpy array in Python

For example, I have a variable which point to a vector contains many elements in memory, I want to copy element in vector to a numpy array, what should I do except one by one copy? Thx
I am assuming that your vector can be represented like that:-
import array
x = array('l', [1, 3, 10, 5, 6]) # an array using python's built-in array module
Casting it as a numpy array will then be:-
import numpy as np
y = np.array(x)
If the data is packed in a buffer in native float format:
a = numpy.fromstring(buf, dtype=float, count=N)

Categories