How to read a numpy ndarray from a block of memory? - python

I have a block of memory that stores a 2D array of float32 numbers.
For example, the shape is (1000, 10), and what I have in memory is something like a C array with 10000 elements.
Can I turn this into a numpy array just by specifying the shape and dtype?

Reading a memory-mapped array from disk involves numpy.memmap() function. The data type and the shape need to be specified again, as this information is not stored in the file.
Lets call the file containing data in disk : memmapped.dat
import numpy as np
array = np.memmap('memmapped.dat', dtype=np.float32,shape=(1000, 10))
Ref : https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.memmap.html and https://ipython-books.github.io/48-processing-large-numpy-arrays-with-memory-mapping/

Turns out numpy supports interpreting a buffer as a 1-D array.
import numpy as np
def load_array(data, shape):
return np.frombuffer(data, dtype=np.float32).reshape(shape)

Related

Saving a numpy array in binary does not improve disk usage compared to uint8

I'm saving numpy arrays while trying to use as little disk space as possible.
Along the way I realized that saving a boolean numpy array does not improve disk usage compared to a uint8 array.
Is there a reason for that or am I doing something wrong here?
Here is a minimal example:
import sys
import numpy as np
rand_array = np.random.randint(0, 2, size=(100, 100), dtype=np.uint8) # create a random dual state numpy array
array_uint8 = rand_array * 255 # array, type uint8
array_bool = np.array(rand_array, dtype=bool) # array, type bool
print(f"size array uint8 {sys.getsizeof(array_uint8)}")
# ==> size array uint8 10120
print(f"size array bool {sys.getsizeof(array_bool)}")
# ==> size array bool 10120
np.save("array_uint8", array_uint8, allow_pickle=False, fix_imports=False)
# size in fs: 10128
np.save("array_bool", array_bool, allow_pickle=False, fix_imports=False)
# size in fs: 10128
The uint8 and bool data types both occupy one byte of memory per element, so the arrays of equal dimensions are always going to occupy the same memory. If you are aiming to reduce your memory footprint, you can pack the boolean values as bits into a uint8 array using numpy.packbits, thereby storing binary data in a significantly smaller array (read here)

How to initialise a fixed-size ListArray in pyarrow from a numpy array efficiently?

How would I efficiently initialise a fixed-size pyarray.ListArray
from a suitably prepared numpy array?
The documentation of pyarray.array indicates that a nested iterable input structure works, but in practice that does not work if the outer iterable is a numpy array:
import numpy as np
import pyarrow as pa
n = 1000
w = 3
data = np.arange(n*w,dtype="i2").reshape(-1,w)
# this works:
pa.array(list(data),pa.list_(pa.int16(),w))
# this fails:
pa.array(data,pa.list_(pa.int16(),w))
# -> ArrowInvalid: only handle 1-dimensional arrays
It seems ridiculus to split an input array directly matching the Arrow specification into n separate arrays and then re-assemble from there.
pyarray.ListArray.from_arrays seems to require an offsets argument, which only has a meaning for variable-size lists.
I believe you are looking for pyarrow.FixedSizeListArray.from_arrays which, regrettably, appears undocumented (I went ahead and filed a JIRA ticket)
You'll want to reshape your numpy array as a contiguous array first.
import numpy as np
import pyarrow as pa
len = 10
width = 3
# Or just skip the initial reshape but keeping it in to simulate real data
arr = np.arange(len*width,dtype="i2").reshape(-1,width)
arr.shape = -1
pa.FixedSizeListArray.from_arrays(arr, width)

How to check if a python object is a numpy ndarray

I have a function that takes an array as input and does some computation on it. The input array may or may not be a numpy ndarray (may be a list, pandas object, etc).
In the function, I convert the input array (regardless of its type) to a numpy ndarray. But this step may be computationally expensive for large arrays, especially if the function is called multiple times in a for loop.
Hence, I want to convert the input array to numpy ndarray ONLY if it is not already a numpy ndarray.
How can I do this?
import numpy as np
def myfunc(array):
# Check if array is not already numpy ndarray
# Not correct way, this is where I need help
if type(array) != 'numpy.ndarray':
array = np.array(array)
# The computation on array
# Do something with array
new_array = other_func(array)
return new_array
You're quite close, but you need to call the specific class, i.e numpy.ndarray (here you're just comparing with a string). Also for this you have the built-in isinstance, too see if a given object is an instance of another:
def myfunc(array):
# Check if array is not already numpy ndarray
if not isinstance(array, np.ndarray):
array = np.array(array)
# The computation on array
# Do something with array
new_array = other_func(array)
return new_array
You can use isinstance here.
import numpy as np
a=np.array([1,2,...])
isinstance(a,np.ndarray)
#True
def myfunc(array):
return array if isinstance(array,np.ndarray) else np.array(array)
You just return array if it's a np.ndarray already else you convert array to np.array.
It is simpler to use asarray:
def myfunc(arr):
arr = np.asarray(arr)
# The computation on array
# Do something with array
new_array = other_func(arr)
return new_array
If arr is already an array, asarray does not make a copy, so there's no penalty to passing it through asarray. Let numpy do the testing for you.
numpy functions often pass their inputs through asarray (or variant) just make sure the type is what they expect.
import numpy as np
def myfunc(array):
# Check if array is not already numpy ndarray
# Not correct way, this is where I need help
if bool(np.type(array)):
array = np.array(array)
else:
print('Big array computationally expensive')
array = np.array(array)
# The computation on array
# Do something with array
new_array = other_func(array)
return new_array

scipy.io.loadmat reads MATLAB (R2016a) structs incorrectly

Instead of loading a MATLAB struct as a dict (as described in http://docs.scipy.org/doc/scipy/reference/tutorial/io.html and other related questions), scipy.io.loadmat is loading it as a strange ndarray, where the values are an array of arrays, and the field names are taken to be the dtype. Minimal example:
(MATLAB):
>> a = struct('b',0)
a =
b: 0
>> save('simple_struct.mat','a')
(Python):
In[1]:
import scipy.io as sio
matfile = sio.loadmat('simple_struct.mat')
a = matfile['a']
a
Out[1]:
array([[([[0]],)]],
dtype=[('b', 'O')])
This problem persists in Python 2 and 3.
This is expected behavior. Numpy is just showing you have MATLAB is storing your data under-the-hood.
MATLAB structs are 2+D cell arrays where one dimension is mapped to a sequence of strings. In Numpy, this same data structure is called a "record array", and the dtype is used to store the name. And since MATLAB matrices must be at least 2D, the 0 you stored in MATLAB is really a 2D matrix with dimensions (1, 1).
So what you are seeing in the scipy.io.loadmat is how MATLAB is storing your data (minus the dtype bit, MATLAB doesn't have such a thing). Specifically, it is a 2D [1, 1] array (that is what Numpy calls cell arrays), where one dimension is mapped to a string, containing a [1, 1] 2D array. MATLAB hides some of these details from you, but numpy doesn't.

How to copy data from memory to numpy array in Python

For example, I have a variable which point to a vector contains many elements in memory, I want to copy element in vector to a numpy array, what should I do except one by one copy? Thx
I am assuming that your vector can be represented like that:-
import array
x = array('l', [1, 3, 10, 5, 6]) # an array using python's built-in array module
Casting it as a numpy array will then be:-
import numpy as np
y = np.array(x)
If the data is packed in a buffer in native float format:
a = numpy.fromstring(buf, dtype=float, count=N)

Categories