How to copy data from memory to numpy array in Python - python

For example, I have a variable which point to a vector contains many elements in memory, I want to copy element in vector to a numpy array, what should I do except one by one copy? Thx

I am assuming that your vector can be represented like that:-
import array
x = array('l', [1, 3, 10, 5, 6]) # an array using python's built-in array module
Casting it as a numpy array will then be:-
import numpy as np
y = np.array(x)

If the data is packed in a buffer in native float format:
a = numpy.fromstring(buf, dtype=float, count=N)

Related

Numpy array with elements of different last axis dimensions

Assume the following code:
import numpy as np
x = np.random.random([2, 4, 50])
y = np.random.random([2, 4, 60])
z = [x, y]
z = np.array(z, dtype=object)
This gives a ValueError: could not broadcast input array from shape (2,4,50) into shape (2,4)
I can understand why this error would occur since the trailing (last) dimension of both arrays is different and a numpy array cannot store arrays with varying dimensions.
However, I happen to have a MAT-file which when loaded in Python through the io.loadmat() function in scipy, contains a np.ndarray with the following properties:
from scipy import io
mat = io.loadmat(file_name='gt.mat')
print(mat.shape)
> (1, 250)
print(mat[0].shape, mat[0].dtype)
> (250,) dtype('O')
print(mat[0][0].shape, mat[0][0].dtype)
> (2, 4, 54), dtype('<f8')
print(mat[0][1].shape, mat[0][1].dtype)
> (2, 4, 60), dtype('<f8')
This is pretty confusing for me. How is the array mat[0] in this file holding numpy arrays with different trailing dimensions as objects while being a np.ndarray itself and I am not able do so myself?
When calling np.array on a nested array, it will try to stack the arrays anyway. Note that you are dealing with objects in both cases. It is still possible. One way would be to first create an empty array of objects and then fill in the values.
z = np.empty(2, dtype=object)
z[0] = x
z[1] = y
Like in this answer.

How to achieve numpy indexing with xarray Dataset

I know the x and the y indices of a 2D array (numpy indexing).
Following this documentation, xarray uses e.g. Fortran style of indexing.
So when I pass e.g.
ind_x = [1, 2]
ind_y = [3, 4]
I expect 2 values for the index pairs (1,3) and (2,4), but xarray returns a 2x2 matrix.
Now I want to know how to achieve numpy like indexing with xarray?
Note: I want to avoid loading the whole data into memory. So using .values api is not part of the solution I am looking for.
You can access the underlying numpy array to index it directly:
import xarray as xr
x = xr.tutorial.load_dataset("air_temperature")
ind_x = [1, 2]
ind_y = [3, 4]
print(x.air.data[0, ind_y, ind_x].shape)
# (2,)
Edit:
Assuming you have your data in a dask-backed xarray and don't want to load all of it into memory, you need to use vindex on the dask array behind the xarray data object:
import xarray as xr
# simple chunk to convert to dask array
x = xr.tutorial.load_dataset("air_temperature").chunk({"time":1})
extract = x.air.data.vindex[0, ind_y, ind_x]
print(extract.shape)
# (2,)
print(extract.compute())
# [267.1, 274.1], dtype=float32)
In order to take the speed into account I have made a test with different methods.
def method_1(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
d = Dataset(file, 'r')
data.append(d.variables['hrv'][indices])
d.close()
return data
def method_2(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_dataset(file, engine='h5netcdf').hrv.values[indices])
return data
def method_3(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_mfdataset([file], engine='h5netcdf').hrv.data.vindex[indices].compute())
return data
In [1]: len(file_paths)
Out[1]: 4813
The results:
method_1 (using netcdf4 library): 101.9s
method_2 (using xarray and values API): 591.4s
method_3 (using xarray+dask): 688.7s
I guess that xarray+dask takes to much time within .compute step.

How does the item() function works when accessing pixel values?

I was reading the OpenCV python tutorials, and they said that the NumPy array function item() is the best way to access pixel values in an image, but I don't understand what it does.
import cv2
import numpy as np
img = cv2.imread('image.jpg')
print(img.item(10, 10, 2)) # I don't know what the item() function does/what it's parameters are
From the docs: numpy.ndarray.item() copies an element of an array to a standard Python scalar and returns it.
To put it in other words, calling img.item(i) gets you a copy of the value represented by the index i in your array, similar to img[i] but with the difference that it returns it as a Python scalar instead of an array. Following the docs, getting a Python scalar is useful to speed up access to the elements of the array and doing arithmetic on the values taking advantage of Python's optimized math.
An example:
>>> x = np.random.randint(9, size=(3, 3))
>>> x
array([[1, 8, 4],
[8, 7, 5],
[2, 1, 1]])
>>> x.item((1,0))
8
>>> x[1,0] # both calls seem to return the same value, but...
8
>>> type(x[1,0]) # Numpy's own int32
<class 'numpy.int32'>
>>> type(x.item((1,0))) # Python's standard int
<class 'int'>
item takes only one parameter which can be None, which only works with single item arrays, an int_type, which works like a flat index, and a tuple of int_type, which is interpreted as a multi dimension index of the array.
Going back to your concrete question, OpenCV recommends item and itemset when working with individual pixel values, as numpy is optimized for array calculations so accessing a single item is discouraged.
So instead of doing:
import cv2
import numpy as np
img = cv2.imread('image.jpg')
img[0, 0] = 255 # Setting a single pixel
px = img[0, 0] # Getting a single pixel
Do:
img.itemset((0, 0), 255)
px = img.item((0, 0))

Append value to each array in a numpy array

I have a numpy array of arrays, for example:
x = np.array([[1,2,3],[10,20,30]])
Now lets say I want to extend each array with [4,40], to generate the following resulting array:
[[1,2,3,4],[10,20,30,40]]
How can I do this without making a copy of the whole array? I tried to change the shape of the array in place but it throws a ValueError:
x[0] = np.append(x[0],4)
x[1] = np.append(x[1],40)
ValueError : could not broadcast input array from shape (4) into shape (3)
You can't do this. Numpy arrays allocate contiguous blocks of memory, if at all possible. Any change to the array size will force an inefficient copy of the whole array. You should use Python lists to grow your structure if possible, then convert the end result back to an array.
However, if you know the final size of the resulting array, you could instantiate it with something like np.empty() and then assign values by index, rather than appending. This does not change the size of the array itself, only reassigns values, so should not require copying.
While #roganjosh is right that you cannot modify the numpy arrays without making a copy (in the underlying process), there is a simpler way of appending each value of an ndarray to the end of each numpy array in a 2d ndarray, by using numpy.column_stack
x = np.array([[1,2,3],[10,20,30]])
array([[ 1, 2, 3],
[10, 20, 30]])
stack_y = np.array([4,40])
array([ 4, 40])
numpy.column_stack((x, stack_y))
array([[ 1, 2, 3, 4],
[10, 20, 30, 40]])
Create a new matrix
Insert the values of your old matrix
Then, insert your new values in the last positions
x = np.array([[1,2,3],[10,20,30]])
new_X = np.zeros((2, 4))
new_X[:2,:3] = x
new_X[0][-1] = 4
new_X[1][-1] = 40
x=new_X
Or Use np.reshape() or np.resize() instead

How to add a scalar to a numpy array within a specific range?

Is there a simpler and more memory efficient way to do the following in numpy alone.
import numpy as np
ar = np.array(a[l:r])
ar += c
a = a[0:l] + ar.tolist() + a[r:]
It may look primitive but it involves obtaining a subarray copy of the given array, then prepare two more copies of the same to append in left and right direction in addition to the scalar add. I was hoping to find some more optimized way of doing this. I would like a solution that is completely in Python list or NumPy array, but not both as converting from one form to another as shown above would cause serious overhead when the data is huge.
You can just do the assignment inplace as follows:
import numpy as np
a = np.array([1, 1, 1, 1, 1])
a[2:4] += 5
>>> a
array([1, 1, 6, 6, 1])

Categories