Reshape data skipping specific number of points - python

I have a 1024^3 data set to read. But this array is too large, so I want to save the data every 2^n points. For example, if I set skip = 2, then the array will be 512^3.
import numpy as np
nx = 1024
filename = 'E:/AUTOIGNITION/Z1/Velocity1_inertHIT.bin'
U = np.fromfile(filename, dtype=np.float32, count=nx**3).reshape(nx, nx, nx)
How do I do this by using reshape?

If I understand correctly, you want to sample every (2^n):th value of the array, in each of the 3 dimensions.
You can do this by indexing as follows.
>>> import numpy as np
>>> n = 1
>>> example_array = np.zeros((1024,1024,1024))
>>> N = 2^n
>>> example_array_sampled = example_array[::N, ::N, ::N]
>>> example_array_sampled.shape
(512, 512, 512)
Verifying that it actually samples evenly:
>>> np.arange(64).reshape((4,4,4))[::2,::2,::2]
array([[[ 0, 2],
[ 8, 10]],
[[32, 34],
[40, 42]]])
This particular code still assumes that you will read in the whole array in memory before indexing into it. And the variable example_array_sampled will be a view into the original array.
More info on indexing numpy arrays: https://numpy.org/doc/stable/reference/arrays.indexing.html
Some info on numpy views: https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
Related question (in one dimension): subsampling every nth entry in a numpy array

Related

a uniform data structure that can represent an ndarray with various size along a given axis

I can use the following code to generate three dimensional array.
import numpy as np
x1 = np.random.rand(8,9,10)
In some scenarios, the studied data set (or array) have various length along the axis 0, In other words. A subset may be of shape (8, 9, 10), and another subset maybe of shape (7,9,10). All these subsets are of the same size along the second and the third axis. If I still want to represent the whole data set using the same data structure, how to achieve this goal?
One solution would be to use awkward-array:
https://github.com/scikit-hep/awkward-1.0
>>> import numpy as np
>>> import awkward as ak
>>> numpy_arrays = [np.random.rand(8,9,10), np.random.rand(7,9,10)]
>>> irregular_array = ak.Array(numpy_arrays)
>>> irregular_array
<Array [[[ ... ]]] type='var * 9 * 10 * int64'>

Numpy array with elements of different last axis dimensions

Assume the following code:
import numpy as np
x = np.random.random([2, 4, 50])
y = np.random.random([2, 4, 60])
z = [x, y]
z = np.array(z, dtype=object)
This gives a ValueError: could not broadcast input array from shape (2,4,50) into shape (2,4)
I can understand why this error would occur since the trailing (last) dimension of both arrays is different and a numpy array cannot store arrays with varying dimensions.
However, I happen to have a MAT-file which when loaded in Python through the io.loadmat() function in scipy, contains a np.ndarray with the following properties:
from scipy import io
mat = io.loadmat(file_name='gt.mat')
print(mat.shape)
> (1, 250)
print(mat[0].shape, mat[0].dtype)
> (250,) dtype('O')
print(mat[0][0].shape, mat[0][0].dtype)
> (2, 4, 54), dtype('<f8')
print(mat[0][1].shape, mat[0][1].dtype)
> (2, 4, 60), dtype('<f8')
This is pretty confusing for me. How is the array mat[0] in this file holding numpy arrays with different trailing dimensions as objects while being a np.ndarray itself and I am not able do so myself?
When calling np.array on a nested array, it will try to stack the arrays anyway. Note that you are dealing with objects in both cases. It is still possible. One way would be to first create an empty array of objects and then fill in the values.
z = np.empty(2, dtype=object)
z[0] = x
z[1] = y
Like in this answer.

Change the data type of one element in a matrix

I'm looking to implement a hardware-efficient multiplication of a list of large matrices (on the order of 200,000 x 200,000). The matrices are very nearly the identity matrix, but with some elements changed to irrational numbers.
In an effort to reduce the memory footprint and make the computation go faster, I want to store the 0s and 1s of the identity as single bytes like so.
import numpy as np
size = 200000
large_matrix = np.identity(size, dtype=uint8)
and just change a few elements to a different data type.
import sympy as sp
# sympy object
irr1 = sp.sqrt(2)
# float
irr2 = e
large_matrix[123456, 100456] = irr1
large_matirx[100456, 123456] = irr2
Is is possible to hold only these elements of the matrix with a different data type, while all the other elements are still bytes? I don't want to have to change everything to a float just because I need one element to be a float.
-----Edit-----
If it's not possible in numpy, then how can I find a solution without numpy?
Maybe you can have a look at the SciPy's Coordinate-based sparse matrix. In that case SciPy creates a sparse matrix (optimized for such large empty matrices) and with its coordinate format you can access and modify the data as you intend.
From its documentation:
>>> from scipy.sparse import coo_matrix
>>> # Constructing a matrix using ijv format
>>> row = np.array([0, 3, 1, 0])
>>> col = np.array([0, 3, 1, 2])
>>> data = np.array([4, 5, 7, 9])
>>> m = coo_matrix((data, (row, col)), shape=(4, 4))
>>> m.toarray()
array([[4, 0, 9, 0],
[0, 7, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 5]])
It does not create a matrix but a set of coordinates with values, which takes much less space than just filling a matrix with zeros.
>>> from sys import getsizeof
>>> getsizeof(m)
56
>>> getsizeof(m.toarray())
176
By definition, NumPy arrays only have one dtype. You can see in the NumPy documentation:
A numpy array is homogeneous, and contains elements described by a dtype object. A dtype object can be constructed from different combinations of fundamental numeric types.
Further reading: https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html

python difference between array(10,1) array(10,)

I'm trying to load MNIST dataset into arrays.
When I use
(X_train, y_train), (X_test, y_test)= mnist.load_data()
I get an array y_test(10000,) but I want it to be in the shape of (10000,1).
What is the difference between array(10000,1) and array(10000,)?
How can I convert the first array to the second array?
Your first Array with shape (10000,) is a 1-Dimensional np.ndarray.
Since the shape attribute of numpy Arrays is a Tuple and a tuple of length 1 needs a trailing comma the shape is (10000,) and not (10000) (which would be an int). So currently your data looks like this:
import numpy as np
a = np.arange(5) # >>> array([0, 1, 2, 3, 4]
print(a.shape) # >>> (5,)
What you want is an 2-Dimensional array with shape of (10000, 1).
Adding a dimension of length 1 doesn't require any additional data, it is basically and "empty" dimension. To add an dimension to an existing array you can use either np.expand_dims() or np.reshape().
Using np.expand_dims:
import numpy as np
b = np.array(np.arange(5)) # >>> array([0, 1, 2, 3, 4])
b = np.expand_dims(b, axis=1) # >>> array([[0],[1],[2],[3],[4]])
The function was specifically made for the purpose of adding empty dimensions to arrays. The axis keyword specifies which position the newly added dimension will occupy.
Using np.reshape:
import numpy as np
a = np.arange(5)
X_test_reshaped = np.reshape(a, shape=[-1, 1]) # >>> array([[0],[1],[2],[3],[4]])
The shape=[-1, 1] specifies how the new shape should look like after the reshape operation. The -1 itself will be replaced by the shape that 'fits the data' by numpy internally.
Reshape is a more powerful function than expand_dims and can be used in many different ways. You can read more on other uses of it in the numpy docs. numpy.reshape()
An array with a size of (10,1) is a 2D array containing empty columns.
An array with a size of (10,) is a 1D array.
To convert (10,1) to (10,), you can simply collapse the columns. For example, we take the x array, which has x.shape = (10,1). now using x[:,] you can collapse the columns and x[:,].shape = (10,).
To convert (10,) to (10,1), you can add a dimension by using np.newaxis. So, after import numpy as np, assuming we are using numpy arrays here. Take a y array for example, which has y.shape = (10,). Using y[:, np.newaxis], you can a new array with the shape of (10,1).

Append value to each array in a numpy array

I have a numpy array of arrays, for example:
x = np.array([[1,2,3],[10,20,30]])
Now lets say I want to extend each array with [4,40], to generate the following resulting array:
[[1,2,3,4],[10,20,30,40]]
How can I do this without making a copy of the whole array? I tried to change the shape of the array in place but it throws a ValueError:
x[0] = np.append(x[0],4)
x[1] = np.append(x[1],40)
ValueError : could not broadcast input array from shape (4) into shape (3)
You can't do this. Numpy arrays allocate contiguous blocks of memory, if at all possible. Any change to the array size will force an inefficient copy of the whole array. You should use Python lists to grow your structure if possible, then convert the end result back to an array.
However, if you know the final size of the resulting array, you could instantiate it with something like np.empty() and then assign values by index, rather than appending. This does not change the size of the array itself, only reassigns values, so should not require copying.
While #roganjosh is right that you cannot modify the numpy arrays without making a copy (in the underlying process), there is a simpler way of appending each value of an ndarray to the end of each numpy array in a 2d ndarray, by using numpy.column_stack
x = np.array([[1,2,3],[10,20,30]])
array([[ 1, 2, 3],
[10, 20, 30]])
stack_y = np.array([4,40])
array([ 4, 40])
numpy.column_stack((x, stack_y))
array([[ 1, 2, 3, 4],
[10, 20, 30, 40]])
Create a new matrix
Insert the values of your old matrix
Then, insert your new values in the last positions
x = np.array([[1,2,3],[10,20,30]])
new_X = np.zeros((2, 4))
new_X[:2,:3] = x
new_X[0][-1] = 4
new_X[1][-1] = 40
x=new_X
Or Use np.reshape() or np.resize() instead

Categories