scipy.random.rand() and other functions in the same package
all produce arrays of float64 as output
(at least for python 2.7.3 64-bit on Mac OS, scipy version 0.12.0).
What I want is a rather large (N gigabytes) randomly initialized matrix of float32.
Is there an easy way to produce one directly, rather than allocating double space
for float64 then converting down to 32 bits?
I would preallocate the array, then copy in batches of random float64s as Warren Weckesser recommends in the comments.
If you're in for a hack, here's ten floats generated using uniform random bits:
>>> bytes_per_float = np.float32(0).nbytes # ugly, I know
>>> np.frombuffer(random.bytes(10 * bytes_per_float), dtype=np.float32)
array([ -3.42894422e-23, -3.33389699e-01, -7.63695071e-26,
7.02152836e-10, 3.45816648e-18, 2.80226597e-09,
-9.34621269e-10, -9.75820352e+08, 2.95705402e+20,
2.57654391e+25], dtype=float32)
Of course, these don't follow any nice distribution, the array might contain NaN or Inf, and the code might actually crash on some non-x86 machines due to aligment problems.
Related
import numpy as np
v = np.zeros((3,10000), dtype=np.float32)
mat = np.zeros((10000,10000000), dtype=np.int8)
w = np.matmul(v, mat)
yields
Traceback (most recent call last):
File "int_mul_test.py", line 6, in <module>
w = np.matmul(v, mat)
numpy.core._exceptions.MemoryError: Unable to allocate 373. GiB
for an array with shape (10000, 10000000) and data type float32
Apparently, numpy is trying to convert my 10k x 10m int8 matrix to dtype float32. Why does it need to do this? It seems extremely wasteful, and if matrix multiplication must work with float numbers in memory, it could convert say 1m columns at a time (which shouldn't sacrifice speed too much), instead of converting all 10m columns all at once.
My current solution is to use a loop to break the matrix into 10 pieces and reduce temporary memory allocation to 1/10 of the 373 GiB:
w = np.empty((v.shape[0],mat.shape[1]),dtype=np.float32)
start = 0
block = 1000000
for i in range(mat.shape[1]//block):
end = start + block
w[:,start:end] = np.matmul(v, mat[:,start:end])
start = end
w[:,start:] = np.matmul(v, mat[:,start:])
# runs in 396 seconds
Is there a numpy-idiomatic way to multiply "piece by piece" without manually coding a loop?
The semantic of Numpy operations force the inputs of a binary operation to be casted when the left/right types are different. In fact, this is the case in almost all statically typed language including C, C++, Java, Rust, but also many dynamically-typed languages (the semantic rules are applied at runtime in this case). Python also (partially) applies such a well defined semantic rule. For example, when you evaluate the expression True * 1.7, the interpreter evaluates the type of both operands (bool and float here) and then applies multiple semantic rules until the type of both operand are the same before performing the actual multiplication. In this case, True of type bool is casted to 1 of type int which is then casted to 1.0 of type float. Such semantic rules are generally defined in a way that is both relatively unambiguous and safe. For example, you do not expect 2 * 1.7 to be equal to 3. Numpy use semantic rules similar to the ones of the C language because it is written in C and provide native types. The semantic rules should be defined independently of a given implementation. That being said performance and ease-of-use matters a lot when designing it. Unfortunately, in your case, this means a huge array has to be allocated.
Note that Numpy could theoretically bypass casting and implement the N * N possible versions for the N different types for each binary operations in order to make them faster (like the "as-if" semantic rule of the C language). However, this would be insane to implement for developers and it would result in a more bug-prone code (ie. less stable and slower development) and a huge code bloat (bigger binaries). This is especially true since other parameters should be taken into account like the shape of the array and the memory layout (eg. alignment) or event the target architecture. The current main casting generative function of Numpy is already quite complex and already results in 4 x 18 x 18 = 1296 different C functions to be compiled and stored in Numpy binaries!
In your case, you can use Cython or Numba to generate a memory-efficient (and possibly faster) implementation dedicated to your specific needs. Be careful about possible overflows though.
I try to write a matrix consisting of kronecker-products
def kron_sparse_2(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p):
kron= sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(sparse.kron(a,b),c),d),e),f),g),h),i),j),k),l),m),n),o),p)
return kron
res = 0
for i in sd:
res= res +( kron_sparse_2(i,i,I,I,I,I,I,I,I,I,I,I,I,I,I,I))
The i's in sd are 2x2 matrices.
Is there anything I can do further to calculate this without the memory problem?
The error I get is: MemoryError: Unable to allocate 16.0 GiB for an array with shape (536870912, 2, 2) and data type float64
If I understand correctly (I think you are trying to form the Hamiltonian for some spin problem, and you should be able to go up to 20 spins with ease. Also if it is indeed the case try using np.roll and reduce functions to rewrite your methods efficiently), you might try converting all of your matrices (even with dims. 2X2) to sparse format (say csr or csc) and use scipy kron function with the format specified as the sparse matrix format you used to construct all of your matrices. Because as far as I remember kron(format=None) uses an explicit representation of matrices which causes memory problems, try format='csc' for instance.
I have a huge file of csv which can not be loaded into memory. Transforming it to libsvm format may save some memory.
There are many nan in csv file. If I read lines and store them as np.array, with np.nan as NULL, will the array still occupy too much memory ?
Does the np.nan in array also occupy memory ?
When working with floating point representations of numbers, non-numeric values (NaN and inf) are also represented by a specific binary pattern occupying the same number of bits as any numeric floating point value. Therefore, NaNs occupy the same amount of memory as any other number in the array.
As far as I know yes, nan and zero values occupy the same memory as any other value, however, you can address your problem in other ways:
Have you tried using a sparse vector? they are intended for vectors with a lot of 0 values and memory consumption is optimized
SVM Module Scipy
Sparse matrices Scipy
There you have some info about SVM and sparse matrices, if you have further questions just ask.
Edited to provide an answer as well as a solution
According to the getsizeof() command from the sys module it does. A simple and fast example :
import sys
import numpy as np
x = np.array([1,2,3])
y = np.array([1,np.nan,3])
x_size = sys.getsizeof(x)
y_size = sys.getsizeof(y)
print(x_size)
print(y_size)
print(y_size == x_size)
This should print out
120
120
True
so my conclusion was it uses as much memory as a normal entry.
Instead you could use sparse matrices (Scipy.sparse) which do not save zero / Null at all and therefore are more memory efficient. But Scipy strongly discourages from using Numpy methods directly https://docs.scipy.org/doc/scipy/reference/sparse.html since Numpy might not interpret them correctly.
I'm trying to use numpy to element-wise square an array. I've noticed that some of the values appear as negative numbers. The squared value isn't near the max int limit. Does anyone know why this is happening and how I could fix it? I'd rather avoid using a for loop to square an array element-wise, since my data set is quite large.
Here's an example of what is happening:
import numpy as np
test = [1, 2, 47852]
sq = np.array(test)**2
print(sq)
print(47852*47852)
Output:
[1,4, -2005153392]
2289813904
This is because NumPy doesn't check for integer overflow - likely because that would slow down every integer operation, and NumPy is designed with efficiency in mind. So when you have an array of 32-bit integers and your result does not fit in 32 bits, it is still interpreted as 32-bit integer, giving you the strange negative result.
To avoid this, you can be mindful of the dtype you need to perform the operation safely, in this case 'int64' would suffice.
>>> np.array(test, dtype='int64')**2
2289813904
You aren't seeing the same issue with Python int's because Python checks for overflow and adjusts accordingly to a larger data type if necessary. If I recall, there was a question about this on the mailing list and the response was that there would be a large performance implication on atomic array ops if the same were done in NumPy.
As for why your default integer type may be 32-bit on a 64-bit system, as Goyo answered on a related question, the default integer np.int_ type is the same as C long, which is platform dependent but can be 32-bits.
I am trying to compute the dot product of two numpy arrays sized respectively (162225, 10000) and (10000, 100). However, if I call numpy.dot(A, B) a MemoryError happens.
I, then, tried to write my implementation:
def slower_dot (A, B):
"""Low-memory implementation of dot product"""
#Assuming A and B are of the right type and size
R = np.empty([A.shape[0], B.shape[1]])
for i in range(A.shape[0]):
for j in range(B.shape[1]):
R[i,j] = np.dot(A[i,:], B[:,j])
return R
and it works just fine, but is of course very slow. Any idea of 1) what is the reason behind this behaviour and 2) how I could circumvent / solve the problem?
I am using Python 3.4.2 (64bit) and Numpy 1.9.1 on a 64bit equipped computer with 16GB of ram running Ubuntu 14.10.
The reason you're getting a memory error is probably because numpy is trying to copy one or both arrays inside the call to dot. For small to medium arrays this is often the most efficient option, but for large arrays you'll need to micro-manage numpy in order to avoid the memory error. Your slower_dot function is slow largely because of the python function call overhead, which you suffer 162225 x 100 times. Here is one common way of dealing with this kind of situation when you want to balance memory and performance limitations.
import numpy as np
def chunking_dot(big_matrix, small_matrix, chunk_size=100):
# Make a copy if the array is not already contiguous
small_matrix = np.ascontiguousarray(small_matrix)
R = np.empty((big_matrix.shape[0], small_matrix.shape[1]))
for i in range(0, R.shape[0], chunk_size):
end = i + chunk_size
R[i:end] = np.dot(big_matrix[i:end], small_matrix)
return R
You'll want to pick the chunk_size that works best for your specific array sizes. Typically larger chunk sizes will be faster as long as everything fits in memory.
I think the problem starts from the matrix A itself as a 16225 * 10000 size matrix already occupies about 12GB of memory if each element is a double precision floating point number. That together with how numpy creates temporary copies to do the dot operation will cause the error. The extra copies is because numpy uses the underlying BLAS operations for dot which needs the matrices to be stored in contiguous C order
Check out these links if you want more discussions about improving dot performance
http://wiki.scipy.org/PerformanceTips
Speeding up numpy.dot
https://github.com/numpy/numpy/pull/2730