Size of numpy strided array/broadcast array in memory? - python

I'm trying to create efficient broadcast arrays in numpy, e.g. a set of shape=[1000,1000,1000] arrays that have only 1000 elements, but repeated 1e6 times. This can be achieved both through np.lib.stride_tricks.as_strided and np.broadcast_arrays.
However, I am having trouble verifying that there is no duplication in memory, and this is critical since tests that actually duplicate the arrays in memory tend to crash my machine leaving no traceback.
I've tried examining the size of the arrays using .nbytes, but that doesn't seem to correspond to the actual memory usage:
>>> import numpy as np
>>> import resource
>>> initial_memuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> pagesize = resource.getpagesize()
>>>
>>> x = np.arange(1000)
>>> memuse_x = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of x = {0} MB".format(x.nbytes/1e6))
Size of x = 0.008 MB
>>> print("Memory used = {0} MB".format((memuse_x-initial_memuse)*resource.getpagesize()/1e6))
Memory used = 150.994944 MB
>>>
>>> y = np.lib.stride_tricks.as_strided(x, [1000,10,10], strides=x.strides + (0, 0))
>>> memuse_y = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of y = {0} MB".format(y.nbytes/1e6))
Size of y = 0.8 MB
>>> print("Memory used = {0} MB".format((memuse_y-memuse_x)*resource.getpagesize()/1e6))
Memory used = 201.326592 MB
>>>
>>> z = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
>>> memuse_z = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of z = {0} MB".format(z.nbytes/1e6))
Size of z = 80.0 MB
>>> print("Memory used = {0} MB".format((memuse_z-memuse_y)*resource.getpagesize()/1e6))
Memory used = 0.0 MB
So .nbytes reports the "theoretical" size of the array, but apparently not the actual size. The resource checking is a little awkward, as it looks like there are some things being loaded & cached (perhaps?) that result in the first striding taking up some amount of memory, but future strides take none.
tl;dr: How do you determine the actual size of a numpy array or array view in memory?

One way would be to examine the .base attribute of the array, which references the object from which an array "borrows" its memory. For example:
x = np.arange(1000)
print(x.flags.owndata) # x "owns" its data
# True
print(x.base is None) # its base is therefore 'None'
# True
a = x.reshape(100, 10) # a is a reshaped view onto x
print(a.flags.owndata) # it therefore "borrows" its data
# False
print(a.base is x) # its .base is x
# True
Things are slightly more complicated with np.lib.stride_tricks:
b = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
print(b.flags.owndata)
# False
print(b.base)
# <numpy.lib.stride_tricks.DummyArray object at 0x7fb40c02b0f0>
Here, b.base is a numpy.lib.stride_tricks.DummyArray instance, which looks like this:
class DummyArray(object):
"""Dummy object that just exists to hang __array_interface__ dictionaries
and possibly keep alive a reference to a base array.
"""
def __init__(self, interface, base=None):
self.__array_interface__ = interface
self.base = base
We can therefore examine b.base.base:
print(b.base.base is x)
# True
Once you have the base array then its .nbytes attribute should accurately reflect the amount of memory it occupies.
In principle it's possible to have a view of a view of an array, or to create a strided array from another strided array. Assuming that your view or strided array is ultimately backed by another numpy array, you could recursively reference its .base attribute. Once you find an object whose .base is None, you have found the underlying object from which your array is borrowing its memory:
def find_base_nbytes(obj):
if obj.base is not None:
return find_base_nbytes(obj.base)
return obj.nbytes
As expected,
print(find_base_nbytes(x))
# 8000
print(find_base_nbytes(y))
# 8000
print(find_base_nbytes(z))
# 8000

Related

python numpy ndarray subclassing for offset changing

I am working on a framework for processing incoming data.
The data is received from a socket and added to numpy an array A (used as buffer) using shifting, sth like:
A[:-1] = A[1:]
A[-1] = value
The framework allows loading processing units as classes that have an access to incoming data using array view pointing to A. Everytime new data is received and stored in A, a method execute() is called:
def execute(self,):
newSample = self.data[-1]
What is important is that new sample is always under index = -1.
A user can also create his own array views in __init__ function:
def __init__(self,):
self.myData = self.data[-4:] # view that contains last 4 samples
Everything works nicely when I am shifting array A and adding new value at the end. However, for offline testing, I want to load all the data at the start of the framework and run everything else as before (i.e. the same classes implementing data processing).
Of course, I can again create A buffer using zeros array and shift it with new values. However, this involves copying of data between two arrays that is absolutely not necessary - takes time and memory.
What I was thinking about is to provide a way to change the boundaries of the numpy array or change A.data pointer. However, all the solutions are not allowed or lead to the warning message.
Finally, I am trying to change an internal offset of array A, so that I can advance it and thus make more data available for algorithms. What is important, self.data[-1] has to always point to the newly appeared sample and standard numpy array API should be used.
I have subclassed np.ndarray:
class MyArrayView(np.ndarray):
def __new__(cls, input_array):
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj._offset = 0
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
if obj is None:
return
self._offset = getattr(obj, '_offset', None)
def advance_index(self):
self._offset += 1
def __str__(self):
return super(MyArrayView, self[:]).__str__()
def __repr__(self):
return super(MyArrayView, self[:]).__repr__()
def __getitem__(self, idx):
if isinstance(idx, slice):
start = 0
stop = self._offset
step = idx.step
idx = slice(start, stop, step)
else:
idx = self._offset + idx
return super(MyArrayView, self).__getitem__(idx)
that allows me to do the following:
a = np.array([1,2,3,4,5,6,7,8,9,10])
myA = MyArrayView(a)
b = myA
print("b :", b)
for i in range(1,5):
myA.advance_index()
print(b[:], b[-1])
print("b :", b)
print("b + 10 :", b + 10)
print("b[:] + 20 :", b[:] + 20)
and gives following output:
b : []
[1] 1
[1 2] 2
[1 2 3] 3
[1 2 3 4] 4
b : [1 2 3 4]
b + 10 : [11 12 13 14]
b[:] + 20 : [21 22 23 24]
so far so good. However if I check the shape:
print("shape", b[:].shape) # shape (4,)
print("shape", b.shape) # shape (10,)
it is different in those two cases. I have tried to change it using: shape=(self.internalIndex,) but it leads me only to an error message.
I want to ask if you think this is the right way what I am doing and it only requires to overload more functions in a np.ndarray class. Or should I completely abandon this solution and fallback to shifting array with a new sample? Or is it may be possible to be achieved using standard np.ndarray implementation as I need to use standard numpy API.
I also tried this:
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a.view()[5:]
print(a.data) # <memory at 0x7f09e01d8f48>
print(b.data) # <memory at 0x7f09e01d8f48> They point to the same memory start!
print(np.byte_bounds(a)) # (50237824, 50237904)
print(np.byte_bounds(b)) # (50237864, 50237904) but the byte_bounds are different
So having this in mind, I would say I need to create a view of array a and extend it (or at least move it like a window on top of a).
However, all my tries to change the byte_bounds did not bring any effects.
I admire your bravery, but am quite sure sub-classing numpy arrays is overkill for your problem and can cause you a huge lot of headache. In the end it might cause a performance hit somewhere that by far outruns the array copying you are trying to avoid.
Why not make the slice (i.e. [-4:] or slice(-4, None)) a parameter to your __init__ function or a class attribute and override that in your test?
def __init__(self, lastfour=slice(-4, None)):
self.myData = self.data[lastfour]

Numpy: garbage collection after slicing

def foo():
x = np.ones((10,10))
return x[:5,:5]
If I call y = foo() I'll get a 5x5 array (1/4 of the values in x). But what happens to the other values in x, do they persist in memory or get garbage collected in some way? I'd like to understand this.
As kindall says in the comments, basic slicing on a NumPy array creates a view of the original array. The view has to keep the entire original object alive; you can see the reference it uses to do so in the view's base attribute.
In [2]: x = numpy.ones((10, 10))
In [3]: y = x[:5, :5]
In [4]: y.base is x
Out[4]: True

In NumPy, how do I set array b's data to reference the data of array a?

Say I have ndarray a and b of compatible type and shape. I now wish for the data of b to be referring to the data of a. That is, without changing the array b object itself or creating a new one. (Imagine that b is actually an object of a class derived from ndarray and I wish to set its data reference after construction.) In the following example, how do I perform the b.set_data_reference?
import numpy as np
a = np.array([1,2,3])
b = np.empty_like(a)
b.set_data_reference(a)
This would result in b[0] == 1, and setting operations in one array would affect the other array. E.g. if we set a[1] = 22 then we can inspect that b[1] == 22.
N.B.: In case I controlled the creation of array b, I am aware that I could have created it like
b = np.array(a, copy=True)
This is, however, not the case.
NumPy does not support this operation. If you controlled the creation of b, you might be able to create it in such a way that it uses a's data buffer, but after b is created, you can't swap its buffer out for a's.
Every variable in python is a pointer so you can use directly = as follow
import numpy as np
a = np.array([1,2,3])
b = a
You can check that b refers to a as follow
assert a[1] == b[1]
a[1] = 4
assert a[1] == b[1]
Usually when functions are not always supposed to create their own buffer they implement an interface like
def func(a, b, c, out=None):
if out is None:
out = numpy.array(x, y)
# ...
return out
that way the caller can control if an existing buffer is used or not.

High memory usage in python

The following simple python code:
class Node:
NumberOfNodes = 0
def __init__(self):
Node.NumberOfNodes += 1
if __name__ == '__main__':
nodes = []
for i in xrange(1, 7 * 1000 * 1000):
if i % 1000 == 0:
print i
nodes.append(Node())
takes gigabytes of memory;
Which I think is irrational. Is that normal in python?
How could I fix that?(in my original code, I have about 7 million objects each with 10 fields and that takes 8 gigabytes of RAM)
If you have fixed number of fields then you can use __slots__ to save quite a lot of memory. Note that __slots__ do have some limitations, so make sure your read the Notes on using __slots__ carefully before choosing to use them in your application:
>>> import sys
>>> class Node(object):
NumberOfNodes = 0
def __init__(self):
Node.NumberOfNodes += 1
...
>>> n = Node()
>>> sys.getsizeof(n)
64
>>> class Node(object):
__slots__ = ()
NumberOfNodes = 0
def __init__(self):
Node.NumberOfNodes += 1
...
>>> n = Node()
>>> sys.getsizeof(n)
16
Python is an inherently memory heavy programming language. There are some ways you can get around this. __slots__ is one way. Another, more extreme approach is to use numpy to store your data. You can use numpy to create a structured array or record -- a complex data type that uses minimal memory, but suffers a substantial loss of functionality compared to a normal python class. That is, you are working with the numpy array class, rather than your own class -- you cannot define your own methods on your array.
import numpy as np
# data type for a record with three 32-bit ints called x, y and z
dtype = [(name, np.int32) for name in 'xyz']
arr = np.zeros(1000, dtype=dtype)
# access member of x of a record
arr[0]['x'] = 1 # name based access
# or
assert arr[0][0] == 1 # index based access
# accessing all x members of records in array
assert arr['x'].sum() == 1
# size of array used to store elements in memory
assert arr.nbytes == 12000 # 1000 elements * 3 members * 4 bytes per int
See more here.

Efficient way to convert string to ctypes.c_ubyte array in Python

I have a string of 20 bytes, and I would like to convert it a ctypes.c_ubyte array for bit field manipulation purposes.
import ctypes
str_bytes = '01234567890123456789'
byte_arr = bytearray(str_bytes)
raw_bytes = (ctypes.c_ubyte*20)(*(byte_arr))
Is there a way to avoid a deep copy from str to bytearray for the sake of the cast?
Alternatively, is it possible to convert a string to a bytearray without a deep copy? (With techniques like memoryview?)
I am using Python 2.7.
Performance results:
Using eryksun and Brian Larsen's suggestion, here are the benchmarks under a vbox VM with Ubuntu 12.04 and Python 2.7.
method1 uses my original post
method2 uses ctype from_buffer_copy
method3 uses ctype cast/POINTER
method4 uses numpy
Results:
method1 takes 3.87sec
method2 takes 0.42sec
method3 takes 1.44sec
method4 takes 8.79sec
Code:
import ctypes
import time
import numpy
str_bytes = '01234567890123456789'
def method1():
result = ''
t0 = time.clock()
for x in xrange(0,1000000):
byte_arr = bytearray(str_bytes)
result = (ctypes.c_ubyte*20)(*(byte_arr))
t1 = time.clock()
print(t1-t0)
return result
def method2():
result = ''
t0 = time.clock()
for x in xrange(0,1000000):
result = (ctypes.c_ubyte * 20).from_buffer_copy(str_bytes)
t1 = time.clock()
print(t1-t0)
return result
def method3():
result = ''
t0 = time.clock()
for x in xrange(0,1000000):
result = ctypes.cast(str_bytes, ctypes.POINTER(ctypes.c_ubyte * 20))[0]
t1 = time.clock()
print(t1-t0)
return result
def method4():
result = ''
t0 = time.clock()
for x in xrange(0,1000000):
arr = numpy.asarray(str_bytes)
result = arr.ctypes.data_as(ctypes.POINTER(ctypes.c_ubyte*len(str_bytes)))
t1 = time.clock()
print(t1-t0)
return result
print(method1())
print(method2())
print(method3())
print(method4())
I don't that's working how you think. bytearray creates a copy of the string. Then the interpreter unpacks the bytearray sequence into a starargs tuple and merges this into another new tuple that has the other args (even though there are none in this case). Finally, the c_ubyte array initializer loops over the args tuple to set the elements of the c_ubyte array. That's a lot of work, and a lot of copying, to go through just to initialize the array.
Instead you can use the from_buffer_copy method, assuming the string is a bytestring with the buffer interface (not unicode):
import ctypes
str_bytes = '01234567890123456789'
raw_bytes = (ctypes.c_ubyte * 20).from_buffer_copy(str_bytes)
That still has to copy the string, but it's only done once, and much more efficiently. As was stated in the comments, a Python string is immutable and could be interned or used as a dict key. Its immutability should be respected, even if ctypes lets you violate this in practice:
>>> from ctypes import *
>>> s = '01234567890123456789'
>>> b = cast(s, POINTER(c_ubyte * 20))[0]
>>> b[0] = 97
>>> s
'a1234567890123456789'
Edit
I need to emphasize that I am not recommending using ctypes to modify an immutable CPython string. If you have to, then at the very least check sys.getrefcount beforehand to ensure that the reference count is 2 or less (the call adds 1). Otherwise, you will eventually be surprised by string interning for names (e.g. "sys") and code object constants. Python is free to reuse immutable objects as it sees fit. If you step outside of the language to mutate an 'immutable' object, you've broken the contract.
For example, if you modify an already-hashed string, the cached hash is no longer correct for the contents. That breaks it for use as a dict key. Neither another string with the new contents nor one with the original contents will match the key in the dict. The former has a different hash, and the latter has a different value. Then the only way to get at the dict item is by using the mutated string that has the incorrect hash. Continuing from the previous example:
>>> s
'a1234567890123456789'
>>> d = {s: 1}
>>> d[s]
1
>>> d['a1234567890123456789']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'a1234567890123456789'
>>> d['01234567890123456789']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: '01234567890123456789'
Now consider the mess if the key is an interned string that's reused in dozens of places.
For performance analysis it's typical to use the timeit module. Prior to 3.3, timeit.default_timer varies by platform. On POSIX systems it's time.time, and on Windows it's time.clock.
import timeit
setup = r'''
import ctypes, numpy
str_bytes = '01234567890123456789'
arr_t = ctypes.c_ubyte * 20
'''
methods = [
'arr_t(*bytearray(str_bytes))',
'arr_t.from_buffer_copy(str_bytes)',
'ctypes.cast(str_bytes, ctypes.POINTER(arr_t))[0]',
'numpy.asarray(str_bytes).ctypes.data_as('
'ctypes.POINTER(arr_t))[0]',
]
test = lambda m: min(timeit.repeat(m, setup))
>>> tabs = [test(m) for m in methods]
>>> trel = [t / tabs[0] for t in tabs]
>>> trel
[1.0, 0.060573711879182784, 0.261847116395079, 1.5389279092185282]
As another solution for you to benchmark (I would be very interested in the results).
Using numpy might add some simplicity depending on what the whole code looks like.
import numpy as np
import ctypes
str_bytes = '01234567890123456789'
arr = np.asarray(str_bytes)
aa = arr.ctypes.data_as(ctypes.POINTER(ctypes.c_ubyte*len(str_bytes)))
for v in aa.contents: print v
48
49
50
51
52
53
54
55
56
57
48
49
50
51
52
53
54
55
56
57

Categories