I was wondering if there were any situation where a numpy array owning its data is stored non-contiguously.
From a numerical point of view, non-contiguous, row- or column-aligned buffers make sense and are ubiquitous in performance libraries such as IPP. However it seems that numpy by default converts anything passed as an argument of array to a contiguous buffer. This is not really explicitly said in the documentation as far as I understand it.
My question is, does numpy guarantee that any owning array created with np.array is contiguous in memory? More generally, in which situations can we come across a non-contiguous owning array?
EDIT following #Eelco's answer
By non-contiguous, I mean that there is some "empty spaces" in the memory chunk used to store data (strides[1] > shape[0] * itemsize if you will). I do not mean an array whose data is stored using two or more memory allocations — I would be surprised that such an owning numpy array exists. This seems to be consistent with numpy's terminology according to this answer.
By owning arrays, I mean arrays whose .flags.owndata=True. I am not interested in non-owning arrays who can behave wildly indeed.
Ive heard it said (no source, sorry), that indeed all memory-owning arrays are contiguous. And that makes sense; how can you own a non-contiguous block? It implies youd have to make an arbitrary number of fragmented deallocation calls when that hypothetical object gets collected... And I think thats not even possible; I think one can only release the ranges originally allocated. And viewed from the other side; ownership originates at the time of allocation; and we can only ever allocate contiguous blocks. (at least thats how it works on the malloc level; you could have a software-based allocation layer on top of that which implements logic to handle such fragmented ownership; but if any such thing exists its news to me).
Ive contributed to jsonpickle to expand its numpy support, and there this question also came up. The code I wrote there would break (and quite horribly so) if someone were to feed it a non-contiguous owning array; and its been more than a year and I havnt seen any issues been reported; so thats fairly strong empirical evidence id say...
But if you are still worried about this leading to hard to track bugs (I dont think there is a limit to the shenanigans a C lib constructing a numpy array can get up to), id recommend simply asserting at runtime that no such frankenarrays ever get accidentally passed in to the wrong places.
Related
I have to handle sparse matrix that can occasionally be very big, nearing or exceeding RAM capacity. I also need to support mat*vec and mat*mat operations.
Since internally a csr_matrix is 3 arrays data, indices and indptr is it possible to create a csr matrix from numpy memmap.
This can partially work, until you try to do much with the array. There's a very good chance the subarrays will be fully read into memory if you subset, or you'll get an error.
An important consideration here is that the underlying code is written assuming the arrays are typical in-memory numpy arrays. Cost of random access is very different for mmapped arrays and in memory arrays. In fact, much of the code here is (at time of writing) in Cython, which may not be able to work with more exotic array types.
Also most of this code can change at any time, as long as the behaviour is the same for in-memory arrays. This has personally bitten me when some I learned some code I worked with was doing this, but with h5py.Datasets for the underlying arrays. It worked surprisingly well, until a bug fix release of scipy completely broke it.
This works without any problems.
As far as I understand it, tuples and strings are immutable to allow optimizations such as re-using memory that won't change. However, one obvious optimisation, making slices of tuples refer to the same memory as the original tuple, is not included in python.
I know that this optimization isn't included because when I time the following function, time taken goes like O(n^2) instead of O(n), so full copying is taking place:
def test(n):
tup = tuple(range(n))
for i in xrange(n):
tup[0:i]
Is there some behavior of python that would change if this optimization was implemented? Is there some performance benefit to copying even when the original is immutable?
By view, are you thinking of something equivalent to what numpy does? I'm familiar with how and why numpy does that.
A numpy array is an object with shape and dtype information, plus a data buffer. You can see this information in the __array_interface__ property. A view is a new numpy object, with its own shape attribute, but with a new data buffer pointer that points to someplace in the source buffer. It also has a flag that says "I don't own the buffer". numpy also maintains its own reference count, so the data buffer is not destroyed if the original (owner) array is deleted (and garbage collected).
This use of views can be big time saver, especially with very large arrays (questions about memory errors are common on SO). Views also allow different dtype, so a data buffer can be viewed at 4 byte integers, or 1 bytes characters, etc.
How would this apply to tuples? My guess is that it would require a lot of extra baggage. A tuple consists of a fixed set of object pointers - probably a C array. A view would use the same array, but with its own start and end markers (pointers and/or lengths). What about sharing flags? Garbage collection?
And what's the typical size and use of tuples? A common use of tuples is to pass arguments to a function. My guess is that a majority of tuples in a typical Python run are small - 0, 1 or 2 elements. Slices are allowed, but are they very common? On small tuples or very large ones?
Would there be any unintended consequences to making tuple slices views (in the numpy sense)? The distinction between views and copies is one of the harder things for numpy users to grasp. Since a tuple is supposed to be immutable - that is the pointers in the tuple cannot be changed - it is possible that implementing views would be invisible to users. But still I wonder.
It may make most sense to try this idea on a branch of the PyPy version - unless you really like to get dig into Cpython code. Or as a custom class with Cython.
Playing with strides in NumPy I realized that you can easily go past the boundaries of arrays:
>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>> a = np.array([1], dtype=np.int8)
>>> as_strided(a, shape=(2,), strides=(1,))
array([ 1, -28], dtype=int8)
Like this I can read the bytes outside of the array and also write into them. But I don't understand how this is possible. Why doesn't the operating system stop me? It seems I can go at least 100 KB away from this array, before a Segmentation fault is thrown.
The only thing I can think of is that this memory space is directly allocated by my Python process. Does NumPy do this? Is there a fixed size to this space? What other objects can there be?
There are two different memory allocators in play here:
The operating system, accessible under Unix with e.g. brk(2) or mmap(2). This will generally give you exactly what you ask for, but it's not very user-friendly.
The C runtime heap, accessible with malloc(3) and free(3). This may or may not return freed memory to the operating system immediately. It may also round allocations up to the nearest page, if that is more performant. This is usually implemented in terms of (1).
Most applications, including NumPy and Python, use (2) rather than (1) (or they implement their own memory allocator on top of (2)). As a result, memory that is invalid according to (2) may still be valid according to (1). You only get a segfault if you violate the rules of method (1). It is also possible you are interacting with other live objects on the heap, which has a strong likelihood of causing your program to misbehave in arbitrary ways, even if you are not changing anything.
Python and numpy is built with C, which has no memory protection built-in. Memory is allocated in the "heap" which is a large block of memory. Since all objects are allocated there, the memory area is quite large and filled with any kind of objects. Writing to this memory will probably crash your program.
This is a follow up to this question
What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to MEMORY?
I'm interested in understanding the speed implications of using a numpy array vs a list of lists when the array is of type object.
If anyone is interested in the object I'm using:
import gmpy2 as gm
gm.mpfr('0') # <-- this is the object
The biggest usual benefits of numpy, as far as speed goes, come from being able to vectorize operations, which means you replace a Python loop around a Python function call with a C loop around some inlined C (or even custom SIMD assembly) code. There are probably no built-in vectorized operations for arrays of mpfr objects, so that main benefit vanishes.
However, there are some place you'll still benefit:
Some operations that would require a copy in pure Python are essentially free in numpy—transposing a 2D array, slicing a column or a row, even reshaping the dimensions are all done by wrapping a pointer to the same underlying data with different striding information. Since your initial question specifically asked about A.T, yes, this is essentially free.
Many operations can be performed in-place more easily in numpy than in Python, which can save you some more copies.
Even when a copy is needed, it's faster to bulk-copy a big array of memory and then refcount all of the objects than to iterate through nested lists deep-copying them all the way down.
It's a lot easier to write your own custom Cython code to vectorize an arbitrary operation with numpy than with Python.
You can still get some benefit from using np.vectorize around a normal Python function, pretty much on the same order as the benefit you get from a list comprehension over a for statement.
Within certain size ranges, if you're careful to use the appropriate striding, numpy can allow you to optimize cache locality (or VM swapping, at larger sizes) relatively easily, while there's really no way to do that at all with lists of lists. This is much less of a win when you're dealing with an array of pointers to objects that could be scattered all over memory than when dealing with values that can be embedded directly in the array, but it's still something.
As for disadvantages… well, one obvious one is that using numpy restricts you to CPython or sometimes PyPy (hopefully in the future that "sometimes" will become "almost always", but it's not quite there as of 2014); if your code would run faster in Jython or IronPython or non-NumPyPy PyPy, that could be a good reason to stick with lists.
I am trying to adapt the underlying structure of plotting code (matplotlib) that is updated on a timer to go from using Python lists for the plot data to using numpy arrays. I want to be able to lower the time step for the plot as much as possible, and since the data may get up into the thousands of points, I start to lose valuable time fast if I can't. I know that numpy arrays are preferred for this sort of thing, but I am having trouble figuring out when I need to think like a Python programmer and when I need to think like a C++ programmer maximize my efficiency of memory access.
It says in the scipy.org docs for the append() function that it returns a copy of the arrays appended together. Do all these copies get garbage-collected properly? For example:
import numpy as np
a = np.arange(10)
a = np.append(a,10)
print a
This is my reading of what is going on on the C++-level, but if I knew what I was talking about, I wouldn't be asking the question, so please correct me if I'm wrong! =P
First a block of 10 integers gets allocated, and the symbol a points to the beginning of that block. Then a new block of 11 integers is allocated, for a total of 21 ints (84 bytes) being used. Then the a pointer is moved to the start of the 11-int block. My guess is that this would result in the garbage-collection algorithm decrementing the reference count of the 10-int block to zero and de-allocating it. Is this right? If not, how do I ensure I don't create overhead when appending?
I also am not sure how to properly delete a numpy array when I am done using it. I have a reset button on my plots that just flushes out all the data and starts over. When I had lists, this was done using del data[:]. Is there an equivalent function for numpy arrays? Or should I just say data = np.array([]) and count on the garbage collector to do the work for me?
The point of automatic memory management is that you don't think about it. In the code that you wrote, the copies will be garbage-collected fine (it's nigh on impossible to confuse Python's memory management). However, because np.append is not in-place, the code will create a new array in memory (containing the concatenation of a and 10) and then the variable a will be updated to point to this new array. Since a now no longer points to the original array, which had a refcount of 1, its refcount is decremented to 0 and it will be cleaned up automatically. You can use gc.collect to force a full cleanup.
Python's strength does not lie in fine-tuning memory access, although it is possible to optimise. You are probably best sorted pre-allocating a (using e.g. a = np.zeros( <size> )); if you need finer tuning than that it starts to get a bit hairy. You could have a look at the Cython + Numpy tutorial for a very neat and easy way to integrate C with Python for efficiency.
Variables in Python just point to the location where their contents are stored; you can del any variable and it will decrease the reference count of its target by one. The target will be cleaned automatically after its reference count hits zero. The moral of this is, don't worry about cleaning up your memory. It will happen automatically.