If you are creating a 1d array, you can implement it as a list, or else use the 'array' module in the standard library. I have always used lists for 1d arrays.
What is the reason or circumstance where I would want to use the array module instead?
Is it for performance and memory optimization, or am I missing something obvious?
Basically, Python lists are very flexible and can hold completely heterogeneous, arbitrary data, and they can be appended to very efficiently, in amortized constant time. If you need to shrink and grow your list time-efficiently and without hassle, they are the way to go. But they use a lot more space than C arrays, in part because each item in the list requires the construction of an individual Python object, even for data that could be represented with simple C types (e.g. float or uint64_t).
The array.array type, on the other hand, is just a thin wrapper on C arrays. It can hold only homogeneous data (that is to say, all of the same type) and so it uses only sizeof(one object) * length bytes of memory. Mostly, you should use it when you need to expose a C array to an extension or a system call (for example, ioctl or fctnl).
array.array is also a reasonable way to represent a mutable string in Python 2.x (array('B', bytes)). However, Python 2.6+ and 3.x offer a mutable byte string as bytearray.
However, if you want to do math on a homogeneous array of numeric data, then you're much better off using NumPy, which can automatically vectorize operations on complex multi-dimensional arrays.
To make a long story short: array.array is useful when you need a homogeneous C array of data for reasons other than doing math.
For almost all cases the normal list is the right choice. The arrays module is more like a thin wrapper over C arrays, which give you kind of strongly typed containers (see docs), with access to more C-like types such as signed/unsigned short or double, which are not part of the built-in types. I'd say use the arrays module only if you really need it, in all other cases stick with lists.
The array module is kind of one of those things that you probably don't have a need for if you don't know why you would use it (and take note that I'm not trying to say that in a condescending manner!). Most of the time, the array module is used to interface with C code. To give you a more direct answer to your question about performance:
Arrays are more efficient than lists for some uses. If you need to allocate an array that you KNOW will not change, then arrays can be faster and use less memory. GvR has an optimization anecdote in which the array module comes out to be the winner (long read, but worth it).
On the other hand, part of the reason why lists eat up more memory than arrays is because python will allocate a few extra elements when all allocated elements get used. This means that appending items to lists is faster. So if you plan on adding items, a list is the way to go.
TL;DR I'd only use an array if you had an exceptional optimization need or you need to interface with C code (and can't use pyrex).
It's a trade off !
pros of each one :
list
flexible
can be heterogeneous
array (ex: numpy array)
array of uniform values
homogeneous
compact (in size)
efficient (functionality and speed)
convenient
My understanding is that arrays are stored more efficiently (i.e. as contiguous blocks of memory vs. pointers to Python objects), but I am not aware of any performance benefit. Additionally, with arrays you must store primitives of the same type, whereas lists can store anything.
The standard library arrays are useful for binary I/O, such as translating a list of ints to a string to write to, say, a wave file. That said, as many have already noted, if you're going to do any real work then you should consider using NumPy.
With regard to performance, here are some numbers comparing python lists, arrays and numpy arrays (all with Python 3.7 on a 2017 Macbook Pro).
The end result is that the python list is fastest for these operations.
# Python list with append()
np.mean(timeit.repeat(setup="a = []", stmt="a.append(1.0)", number=1000, repeat=5000)) * 1000
# 0.054 +/- 0.025 msec
# Python array with append()
np.mean(timeit.repeat(setup="import array; a = array.array('f')", stmt="a.append(1.0)", number=1000, repeat=5000)) * 1000
# 0.104 +/- 0.025 msec
# Numpy array with append()
np.mean(timeit.repeat(setup="import numpy as np; a = np.array([])", stmt="np.append(a, [1.0])", number=1000, repeat=5000)) * 1000
# 5.183 +/- 0.950 msec
# Python list using +=
np.mean(timeit.repeat(setup="a = []", stmt="a += [1.0]", number=1000, repeat=5000)) * 1000
# 0.062 +/- 0.021 msec
# Python array using +=
np.mean(timeit.repeat(setup="import array; a = array.array('f')", stmt="a += array.array('f', [1.0]) ", number=1000, repeat=5000)) * 1000
# 0.289 +/- 0.043 msec
# Python list using extend()
np.mean(timeit.repeat(setup="a = []", stmt="a.extend([1.0])", number=1000, repeat=5000)) * 1000
# 0.083 +/- 0.020 msec
# Python array using extend()
np.mean(timeit.repeat(setup="import array; a = array.array('f')", stmt="a.extend([1.0]) ", number=1000, repeat=5000)) * 1000
# 0.169 +/- 0.034
If you're going to be using arrays, consider the numpy or scipy packages, which give you arrays with a lot more flexibility.
This answer will sum up almost all the queries about when to use List and Array:
The main difference between these two data types is the operations you can perform on them. For example, you can divide an array by 3 and it will divide each element of array by 3. Same can not be done with the list.
The list is the part of python's syntax so it doesn't need to be declared whereas you have to declare the array before using it.
You can store values of different data-types in a list (heterogeneous), whereas in Array you can only store values of only the same data-type (homogeneous).
Arrays being rich in functionalities and fast, it is widely used for arithmetic operations and for storing a large amount of data - compared to list.
Arrays take less memory compared to lists.
Array can only be used for specific types, whereas lists can be used for any object.
Arrays can also only data of one type, whereas a list can have entries of various object types.
Arrays are also more efficient for some numerical computation.
An important difference between numpy array and list is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.
Related
import numpy as np
v = np.zeros((3,10000), dtype=np.float32)
mat = np.zeros((10000,10000000), dtype=np.int8)
w = np.matmul(v, mat)
yields
Traceback (most recent call last):
File "int_mul_test.py", line 6, in <module>
w = np.matmul(v, mat)
numpy.core._exceptions.MemoryError: Unable to allocate 373. GiB
for an array with shape (10000, 10000000) and data type float32
Apparently, numpy is trying to convert my 10k x 10m int8 matrix to dtype float32. Why does it need to do this? It seems extremely wasteful, and if matrix multiplication must work with float numbers in memory, it could convert say 1m columns at a time (which shouldn't sacrifice speed too much), instead of converting all 10m columns all at once.
My current solution is to use a loop to break the matrix into 10 pieces and reduce temporary memory allocation to 1/10 of the 373 GiB:
w = np.empty((v.shape[0],mat.shape[1]),dtype=np.float32)
start = 0
block = 1000000
for i in range(mat.shape[1]//block):
end = start + block
w[:,start:end] = np.matmul(v, mat[:,start:end])
start = end
w[:,start:] = np.matmul(v, mat[:,start:])
# runs in 396 seconds
Is there a numpy-idiomatic way to multiply "piece by piece" without manually coding a loop?
The semantic of Numpy operations force the inputs of a binary operation to be casted when the left/right types are different. In fact, this is the case in almost all statically typed language including C, C++, Java, Rust, but also many dynamically-typed languages (the semantic rules are applied at runtime in this case). Python also (partially) applies such a well defined semantic rule. For example, when you evaluate the expression True * 1.7, the interpreter evaluates the type of both operands (bool and float here) and then applies multiple semantic rules until the type of both operand are the same before performing the actual multiplication. In this case, True of type bool is casted to 1 of type int which is then casted to 1.0 of type float. Such semantic rules are generally defined in a way that is both relatively unambiguous and safe. For example, you do not expect 2 * 1.7 to be equal to 3. Numpy use semantic rules similar to the ones of the C language because it is written in C and provide native types. The semantic rules should be defined independently of a given implementation. That being said performance and ease-of-use matters a lot when designing it. Unfortunately, in your case, this means a huge array has to be allocated.
Note that Numpy could theoretically bypass casting and implement the N * N possible versions for the N different types for each binary operations in order to make them faster (like the "as-if" semantic rule of the C language). However, this would be insane to implement for developers and it would result in a more bug-prone code (ie. less stable and slower development) and a huge code bloat (bigger binaries). This is especially true since other parameters should be taken into account like the shape of the array and the memory layout (eg. alignment) or event the target architecture. The current main casting generative function of Numpy is already quite complex and already results in 4 x 18 x 18 = 1296 different C functions to be compiled and stored in Numpy binaries!
In your case, you can use Cython or Numba to generate a memory-efficient (and possibly faster) implementation dedicated to your specific needs. Be careful about possible overflows though.
arr = np.array([Myclass(np.random.random(100)) for _ in range(10000)])
Is there a way to save time in this statement by creating a numpy array of objects directly (avoiding the list construction which is costly)?
I need to create and process a large number of objects of class Myclass, where each object contains several int’s, several float’s, and a list (or tuple) of floats. The point of using the array (of objects) is to take advantage of numpy array’s fast computation (e.g., column-sums) on slices of the array of objects (and other stuff; the array on which slices are taken has each row made up of one Myclass object and other scalar fields). Other than using the np.array (as above), is there any other time-saving strategy in this case? Thanks.
Numpy needs to know the length of the array in advance because it must allocate enough memory in a block.
You can start with an empty array of appropriate type using np.empty(10_000, object). (Beware that for most data types empty arrays may contain garbage data, it's usually safer to start with np.zeros() unless you really need the performance, but dtype object does get properly initialized to Nones.)
You can then apply any callable you like (like a class) over all the values using np.vectorize. It's faster to use the included vectorized functions when you can instead of converting them, since vectorize basically has to call it for each element in a for loop. But sometimes you can't.
In the case of random numbers, you can create an array sample of any shape you like using np.random.rand(). It would still have to be converted to a new array of dtype object when you apply your class to it though. I'm not sure if that's any faster than creating the samples in each __init__ (or whatever callable). You'd have to profile it.
I have what I thought would be a simple task in numpy, but I'm having trouble.
I have a function which takes an index in the array and returns the value that belongs at that index. I would like to, efficiently, write the values into a numpy array.
I have found numpy.fromfunction, but it doesn't behave remotely like the documentation suggests. It seems to "vectorise" the function, which means that instead of passing the actual indices it passes a numpy array of indices:
def vsin(i):
return float(round(A * math.sin((2 * pi * wf) * i)))
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
# TypeError: only length-1 arrays can be converted to Python scalars
(if we use a debugger to inspect i, it is a numpy.array instance.)
So, if we try to use numpy's vectorised sin function:
def vsin(i):
return (A * numpy.sin((2 * pi * wf) * i)).astype(numpy.int16)
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
We don't get a type error, but if len > 2**15 we get discontinuities chopping accross our oscillator, because numpy is using int16_t to represent the index!
The point here isn't about sin in particular: I want to be able to write arbitrary python functions like this (whether a numpy vectorised version exists or not) and be able to run them inside a tight C loop (rather than a roundabout python one), and not have to worry about integer wraparound.
Do I really have to write my own cython extension in order to be able to do this? Doesn't numpy have support for running python functions once per item in an array, with access to the index?
It doesn't have to be a creation function: I can use numpy.empty (or indeed, reuse an existing array from somewhere else.) So a vectorised transformation function would also do.
I think the issue of integer wraparound is unrelated to numpy's vectorized sin implementation and even the use of python or C.
If you use a 2-byte signed integer and try to generate an array of integer values ranging from 0 to above 32767, you will get a wrap-around error. The array will look like:
[0, 1, 2, ... , 32767, -32768, -32767, ...]
The simplest solution, assuming memory is not too tight, is to use more bytes for your integer array generated by fromfunction so you don't have a wrap-around problem in the first place (up to a few billion):
numpy.fromfunction(vsin, (len,), dtype=numpy.int32)
numpy is optimized to work fast on arrays by passing the whole array around between vectorized functions. I think in general the numpy tools are inconvenient for trying to run scalar functions once per array element.
I know in python it's hard to see the memory usage of an object.
Is it easier to do this for SciPy objects (for example, sparse matrix)?
you can use array.itemsize (size of the contained type in bytes) and array.flat to obtain the lenght:
# a is your array
bytes = a.itemsize * a.size
it's not the exact value, as it ignore the whole array infrastructure, but for big array it's the value that matter (and I guess that you care because you have something big)
if you want to use it on a sparse array you have to modify it, as the sparse doesn't have the itemsize attribute. You have to access the dtype and get the itemsize from it:
bytes = a.dtype.itemsize * a.size
In general I don't think it's easy to evaluate the real memory occupied by a python object...the numpy array is an exception being just a thin layer over a C array
If you are inside IPython, you can also use its %whosmagic function, which gives you information about the session's variables and includes how much RAM each takes.
For example, how much memory is required to store a list of one million (32-bit) integers?
alist = range(1000000) # or list(range(1000000)) in Python 3.0
"It depends." Python allocates space for lists in such a way as to achieve amortized constant time for appending elements to the list.
In practice, what this means with the current implementation is... the list always has space allocated for a power-of-two number of elements. So range(1000000) will actually allocate a list big enough to hold 2^20 elements (~ 1.045 million).
This is only the space required to store the list structure itself (which is an array of pointers to the Python objects for each element). A 32-bit system will require 4 bytes per element, a 64-bit system will use 8 bytes per element.
Furthermore, you need space to store the actual elements. This varies widely. For small integers (-5 to 256 currently), no additional space is needed, but for larger numbers Python allocates a new object for each integer, which takes 10-100 bytes and tends to fragment memory.
Bottom line: it's complicated and Python lists are not a good way to store large homogeneous data structures. For that, use the array module or, if you need to do vectorized math, use NumPy.
PS- Tuples, unlike lists, are not designed to have elements progressively appended to them. I don't know how the allocator works, but don't even think about using it for large data structures :-)
Useful links:
How to get memory size/usage of python object
Memory sizes of python objects?
if you put data into dictionary, how do we calculate the data size?
However they don't give a definitive answer. The way to go:
Measure memory consumed by Python interpreter with/without the list (use OS tools).
Use a third-party extension module which defines some sort of sizeof(PyObject).
Update:
Recipe 546530: Size of Python objects (revised)
import asizeof
N = 1000000
print asizeof.asizeof(range(N)) / N
# -> 20 (python 2.5, WinXP, 32-bit Linux)
# -> 33 (64-bit Linux)
Addressing "tuple" part of the question
Declaration of CPython's PyTuple in a typical build configuration boils down to this:
struct PyTuple {
size_t refcount; // tuple's reference count
typeobject *type; // tuple type object
size_t n_items; // number of items in tuple
PyObject *items[1]; // contains space for n_items elements
};
Size of PyTuple instance is fixed during it's construction and cannot be changed afterwards. The number of bytes occupied by PyTuple can be calculated as
sizeof(size_t) x 2 + sizeof(void*) x (n_items + 1).
This gives shallow size of tuple. To get full size you also need to add total number of bytes consumed by object graph rooted in PyTuple::items[] array.
It's worth noting that tuple construction routines make sure that only single instance of empty tuple is ever created (singleton).
References:
Python.h,
object.h,
tupleobject.h,
tupleobject.c
A new function, getsizeof(), takes a
Python object and returns the amount
of memory used by the object, measured
in bytes. Built-in objects return
correct results; third-party
extensions may not, but can define a
__sizeof__() method to return the object’s size.
kveretennicov#nosignal:~/py/r26rc2$ ./python
Python 2.6rc2 (r26rc2:66712, Sep 2 2008, 13:11:55)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
>>> import sys
>>> sys.getsizeof(range(1000000))
4000032
>>> sys.getsizeof(tuple(range(1000000)))
4000024
Obviously returned numbers don't include memory consumed by contained objects (sys.getsizeof(1) == 12).
This is implementation specific, I'm pretty sure. Certainly it depends on the internal representation of integers - you can't assume they'll be stored as 32-bit since Python gives you arbitrarily large integers so perhaps small ints are stored more compactly.
On my Python (2.5.1 on Fedora 9 on core 2 duo) the VmSize before allocation is 6896kB, after is 22684kB. After one more million element assignment, VmSize goes to 38340kB. This very grossly indicates around 16000kB for 1000000 integers, which is around 16 bytes per integer. That suggests a lot of overhead for the list. I'd take these numbers with a large pinch of salt.