Normally, I'm happy with the way numpy determines the minimum type required to hold the objects of the sequence in np.array:
>>> print(np.array([42, 4.2]))
array([42, 4.2], dtype=float64)
That is quite intuitive: I need to upcast an integer to a float in order to handle the data.
However, the following case seems to be less intuitive to me:
>>> print(np.array([42, 4.2, 'aa']))
array(['42', '4.2', 'aa'], dtype='<U32')
I would prefer the resulting array to be of type np.object. I don't want to call
np.array(ma_list, dtype=np.object)
because I would like to keep the old behavior in the case of my_list=[42, 4.2] and also in case of my_list=['aa'] (which would result in type being <U2).
Is it possible to tweak the default behavior in order to prevent the upcasting of numerical values to a string, or is there any workaround with the same effect?
It looks like you want to do a bit of pre-processing on your data before you let numpy determine the data type. From what I understood of your criteria, if all the objects in the list are numbers, or all of them are not numbers, you want to let numpy determine the type. If the categories are mixed, you want to use np.object.
Fortunately, all numbers in Python have the abstract base class numbers.Number hooked in:
from numbers import Number
isnum = lambda x: isinstance(x, Number)
isntnum = lambda x: not isinstance(x, Number)
if all(map(isnum, my_list)) or all(map(isntnum, my_list)):
dtype = None
else:
dtype = np.object
my_arr = np.array(my_list, dtype=dtype)
The phrasing here isn't ideal, but it should work, and give you a starting point for something more elegant and efficient.
After looking through all of the C code that I could in ~30 minutes, I've concluded there is no great way of doing this.
My best bet would be the following:
a = np.array([4.2,42,'42'])
if str(a.dtype)[:2]=='<U':
a = np.array([4.2,42,'42'],dtype=np.object)
I'll admit that this is really hacky, since it relies on the fact that np.array casts these string/float arrays to unicode data types, but it should work well, at least for small arrays.
I wrote a function that passes numpy array's into C code using CFFI. It utilizes the buffer protocol and memoryview to pass the data efficiently without copying it. However, this means that you need to pass C-contiguous arrays and ensure that you using the right types. Numpy provides a function numpy.ascontiguous, which does this. So I iterate over the arguments, and apply this function. The implementation below works, and may be of general interest. However, it is slow given the number of times it is called. (Any general comments on how to speed it up would be helpful.)
However, the actual question is when you replace the first list comprehension with a generator comprehension, or if you refactor the code so that np.ascontigous is called in the second one, the pointers passed into the C code no longer point to the start of the numpy array. I think that it is not getting called. I'm iterating over the comprehension and only using the return values, why would using a list comprehension or generator comprehension change anything?
def cffi_wrap(cffi_func, ndarray_params, pod_params, return_shapes=None):
"""
Wraps a cffi function to allow it to be called on numpy arrays.
It uss the numpy buffer protocol and and the cffi buffer protocol to pass the
numpy array into the c function without copying any of the parameters.
You will need to pass dimensions into the C function, which you can do using
the pod_params.
Parameters
----------
cffi_func : c function
This is a c function declared using cffi. It must take double pointers and
plain old data types. The arguments must be in the form of numpy arrays,
plain old data types, and then the returned numpy arrays.
ndarray_params : iterable of ndarrays
The numpy arrays to pass into the function.
pod_params : tuple of plain old data
This plain old data objects to pass in. This may include for example
dimensions.
return_shapes : iterable of tuples of positive ints
The shapes of the returned objects.
Returns
-------
return_vals : ndarrays of doubles.
The objects to be calculated by the cffi_func.
"""
arr_param_buffers = [np.ascontiguousarray(param, np.float64)
if np.issubdtype(param.dtype, np.float)
else np.ascontiguousarray(param, np.intc) for param in ndarray_params]
arr_param_ptrs = [ffi.cast("double *", ffi.from_buffer(memoryview(param)))
if np.issubdtype(param.dtype, np.float)
else ffi.cast("int *", ffi.from_buffer(memoryview(param)))
for param in arr_param_buffers]
if return_shapes is not None:
return_vals_ptrs = tuple(ffi.new("double[" + str(np.prod(shape)) + "]")
for shape in return_shapes)
returned_val = cffi_func(*arr_param_ptrs, *pod_params, *return_vals_ptrs)
return_vals = tuple(np.frombuffer(ffi.buffer(
return_val))[:np.prod(shape)].reshape(shape)
for shape, return_val in zip(return_shapes, return_vals_ptrs))
else:
returned_val = cffi_func(*arr_param_ptrs, *pod_params)
return_vals = None
if returned_val is not None and return_vals is not None:
return_vals = return_vals + (returned_val,)
elif return_vals is None:
return_vals = (returned_val,)
if len(return_vals) == 1:
return return_vals[0]
else:
return return_vals
I'm just guessing, but the error could come from keepalives: with arr_param_buffers a list comprehension, as in your posted code, then as long as this local variable exists (i.e. for the whole duration of cffi_wrap()), all the created numpy arrays are alive. This allows you to do ffi.from_buffer(memoryview(...)) on the next line and be sure that they are all pointers to valid data.
If you replace arr_param_buffers with a generator expression, it will generate the new numpy arrays one by one, call ffi.from_buffer(memoryview(param)) on them, and then throw them away. The ffi.from_buffer(x) returns an object that should keep x alive, but maybe x == memoryview(nd) does not itself keep alive the numpy array nd, for all I know.
I created a ndarray array in python
temp = np.array([1, 2, 3, 4])
To measure the length of this array, I can use
temp.size
or
np.size(temp)
both return 4. But I'm wondering what's the difference between the two expressions? Also, to get the lena image, I need to write
>>> import scipy.misc
>>> lena = scipy.misc.lena()
I'm wondering why there's a bracket pair after lena? Isn't lena a matrix? Something with () is like a function. I understand lena() is a function takes no inputs and returns a ndarray. I just feel like it's tedious to write this way.
In Matlab, it's quite clear to distinguish between a constant and a function. Function is defined and called with (), but constant (or pre-stored) can be called directly, e.g., "blobs.png"
np.size(temp) is a little more general than temp.size. At first glance, they appear to do the same thing:
>>> x = np.array([[1,2,3],[4,5,6]])
>>> x.size
6
>>> np.size(x)
6
This is true when you don't supply any additional arguments to np.size. But if you look at the documentation for np.size, you'll see that it accepts an additional axis parameter, which gives the size along the corresponding axis:
>>> np.size(x, 0)
2
>>> np.size(x, 1)
3
As far as your second question, scipy.misc.lena is a function as you point out. It is not a matrix. It is a function returning a matrix. The function (presumably) loads the data on the fly so that it isn't placed in memory whenever you import the scipy.misc module. This is a good thing, and actually not all that different than matlab.
temp.size is a property numpy.ndarray.size of ndarray where as numpy.size is a free function which calls the size property of ndarray or any other similar object which has the size method.
The reason numpy.size is flexible because it can act upon ndarray like object or objects that can be converted to ndarray
numpy.size also excepts an optional axis, along which it would calculate the size.
Here is the implementation of numpy.array.
def size(a, axis=None):
if axis is None:
try:
return a.size
except AttributeError:
return asarray(a).size
else:
try:
return a.shape[axis]
except AttributeError:
return asarray(a).shape[axis]
I just came across this strange behaviour of numpy.sum:
>>> import numpy
>>> ar = numpy.array([1,2,3], dtype=numpy.uint64)
>>> gen = (el for el in ar)
>>> lst = [el for el in ar]
>>> numpy.sum(gen)
6.0
>>> numpy.sum(lst)
6
>>> numpy.sum(iter(lst))
<listiterator object at 0x87d02cc>
According to the documentation the result should be of the same dtype of the iterable, but then why in the first case a numpy.float64 is returned instead of an numpy.uint64?
And how come the last example does not return any kind of sum and does not raise any error either?
In general, numpy functions don't always do what you might expect when working with generators. To create a numpy array, you need to know its size and type before creating it, and this isn't possible for generators. So many numpy functions either don't work with generators, or do this sort of thing where they fall back on Python builtins.
However, for the same reason, using generators often isn't that useful in Numpy contexts. There's no real advantage to making a generator from a Numpy object, because you already have to have the entire Numpy object in memory anyway. If you need all the types to stay as you specify, you should just not wrap your Numpy objects in generators.
Some more info: Technically, the argument to np.sum is supposed to be an "array-like" object, not an iterable. Array-like is defined in the documentation as:
An array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence.
The array interface is documented here. Basically, arrays have to have a fixed shape and a uniform type.
Generators don't fit this protocol and so aren't really supported. Many numpy functions are nice and will accept other sorts of objects that don't technically qualify as array-like, but a strict reading of the docs implies you can't rely on this behavior. The operations may work, but you can't expect all the types to be preserved perfectly.
If the argument is a generator, Python's builtin sum get used.
You can see this in the source code of numpy.sum (numpy/core/fromnumeric.py):
0 if isinstance(a, _gentype):
1 res = _sum_(a)
2 if out is not None:
3 out[...] = res
4 return out
5 return res
_gentype is just an alias of types.GeneratorType, and _sum_ is alias of the built-in sum.
If you try applying sum to gen and lst, you could see that the results are the same: 6.0.
The second parameter of sum is start, which defaults to 0, this is part of what makes your result a float64.
In [1]: import numpy as np
In [2]: type(np.uint64(1) + np.uint64(2))
Out[2]: numpy.uint64
In [3]: type(np.uint64(1) + 0)
Out[3]: numpy.float64
EDIT:
BTW, I find a ticket on this issue, which is marked as a wontfix: http://projects.scipy.org/numpy/ticket/669
How do I declare an array in Python?
variable = []
Now variable refers to an empty list*.
Of course this is an assignment, not a declaration. There's no way to say in Python "this variable should never refer to anything other than a list", since Python is dynamically typed.
*The default built-in Python type is called a list, not an array. It is an ordered container of arbitrary length that can hold a heterogenous collection of objects (their types do not matter and can be freely mixed). This should not be confused with the array module, which offers a type closer to the C array type; the contents must be homogenous (all of the same type), but the length is still dynamic.
This is surprisingly complex topic in Python.
Practical answer
Arrays are represented by class list (see reference and do not mix them with generators).
Check out usage examples:
# empty array
arr = []
# init with values (can contain mixed types)
arr = [1, "eels"]
# get item by index (can be negative to access end of array)
arr = [1, 2, 3, 4, 5, 6]
arr[0] # 1
arr[-1] # 6
# get length
length = len(arr)
# supports append and insert
arr.append(8)
arr.insert(6, 7)
Theoretical answer
Under the hood Python's list is a wrapper for a real array which contains references to items. Also, underlying array is created with some extra space.
Consequences of this are:
random access is really cheap (arr[6653] is same to arr[0])
append operation is 'for free' while some extra space
insert operation is expensive
Check this awesome table of operations complexity.
Also, please see this picture, where I've tried to show most important differences between array, array of references and linked list:
You don't actually declare things, but this is how you create an array in Python:
from array import array
intarray = array('i')
For more info see the array module: http://docs.python.org/library/array.html
Now possible you don't want an array, but a list, but others have answered that already. :)
I think you (meant)want an list with the first 30 cells already filled.
So
f = []
for i in range(30):
f.append(0)
An example to where this could be used is in Fibonacci sequence.
See problem 2 in Project Euler
This is how:
my_array = [1, 'rebecca', 'allard', 15]
For calculations, use numpy arrays like this:
import numpy as np
a = np.ones((3,2)) # a 2D array with 3 rows, 2 columns, filled with ones
b = np.array([1,2,3]) # a 1D array initialised using a list [1,2,3]
c = np.linspace(2,3,100) # an array with 100 points beteen (and including) 2 and 3
print(a*1.5) # all elements of a times 1.5
print(a.T+b) # b added to the transpose of a
these numpy arrays can be saved and loaded from disk (even compressed) and complex calculations with large amounts of elements are C-like fast.
Much used in scientific environments. See here for more.
JohnMachin's comment should be the real answer.
All the other answers are just workarounds in my opinion!
So:
array=[0]*element_count
A couple of contributions suggested that arrays in python are represented by lists. This is incorrect. Python has an independent implementation of array() in the standard library module array "array.array()" hence it is incorrect to confuse the two. Lists are lists in python so be careful with the nomenclature used.
list_01 = [4, 6.2, 7-2j, 'flo', 'cro']
list_01
Out[85]: [4, 6.2, (7-2j), 'flo', 'cro']
There is one very important difference between list and array.array(). While both of these objects are ordered sequences, array.array() is an ordered homogeneous sequences whereas a list is a non-homogeneous sequence.
You don't declare anything in Python. You just use it. I recommend you start out with something like http://diveintopython.net.
I would normally just do a = [1,2,3] which is actually a list but for arrays look at this formal definition
To add to Lennart's answer, an array may be created like this:
from array import array
float_array = array("f",values)
where values can take the form of a tuple, list, or np.array, but not array:
values = [1,2,3]
values = (1,2,3)
values = np.array([1,2,3],'f')
# 'i' will work here too, but if array is 'i' then values have to be int
wrong_values = array('f',[1,2,3])
# TypeError: 'array.array' object is not callable
and the output will still be the same:
print(float_array)
print(float_array[1])
print(isinstance(float_array[1],float))
# array('f', [1.0, 2.0, 3.0])
# 2.0
# True
Most methods for list work with array as well, common
ones being pop(), extend(), and append().
Judging from the answers and comments, it appears that the array
data structure isn't that popular. I like it though, the same
way as one might prefer a tuple over a list.
The array structure has stricter rules than a list or np.array, and this can
reduce errors and make debugging easier, especially when working with numerical
data.
Attempts to insert/append a float to an int array will throw a TypeError:
values = [1,2,3]
int_array = array("i",values)
int_array.append(float(1))
# or int_array.extend([float(1)])
# TypeError: integer argument expected, got float
Keeping values which are meant to be integers (e.g. list of indices) in the array
form may therefore prevent a "TypeError: list indices must be integers, not float", since arrays can be iterated over, similar to np.array and lists:
int_array = array('i',[1,2,3])
data = [11,22,33,44,55]
sample = []
for i in int_array:
sample.append(data[i])
Annoyingly, appending an int to a float array will cause the int to become a float, without throwing an exception.
np.array retain the same data type for its entries too, but instead of giving an error it will change its data type to fit new entries (usually to double or str):
import numpy as np
numpy_int_array = np.array([1,2,3],'i')
for i in numpy_int_array:
print(type(i))
# <class 'numpy.int32'>
numpy_int_array_2 = np.append(numpy_int_array,int(1))
# still <class 'numpy.int32'>
numpy_float_array = np.append(numpy_int_array,float(1))
# <class 'numpy.float64'> for all values
numpy_str_array = np.append(numpy_int_array,"1")
# <class 'numpy.str_'> for all values
data = [11,22,33,44,55]
sample = []
for i in numpy_int_array_2:
sample.append(data[i])
# no problem here, but TypeError for the other two
This is true during assignment as well. If the data type is specified, np.array will, wherever possible, transform the entries to that data type:
int_numpy_array = np.array([1,2,float(3)],'i')
# 3 becomes an int
int_numpy_array_2 = np.array([1,2,3.9],'i')
# 3.9 gets truncated to 3 (same as int(3.9))
invalid_array = np.array([1,2,"string"],'i')
# ValueError: invalid literal for int() with base 10: 'string'
# Same error as int('string')
str_numpy_array = np.array([1,2,3],'str')
print(str_numpy_array)
print([type(i) for i in str_numpy_array])
# ['1' '2' '3']
# <class 'numpy.str_'>
or, in essence:
data = [1.2,3.4,5.6]
list_1 = np.array(data,'i').tolist()
list_2 = [int(i) for i in data]
print(list_1 == list_2)
# True
while array will simply give:
invalid_array = array([1,2,3.9],'i')
# TypeError: integer argument expected, got float
Because of this, it is not a good idea to use np.array for type-specific commands. The array structure is useful here. list preserves the data type of the values.
And for something I find rather pesky: the data type is specified as the first argument in array(), but (usually) the second in np.array(). :|
The relation to C is referred to here:
Python List vs. Array - when to use?
Have fun exploring!
Note: The typed and rather strict nature of array leans more towards C rather than Python, and by design Python does not have many type-specific constraints in its functions. Its unpopularity also creates a positive feedback in collaborative work, and replacing it mostly involves an additional [int(x) for x in file]. It is therefore entirely viable and reasonable to ignore the existence of array. It shouldn't hinder most of us in any way. :D
How about this...
>>> a = range(12)
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> a[7]
6
Following on from Lennart, there's also numpy which implements homogeneous multi-dimensional arrays.
Python calls them lists. You can write a list literal with square brackets and commas:
>>> [6,28,496,8128]
[6, 28, 496, 8128]
I had an array of strings and needed an array of the same length of booleans initiated to True. This is what I did
strs = ["Hi","Bye"]
bools = [ True for s in strs ]
You can create lists and convert them into arrays or you can create array using numpy module. Below are few examples to illustrate the same. Numpy also makes it easier to work with multi-dimensional arrays.
import numpy as np
a = np.array([1, 2, 3, 4])
#For custom inputs
a = np.array([int(x) for x in input().split()])
You can also reshape this array into a 2X2 matrix using reshape function which takes in input as the dimensions of the matrix.
mat = a.reshape(2, 2)
# This creates a list of 5000 zeros
a = [0] * 5000
You can read and write to any element in this list with a[n] notation in the same as you would with an array.
It does seem to have the same random access performance as an array. I cannot say how it allocates memory because it also supports a mix of different types including strings and objects if you need it to.