Use size in Python - python

I created a ndarray array in python
temp = np.array([1, 2, 3, 4])
To measure the length of this array, I can use
temp.size
or
np.size(temp)
both return 4. But I'm wondering what's the difference between the two expressions? Also, to get the lena image, I need to write
>>> import scipy.misc
>>> lena = scipy.misc.lena()
I'm wondering why there's a bracket pair after lena? Isn't lena a matrix? Something with () is like a function. I understand lena() is a function takes no inputs and returns a ndarray. I just feel like it's tedious to write this way.
In Matlab, it's quite clear to distinguish between a constant and a function. Function is defined and called with (), but constant (or pre-stored) can be called directly, e.g., "blobs.png"

np.size(temp) is a little more general than temp.size. At first glance, they appear to do the same thing:
>>> x = np.array([[1,2,3],[4,5,6]])
>>> x.size
6
>>> np.size(x)
6
This is true when you don't supply any additional arguments to np.size. But if you look at the documentation for np.size, you'll see that it accepts an additional axis parameter, which gives the size along the corresponding axis:
>>> np.size(x, 0)
2
>>> np.size(x, 1)
3
As far as your second question, scipy.misc.lena is a function as you point out. It is not a matrix. It is a function returning a matrix. The function (presumably) loads the data on the fly so that it isn't placed in memory whenever you import the scipy.misc module. This is a good thing, and actually not all that different than matlab.

temp.size is a property numpy.ndarray.size of ndarray where as numpy.size is a free function which calls the size property of ndarray or any other similar object which has the size method.
The reason numpy.size is flexible because it can act upon ndarray like object or objects that can be converted to ndarray
numpy.size also excepts an optional axis, along which it would calculate the size.
Here is the implementation of numpy.array.
def size(a, axis=None):
if axis is None:
try:
return a.size
except AttributeError:
return asarray(a).size
else:
try:
return a.shape[axis]
except AttributeError:
return asarray(a).shape[axis]

Related

Defining a matrix with unknown size in python

I want to use a matrix in my Python code but I don't know the exact size of my matrix to define it.
For other matrices, I have used np.zeros(a), where a is known.
What should I do to define a matrix with unknown size?
In this case, maybe an approach is to use a python list and append to it, up until it has the desired size, then cast it to a np array
pseudocode:
matrix = []
while matrix not full:
matrix.append(elt)
matrix = np.array(matrix)
You could write a function that tries to modify the np.array, and expand if it encounters an IndexError:
x = np.random.normal(size=(2,2))
r,c = (5,10)
try:
x[r,c] = val
except IndexError:
r0,c0 = x.shape
r_ = r+1-r0
c_ = c+1-c0
if r > 0:
x = np.concatenate([x,np.zeros((r_,x.shape[1]))], axis = 0)
if c > 0:
x = np.concatenate([x,np.zeros((x.shape[0],c_))], axis = 1)
There are problems with this implementation though: First, it makes a copy of the array and returns a concatenation of it, which translates to a possible bottleneck if you use it many times. Second, the code I provided only works if you're modifying a single element. You could do it for slices, and it would take more effort to modify the code; or you can go the whole nine yards and create a new object inheriting np.array and override the .__getitem__ and .__setitem__ methods.
Or you could just use a huge matrix, or better yet, see if you can avoid having to work with matrices of unknown size.
If you have a python generator you can use np.fromiter:
def gen():
yield 1
yield 2
yield 3
In [11]: np.fromiter(gen(), dtype='int64')
Out[11]: array([1, 2, 3])
Beware if you pass an infinite iterator you will most likely crash python, so it's often a good idea to cap the length (with the count argument):
In [21]: from itertools import count # an infinite iterator
In [22]: np.fromiter(count(), dtype='int64', count=3)
Out[22]: array([0, 1, 2])
Best practice is usually to either pre-allocate (if you know the size) or build the array as a list first (using list.append). But lists don't build in 2d very well, which I assume you want since you specified a "matrix."
In that case, I'd suggest pre-allocating an oversize scipy.sparse matrix. These can be defined to have a size much larger than your memory, and lil_matrix or dok_matrix can be built sequentially. Then you can pare it down once you enter all of your data.
from scipy.sparse import dok_matrix
dummy = dok_matrix((1000000, 1000000)) # as big as you think you might need
for i, j, data in generator():
dummy[i,j] = data
s = np.array(dummy.keys).max() + 1
M = dummy.tocoo[:s,:s] #or tocsr, tobsr, toarray . . .
This way you build your array as a Dictionary of Keys (dictionaries supporting dynamic assignment much better than ndarray does) , but still have a matrix-like output that can be (somewhat) efficiently used for math, even in a partially built state.

Use of numpy fromfunction

im trying to use fromfunction to create a 5x5 matrix with gaussian values of mu=3 and sig=2, this is my attempt :
from random import gauss
import numpy as np
np.fromfunction(lambda i,j: gauss(3,2), (5, 5))
this is the result : 5.365244570434782
as i understand from the docs this should have worked, but i am getting a scalar instead of 5x5 matrix... why? and how to fix this?
The numpy.fromfunction docs are extremely misleading. Instead of calling your function repeatedly and building an array from the results, fromfunction actually only makes one call to the function you pass it. In that one call, it passes a number of index arrays to your function, instead of individual indices.
Stripping out the docstring, the implementation is as follows:
def fromfunction(function, shape, **kwargs):
dtype = kwargs.pop('dtype', float)
args = indices(shape, dtype=dtype)
return function(*args,**kwargs)
That means unless your function broadcasts, numpy.fromfunction doesn't do anything like what the docs say it does.
I know this is an old post, but for anyone stumbling upon this, the reason why it didn't work is, the expression inside lambda is not making use of the i, j variables
what you need could you achieved like this:
np.zeros((5, 5)) + gauss(3, 2)

numpy.ndarray sent as argument doesn't need loop for iteration?

In this code np.linspace() assigns to inputs 200 evenly spaced numbers from -20 to 20.
This function works. What I am not understanding is how could it work. How can inputs be sent as an argument to output_function() without needing a loop to iterate over the numpy.ndarray?
def output_function(x):
return 100 - x ** 2
inputs = np.linspace(-20, 20, 200)
plt.plot(inputs, output_function(inputs), 'b-')
plt.show()
numpy works by defining operations on vectors the way that you really want to work with them mathematically. So, I can do something like:
a = np.arange(10)
b = np.arange(10)
c = a + b
And it works as you might hope -- each element of a is added to the corresponding element of b and the result is stored in a new array c. If you want to know how numpy accomplishes this, it's all done via the magic methods in the python data model. Specifically in my example case, the __add__ method of numpy's ndarray would be overridden to provide the desired behavior.
What you want to use is numpy.vectorize which behaves similarly to the python builtin map.
Here is one way you can use numpy.vectorize:
outputs = (np.vectorize(output_function))(inputs)
You asked why it worked, it works because numpy arrays can perform operations on its array elements en masse, for example:
a = np.array([1,2,3,4]) # gives you a numpy array of 4 elements [1,2,3,4]
b = a - 1 # this operation on a numpy array will subtract 1 from every element resulting in the array [0,1,2,3]
Because of this property of numpy arrays you can perform certain operations on every element of a numpy array very quickly without using a loop (like what you would do if it were a regular python array).

How to return an array of at least 4D: efficient method to simulate numpy.atleast_4d

numpy provides three handy routines to turn an array into at least a 1D, 2D, or 3D array, e.g. through numpy.atleast_3d
I need the equivalent for one more dimension: atleast_4d. I can think of various ways using nested if statements but I was wondering whether there is a more efficient and faster method of returning the array in question. In you answer, I would be interested to see an estimate (O(n)) of the speed of execution if you can.
The np.array method has an optional ndmin keyword argument that:
Specifies the minimum number of dimensions that the resulting array
should have. Ones will be pre-pended to the shape as needed to meet
this requirement.
If you also set copy=False you should get close to what you are after.
As a do-it-yourself alternative, if you want extra dimensions trailing rather than leading:
arr.shape += (1,) * (4 - arr.ndim)
Why couldn't it just be something as simple as this:
import numpy as np
def atleast_4d(x):
if x.ndim < 4:
y = np.expand_dims(np.atleast_3d(x), axis=3)
else:
y = x
return y
ie. if the number of dimensions is less than four, call atleast_3d and append an extra dimension on the end, otherwise just return the array unchanged.

Why numpy.sum returns a float64 instead of an uint64 when adding elements of a generator?

I just came across this strange behaviour of numpy.sum:
>>> import numpy
>>> ar = numpy.array([1,2,3], dtype=numpy.uint64)
>>> gen = (el for el in ar)
>>> lst = [el for el in ar]
>>> numpy.sum(gen)
6.0
>>> numpy.sum(lst)
6
>>> numpy.sum(iter(lst))
<listiterator object at 0x87d02cc>
According to the documentation the result should be of the same dtype of the iterable, but then why in the first case a numpy.float64 is returned instead of an numpy.uint64?
And how come the last example does not return any kind of sum and does not raise any error either?
In general, numpy functions don't always do what you might expect when working with generators. To create a numpy array, you need to know its size and type before creating it, and this isn't possible for generators. So many numpy functions either don't work with generators, or do this sort of thing where they fall back on Python builtins.
However, for the same reason, using generators often isn't that useful in Numpy contexts. There's no real advantage to making a generator from a Numpy object, because you already have to have the entire Numpy object in memory anyway. If you need all the types to stay as you specify, you should just not wrap your Numpy objects in generators.
Some more info: Technically, the argument to np.sum is supposed to be an "array-like" object, not an iterable. Array-like is defined in the documentation as:
An array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence.
The array interface is documented here. Basically, arrays have to have a fixed shape and a uniform type.
Generators don't fit this protocol and so aren't really supported. Many numpy functions are nice and will accept other sorts of objects that don't technically qualify as array-like, but a strict reading of the docs implies you can't rely on this behavior. The operations may work, but you can't expect all the types to be preserved perfectly.
If the argument is a generator, Python's builtin sum get used.
You can see this in the source code of numpy.sum (numpy/core/fromnumeric.py):
0 if isinstance(a, _gentype):
1 res = _sum_(a)
2 if out is not None:
3 out[...] = res
4 return out
5 return res
_gentype is just an alias of types.GeneratorType, and _sum_ is alias of the built-in sum.
If you try applying sum to gen and lst, you could see that the results are the same: 6.0.
The second parameter of sum is start, which defaults to 0, this is part of what makes your result a float64.
In [1]: import numpy as np
In [2]: type(np.uint64(1) + np.uint64(2))
Out[2]: numpy.uint64
In [3]: type(np.uint64(1) + 0)
Out[3]: numpy.float64
EDIT:
BTW, I find a ticket on this issue, which is marked as a wontfix: http://projects.scipy.org/numpy/ticket/669

Categories