ndarray .floor() and .ceil() methods missing? - python

I'm processing a numpy.matrix and I'm missing the round-up and down functions.
I.e. I can do:
data = [[1, -20],[-30, 2]]
np.matrix(data).mean(0).round().astype(int).tolist()[0]
Out[58]: [-14, -9]
Thus use .round(). But I cannot use .floor() or .ceil().
They are also not mentioned in the SciPy NumPy 1.14 reference.
Why are these (quite essential) functions missing?
edit:
I've found that you can do np.floor(np.matrix(data).mean(0)).astype(int).tolist()[0]. But why the difference? Why is .round() a method and .floor not?

As with most of these why questions we can only deduce likely reasons from patterns, and some knowledge of the history.
https://docs.scipy.org/doc/numpy/reference/ufuncs.html#floating-functions
floor and ceil are classed as floating ufuncs. rint is also a ufunc that performs like round. ufuncs have a standardized interface, including parameters like out and where.
np.round is in /usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py. numeric is one of original packages that was merged to form the current numpy. It is alias for np.round_ which ends up calling np.around, also in fromnumeric. Note the available parameters include out, but also decimals (which is missing from rint). And it delegates to the .round method.
One advantage to having a function is that you don't have to first convert the list into an array:
In [115]: data = [[1, -20],[-30, 2]]
In [119]: np.mean(data,0)
Out[119]: array([-14.5, -9. ])
In [120]: np.mean(data,0).round()
Out[120]: array([-14., -9.])
In [121]: np.rint(np.mean(data,0))
Out[121]: array([-14., -9.])
using other parameters:
In [138]: np.mean(data,axis=0, keepdims=True,dtype=int)
Out[138]: array([[-14, -9]])

Related

Raise matrix to power N as in maths

In numpy if we want to raise a matrix A to power N (but raise it as defined in mathematics, in linear algebra in particular), then it seems we need to use this function
numpy.linalg.matrix_power
Isn't there a simpler way? Some Python symbol/operator?
E.g. I was expecting A**N to do this but it doesn't.
Seems that A**N is raising each element to power N, and not the whole matrix to power N (in the usual math sense). So A**N is some strange element-wise raising to power N.
By matrix I mean of course a two-dimensional ndarray.
In [4]: x=np.arange(4).reshape(2,2)
For this square array:
In [6]: np.linalg.matrix_power(x,3)
Out[6]:
array([[ 6, 11],
[22, 39]])
In [7]: x#x#x
Out[7]:
array([[ 6, 11],
[22, 39]])
matrix_power is written in python so you can easily read it. It essentially does a sequence of dot products, with some refinements to reduce the steps.
For np.matrix subclass, ** does the same thing:
In [8]: mx=np.matrix(x)
In [9]: mx**3
Out[9]:
matrix([[ 6, 11],
[22, 39]])
** is translated by the interpreter to a __pow__ call. For this class that just amounts to a matrix_power call:
In [10]: mx.__pow__??
Signature: mx.__pow__(other)
Docstring: Return pow(self, value, mod).
Source:
def __pow__(self, other):
return matrix_power(self, other)
File: c:\users\paul\anaconda3\lib\site-packages\numpy\matrixlib\defmatrix.py
Type: method
But for ndarray this method is a compiled one:
In [3]: x.__pow__??
Signature: x.__pow__(value, mod=None, /)
Call signature: x.__pow__(*args, **kwargs)
Type: method-wrapper
String form: <method-wrapper '__pow__' of numpy.ndarray object at 0x0000022A8B2B5ED0>
Docstring: Return pow(self, value, mod).
numpy does not alter python syntax. It has not added any operators. The # operator was added to python several years ago, largely as a convenience for packages like numpy. But it had to added to the interpreter's syntax first.
Note that matrix_power works for a
a : (..., M, M) array_like
Matrix to be "powered".
That means it has to have at least 2 dimensions, and the trailing two must be equal size. So even that extends the normal linear algebra definition (which is limited to 2d).
numpy isn't just a linear algebra package. It's meant to be a general purpose array tool. Linear algebra is just a subset of math that can be performed with multidimensional collections of numbers (and other objects).
numpy.linalg.matrix_power is the best way as far as I know. You could use dot or * in a loop, but that would just be more code, and probably less efficient.

Why does Python behave differently for the same function such as 'sum' or 'and'?

**
Scenario 1: sum
**
I found that when working on 2d arrays in numpy, I realised that summing up has different options - i.e., Python built-in method sum provides summation along the axes only, whereas the numpy sum provides summation on the total 2d array (matrix).
**
Scenario 2: and versus &
**
I noticed that logical and (and) bitwise and (&) both work on the same data element but produce different results. In fact, Logical and and does not work within series of dataframe whereas bitwise and & works just fine.
Why does this happen ? Can anybody provide insights based on the language's history, design, purpose, etc so one can understand better ?
Regards
Ssp
numpy operates within Python, and gets all its special behavior from ndarray class methods and module functions. It does not alter Python syntax.
Python sum treats its input as an iterable; it's easy to understand with a 1d array, which is just like operating on a list. But on a 2d array, it's harder to understand:
In [52]: x = np.arange(12).reshape(3,4)
In [53]: sum(x)
Out[53]: array([12, 15, 18, 21]) # what's this doing?
In [54]: x.sum() # or np.sum(x)
Out[54]: 66
In [55]: x.sum(axis=0)
Out[55]: array([12, 15, 18, 21]) # sum down rows, one per column
In [56]: x.sum(axis=1)
Out[56]: array([ 6, 22, 38]) # sum across columns, one per row
Python and is a short circuiting operator. Like an if statement, its use with numpy arrays is likely to produce an ambiguity error. Comparisons of arrays produces boolean arrays. Boolean arrays cannot be used in Python contexts that require a scalar boolean value.
Operators like +,*,& has class specific meanings/methods. [1,2,3]*3 is different from np.array([1,2,3])*3. "a"+"string" is different from np.arange(3)+3.

Sympy-numpy integration exists - where is it documented?

I just accidentally discovered that I can mix sympy expressions up with numpy arrays:
>>> import numpy as np
>>> import sympy as sym
>>> x, y, z = sym.symbols('x y z')
>>> np.ones(5)*x
array([1.0*x, 1.0*x, 1.0*x, 1.0*x, 1.0*x], dtype=object)
# I was expecting this to throw an error!
# sum works and collects terms etc. as I would expect
>>> np.sum(np.array([x+0.1,y,z+y]))
x + 2*y + z + 0.1
# dot works too
>>> np.dot(np.array([x,y,z]),np.array([z,y,x]))
2*x*z + y**2
>>> np.dot(np.array([x,y,z]),np.array([1,2,3]))
x + 2*y + 3*z
This is quite useful for me, because I'm doing both numerical and symbolic calculations in the same program. However, I'm curious about the pitfalls and limitations of this approach --- it seems for example that neither np.sin nor sym.sin are supported on Numpy arrays containing Sympy objects, since both give an error.
However, this numpy-sympy integration doesn't appear to be documented anywhere. Is it just an accident of how these libraries are implemented, or is it a deliberate feature? If the latter, when is it designed to be used, and when would it be better to use sympy.Matrix or other solutions? Can I expect to keep some of numpy's speed when working with arrays of this kind, or will it just drop back to Python loops as soon as a sympy symbol is involved?
In short I'm pleased to find this feature exists, but I would like to know more about it!
This is just NumPy's support for arrays of objects. It is not specific to SymPy. NumPy examines the operands and finds not all of them are scalars; there are some objects involved. So it calls that object's __mul__ or __rmul__, and puts the result into an array of objects. For example: mpmath objects,
>>> import mpmath as mp
>>> np.ones(5) * mp.mpf('1.23')
array([mpf('1.23'), mpf('1.23'), mpf('1.23'), mpf('1.23'), mpf('1.23')],
dtype=object)
or lists:
>>> np.array([[2], 3])*5
array([list([2, 2, 2, 2, 2]), 15], dtype=object)
>>> np.array([2, 3])*[[1, 1], [2]]
array([list([1, 1, 1, 1]), list([2, 2, 2])], dtype=object)
Can I expect to keep some of numpy's speed when working with arrays of this kind,
No. NumPy object arrays have no performance benefits over Python lists; there is probably more overhead in accessing elements than would be in a list. Storing Python objects in a Python list vs. a fixed-length Numpy array
There is no reason to use such arrays if a more specific data structure is available.
I just came across a relevant note in the latest numpy release notes (https://docs.scipy.org/doc/numpy-1.15.1/release.html)
Comparison ufuncs accept dtype=object, overriding the default bool
This allows object arrays of symbolic types, which override == and other operators to return expressions, to be compared elementwise with np.equal(a, b, dtype=object).
I think that means this works, but didn't before:
In [9]: np.array([x+.1, 2*y])==np.array([.1+x, y*2])
Out[9]: array([ True, True])

How can I multiply column of the int numpy array to the float digit and stays in int?

I have a numpy array:
>>> b
array([[ 2, 2],
[ 6, 4],
[10, 6]])
I want to multiply first column by float number, and as result I need int number, because when I doing:
>>> b[:,0] *= 2.1
It says:
TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
I need the array that looks like:
array([[ 4, 2],
[12, 4],
[21, 6]])
#Umang Gupta gave a solution to your problem. I was curious myself as to why this worked, so I'm posting what I found as additional context. FWIW this question has already been asked and answered here, but that answer also doesn't really walk through what's happening as much as I would have liked, so here's my attempt:
Using the *= operator calls the __imul__() special method for in-place multiplication of Numpy ndarrays, which in turn calls the universal function (ufunc) multiply().
There are two arguments in multiply() which are relevant here: out and casting.
The out argument specifies the output (along with its type). In the in-place multiplication operator, out is set to self, i.e. the ndarray object which called the multiplication operation. In particular, the exact call for *= looks like this:
ufunc(self, other, out=(self,))
^ where ufunc = multiply, self = b (ndarray, type int64, and other = 2.1 (scalar, type float)
The casting argument, however, determines the rules for what kind of data type casting is permitted as a result of an operation. As of Numpy 1.10, the default value for casting is same_kind, which means:
only safe casts or casts within a kind, like float64 to float32, are allowed
Since our ufunc call didn't specify a value for the casting argument, the default (same_kind) is used - but this causes problems because we have specified out as having an int64 dtype, which is not the same kind as the output of the int-by-float multiplication. With same_kind casting, the float result of the operation can't be converted to int. That's why we see this error.
We can replicate this error using multiply() explicitly:
np.multiply(b, 2.1, out=b)
TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
It is possible to relax the casting requirement of multiply(), by setting the argument value to "unsafe". Then, when out is also set, the output is coerced to the type of out, regardless of whether it's the same kind or not (if possible):
np.multiply(b, 2.1, out=b, casting="unsafe")
# specifying int output and allowing casting to be "unsafe" allows re-conversion to int
array([[ 4, 4],
[12, 8],
[21, 12]])
Using the normal assignment operator to update b[:,0], on the other hand, is ok. That's what #Umang Gupta's solution does.
With:
b[:,0] = b[:,0]* 2.1
* calls the multiply ufunc, just like with *=. But since it isn't calling the inplace version of the operation, there's no out argument specified, and so no set type for the output. Then, standard typecasting allows ints to upcast to floats:
np.multiply(b, 2.1)
# float output
array([[ 4.2, 4.2],
[ 12.6, 8.4],
[ 21. , 12.6]])
Then the normal assignment operator = takes the output of the multiplication and stores it in b[:,0]. Per the Numpy docs on assigning values to indexed arrays:
Note that assignments may result in changes if assigning higher types to lower types (like floats to ints)
So the problem lies in *= operator's automatic setting of the out argument without changing the casting argument from same_kind to unsafe. (Not that this is a bug, just that this is why you are getting an error.) And the accepted solution gets around that by leveraging automatic "downcasting" properties of assignment in Numpy. Hope that helps! (Also, Numpy pros, please feel free to correct any misunderstandings on my part.)
While b[:,0] *= 2.1 may not work but b[:,0] = b[:,0]* 2.1 works.

unexpected behaviour of numpy.median on masked arrays

I've a question relating the behaviour of numpy.median() on masked arrays created with numpy.ma.masked_array().
As I've understood from debugging my own code, numpy.median() does not work as expected on masked arrays (see Using numpy.median on a masked array for a definition of the problem)
The answer provided was:
Explanation: If I remember correctly, the np.median does not support subclasses, so it fails to work correctly on np.ma.MaskedArray.
The conclusion therefore being that in order to calculate the median of the elements in a masked array is to use numpy.ma.median() since this is a median function dedicated to masked arrays.
My problem lies in the fact that I've just spent a considerable amount of time finding this problem since there is no way of knowing this problem.
There is no warning or exception raised when trying to calculate the median of a masked array via numpy.median().
The answer returned by this function is not what is expected, and cause serious problems when people are not aware of this.
Does anyone know if this might be considered a bug?
In my opinion, the expected behaviour should be that using numpy.median on a masked array will raise and exception of some sort.
Any thoughts???
The below test script shows the unwanted and unexpected behaviour of using numpy.median on a masked array (note that the correct and expected median value of the valid elements is 2.5!!!):
In [1]: import numpy as np
In [2]: test = np.array([1, 2, 3, 4, 100, 100, 100, 100])
In [3]: valid_elements = np.array([1, 1, 1, 1, 0, 0, 0, 0], dtype=np.bool)
In [4]: testm = np.ma.masked_array(test, ~valid_elements)
In [5]: testm
Out[5]:
masked_array(data = [1 2 3 4 -- -- -- --],
mask = [False False False False True True True True],
fill_value = 999999)
In [6]: np.median(test)
Out[6]: 52.0
In [7]: np.median(test[valid_elements])
Out[7]: 2.5
In [8]: np.median(testm)
Out[8]: 4.0
In [9]: np.ma.median(testm)
Out[9]: 2.5
Does anyone know if this might be considered a bug?
Well, it is a Bug! I posted it a few months ago on their issue tracker (Link to the bug report).
The reason for this behaviour is that np.median uses the partition method of the input-array but np.ma.MaskedArray doesn't override the partition method. So when arr.partition is called in np.median it simply defaults to the basic numpy.ndarray.partition method (which is bogus for a masked array!).

Categories