I just accidentally discovered that I can mix sympy expressions up with numpy arrays:
>>> import numpy as np
>>> import sympy as sym
>>> x, y, z = sym.symbols('x y z')
>>> np.ones(5)*x
array([1.0*x, 1.0*x, 1.0*x, 1.0*x, 1.0*x], dtype=object)
# I was expecting this to throw an error!
# sum works and collects terms etc. as I would expect
>>> np.sum(np.array([x+0.1,y,z+y]))
x + 2*y + z + 0.1
# dot works too
>>> np.dot(np.array([x,y,z]),np.array([z,y,x]))
2*x*z + y**2
>>> np.dot(np.array([x,y,z]),np.array([1,2,3]))
x + 2*y + 3*z
This is quite useful for me, because I'm doing both numerical and symbolic calculations in the same program. However, I'm curious about the pitfalls and limitations of this approach --- it seems for example that neither np.sin nor sym.sin are supported on Numpy arrays containing Sympy objects, since both give an error.
However, this numpy-sympy integration doesn't appear to be documented anywhere. Is it just an accident of how these libraries are implemented, or is it a deliberate feature? If the latter, when is it designed to be used, and when would it be better to use sympy.Matrix or other solutions? Can I expect to keep some of numpy's speed when working with arrays of this kind, or will it just drop back to Python loops as soon as a sympy symbol is involved?
In short I'm pleased to find this feature exists, but I would like to know more about it!
This is just NumPy's support for arrays of objects. It is not specific to SymPy. NumPy examines the operands and finds not all of them are scalars; there are some objects involved. So it calls that object's __mul__ or __rmul__, and puts the result into an array of objects. For example: mpmath objects,
>>> import mpmath as mp
>>> np.ones(5) * mp.mpf('1.23')
array([mpf('1.23'), mpf('1.23'), mpf('1.23'), mpf('1.23'), mpf('1.23')],
dtype=object)
or lists:
>>> np.array([[2], 3])*5
array([list([2, 2, 2, 2, 2]), 15], dtype=object)
>>> np.array([2, 3])*[[1, 1], [2]]
array([list([1, 1, 1, 1]), list([2, 2, 2])], dtype=object)
Can I expect to keep some of numpy's speed when working with arrays of this kind,
No. NumPy object arrays have no performance benefits over Python lists; there is probably more overhead in accessing elements than would be in a list. Storing Python objects in a Python list vs. a fixed-length Numpy array
There is no reason to use such arrays if a more specific data structure is available.
I just came across a relevant note in the latest numpy release notes (https://docs.scipy.org/doc/numpy-1.15.1/release.html)
Comparison ufuncs accept dtype=object, overriding the default bool
This allows object arrays of symbolic types, which override == and other operators to return expressions, to be compared elementwise with np.equal(a, b, dtype=object).
I think that means this works, but didn't before:
In [9]: np.array([x+.1, 2*y])==np.array([.1+x, y*2])
Out[9]: array([ True, True])
Related
In numpy if we want to raise a matrix A to power N (but raise it as defined in mathematics, in linear algebra in particular), then it seems we need to use this function
numpy.linalg.matrix_power
Isn't there a simpler way? Some Python symbol/operator?
E.g. I was expecting A**N to do this but it doesn't.
Seems that A**N is raising each element to power N, and not the whole matrix to power N (in the usual math sense). So A**N is some strange element-wise raising to power N.
By matrix I mean of course a two-dimensional ndarray.
In [4]: x=np.arange(4).reshape(2,2)
For this square array:
In [6]: np.linalg.matrix_power(x,3)
Out[6]:
array([[ 6, 11],
[22, 39]])
In [7]: x#x#x
Out[7]:
array([[ 6, 11],
[22, 39]])
matrix_power is written in python so you can easily read it. It essentially does a sequence of dot products, with some refinements to reduce the steps.
For np.matrix subclass, ** does the same thing:
In [8]: mx=np.matrix(x)
In [9]: mx**3
Out[9]:
matrix([[ 6, 11],
[22, 39]])
** is translated by the interpreter to a __pow__ call. For this class that just amounts to a matrix_power call:
In [10]: mx.__pow__??
Signature: mx.__pow__(other)
Docstring: Return pow(self, value, mod).
Source:
def __pow__(self, other):
return matrix_power(self, other)
File: c:\users\paul\anaconda3\lib\site-packages\numpy\matrixlib\defmatrix.py
Type: method
But for ndarray this method is a compiled one:
In [3]: x.__pow__??
Signature: x.__pow__(value, mod=None, /)
Call signature: x.__pow__(*args, **kwargs)
Type: method-wrapper
String form: <method-wrapper '__pow__' of numpy.ndarray object at 0x0000022A8B2B5ED0>
Docstring: Return pow(self, value, mod).
numpy does not alter python syntax. It has not added any operators. The # operator was added to python several years ago, largely as a convenience for packages like numpy. But it had to added to the interpreter's syntax first.
Note that matrix_power works for a
a : (..., M, M) array_like
Matrix to be "powered".
That means it has to have at least 2 dimensions, and the trailing two must be equal size. So even that extends the normal linear algebra definition (which is limited to 2d).
numpy isn't just a linear algebra package. It's meant to be a general purpose array tool. Linear algebra is just a subset of math that can be performed with multidimensional collections of numbers (and other objects).
numpy.linalg.matrix_power is the best way as far as I know. You could use dot or * in a loop, but that would just be more code, and probably less efficient.
I need to create a bit array in Python. So far, I've discovered that one can generate very memory-efficient arrays using the bitarray module.
However, my final intention is to use #vectorize decorator from Numba. Numba works with only a limited amount of Python and numpy features and bitarray is not one of them.
My question is, what's the best memory-efficient way of creating bit arrays using the structures that are supported by Numba?
I would go with numpy arrays, but I've done a quick memory test and it doesn't look good:
>>> import numpy as np
>>> import random
>>> from bitarray import bitarray
>>> from sys import getsizeof
>>> N = 10000
>>> a = bitarray(N)
>>> print(type(a), getsizeof(a))
<class 'bitarray.bitarray'> 96
>>> b = np.random.randint(0, 1, N)
>>> print(type(b), b.nbytes)
<class 'numpy.ndarray'> 40000
>>> c = [random.randint(0, 1) for i in range(N)]
>>> print(type(c), getsizeof(c))
<class 'list'> 87624
(not to say anything about list)
EDIT: As a side question, does anyone have any idea why getsizeof returns such an unrealistically low number for bitarray? I have just noticed.
You can simply specify the data type:
N=1000
b = np.random.randint(0, 1, N)
print(type(b),getsizeof(b))
<class 'numpy.ndarray'> 4096
c = np.random.randint(0, 1, N, dtype=np.bool)
print(type(b),getsizeof(c))
<class 'numpy.ndarray'> 1096
And for your side-question, numpy constructs much more in the numpy object then bitarrray so it is less efficient in terms of total memory of the object.
EDIT:
The memory of the object in python consist of all methods implemented in the object, at least their references to the code, attributes and items such as object.size which is a tuple in numpy that consist of integers, etc. In your list, you have several references to methods such as pop, delete, etc., and it consist of integers arranged in different nodes (list is an extended implementation of a classical linked list combined with other methods, see data structures in official docs).
Taking all of that into consideration, best practice is to use an appropriate data structure that works well in your pipeline, and specify types whenever is possible. Since you use numba, numpy seems the best fit. Memory is not always the issue.
Problem
I am working with an arbitrary MxN numpy matrix where each element is a sympy expression, potentially with differing symbols. For the purposes of visualization, let's work with the following matrix test
import sympy as sp
import numpy as np
a,b,c=sp.symbols('a b c');
test=np.array([[a**2,a+b],[a*c+b,b/c]]);
When run, test will look like:
In [25]: test
Out[25]:
array([[a**2, a + b],
[a*c + b, b/c]], dtype=object)
I would like to be able to replace one variable in this array with a number and return a new array that has the same dimensions as test but has replaced the specified variable with it's new value. For example, if I wanted to replace b with 2, the new array should look like:
array([[a**2, a + 2],
[a*c + 2, 2/c]], dtype=object)
Attempt at a Solution
I first tried to use the sympy function subs but I received the following error:
test.subs({b:2})
Traceback (most recent call last):
File "<ipython-input-29-a9a04d63af37>", line 1, in <module>
test.subs({b:2})
AttributeError: 'numpy.ndarray' object has no attribute 'subs'
I looked at using lambdify but I believe that it returns a numeric lambda expression which isn't what I want. I need a new symbolic expression, just one that doesn't depend on b anymore. I found some literature in the Wolfram Mathematica documentation under pattern matching that seems to be what I need http://www.wolfram.com/language/fast-introduction-for-programmers/en/patterns/ but I can't figure out how to implement this in Python, or if it's even possible to do so. Any help would be greatly appreciated.
Just use sympy. No need for numpy, at least not for the substitution:
In [117]: import sympy
In [118]: a,b,c=sympy.symbols('a b c')
In [120]: M=sympy.Matrix([[a**2, a+b],[a*c+b, b/c]])
In [121]: M
Out[121]:
Matrix([
[ a**2, a + b],
[a*c + b, b/c]])
In [123]: M.subs({b:2})
Out[123]:
Matrix([
[ a**2, a + 2],
[a*c + 2, 2/c]])
array is not an expression, therefore it has not the method subs
In python 3.5, the # operator was introduced for matrix multiplication, following PEP465. This is implemented e.g. in numpy as the matmul operator.
However, as proposed by the PEP, the numpy operator throws an exception when called with a scalar operand:
>>> import numpy as np
>>> np.array([[1,2],[3,4]]) # np.array([[1,2],[3,4]]) # works
array([[ 7, 10],
[15, 22]])
>>> 1 # 2 # doesn't work
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unsupported operand type(s) for #: 'int' and 'int'
This is a real turnoff for me, since I'm implementing numerical signal processing algorithms that should work for both scalars and matrices. The equations for both cases are mathematically exactly equivalent, which is no surprise, since "1-D x 1-D matrix multiplication" is equivalent to scalar multiplication. The current state however forces me to write duplicate code in order to handle both cases correctly.
So, given that the current state is not satisfactory, is there any reasonable way I can make the # operator work for scalars? I thought about adding a custom __matmul__(self, other) method to scalar data types, but this seems like a lot of hassle considering the number of involved internal data types. Could I change the implementation of the __matmul__ method for numpy array data types to not throw an exception for 1x1 array operands?
And, on a sidenote, which is the rationale behind this design decision? Off the top of my head, I cannot think of any compelling reasons not to implement that operator for scalars as well.
As ajcr suggested, you can work around this issue by forcing some minimal dimensionality on objects being multiplied. There are two reasonable options: atleast_1d and atleast_2d which have different results in regard to the type being returned by #: a scalar versus a 1-by-1 2D array.
x = 3
y = 5
z = np.atleast_1d(x) # np.atleast_1d(y) # returns 15
z = np.atleast_2d(x) # np.atleast_2d(y) # returns array([[15]])
However:
Using atleast_2d will lead to an error if x and y are 1D-arrays that would otherwise be multiplied normally
Using atleast_1d will result in the product that is either a scalar or a matrix, and you don't know which.
Both of these are more verbose than np.dot(x, y) which would handle all of those cases.
Also, the atleast_1d version suffers from the same flaw that would also be shared by having scalar # scalar = scalar: you don't know what can be done with the output. Will z.T or z.shape throw an error? These work for 1-by-1 matrices but not for scalars. In the setting of Python, one simply cannot ignore the distinction between scalars and 1-by-1 arrays without also giving up all the methods and properties that the latter have.
I was wondering why I get different result in the two prints? shouldn't they be the same?
import numpy as np
x = np.array([[1.5, 2], [2.4, 6]])
k = np.copy(x)
for i in range(len(x)):
for j in range(len(x[i])):
k[i][j] = 1 / (1 + np.exp(-x[i][j]))
print("K[i][j]:"+str(k[i][j]))
print("Value:"+str(1 / (1 + np.exp(-x[i][j]))))
I've just run your code with python3 and python2 and the results were absolutely the same.
Besides, you don't have to do looping when using numpy arrays allows you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of
replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.
So, keeping all this in mind you may rewrite your code as follows:
import numpy as np
x = np.array([[1.5, 2], [2.4, 6]], dtype=np.float)
k = 1 / (1 + np.exp(-x))
When I run this script, 2 prints showed same results. This python is 3.5.2.
K[i][j]:0.817574476194
Value:0.817574476194
K[i][j]:0.880797077978
Value:0.880797077978
K[i][j]:0.916827303506
Value:0.916827303506
K[i][j]:0.997527376843
Value:0.997527376843