What is the difference between single and double bracket Numpy array?

What is the difference between single and double bracket Numpy array? - python

import numpy as np
a=np.random.randn(1, 2)
b=np.zeros((1,2))
print("Data type of A: ",type(a))
print("Data type of A: ",type(b))
Output:
Data type of A: <class 'numpy.ndarray'>
Data type of A: <class 'numpy.ndarray'>
In np.zeros(), to declare an array we give the input in 2 brackets whereas in np.random.radn(), we give it in 1 bracket?
Is there any specific reason for the syntax,as both of them are of same data type but follow a different syntax?

In an effort to ease the transition for Matlab users to NumPy, some convenience functions like randn were built which use the same call signature as their Matlab equivalents.
The more NumPy-centric (as opposed to Matlab-centric) NumPy functions (such as np.zeros) expect the size (or shape) to be a tuple. This allows other parameters like dtype and order to be passed to the function as well.
The Matlab-centric functions assume all the arguments are part of the size.
np.random.randn is one of NumPy's Matlab-centric convenience functions, modeled after Matlab's randn. The more NumPy-centric alternative to np.random.randn is np.random.standard_normal.

Related

Converting native python types to numpy dtypes

Not to be confused with the inverse task, that is covered plenty.
I am looking for something like np.dtype(7.7) == np.float.
The motivation is to be able to handle any array-like input just like numpy itself. To construct the output or temporary data, I sometimes want to use the input type if possible.
Edit: Maybe that was a bad (too specific) example; I know that np.float happens to be just an alias for the builtin float. I was thinking more along the following lines.
myInput = something
# required to have a homogeneous data type in the documentation of my function;
# maybe constrained to float, int, string, lists of same length and type
# but I would like to handle simply as much of that as numpy can handle
numInput = len(myInput)
numOutput = numInput // 2 # just for example
myOutput = np.empty(shape=(numOutput), dtype=???)
for i in range(numOutput):
myOutput[i] = whatever # maybe just a copy, hence the same data type

numpy.float is just the regular Python float type. It's not a NumPy dtype. It's almost certainly not what you need:
>>> import numpy
>>> numpy.float is float
True
If you want the dtype NumPy would coerce your scalar to, just make an array and get its dtype:
>>> numpy.array(7.7).dtype
dtype('float64')
If you want the type NumPy uses for scalars of this dtype, access the dtype's type attribute:
>>> numpy.array(7.7).dtype.type
<class 'numpy.float64'>

You could simply use np.float64(original_float), or whatever numpy type you wish to convert your variable to.
For the record, this code works:
val = 7.7
if isinstance(val, float) is True:
val = np.float64(val)
if isinstance(val, np.float64) is True:
print("Success!")
>>>Success!
Hope this helps.
Edit:
I just saw #user2357112 supports Monica's comment to your question and it's important to note that effectively np.float acts the same way as float. The implementation I provided is oriented towards special numpy types like np.float32 or np.float64, the one I used in the test code. But if I performed the same test with just np.float this would be the result:
val = 7.7
if isinstance(val, float) is True:
if isinstance(val, np.float) is True:
print("Success!")
>>>Success!
Thus proving that from the interpreter's point of view float and np.float are pretty much the same type.

Convert a MATLAB matrix object to a python NumPy array

I want to call some python code from MATLAB, in order to do this I need to convert a matrix object to a NumPy ndarray, through the MATLAB function py.numpy.array. However, passing simply the matrix object to the function does not work. At the moment I solved the problem converting the matrix to a cell of cells object, containing the rows of the matrix. For example
function ndarray = convert(mat)
% This conversion fails
ndarray = py.numpy.array(mat)
% This conversion works
cstr = cell(1, size(mat, 1));
for row = 1:size(mat, 1)
cstr(row) = {mat(row, :)};
end
ndarray = py.numpy.array(cstr);
I was wondering if it exists some more efficient solution.

Assuming your array contains double values, the error tells us exactly what we should do:
A = magic(3);
%% Attempt 1:
try
npA = py.numpy.array(A);
% Result:
% Error using py.numpy.array
% Conversion of MATLAB 'double' to Python is only supported for 1-N vectors.
catch
end
%% Attempt 2:
npA = py.numpy.array(A(:).');
% Result: OK!
Then:
>> whos npA
Name Size Bytes Class Attributes
npA 1x1 8 py.numpy.ndarray
Afterwards you can use numpy.reshape to get the original shape back, either directly in MATLAB or in Python.

Actually, using python 2.7 and Matlab R2018b, it worked with simply doing:
pyvar = py.numpy.array(var);
Matlab tells me that if I want to convert the numpy array to Matlab variable, I can just use double(pyvar)
By the way, it didn't worked with python 3.7, neither using an older version of Matlab . I don't know what this means, but I thought this might be helpful

Make the matrix multiplication operator # work for scalars in numpy

In python 3.5, the # operator was introduced for matrix multiplication, following PEP465. This is implemented e.g. in numpy as the matmul operator.
However, as proposed by the PEP, the numpy operator throws an exception when called with a scalar operand:
>>> import numpy as np
>>> np.array([[1,2],[3,4]]) # np.array([[1,2],[3,4]]) # works
array([[ 7, 10],
[15, 22]])
>>> 1 # 2 # doesn't work
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unsupported operand type(s) for #: 'int' and 'int'
This is a real turnoff for me, since I'm implementing numerical signal processing algorithms that should work for both scalars and matrices. The equations for both cases are mathematically exactly equivalent, which is no surprise, since "1-D x 1-D matrix multiplication" is equivalent to scalar multiplication. The current state however forces me to write duplicate code in order to handle both cases correctly.
So, given that the current state is not satisfactory, is there any reasonable way I can make the # operator work for scalars? I thought about adding a custom __matmul__(self, other) method to scalar data types, but this seems like a lot of hassle considering the number of involved internal data types. Could I change the implementation of the __matmul__ method for numpy array data types to not throw an exception for 1x1 array operands?
And, on a sidenote, which is the rationale behind this design decision? Off the top of my head, I cannot think of any compelling reasons not to implement that operator for scalars as well.

As ajcr suggested, you can work around this issue by forcing some minimal dimensionality on objects being multiplied. There are two reasonable options: atleast_1d and atleast_2d which have different results in regard to the type being returned by #: a scalar versus a 1-by-1 2D array.
x = 3
y = 5
z = np.atleast_1d(x) # np.atleast_1d(y) # returns 15
z = np.atleast_2d(x) # np.atleast_2d(y) # returns array([[15]])
However:
Using atleast_2d will lead to an error if x and y are 1D-arrays that would otherwise be multiplied normally
Using atleast_1d will result in the product that is either a scalar or a matrix, and you don't know which.
Both of these are more verbose than np.dot(x, y) which would handle all of those cases.
Also, the atleast_1d version suffers from the same flaw that would also be shared by having scalar # scalar = scalar: you don't know what can be done with the output. Will z.T or z.shape throw an error? These work for 1-by-1 matrices but not for scalars. In the setting of Python, one simply cannot ignore the distinction between scalars and 1-by-1 arrays without also giving up all the methods and properties that the latter have.

Why I can't use theano.tensor.argmax and theano.tensor.mean correctly

I am learning Theano now but there are always some problems.my code is as follows:
import theano
from numpy import *
import theano.tensor as T
a = [1,2,3,4]
b = [7,8,9,10]
print T.argmax(a)
I thought it would print the index of '4',but the result is:
argmax
what's more,when I am using T.neq().just as follows:
import theano
from numpy import *
import theano.tensor as T
a = [1,2,3,4]
b = [7,8,9,10]
print T.neq(a,b)
the result shows:
Elemwise{neq,no_inplace}.0
I really new to this and have no idea,did I miss anything?thank you in advance..

T.argmax() is expecting a Theano TensorVariable type. Some of the types of Variables used in Theano are listed here. Don't let the name "fully typed constructors" scare you. Think about them more in terms of what type of data you want to use as your input. Are you using float matrices? Then the relevant TensorVariable type is probably "fmatrix." Are you dealing with batches of RGB image data? Then the relevant TensorVariable type is probably "tensor4."
In your code, we are trying to input a list type into T.argmax(). So from the above point of view, that isn't going to work. Also, note that type(T.argmax(a)) is a theano.tensor.var.TensorVariable type. So it is expecting a TensorVariable as input, and it outputs a TensorVariable type as well. So this isn't going to return the actual argmax.
Okay, so what does work? How can we do this computation in Theano?
Let's first identify the type of data you want to deal with. This is going to be the starting point of our computational graph that we will be building. In this case, it looks like we want to deal with arrays or vectors. Theano has an ivector type, which is a vector of integers, or an fvector type which is an vector of float32 values. Let's stick with your data and do ivector since we have integer values:
x = T.ivector('input')
This line just created a TensorVariable x that represents our intended input type, an array of integers.
Now let's define a TensorVariable for the argmax of the elements of x:
y = T.argmax(x)
So far we have built a computational graph, which is expecting an array of integers as input and will output the argmax of that array. However, in order to actually do this, we have to compile this into a function:
get_argmax = theano.function([x], y)
The theano.function syntax can be found here.
Think of this function as now actually performing the computation that we have defined using x and y.
When I execute:
get_argmax([1,2,3,4,19,1])
It returns:
array(4)
So what did we really do? By defining Theano variables and using theano.tensor functions, we build a computational graph. We then used theano.function to compile a function that actually performs that computation on actual inputs that we specify.
To end: how to do the not equals operation?
a = T.ivector('a')
b = T.ivector('b')
out = T.neq(a,b)
get_out = theano.function([a,b], out)
print get_out([1,2,3,4], [7,8,9,10])
will return:
[1,1,1,1]
One of the key conceptual differences is that I treat the a,b as theano TensorVariables, rather than assigning them explicit variables.
You'll get the hang out of it, just remember that you need to define your computation in terms of Theano TensorVariables, and then to actually "use it" you have to compile it using theano.function.

Binary operations on Numpy scalars automatically up-casts to float64

I want to do binary operations (like add and multiply) between np.float32 and builtin Python int and float and get a np.float32 as the return type. However, it gets automatically up-casted to a np.float64.
Example code:
>>> a = np.float32(5)
>>> a.dtype
dtype('float32')
>>> b = a + 2
>>> b.dtype
dtype('float64')
If I do this with a np.float128, b also becomes a np.float128. This is good, as it thereby preserves precision. However, no up-casting to np.float64 is necessary to preserve precision in my example, but it still occurs. Had I added 2.0 (a Python float (64 bit)) to a instead of 2, the casting would make sense. But even here, I do not want it.
So my question is: How can I alter the casting done when applying a binary operator to a np.float32 and a builtin Python int/float? Alternatively, making single precision the standard in all calculations rather than double, would also count as a solution, as I do not ever need double precision. Other people have asked for this, and it seems that no solution has been found.
I know about numpy arrays and there dtypes. Here I get the wanted behavior, as an array always preserves its dtype. It is however when I do an operation on a single element of an array that I get the unwanted behavior.
I have a vague idea to a solution, involving subclassing np.ndarray (or np.float32) and changing the value of __array_priority__. So far I have not been able to get it working.
Why do I care? I am trying to write an n-body code using Numba. This is why I cannot simply do operations on the array as a whole. Changing all np.float64 to np.float32 makes for a speed up of about a factor of 2, which is important. The np.float64-casting behavior serves to ruin this speed up completely, as all operations on my np.float32 array are done in 64-precision and then downcasted to 32-precision.

I'm not sure about the NumPy behavior, or how exactly you're trying to use Numba, but being explicit about the Numba types might help. For example, if you do something like this:
#jit
def foo(a):
return a[0] + 2;
a = np.array([3.3], dtype='f4')
foo(a)
The float32 value in a[0] is promoted to a float64 before the add operation (if you don't mind diving into llvm IR, you can see this for yourself by running the code using the numba command and using the --dump-llvm or --dump-optimized flag: numba --dump-optimized numba_test.py). However, by specifying the function signature, including the return type as float32:
#jit('f4(f4[:]'))
def foo(a):
return a[0] + 2;
The value in a[0] is not promoted to float64, although the result is cast to a float64 so it can be converted to a Python float object when the function returns to Python land.
If you can allocate an array beforehand to hold the result, you can do something like this:
#jit
def foo():
a = np.arange(1000000, dtype='f4')
result = np.zeros(1000000, dtype='f4')
for i in range(a.size):
result[0] = a[0] + 2
Even though you're doing the looping yourself, the performance of the compiled code should be comparable to a NumPy ufunc, and no casts to float64 should occur (Again, this can be verified by looking at the llvm IR that Numba generates).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is the difference between single and double bracket Numpy array? - python

Related

Converting native python types to numpy dtypes

Convert a MATLAB matrix object to a python NumPy array

Make the matrix multiplication operator # work for scalars in numpy

Why I can't use theano.tensor.argmax and theano.tensor.mean correctly

Binary operations on Numpy scalars automatically up-casts to float64

Categories

Resources