using cimport in cython for numpy [duplicate] - python

In the tutorial of the Cython documentation, there are cimport and import statements of numpy module:
import numpy as np
cimport numpy as np
I found this convention is quite popular among numpy/cython users.
This looks strange for me because they are both named as np.
In which part of the code, imported/cimported np are used?
Why cython compiler does not confuse them?

cimport my_module gives access to C functions or attributes or even sub-modules under my_module
import my_module gives access to Python functions or attributes or sub-modules under my_module.
In your case:
cimport numpy as np
gives you access to Numpy C API, where you can declare array buffers, variable types and so on...
And:
import numpy as np
gives you access to NumPy-Python functions, such as np.array, np.linspace, etc
Cython internally handles this ambiguity so that the user does not need to use different names.

Related

Calling numpy functions

Is it okay to call any numpy function without using the library name before the function (example: numpy.linspace())? Can we call it simply
linspace()
instead of calling
numpy.linspace()
You can import it like this
from numpy import linspace
and then use it like this
a = linspace(1, 10)
yes, its completely fine when you are importing the function separately from the numpy such as
from numpy import linespace
#you can call the function by just writing its name
result=linespace(3,50)
but the convention is to use the name alias the pakage as np
import numpy as np
#then calling the function with short name
result = np.linespace(3,50)
alias can be helpful when working with large number of libraries.and it also improves the code readability.
If you import the function from the library directly there is nothing wrong with calling said function directly.
i.e.
from numpy import linspace
# Then call linspace by itself
a = linspace(1, 10)
That being said, many find that having numpy (often shortened to np) in front of function names help improve code readability. As almost everyone does this with certain libraries (Tensorflow as tf, Numpy as np, Pandas as pd) some may view it in a poor light if you simply directly import and use the function.
I would recommend importing the library as the shortened name and then using it appropriately.
i.e.
import numpy as np
# Then call np.linspace
a = np.linspace(1, 10)

import numpy as np versus from numpy import

I have a module that heavily makes use of numpy:
from numpy import array, median, nan, percentile, roll, sqrt, sum, transpose, unique, where
Is it better practice to keep the namespace clean by using
import numpy as np
and then when I need to use array just use np.array, for e.g.?
This module also gets called repeatedly, say a few million times and keeping the namespace clean appears to add a bit of an overhead?
setup = '''import numpy as np'''
function = 'x = np.sum(np.array([1,2,3,4,5,6,7,8,9,10]))'
print(min(timeit.Timer(function, setup=setup).repeat(10, 300000)))
1.66832
setup = '''from numpy import arange, array, sum'''
function = 'x = sum(array([1,2,3,4,5,6,7,8,9,10]))'
print(min(timeit.Timer(function, setup=setup).repeat(10, 300000)))
1.65137
Why does this add more time when using np.sum vs sum?
You are right, it is better to keep the namespace clean. So I would use
import numpy as np
It keeps your code more readable, when you see a call like np.sum(array) you are reminded that you should work with an numpy array. The second reason is that many of the numpy functions have identical names as functions in other modules like scipy... If you use both its always clear which one you are using.
As you you can see in the test you made, the performance difference is there and if you really need the performance you could do it the other way.
The difference in performance is that in the case of a specific function import, you are referencing the function in the numpy module at the beginning of the script.
In the case of the general module import you import only the reference to the module and python needs to resolve/find the function that you are using in that module at every call.
You could have the best of both worlds (faster name resolution, and non-shadowing), if you're ok with defining your own aliasing (subject to your team conventions, of course):
import numpy as np
(np_sum, np_min, np_arange) = (np.sum, np.min, np.arange)
x = np_arange(24)
print (np_sum(x))
Alternative syntax to define your aliases:
from numpy import \
arange as np_arange, \
sum as np_sum, \
min as np_min

how to use functions from a sub-module

I am confused about how to use functions from a sub module. For example, numpy has a submodule linalg, which contains a function solve, so I can do:
import numpy as np
np.linalg.solve( # And then specify what I want to solve
Similarly, scipy has a subpackage sparse, which has a subpackage linalg, which has a function spsolve, yet I cannot do this:
import scipy.sparse as sp
sp.linalg.spsolve( # And then whatever I want to solve
Why not? Shouldn't that work?

Cython says buffer types only allowed as function local variables even for ndarray.copy()

I am new to Cython and encountered this code snippet:
import numpy as np
cimport numpy as np
testarray = np.arange(5)
cdef np.ndarray[np.int_t, ndim=1] testarray1 = testarray.copy()
cdef np.ndarray[np.float_t, ndim=1] testarray2 = testarray.astype(np.float)
During compilation, it said Buffer types only allowed as function local variables. However, I am using .copy() or .astype() which is returning not a memoryview, but a copy. Why is this still happening? How can I get around this?
Thanks!
When you define an array in cython using np.ndarray[Type, dim], that is accessing the python buffer interface, and those can't be set as module level variables. This is a separate issue from views vs copies of numpy array data.
Typically if I want to have an array as a module level variable (i.e not local to a method), I define a typed memoryview and then set it within a method using something like (untested):
import numpy as np
cimport numpy as np
cdef np.int_t[:] testarray1
def init_arrays(np.int_t[:] testarray):
global testarray1
testarray1 = testarray.copy()

ctypedef in Cython with numpy: what is right convention?

In Cython when using numpy, what is the point of writing:
cimport numpy as np
import numpy as np
ctypedef np.int_t DTYPE_t
and then using DTYPE_t everywhere instead of just using np.int_t? Does the ctypedef actually do anything differently in the resulting code here?
You can read the notes from the docs for cython, reading the notes they explain the reason for the use of this notation and imports.
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)

Categories