I wrote a function that passes numpy array's into C code using CFFI. It utilizes the buffer protocol and memoryview to pass the data efficiently without copying it. However, this means that you need to pass C-contiguous arrays and ensure that you using the right types. Numpy provides a function numpy.ascontiguous, which does this. So I iterate over the arguments, and apply this function. The implementation below works, and may be of general interest. However, it is slow given the number of times it is called. (Any general comments on how to speed it up would be helpful.)
However, the actual question is when you replace the first list comprehension with a generator comprehension, or if you refactor the code so that np.ascontigous is called in the second one, the pointers passed into the C code no longer point to the start of the numpy array. I think that it is not getting called. I'm iterating over the comprehension and only using the return values, why would using a list comprehension or generator comprehension change anything?
def cffi_wrap(cffi_func, ndarray_params, pod_params, return_shapes=None):
"""
Wraps a cffi function to allow it to be called on numpy arrays.
It uss the numpy buffer protocol and and the cffi buffer protocol to pass the
numpy array into the c function without copying any of the parameters.
You will need to pass dimensions into the C function, which you can do using
the pod_params.
Parameters
----------
cffi_func : c function
This is a c function declared using cffi. It must take double pointers and
plain old data types. The arguments must be in the form of numpy arrays,
plain old data types, and then the returned numpy arrays.
ndarray_params : iterable of ndarrays
The numpy arrays to pass into the function.
pod_params : tuple of plain old data
This plain old data objects to pass in. This may include for example
dimensions.
return_shapes : iterable of tuples of positive ints
The shapes of the returned objects.
Returns
-------
return_vals : ndarrays of doubles.
The objects to be calculated by the cffi_func.
"""
arr_param_buffers = [np.ascontiguousarray(param, np.float64)
if np.issubdtype(param.dtype, np.float)
else np.ascontiguousarray(param, np.intc) for param in ndarray_params]
arr_param_ptrs = [ffi.cast("double *", ffi.from_buffer(memoryview(param)))
if np.issubdtype(param.dtype, np.float)
else ffi.cast("int *", ffi.from_buffer(memoryview(param)))
for param in arr_param_buffers]
if return_shapes is not None:
return_vals_ptrs = tuple(ffi.new("double[" + str(np.prod(shape)) + "]")
for shape in return_shapes)
returned_val = cffi_func(*arr_param_ptrs, *pod_params, *return_vals_ptrs)
return_vals = tuple(np.frombuffer(ffi.buffer(
return_val))[:np.prod(shape)].reshape(shape)
for shape, return_val in zip(return_shapes, return_vals_ptrs))
else:
returned_val = cffi_func(*arr_param_ptrs, *pod_params)
return_vals = None
if returned_val is not None and return_vals is not None:
return_vals = return_vals + (returned_val,)
elif return_vals is None:
return_vals = (returned_val,)
if len(return_vals) == 1:
return return_vals[0]
else:
return return_vals
I'm just guessing, but the error could come from keepalives: with arr_param_buffers a list comprehension, as in your posted code, then as long as this local variable exists (i.e. for the whole duration of cffi_wrap()), all the created numpy arrays are alive. This allows you to do ffi.from_buffer(memoryview(...)) on the next line and be sure that they are all pointers to valid data.
If you replace arr_param_buffers with a generator expression, it will generate the new numpy arrays one by one, call ffi.from_buffer(memoryview(param)) on them, and then throw them away. The ffi.from_buffer(x) returns an object that should keep x alive, but maybe x == memoryview(nd) does not itself keep alive the numpy array nd, for all I know.
Related
new to Python (matlab background).
I have a function (np.unique) that can output either 1 or 2 arrays:
array of unique values.
counts for each value (enabled by setting an argument return_counts=true)
When the function is set to return a single array only, assigning the result into the undefined variable "uni" makes it an ndarray type:
uni=np.unique(iris_2d['species'],return_counts=False)
But when the function is set to return 2 arrays the variable "uni" is created as a tuple containing 2 ndarrays.
Is there a way to force the output directly into a 2d array (and multidimensional in general), without predefine the variable "uni" or using a a second function like numpy.stack/numpy.asarray?
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
dtype=np.dtype({'names':names, 'formats':np.append(np.repeat('float',4),'<U16')})
iris_2d = np.genfromtxt(url, delimiter=',', dtype=dtype, usecols=[0,1,2,3,4])
uni_isTuple=np.unique(iris_2d['species'],return_counts=True)
uni_isNdArray=np.unique(iris_2d['species'],return_counts=False)
I'm unaware of a way to force np.unique() to return a ndarray instead of a tuple. I realize you asked for a solution that doesn't call another function, but if you'll tolerate passing the tuple to np.array() to build a ndarray from the tuple that might give you what you want.
uni_isTuple = np.array(np.unique(iris_2d['species'],return_counts=True))
I come across this issue often, and I would be surprised if there wasn't some very simple and pythonic one-liner solution to it.
Suppose I have a method or a function that takes a list or some other iterable object as an argument. I want for an operation to be performed once for each item in the object.
Sometimes, only a single item (say, a float value) is passed to this function. In this situation, my for-loop doesn't know what to do. And so, I find myself peppering my code with the following snippet of code:
from collections.abc import Sequence
def my_function(value):
if not isinstance(value, Sequence):
value = [value]
# rest of my function
This works, but it seems wasteful and not particularly legible. In searching StackOverflow I've also discovered that strings are considered sequences, and so this code could easily break given the wrong argument. It just doesn't feel like the right approach.
I come from a MATLAB background, and this is neatly solved in that language since scalars are treated like 1x1 matrices. I'd expect, at the very least, for there to be a built-in, something like numpy's atleast_1d function, that automatically converts anything into an iterable if it isn't one.
The short answer is nope, there is no simple built-in. And yep, if you want str (or bytes or bytes-like stuff or whatever) to act as a scalar value, it gets uglier. Python expects callers to adhere to the interface contract; if you say you accept sequences, say so, and it's on the caller to wrap any individual arguments.
If you must do this, there's two obvious ways to do it:
First is to make your function accept varargs instead of a single argument, and leave it up to the caller to unpack any sequences, so you can always iterate the varargs received:
def my_function(*values):
for val in values:
# Rest of function
A caller with individual items calls you with my_function(a, b), a caller with a sequence calls you with my_function(*seq). The latter does incur some overhead to unpack the sequence to a new tuple to be received by my_function, but in many cases this is fine.
If that's not acceptable for whatever reason, the other solution is to roll your own "ensure iterable" converter function, following whatever rules you care about:
from collections.abc import ByteString
def ensure_iterable(obj):
if isinstance(obj, (str, ByteString)):
return (obj,) # Treat strings and bytes-like stuff as scalars and wrap
try:
iter(obj) # Simplest way to test if something is iterable is to try to make it an iterator
except TypeError:
return (obj,) # Not iterable, wrap
else:
return obj # Already iterable
which my_function can use with:
def my_function(value):
value = ensure_iterable(value)
Python is a general purpose language, with true scalars, and as well as iterables like lists.
MATLAB does not have true scalars. The base object is a 2d matrix. It did not start as a general purpose language.
numpy adds MATLAB like arrays to Python, but it too can have 0d arrays (scalar arrays), which may give the wayward MATLAB users headaches.
Many numpy functions have a provision for converting their input to an array. That way they will work a list input as well as array
In [10]: x = np.array(3)
In [11]: x
Out[11]: array(3)
In [12]: x.shape
Out[12]: ()
In [13]: for i in x: print(x)
Traceback (most recent call last):
Input In [13] in <cell line: 1>
for i in x: print(x)
TypeError: iteration over a 0-d array
It also has utility functions that insure the array is 1d, or 2 ...
In [14]: x = np.atleast_1d(1)
In [15]: x
Out[15]: array([1])
In [16]: for i in x: print(i)
1
But like old-fashion MATLAB, we prefer to avoid iteration in numpy. It doesn't have jit compilation that lets current MATLAB users get by with iterations. Technically numpy functions do use iteration, but it usually in compiled code.
np.sin applied to various inputs:
In [17]: np.sin(1) # scalar
Out[17]: 0.8414709848078965
In [18]: np.sin([1,2,3]) # list
Out[18]: array([0.84147098, 0.90929743, 0.14112001])
In [19]: np.sin(np.array([1,2,3]).reshape(3,1))
Out[19]:
array([[0.84147098],
[0.90929743],
[0.14112001]])
Technically, the [17] result is a numpy scalar, not a base python float:
In [20]: type(Out[17])
Out[20]: numpy.float64
I would duck type:
def first(item):
try:
it=iter(item)
except TypeError:
it=iter([item])
return next(it)
Test it:
tests=[[1,2,3],'abc',1,1.23]
for e in tests:
print(e, first(e))
Prints:
[1, 2, 3] 1
abc a
1 1
1.23 1.23
In python, I am trying to change the values of np array inside the function
def function(array):
array = array + 1
array = np.zeros((10, 1))
function(array)
For array as function parameter, it is supposed to be a reference, and I should be able to modify its content inside function.
array = array + 1 performs element wise operation that adds one to every element in the array, so it changes inside values.
But the array actually does not change after the function call. I am guessing that the program thinks I am trying to change the reference itself, not the content of the array, because of the syntax of the element wise operation. Is there any way to make it do the intended behavior? I don't want to loop through individual elements or make the function return the new array.
This line:
array = array + 1
… does perform an elementwise operation, but the operation it performs is creating a new array with each element incremented. Assigning that array back to the local variable array doesn't do anything useful, because that local variable is about to go away, and you haven't done anything to change the global variable of the same name,
On the other hand, this line:
array += 1
… performs the elementwise operation of incrementing all of the elements in-place, which is probably what you want here.
In Python, mutable collections are only allowed, not required, to handle the += statement this way; they could handle it the same way as array = array + 1 (as immutable types like str do). But builtin types like list, and most popular third-party types like np.array, do what you want.
Another solution if you want to change the content of your array is to use this:
array[:] = array + 1
If I want to get the dot product of two arrays, I can get a performance boost by specifying an array to store the output in instead of creating a new array (if I am performing this operation many times)
import numpy as np
a = np.array([[1.0,2.0],[3.0,4.0]])
b = np.array([[2.0,2.0],[2.0,2.0]])
out = np.empty([2,2])
np.dot(a,b, out = out)
Is there any way I can take advantage of this feature if I need to modify an array in place? For instance, if I want:
out = np.array([[3.0,3.0],[3.0,3.0]])
out *= np.dot(a,b)
Yes, you can use the out argument to modify an array (e.g. array=np.ones(10)) in-place, e.g. np.multiply(array, 3, out=array).
You can even use in-place operator syntax, e.g. array *= 2.
To confirm if the array was updated in-place, you can check the memory address array.ctypes.data before and after the modification.
I would like to run the contraction algorithm on an array of vertices n^2 times so as to calculate the minimum cut of a graph. After the first for-loop iteration, the array is altered and the remaining iterations use the altered array, which is not what I want. How can I simulate pointers so as to have the original input array during each for-loop iteration?
def n_squared_runs(array):
min_cut, length = 9999, len(array) ** 2
for i in range(0, length):
# perform operation on original input array
array = contraction(array)
if len(array) < min_cut:
min_cut = len(array)
return min_cut
The contraction() operation should create and return a new array then, and not modify in-place the array it receives as a parameter - also you should use a different variable name for the returned array, clearly if you use array to name both the parameter and the local variable, the parameter will get overwritten inside the function.
This has nothing to do with pointers, but with the contracts of the functions in use. If the original array must be preserved, then the helper functions need to make sure that this restriction is enforced. Notice that in Python if you do this:
array = [1, 2, 3]
f(array)
The array received by the f function is the same that was declared "outside" of it - in fact, all that f receives is a reference to the array, not a copy of it - so naturally any modifications to the array you do inside f will be reflected outside. Also, it's worth pointing out that all parameters in Python get passed by value, and there's no such thing as pointers or passing by reference in the language.
Don't overwrite the original array.
def n_squared_runs(array):
min_cut, length = 9999, len(array) ** 2
for i in range(0, length):
# perform operation on original input array
new_array = contraction(array)
if len(new_array) < min_cut:
min_cut = len(new_array)
return min_cut