Redefine global JIT function and maintain performance

Redefine global JIT function and maintain performance - python

I'm in the process of developing a PDE package (similar to fenics) and am trying to come up with the best way for a user to define their functions. The current method I am using is fine. However, as the package grows the run-time compilation of the functions will take much longer.
The current method takes in a dictionary of function specs and redefines and recompiles the global default. As the package gets larger, it would be ideal if I could have everything compiled ahead of time; the problem is for numba's pycc AOT compilation, you can't pass in a function because there is no recognizable signature type (that I could find in the documentaion). I have also tried using cython but the function stays as a weak reference to a pyobject which ruins the performance.
The code in the module where the function exists looks like
import numpy as np
from numba import njit
#njit
def _user_defined_function(z, t):
return -z
def define_vector_func(func_specs, **numba_kwargs):
# func_specs is a dict, numba_kwargs are for jit decorator
# create a string to be compiled into a new function
exprs = []
for var in func_specs:
s = func_specs[var]
for i, (variable, expression) in enumerate(func_specs.items()):
s = s.replace(variable, f'z[{i}]')
exprs.append(s)
fp = '<ipython-cache-safe>' # needed to allow for caching
src = ', '.join(exprs)
src = f'def local_func(z, t): return np.array([{src}])\n'
lines = [src]
linecache.cache[fp] = (len(src), None, lines, fp)
code = compile(src, fp, 'exec')
exec(code)
# redefine global function and return a reference if user wants to use it
local_func_ptr = vars()['local_func']
local_func_ptr(np.ones(len(func_specs)), 0.0)
global _user_defined_function
_user_defined_function = njit(**numba_kwargs)(local_func_ptr)
_rec() # recompile other functions to recognize new function
return _user_defined_function
# rest of module
# ...
To use it from a separate module, you would do something like
from module_above import define_vector_func, other_funcs_from_module
if __name__ == '__main__':
func_spec = {'v': 'v - (v * v * v) / 3.0 - w + 0.08',
'w': '0.08 * (v - 0.8 * w + 0.7)'}
f = define_vector_func(func_spec, fastmath=True, cache=True)
# do what user wants
What would be the best route for compiling the module ahead of time and maintain performance?

Related

Cython Optimization of Numpy for Loop

I am new to cython and have the following code for a numpy for loop that I am trying to optimize. So far, this Cython code isn't much faster than the numpy for loop.
# cython: infer_types = True
import numpy as np
cimport numpy
DTYPE = np.double
def hdcfTransfomation(scanData):
cdef Py_ssize_t Position
scanLength = scanData.shape[0]
hdcfFunction_np = np.zeros(scanLength, dtype = DTYPE)
cdef double [::1] hdcfFunction = hdcfFunction_np
for position in range(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:-(position + 1)]
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
hdcfFunction[position] = arrayMean
return hdcfFunction
I know that using C math library functions would be more ideal than calling back into the numpy language (subtract, square, mean), but I am not sure where I can find a list of functions that can be called in this manner.
I have been trying to figure out ways to optimize this code by using different types, ect. but nothing is providing the performance that I think is possible with a fully optimized implementation of Cython.
Here is a working example of the numpy for-loop:
def hdcfTransfomation(scanData):
scanLength = scanData.shape[0]
hdcfFunction = np.zeros(scanLength)
for position in range(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:-(position + 1)]
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
hdcfFunction[position] = arrayMean
return hdcfFunction
scanDataArray = np.random.rand(80000, 1)
transformedScan = hdcfTransformed(scanDataArray)

Always provide as much informations as possible (some example data, Python/Cython Version, Compiler Version/Settings and CPU Model.
Without that it is quite hard to compare any timings. For example this problem benefits quite well from SIMD-vectorization. It will make quite a difference which compiler you use or if you want to redistribute a compiled version which should also run on low-end or quite old CPUS (eg. no AVX).
I am not very familiar with Cython, but I think your main problem is the missing declaration for scanData. Maybe the C-Compiler needs additional flags like march=native, but the real syntax is compiler dependend. I am am also not sure how Cython or the C-compiler optimizes this part:
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
If that loops (all vectorized commands are actually loops) are not joined, but intead there are temporary arryas like in pure Python created, this will slow down the code. It will be a good idea to create a 1D array first. (eg. scanData=scanData[::1]
As said I am not that familliar with Cython, I tried what is possible with Numba. At least it shows what should also be possible with a resonable good Cython implementation.
Maybe easier to otimize for the compiler
import numba as nb
import numpy as np
#nb.njit(fastmath=True,error_model='numpy',parallel=True)
#scanData is a 1D-array here
def hdcfTransfomation(scanData):
scanLength = scanData.shape[0]
hdcfFunction = np.zeros(scanLength, dtype = scanData.dtype)
for position in nb.prange(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:scanData.shape[0]-(position + 1)]
sum=0.
jj=0
for i in range(scanLength-(position + 1)):
jj+=1
sum+=(topShift[i]-bottomShift[i])**2
hdcfFunction[position] = sum/jj
return hdcfFunction
I also used parallelization here, because the problem is embarrassingly parallel. At least with a size of 80_000 and Numba it doesn't matter if you use a slightly modified version of your code (1D-array), or the code above.
Timings
#Quadcore Core i7-4th gen,Numba 0.4dev,Python 3.6
scanData=np.random.rand(80_000)
#The first call to the function isn't measured (compilation overhead),but the following calls.
Pure Python: 5900ms
Numba single-threaded: 947ms
Numba parallel: 260ms
Especially for larger arrays than np.random.rand(80_000) there may be better aproaches (loop tilling for better cache usage), but for this size that should be more or less OK (At least it fits in the L3-cache)
Naive GPU Implementation
from numba import cuda, float32
#cuda.jit('void(float32[:], float32[:])')
def hdcfTransfomation_gpu(scanData,out_data):
scanLength = scanData.shape[0]
position = cuda.grid(1)
if position < scanLength - 1:
sum= float32(0.)
offset=1 + position
for i in range(scanLength-offset):
sum+=(scanData[i+offset]-scanData[i])**2
out_data[position] = sum/(scanLength-offset)
hdcfTransfomation_gpu[scanData.shape[0]//64,64](scanData,res_3)
This gives about 400ms on a GT640 (float32) and 970ms (float64). For a good implemenation shared arrays should be considered.

Putting cython aside, does this do the same thing as your current code but without a for loop? We can tighten it up and correct for inaccuracies, but the first port of call is to try apply operations in numpy to 2D arrays before turning to cython for for loops. It's too long to put in a comment.
import numpy as np
# Setup
arr = np.random.choice(np.arange(10), 100).reshape(10, 10)
top_shift = arr[:, :-1]
bottom_shift = arr[:, 1:]
arr_diff = top_shift - bottom_shift
arr_squared = np.square(arr_diff)
arr_mean = arr_squared.mean(axis=1)

Python gsl_vector_set

I'm trying to initialize two vectors in memory using gsl_vector_set(). In the main code it is initialized to zero on default, but I wanted to initialize them to some non-zero value. I made a test code based on a working function that uses the gsl_vector_set() function.
from ctypes import *;
gsl = cdll.LoadLibrary('libgsl-0.dll');
gsl.gsl_vector_get.restype = c_double;
gsl.gsl_matrix_get.restype = c_double;
gsl.gsl_vector_set.restype = c_double;
foo = dict(
x_ht = [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
x_ht_m = [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
);
for f in range(0,18):
gsl.gsl_vector_set(foo['x_ht_m'],f,c_double(1.0));
gsl.gsl_vector_set(foo['x_ht'],f,c_double(1.0));
When I run the code I get this error.
ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1
I'm new to using ctypes and gsl functions so I'm not sure what the issue is or what the error message means. I an also not sure if there is a better way that I should be trying to save a vector to memory

Thank you #CristiFati for pointing out that I needed gsl_vector_calloc in my test code. I noticed that in the main code I was working in that the vector I needed to set was
NAV.KF_dictnry['x_hat_m']
instead of
NAV.KF_dictnry['x_ht_m']
So I fixed the test code to mirror the real code a bit better by creating a class holding the dictionary, and added the ability to change each value in the vector to an arbitrary value.
from ctypes import *;
gsl = cdll.LoadLibrary('libgsl-0.dll');
gsl.gsl_vector_get.restype = c_double;
gsl.gsl_matrix_get.restype = c_double;
gsl.gsl_vector_set.restype = c_double;
class foo(object):
fu = dict(
x_hat = gsl.gsl_vector_calloc(c_size_t(18)),
x_hat_m = gsl.gsl_vector_calloc(c_size_t(18)),
);
x_ht = [1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
x_ht_m = [1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
for f in range(0,18):
gsl.gsl_vector_set(foo.fu['x_hat_m'],f,c_double(x_ht_m[f]));
gsl.gsl_vector_set(foo.fu['x_hat'],f,c_double(x_ht[f]));
After running I checked with:
gsl.gsl_vector_get(foo.fu['x_hat_m'],0)
and got out a 1.0 (worked for the entire vector).
Turned out to just be some stupid mistakes on my end.
Thanks again!

Changing some part of a function in a module

In #PSP_soil.py:
def evaporation_flux(psi):
h_s = exp(mw*psi/(R*T))
return(E_p*(h_s-h_a)/(1-h_a))
I want to change this function to:
def evaporation_flux(psi):
h_s = exp(mw*psi/(R*T))
return(h_s)
but console in spyder (Python 2.7) do not run the program (E_p and h_a are constant variables), and just show UMD has deleted: PSP_readDataFile, PSP_grid, PSP_ThomasAlgorithm, PSP_soil
Any advice in this case?

You can do this:
from PSP_soil import *
def evaporation_flux(psi):
h_s = exp(mw*psi/(R*T))
return(h_s)
This redefines evaporation_flux from PSP_soil so when you do evaporation_flux(value), it gets called.
from PSP_soil import * imports all constants you need for this function, but you can also do from PSP_soil import evaporation_flux, mv, R, T

Passing string to Fortran DLL using ctypes and Python

I am trying to load a DLL in Python 2.7 using ctypes. The DLL was written using Fortran and has multiple subroutines in it. I was able to successfully set up couple of the exported functions that that long and double pointers as arguments.
import ctypes as C
import numpy as np
dll = C.windll.LoadLibrary('C:\\Temp\\program.dll')
_cp_from_t = getattr(dll, "CP_FROM_T")
_cp_from_t.restype = C.c_double
_cp_from_t.argtypes = [C.POINTER(C.c_longdouble),
np.ctypeslib.ndpointer(C.c_longdouble)]
# Mixture Rgas function
_mix_r = getattr(dll, "MIX_R")
_mix_r.restype = C.c_double
_mix_r.argtypes = [np.ctypeslib.ndpointer(dtype=C.c_longdouble)]
def cp_from_t(composition, temp):
""" Calculates Cp in BTU/lb/R given a fuel composition and temperature.
:param composition: numpy array containing fuel composition
:param temp: temperature of fuel
:return: Cp
:rtype : float
"""
return _cp_from_t(C.byref(C.c_double(temp)), composition)
def mix_r(composition):
"""Return the gas constant for a given composition.
:rtype : float
:param composition: numpy array containing fuel composition
"""
return _mix_r(composition)
# At this point, I can just pass a numpy array as the composition and I can get the
# calculated values without a problem
comps = np.array([0, 0, 12.0, 23.0, 33.0, 10, 5.0])
temp = 900.0
cp = cp_from_t(comps, temp)
rgas = mix_r(comps)
So far, so good.
The problem arises when I try another subroutine called Function2 which needs some strings as input. The strings are all fixed length (255) and they also ask for the length of each of the string parameters.
The function is implemented in Fortran as follows:
Subroutine FUNCTION2(localBasePath,localTempPath,InputFileName,Model,DataArray,ErrCode)
!DEC$ ATTRIBUTES STDCALL,REFERENCE, ALIAS:'FUNCTION2',DLLEXPORT :: FUNCTION2
Implicit None
Character *255 localBasePath,localTempPath,InputFileName
Integer *4 Model(20), ErrCode(20)
Real *8 DataArray(900)
The function prototype in Python is set up as follows
function2 = getattr(dll, 'FUNCTION2')
function2.argtypes = [C.POINTER(C.c_char_p), C.c_long,
C.POINTER(C.c_char_p), C.c_long,
C.POINTER(C.c_char_p), C.c_long,
np.ctypeslib.ndpointer(C.c_long , flags='F_CONTIGUOUS'),
np.ctypeslib.ndpointer(C.c_double, flags='F_CONTIGUOUS'),
np.ctypeslib.ndpointer(C.c_long, flags='F_CONTIGUOUS')]
And I call it using:
base_path = "D:\\Users\\xxxxxxx\\Documents\\xxxxx\\".ljust(255)
temp_path = "D:\\Users\\xxxxxxx\\Documents\\xxxxx\\temp".ljust(255)
inp_file = "inp.txt".ljust(255)
function2(C.byref(C.c_char_p(base_path)),
C.c_long(len(base_path)),
C.byref(C.c_char_p(temp_dir)),
C.c_long(len(temp_dir))),
C.byref(C.c_char_p(inp_file)),
C.c_long(len(inp_file)),
model_array,
data_array,
error_array)
The strings are essentially paths. The function Function2 does not recognize the paths and proves an error message with some non-readable characters at the end, such as:
forrtl: severe (43): file name specification error, unit 16, D:\Users\xxxxxxx\Documents\xxxxx\œâa.
What I wanted the function to receive was D:\Users\xxxxxxx\Documents\xxxxx\. Obviously, the strings are not passed correctly.
I have read that Python uses NULL terminated strings. Can that be a problem while passing strings to a Fortran dll? If so, how do I get around it?
Any recommendations?

Following comment from #eryksun, I made the following changes to make it work.
Changed the argtypes to:
function2 = getattr(dll, 'FUNCTION2')
function2.argtypes = [C.c_char_p, C.c_long,
C.c_char_p, C.c_long,
C.c_char_p, C.c_long,
np.ctypeslib.ndpointer(C.c_long , flags='F_CONTIGUOUS'),
np.ctypeslib.ndpointer(C.c_double, flags='F_CONTIGUOUS'),
np.ctypeslib.ndpointer(C.c_long, flags='F_CONTIGUOUS')]
And instead of passing the string as byref, I changed it to the following.
base_path = "D:\\Users\\xxxxxxx\\Documents\\xxxxx\\".ljust(255)
temp_path = "D:\\Users\\xxxxxxx\\Documents\\xxxxx\\temp".ljust(255)
inp_file = "inp.txt".ljust(255)
function2(base_path, len(base_path), temp_dir, len(temp_dir), inp_file, len(inp_file),
model_array, data_array, error_array)
It was sufficient to pass the values directly.

Shared-memory objects in multiprocessing

Suppose I have a large in memory numpy array, I have a function func that takes in this giant array as input (together with some other parameters). func with different parameters can be run in parallel. For example:
def func(arr, param):
# do stuff to arr, param
# build array arr
pool = Pool(processes = 6)
results = [pool.apply_async(func, [arr, param]) for param in all_params]
output = [res.get() for res in results]
If I use multiprocessing library, then that giant array will be copied for multiple times into different processes.
Is there a way to let different processes share the same array? This array object is read-only and will never be modified.
What's more complicated, if arr is not an array, but an arbitrary python object, is there a way to share it?
[EDITED]
I read the answer but I am still a bit confused. Since fork() is copy-on-write, we should not invoke any additional cost when spawning new processes in python multiprocessing library. But the following code suggests there is a huge overhead:
from multiprocessing import Pool, Manager
import numpy as np;
import time
def f(arr):
return len(arr)
t = time.time()
arr = np.arange(10000000)
print "construct array = ", time.time() - t;
pool = Pool(processes = 6)
t = time.time()
res = pool.apply_async(f, [arr,])
res.get()
print "multiprocessing overhead = ", time.time() - t;
output (and by the way, the cost increases as the size of the array increases, so I suspect there is still overhead related to memory copying):
construct array = 0.0178790092468
multiprocessing overhead = 0.252444982529
Why is there such huge overhead, if we didn't copy the array? And what part does the shared memory save me?

If you use an operating system that uses copy-on-write fork() semantics (like any common unix), then as long as you never alter your data structure it will be available to all child processes without taking up additional memory. You will not have to do anything special (except make absolutely sure you don't alter the object).
The most efficient thing you can do for your problem would be to pack your array into an efficient array structure (using numpy or array), place that in shared memory, wrap it with multiprocessing.Array, and pass that to your functions. This answer shows how to do that.
If you want a writeable shared object, then you will need to wrap it with some kind of synchronization or locking. multiprocessing provides two methods of doing this: one using shared memory (suitable for simple values, arrays, or ctypes) or a Manager proxy, where one process holds the memory and a manager arbitrates access to it from other processes (even over a network).
The Manager approach can be used with arbitrary Python objects, but will be slower than the equivalent using shared memory because the objects need to be serialized/deserialized and sent between processes.
There are a wealth of parallel processing libraries and approaches available in Python. multiprocessing is an excellent and well rounded library, but if you have special needs perhaps one of the other approaches may be better.

I run into the same problem and wrote a little shared-memory utility class to work around it.
I'm using multiprocessing.RawArray (lockfree), and also the access to the arrays is not synchronized at all (lockfree), be careful not to shoot your own feet.
With the solution I get speedups by a factor of approx 3 on a quad-core i7.
Here's the code:
Feel free to use and improve it, and please report back any bugs.
'''
Created on 14.05.2013
#author: martin
'''
import multiprocessing
import ctypes
import numpy as np
class SharedNumpyMemManagerError(Exception):
pass
'''
Singleton Pattern
'''
class SharedNumpyMemManager:
_initSize = 1024
_instance = None
def __new__(cls, *args, **kwargs):
if not cls._instance:
cls._instance = super(SharedNumpyMemManager, cls).__new__(
cls, *args, **kwargs)
return cls._instance
def __init__(self):
self.lock = multiprocessing.Lock()
self.cur = 0
self.cnt = 0
self.shared_arrays = [None] * SharedNumpyMemManager._initSize
def __createArray(self, dimensions, ctype=ctypes.c_double):
self.lock.acquire()
# double size if necessary
if (self.cnt >= len(self.shared_arrays)):
self.shared_arrays = self.shared_arrays + [None] * len(self.shared_arrays)
# next handle
self.__getNextFreeHdl()
# create array in shared memory segment
shared_array_base = multiprocessing.RawArray(ctype, np.prod(dimensions))
# convert to numpy array vie ctypeslib
self.shared_arrays[self.cur] = np.ctypeslib.as_array(shared_array_base)
# do a reshape for correct dimensions
# Returns a masked array containing the same data, but with a new shape.
# The result is a view on the original array
self.shared_arrays[self.cur] = self.shared_arrays[self.cnt].reshape(dimensions)
# update cnt
self.cnt += 1
self.lock.release()
# return handle to the shared memory numpy array
return self.cur
def __getNextFreeHdl(self):
orgCur = self.cur
while self.shared_arrays[self.cur] is not None:
self.cur = (self.cur + 1) % len(self.shared_arrays)
if orgCur == self.cur:
raise SharedNumpyMemManagerError('Max Number of Shared Numpy Arrays Exceeded!')
def __freeArray(self, hdl):
self.lock.acquire()
# set reference to None
if self.shared_arrays[hdl] is not None: # consider multiple calls to free
self.shared_arrays[hdl] = None
self.cnt -= 1
self.lock.release()
def __getArray(self, i):
return self.shared_arrays[i]
#staticmethod
def getInstance():
if not SharedNumpyMemManager._instance:
SharedNumpyMemManager._instance = SharedNumpyMemManager()
return SharedNumpyMemManager._instance
#staticmethod
def createArray(*args, **kwargs):
return SharedNumpyMemManager.getInstance().__createArray(*args, **kwargs)
#staticmethod
def getArray(*args, **kwargs):
return SharedNumpyMemManager.getInstance().__getArray(*args, **kwargs)
#staticmethod
def freeArray(*args, **kwargs):
return SharedNumpyMemManager.getInstance().__freeArray(*args, **kwargs)
# Init Singleton on module load
SharedNumpyMemManager.getInstance()
if __name__ == '__main__':
import timeit
N_PROC = 8
INNER_LOOP = 10000
N = 1000
def propagate(t):
i, shm_hdl, evidence = t
a = SharedNumpyMemManager.getArray(shm_hdl)
for j in range(INNER_LOOP):
a[i] = i
class Parallel_Dummy_PF:
def __init__(self, N):
self.N = N
self.arrayHdl = SharedNumpyMemManager.createArray(self.N, ctype=ctypes.c_double)
self.pool = multiprocessing.Pool(processes=N_PROC)
def update_par(self, evidence):
self.pool.map(propagate, zip(range(self.N), [self.arrayHdl] * self.N, [evidence] * self.N))
def update_seq(self, evidence):
for i in range(self.N):
propagate((i, self.arrayHdl, evidence))
def getArray(self):
return SharedNumpyMemManager.getArray(self.arrayHdl)
def parallelExec():
pf = Parallel_Dummy_PF(N)
print(pf.getArray())
pf.update_par(5)
print(pf.getArray())
def sequentialExec():
pf = Parallel_Dummy_PF(N)
print(pf.getArray())
pf.update_seq(5)
print(pf.getArray())
t1 = timeit.Timer("sequentialExec()", "from __main__ import sequentialExec")
t2 = timeit.Timer("parallelExec()", "from __main__ import parallelExec")
print("Sequential: ", t1.timeit(number=1))
print("Parallel: ", t2.timeit(number=1))

This is the intended use case for Ray, which is a library for parallel and distributed Python. Under the hood, it serializes objects using the Apache Arrow data layout (which is a zero-copy format) and stores them in a shared-memory object store so they can be accessed by multiple processes without creating copies.
The code would look like the following.
import numpy as np
import ray
ray.init()
#ray.remote
def func(array, param):
# Do stuff.
return 1
array = np.ones(10**6)
# Store the array in the shared memory object store once
# so it is not copied multiple times.
array_id = ray.put(array)
result_ids = [func.remote(array_id, i) for i in range(4)]
output = ray.get(result_ids)
If you don't call ray.put then the array will still be stored in shared memory, but that will be done once per invocation of func, which is not what you want.
Note that this will work not only for arrays but also for objects that contain arrays, e.g., dictionaries mapping ints to arrays as below.
You can compare the performance of serialization in Ray versus pickle by running the following in IPython.
import numpy as np
import pickle
import ray
ray.init()
x = {i: np.ones(10**7) for i in range(20)}
# Time Ray.
%time x_id = ray.put(x) # 2.4s
%time new_x = ray.get(x_id) # 0.00073s
# Time pickle.
%time serialized = pickle.dumps(x) # 2.6s
%time deserialized = pickle.loads(serialized) # 1.9s
Serialization with Ray is only slightly faster than pickle, but deserialization is 1000x faster because of the use of shared memory (this number will of course depend on the object).
See the Ray documentation. You can read more about fast serialization using Ray and Arrow. Note I'm one of the Ray developers.

Like Robert Nishihara mentioned, Apache Arrow makes this easy, specifically with the Plasma in-memory object store, which is what Ray is built on.
I made brain-plasma specifically for this reason - fast loading and reloading of big objects in a Flask app. It's a shared-memory object namespace for Apache Arrow-serializable objects, including pickle'd bytestrings generated by pickle.dumps(...).
The key difference with Apache Ray and Plasma is that it keeps track of object IDs for you. Any processes or threads or programs that are running on locally can share the variables' values by calling the name from any Brain object.
$ pip install brain-plasma
$ plasma_store -m 10000000 -s /tmp/plasma
from brain_plasma import Brain
brain = Brain(path='/tmp/plasma/')
brain['a'] = [1]*10000
brain['a']
# >>> [1,1,1,1,...]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Redefine global JIT function and maintain performance - python

Related

Cython Optimization of Numpy for Loop

Python gsl_vector_set

Changing some part of a function in a module

Passing string to Fortran DLL using ctypes and Python

Shared-memory objects in multiprocessing

Categories

Resources