I've converted a function to use threads (as per this answer). It behaves as expected in tests (that is, it returns identical values to the non-threaded version). However, calling it from Python using ctypes causes the calling process to crash.
First, the working function:
#[no_mangle]
pub extern fn convert_vec(lon: Array, lat: Array) -> Array {
// snip
// orig is a Vec<(f32, f32)>
// convert is a conversion function
let result: Vec<(i32, i32)> = orig.iter()
.map(|elem| convert(elem.0, elem.1))
.collect();
// convert back to vector of unsigned integer Tuples
let nvec = result.iter()
.map(|ints| Tuple { a: ints.0 as u32, b: ints.1 as u32 })
.collect();
Array::from_vec(nvec)
}
And now the threaded version, which passes tests (using cargo test) but crashes when called from Python:
#[no_mangle]
pub extern fn convert_vec_threaded(lon: Array, lat: Array) -> Array {
// snip
// orig is a Vec<(f32, f32)>
// convert is a conversion function
let mut guards: Vec<JoinHandle<Vec<(i32, i32)>>> = vec!();
// split into slices
for chunk in orig.chunks(orig.len() / NUMTHREADS as usize) {
let chunk = chunk.to_owned();
let g = thread::spawn(move || chunk
.into_iter()
.map(|elem| convert(elem.0, elem.1))
.collect());
guards.push(g);
}
let mut result: Vec<(i32, i32)> = Vec::with_capacity(orig.len());
for g in guards {
result.extend(g.join().unwrap().into_iter());
}
// convert back to vector of unsigned integer Tuples
let nvec = result.iter()
.map(|ints| Tuple { a: ints.0 as u32, b: ints.1 as u32 })
.collect();
Array::from_vec(nvec)
}
The complete testable example is available here
From the error message it looks like you used a chunk size of 0 for some inputs. [T]::chunks(size) will assert that size != 0.
If we want NUMTHREADS chunks, we could split it like this:
// Divide into NUMTHREADS chunks
let mut size = orig.len() / NUMTHREADS;
if orig.len() % NUMTHREADS > 0 { size += 1; }
// If we want to avoid the case where orig.len() == 0, we need another adjustment:
size = std::cmp::max(1, size);
Related
I have written a short function to convert an input decimal number to a binary output. However, at a much higher level of the code, the end user should toggle an option as to whether or not they desire a 5B or 10B value. For the sake of some other low level maths, I have to clip the data here.
So I need some help figuring out how to clip the output to a desired length and stuff the required number of leading zeros.
The incomplete C code:
long dec2bin(int x_dec,int res)
{
long x_bin = 0;
int x_bin_len;
int x_rem, i = 1;
while (x_dec != 0)
{
x_rem = x_dec % 2;
x_dec /= 2;
x_bin += x_rem * i;
i *= 10;
}
return x_bin;
}
I had completed a working proof of concept using python. The end application however, requires I write this in C.
The working python script:
def dec2bin(x_dec,x_res):
x_bin = bin(x_dec)[2:] #Convert to Binary (Remove 0B Prefix)
x_len = len(x_bin)
if x_len < x_res: #If Smaller than desired resolution
x_bin = '0' * (x_res-x_len) + x_bin #Stuff with leading 0s
if x_len > x_res: #If larger than desired resolution
x_bin = x_bin[x_len-x_res:x_len] #Display desired no. LSBs
return x_bin
I'm sure this has been done before, Indeed, my python script proves it should be relatively straightforward, but I'm not as experienced with C.
Any help is greatly appreciated.
Mark.
As #yano suggested, I think you have to return an ascii string to the caller, rather than a long. Below's the short function I wrote for my own purposes, for any base...
char *itoa ( int i, int base, int ndigits ) {
static char a[999], digits[99] = /* up to base 65 */
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#$*";
int n=ndigits;
memset(a,'0',ndigits); a[ndigits]='\000';
while ( --n >= 0) {
a[n] = digits[i%base];
if ( (i/=base) < 1 ) break; }
return ( a );
} /* --- end-of-function itoa() --- */
I've written a program that will check if a given string has all characters unique or not. I usually write in Python, but I'm learning C++ and I wanted to write the program using it. I get an error when I translate Python into C++: Thread 1: EXC_BAD_ACCESS (code=257, address=0x100000001)
I am using Xcode. When I run this program, I get the above error:
#include <iostream>
using namespace std;
int isUnique(string str) {
int arr[] = {};
for (int i = 0; i < str.length(); ++i) {
arr[i] = 0;
}
for (int j = 0; j < str.length(); ++j) {
arr[j] += 1;
}
for (int k = 0; k < sizeof(arr)/sizeof(arr[0]); ++k) {
if (arr[k] > 1) {
return false;
}
}
return true;
}
int main() {
string str;
cout << "Enter a string: ";
getline(cin, str);
cout << isUnique(str) << endl;
}
Here is the original code I wrote in Python:
def is_unique(string):
chars = []
for i in range(len(string)):
chars.append(0)
chars[string.find(string[i])] += 1 # I am using find and not just i because I want the first occurrence of the substring in the string to update it to 2 if it happens twice, 3 if it is thrice, etc.
for k in chars:
if k > 1: # Note that I'm checking for > 1
return False
return True
# Driver code
if __name__ == "__main__":
print(is_unique("abcd"))
When run, this outputs True, which means that the string has unique characters only. Change print(is_unique("abcd") to something else with a word without only unique characters, such as print(is_unique("hello") to get False.
When I translated this into C++, the Xcode terminal shows '(lldb)', and the Xcode editor opens up a file 0_mh_execute_header and its contents are as follows:
dsa`_mh_execute_header:
0x100000000 <+0>: .long 0xfeedfacf ; unknown opcode
0x100000004 <+4>: .long 0x0100000c ; unknown opcode
0x100000008 <+8>: udf #0x0
0x10000000c <+12>: udf #0x2
0x100000010 <+16>: udf #0x12
0x100000014 <+20>: udf #0x638
0x100000018 <+24>: .long 0x00218085 ; unknown opcode
0x10000001c <+28>: udf #0x0
0x100000020 <+32>: udf #0x19
0x100000024 <+36>: udf #0x48
0x100000028 <+40>: .long 0x41505f5f ; unknown opcode
0x10000002c <+44>: saddwt z7.h, z10.h, z26.b
0x100000030 <+48>: udf #0x4f52
0x100000034 <+52>: udf #0x0
0x100000038 <+56>: udf #0x0
0x10000003c <+60>: udf #0x0
0x100000040 <+64>: udf #0x0
0x100000044 <+68>: udf #0x1
0x100000048 <+72>: udf #0x0
0x10000004c <+76>: udf #0x0
0x100000050 <+80>: udf #0x0
0x100000054 <+84>: udf #0x0
...
NOTE: ... in the above means that it continues on. Stack Overflow allows only 30000 characters in the body, but this will exceed 950000
On line 1, Xcode shows an error: Thread 1: EXC_BAD_ACCESS (code=257, address=0x100000001) on the right side of the file (like it usually does when there are compiler issues).
Do you know how to solve this?
The problem is here:
int arr[] = {};
The array you're creating has length 0 which you can verify using
cout << "sizeof(arr): " << sizeof(arr) << endl;
The error occurs when you try to access values beyond the size of the array here:
arr[i] = 0;
What you need to do is specify a size of the array, for example int arr[128]; which creates an array that can hold 128 ints, which covers the range of 7-bit-ASCII. Or use a vector, which you can change the size of.
I will also point out that the logic as it is doesn't work, what you might want to do is
int isUnique(string str) {
// Create an array that holds 128 ints and initialize it to 0
int arr[128] = {0};
// First loop no longer needed
for (int i = 0; i < str.length(); ++i) {
// Increment count for cell that corresponds to the character
char c = str[i];
arr[c] += 1;
}
// Note that you can reuse variable name when previous one
// has fallen out of scope
for (int i = 0; i < sizeof(arr)/sizeof(arr[0]); ++i) {
if (arr[i] > 1) {
return false;
}
}
return true;
}
I suggest you read more on the C++ memory model.
The problem lies here:
int arr[] = {};
Arrays in C and C++ are not dynamic. What you have created there is an array with 0 elements, and that's what it forevermore will be. So, when you do:
arr[i] = 0;
you are writing off the end of the array into random memory. If you want the array to be the same length as the string, you would need:
int arr[str.size()];
Or, use a vector:
std::vector arr(str.size());
I am currently migrating to Cython a set of functions that are currently implemented in C++ through scipy.weave (now deprecated).
These functions operate on timeseries points that are 2D-lists (eg. [[17100, 19.2], [17101, 20.7], [17102, 20.3], ...]) both in input and in output. A sample function is subtract that accepts two timeseries and calculates a new timeserie as subtraction of the two inputs going date-by-date.
The structure and the interfaces have to be mantained for retrocompatibility, but my profiling trials show that Cython porting is about 30%-40% slower than the original scipy.weave implementation.
I have tried many ways to optimize (inner conversions to Numpy arrays and memoryviews, C pointers, ...), but the conversion time required lenghtens the overall execution time. Even trying to define input and output as C++ vectors, leveraging on Cython implicit conversions doesn't seem to be effective in order to mantain scipy.weave speed. I have also used the various hints on boundscheck, wraparound, division, ...
The highest slow-downs seem to be on functions that uses nested loops and I've seen that a little gain can be predefining the list size (cdef list target = [[-1, float('nan')]]*size).
I am aware that Cython can't be so much performing on Python structures, especially lists, but are there any other tricks or techniques with which a speedup can be obtained?
=== EDIT - ADD CODE EXAMPLE ===
A good example of the typology of functions is the following.
The function takes a 2-D list of dates/prices and a 2-D list of dates/decimal factors and searches matching dates between the two lists, calculating the output on the corresponding price/factor by multiplying or dividing (that is a third input parameter).
My best-performing cython code:
#cython.cdivision(True)
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef apply_conversion(list original_timeserie, list factor_timeserie, int divide_or_multiply=False):
cdef:
Py_ssize_t i, j = 0, size = len(original_timeserie), size2 = len(factor_timeserie)
long original_date, factor_date
double original_price, factor_price, conv_price
list result = []
for i in range(size):
original_date = original_timeserie[i][0]
for j in range(j, size2):
factor_date = factor_timeserie[j][0]
if original_date == factor_date:
original_price = original_timeserie[i][1]
factor_price = factor_timeserie[j][1]
if divide_or_multiply:
if factor_price != 0:
conv_price = original_price / factor_price
else:
conv_price = float('inf')
else:
conv_price = original_price * factor_price
result.append([original_date, conv_price])
break
return result
The original scipy function:
int len = original_timeserie.length();
int len2 = factor_timeserie.length();
PyObject* py_serieconv = PyList_New(len);
PyObject* original_item = NULL;
PyObject* factor_item = NULL;
PyObject* date = NULL;
PyObject* value = NULL;
long original_date = 0;
long factor_date = 0;
double original_price = 0;
double factor_price = 0;
int j = 0;
for(int i=0;i<len;i++) {
original_item = PyList_GetItem(original_timeserie, i);
date = PyList_GetItem(original_item, 0);
original_date = PyInt_AsLong(date);
original_price = PyFloat_AsDouble( PyList_GetItem(original_item, 1) );
factor_item = NULL;
for(;j<len2;) {
factor_item = PyList_GetItem(factor_timeserie, j++);
factor_date = PyInt_AsLong(PyList_GetItem(factor_item, 0));
if (factor_date == original_date) {
factor_price = PyFloat_AsDouble(PyList_GetItem(factor_item, 1));
value = PyFloat_FromDouble(original_price * (divide_or_multiply==0 ? factor_price : 1/factor_price));
PyObject* py_new_item = PyList_New(2);
Py_XINCREF(date);
PyList_SetItem(py_new_item, 0, date);
PyList_SetItem(py_new_item, 1, value);
PyList_SetItem(py_serieconv, i, py_new_item);
break;
}
}
}
return_val = py_serieconv;
Py_XDECREF(py_serieconv);
I'm implementing a simple Xor Reducer, but it is unable to return the appropriate value.
Python Code (Input):
class LazySpecializedFunctionSubclass(LazySpecializedFunction):
subconfig_type = namedtuple('subconfig',['dtype','ndim','shape','size','flags'])
def __init__(self, py_ast = None):
py_ast = py_ast or get_ast(self.kernel)
super(LazySlimmy, self).__init__(py_ast)
# [... other code ...]
def points(self, inpt):
iter = np.nditer(input, flags=['c_index'])
while not iter.finished:
yield iter.index
iter.iternext()
class XorReduction(LazySpecializedFunctionSubclass):
def kernel(self, inpt):
'''
Calculates the cumulative XOR of elements in inpt, equivalent to
Reduce with XOR
'''
result = 0
for point in self.points(inpt): # self.points is defined in LazySpecializedFunctionSubclass
result = point ^ result # notice how 'point' here is the actual element in self.points(inpt), not the index
return result
C Code (Output):
// <file: module.c>
void kernel(long* inpt, long* output) {
long result = 0;
for (int point = 0; point < 2; point ++) {
result = point ^ result; // Notice how it's using the index, point, not inpt[point].
};
* output = result;
};
Any ideas how to fix this?
The problem is that you are using point in different ways, in XorReduction kernel method you are iterating of the values in the array, but in the generated C code you are iterating over the indices of the array. Your C code xor reduction is thus done on the indices.
The generated C function should look more like
// <file: module.c>
void kernel(long* inpt, long* output) {
long result = 0;
for (int point = 0; point < 2; point ++) {
result = inpt[point] ^ result; // you did not reference your input in the question
};
* output = result;
};
So my python program is
from ctypes import *
import ctypes
number = [0,1,2]
testlib = cdll.LoadLibrary("./a.out")
testlib.init.argtypes = [ctypes.c_int]
testlib.init.restype = ctypes.c_double
#create an array of size 3
testlib.init(3)
#Loop to fill the array
#use AccessArray to preform an action on the array
And the C part is
#include <stdio.h>
double init(int size){
double points[size];
return points[0];
}
double fillarray(double value, double location){
// i need to access
}
double AccessArray(double value, double location){
// i need to acess the array that is filled in the previous function
}
So what I need to do is to pass an array from the python part to the C function somehow move that array in C to the another function where I will access it in order to process it.
I'm stuck though because I cant figure out a way to move the array in the C part.
can someone show me how to do this?
You should try something like this (in your C code):
#include <stdio.h>
double points[1000];//change 1000 for the maximum size for you
int sz = 0;
double init(int size){
//verify size <= maximum size for the array
for(int i=0;i<size;i++) {
points[i] = 1;//change 1 for the init value for you
}
sz = size;
return points[0];
}
double fillarray(double value, double location){
//first verify 0 < location < sz
points[(int)location] = value;
}
double AccessArray(double value, double location){
//first verify 0 < location < sz
return points[(int)location];
}
This is a very simple solution but if you need to allocate an array with just any size you shoul study the use of malloc
Maybe something like this?
$ cat Makefile
go: a.out
./c-double
a.out: c.c
gcc -fpic -shared c.c -o a.out
zareason-dstromberg:~/src/outside-questions/c-double x86_64-pc-linux-gnu 27062 - above cmd done 2013 Fri Dec 27 11:03 AM
$ cat c.c
#include <stdio.h>
#include <malloc.h>
double *init(int size) {
double *points;
points = malloc(size * sizeof(double));
return points;
}
double fill_array(double *points, int size) {
int i;
for (i=0; i < size; i++) {
points[i] = (double) i;
}
}
double access_array(double *points, int size) {
// i need to access the array that is filled in the previous function
int i;
for (i=0; i < size; i++) {
printf("%d: %f\n", i, points[i]);
}
}
zareason-dstromberg:~/src/outside-questions/c-double x86_64-pc-linux-gnu 27062 - above cmd done 2013 Fri Dec 27 11:03 AM
$ cat c-double
#!/usr/local/cpython-3.3/bin/python
import ctypes
testlib = ctypes.cdll.LoadLibrary("./a.out")
testlib.init.argtypes = [ctypes.c_int]
testlib.init.restype = ctypes.c_void_p
#create an array of size 3
size = 3
double_array = testlib.init(size)
#Loop to fill the array
testlib.fill_array(double_array, size)
#use AccessArray to preform an action on the array
testlib.access_array(double_array, size)