Python implementation faster than C - python

I apologise if comparisons are not supposed to work this way. I'm new to programming and just curious as to why this is the case.
I have a large binary file containing word embeddings (4.5gb). Each line has a word followed by its embedding which is comprised of 300 float values. I'm simply finding the total number of lines.
For C, I use mmap:
int fd;
struct stat sb;
off_t offset = 0, pa_offset;
size_t length, i;
char *addr;
int count = 0;
fd = open("processed_data/crawl-300d-2M.vec", O_RDONLY);
if(fd == -1){
handle_error("open");
exit(1);
}
if(fstat(fd, &sb) < 0){
handle_error("fstat");
close(fd);
exit(1);
}
pa_offset = offset & ~(sysconf(_SC_PAGE_SIZE) - 1);
if(offset >= sb.st_size){
fprintf(stderr, "offset is past end of file\n");
exit(EXIT_FAILURE);
}
length = sb.st_size - offset;
addr = mmap(0, (length + offset - pa_offset), PROT_READ, MAP_SHARED, fd, pa_offset);
if (addr == MAP_FAILED) handle_error("mmap");
//Timing only this loop
clock_t begin = clock();
for(i=0;i<length;i++){
if(*(addr+i) == '\n') count++;
}
printf("%d\n", count);
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("%f\n", time_spent);
This takes 11.283060 seconds.
Python:
file = open('processed_data/crawl-300d-2M.vec', 'r')
count = 0
start_time = timeit.default_timer()
for line in file:
count += 1
print(count)
elapsed = timeit.default_timer() - start_time
print(elapsed)
This takes 3.0633065439997154 seconds.
Doesn't the Python code read each character to find new lines? If so, why is my C code so inefficient?

Hard to say, because I assume that it will be heavily implementation dependant. But at first glance, the main difference between your Python and C programs is that the C program uses mmap. It is a very powerful tool (that you do not really need here...) and as such can come with some overhead. As the reference Python implementation is written in C, it is likely that the loop
for line in file:
count += 1
will end in a loop over a tiny function calling fgets. I would bet a coin that a naive C program using fgets will be slightly faster than the Python equivalent, because it will save all the Python overhead. But IMHO there is no surprise that using mmap in C is less efficient than fgets in Python

Related

Clipping a binary number to required length C/C++

I have written a short function to convert an input decimal number to a binary output. However, at a much higher level of the code, the end user should toggle an option as to whether or not they desire a 5B or 10B value. For the sake of some other low level maths, I have to clip the data here.
So I need some help figuring out how to clip the output to a desired length and stuff the required number of leading zeros.
The incomplete C code:
long dec2bin(int x_dec,int res)
{
long x_bin = 0;
int x_bin_len;
int x_rem, i = 1;
while (x_dec != 0)
{
x_rem = x_dec % 2;
x_dec /= 2;
x_bin += x_rem * i;
i *= 10;
}
return x_bin;
}
I had completed a working proof of concept using python. The end application however, requires I write this in C.
The working python script:
def dec2bin(x_dec,x_res):
x_bin = bin(x_dec)[2:] #Convert to Binary (Remove 0B Prefix)
x_len = len(x_bin)
if x_len < x_res: #If Smaller than desired resolution
x_bin = '0' * (x_res-x_len) + x_bin #Stuff with leading 0s
if x_len > x_res: #If larger than desired resolution
x_bin = x_bin[x_len-x_res:x_len] #Display desired no. LSBs
return x_bin
I'm sure this has been done before, Indeed, my python script proves it should be relatively straightforward, but I'm not as experienced with C.
Any help is greatly appreciated.
Mark.
As #yano suggested, I think you have to return an ascii string to the caller, rather than a long. Below's the short function I wrote for my own purposes, for any base...
char *itoa ( int i, int base, int ndigits ) {
static char a[999], digits[99] = /* up to base 65 */
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#$*";
int n=ndigits;
memset(a,'0',ndigits); a[ndigits]='\000';
while ( --n >= 0) {
a[n] = digits[i%base];
if ( (i/=base) < 1 ) break; }
return ( a );
} /* --- end-of-function itoa() --- */

Thread 1: EXC_BAD_ACCESS (code=257, address=0x100000001) in C++

I've written a program that will check if a given string has all characters unique or not. I usually write in Python, but I'm learning C++ and I wanted to write the program using it. I get an error when I translate Python into C++: Thread 1: EXC_BAD_ACCESS (code=257, address=0x100000001)
I am using Xcode. When I run this program, I get the above error:
#include <iostream>
using namespace std;
int isUnique(string str) {
int arr[] = {};
for (int i = 0; i < str.length(); ++i) {
arr[i] = 0;
}
for (int j = 0; j < str.length(); ++j) {
arr[j] += 1;
}
for (int k = 0; k < sizeof(arr)/sizeof(arr[0]); ++k) {
if (arr[k] > 1) {
return false;
}
}
return true;
}
int main() {
string str;
cout << "Enter a string: ";
getline(cin, str);
cout << isUnique(str) << endl;
}
Here is the original code I wrote in Python:
def is_unique(string):
chars = []
for i in range(len(string)):
chars.append(0)
chars[string.find(string[i])] += 1 # I am using find and not just i because I want the first occurrence of the substring in the string to update it to 2 if it happens twice, 3 if it is thrice, etc.
for k in chars:
if k > 1: # Note that I'm checking for > 1
return False
return True
# Driver code
if __name__ == "__main__":
print(is_unique("abcd"))
When run, this outputs True, which means that the string has unique characters only. Change print(is_unique("abcd") to something else with a word without only unique characters, such as print(is_unique("hello") to get False.
When I translated this into C++, the Xcode terminal shows '(lldb)', and the Xcode editor opens up a file 0_mh_execute_header and its contents are as follows:
dsa`_mh_execute_header:
0x100000000 <+0>: .long 0xfeedfacf ; unknown opcode
0x100000004 <+4>: .long 0x0100000c ; unknown opcode
0x100000008 <+8>: udf #0x0
0x10000000c <+12>: udf #0x2
0x100000010 <+16>: udf #0x12
0x100000014 <+20>: udf #0x638
0x100000018 <+24>: .long 0x00218085 ; unknown opcode
0x10000001c <+28>: udf #0x0
0x100000020 <+32>: udf #0x19
0x100000024 <+36>: udf #0x48
0x100000028 <+40>: .long 0x41505f5f ; unknown opcode
0x10000002c <+44>: saddwt z7.h, z10.h, z26.b
0x100000030 <+48>: udf #0x4f52
0x100000034 <+52>: udf #0x0
0x100000038 <+56>: udf #0x0
0x10000003c <+60>: udf #0x0
0x100000040 <+64>: udf #0x0
0x100000044 <+68>: udf #0x1
0x100000048 <+72>: udf #0x0
0x10000004c <+76>: udf #0x0
0x100000050 <+80>: udf #0x0
0x100000054 <+84>: udf #0x0
...
NOTE: ... in the above means that it continues on. Stack Overflow allows only 30000 characters in the body, but this will exceed 950000
On line 1, Xcode shows an error: Thread 1: EXC_BAD_ACCESS (code=257, address=0x100000001) on the right side of the file (like it usually does when there are compiler issues).
Do you know how to solve this?
The problem is here:
int arr[] = {};
The array you're creating has length 0 which you can verify using
cout << "sizeof(arr): " << sizeof(arr) << endl;
The error occurs when you try to access values beyond the size of the array here:
arr[i] = 0;
What you need to do is specify a size of the array, for example int arr[128]; which creates an array that can hold 128 ints, which covers the range of 7-bit-ASCII. Or use a vector, which you can change the size of.
I will also point out that the logic as it is doesn't work, what you might want to do is
int isUnique(string str) {
// Create an array that holds 128 ints and initialize it to 0
int arr[128] = {0};
// First loop no longer needed
for (int i = 0; i < str.length(); ++i) {
// Increment count for cell that corresponds to the character
char c = str[i];
arr[c] += 1;
}
// Note that you can reuse variable name when previous one
// has fallen out of scope
for (int i = 0; i < sizeof(arr)/sizeof(arr[0]); ++i) {
if (arr[i] > 1) {
return false;
}
}
return true;
}
I suggest you read more on the C++ memory model.
The problem lies here:
int arr[] = {};
Arrays in C and C++ are not dynamic. What you have created there is an array with 0 elements, and that's what it forevermore will be. So, when you do:
arr[i] = 0;
you are writing off the end of the array into random memory. If you want the array to be the same length as the string, you would need:
int arr[str.size()];
Or, use a vector:
std::vector arr(str.size());

Optimize cython functions operating on python lists

I am currently migrating to Cython a set of functions that are currently implemented in C++ through scipy.weave (now deprecated).
These functions operate on timeseries points that are 2D-lists (eg. [[17100, 19.2], [17101, 20.7], [17102, 20.3], ...]) both in input and in output. A sample function is subtract that accepts two timeseries and calculates a new timeserie as subtraction of the two inputs going date-by-date.
The structure and the interfaces have to be mantained for retrocompatibility, but my profiling trials show that Cython porting is about 30%-40% slower than the original scipy.weave implementation.
I have tried many ways to optimize (inner conversions to Numpy arrays and memoryviews, C pointers, ...), but the conversion time required lenghtens the overall execution time. Even trying to define input and output as C++ vectors, leveraging on Cython implicit conversions doesn't seem to be effective in order to mantain scipy.weave speed. I have also used the various hints on boundscheck, wraparound, division, ...
The highest slow-downs seem to be on functions that uses nested loops and I've seen that a little gain can be predefining the list size (cdef list target = [[-1, float('nan')]]*size).
I am aware that Cython can't be so much performing on Python structures, especially lists, but are there any other tricks or techniques with which a speedup can be obtained?
=== EDIT - ADD CODE EXAMPLE ===
A good example of the typology of functions is the following.
The function takes a 2-D list of dates/prices and a 2-D list of dates/decimal factors and searches matching dates between the two lists, calculating the output on the corresponding price/factor by multiplying or dividing (that is a third input parameter).
My best-performing cython code:
#cython.cdivision(True)
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef apply_conversion(list original_timeserie, list factor_timeserie, int divide_or_multiply=False):
cdef:
Py_ssize_t i, j = 0, size = len(original_timeserie), size2 = len(factor_timeserie)
long original_date, factor_date
double original_price, factor_price, conv_price
list result = []
for i in range(size):
original_date = original_timeserie[i][0]
for j in range(j, size2):
factor_date = factor_timeserie[j][0]
if original_date == factor_date:
original_price = original_timeserie[i][1]
factor_price = factor_timeserie[j][1]
if divide_or_multiply:
if factor_price != 0:
conv_price = original_price / factor_price
else:
conv_price = float('inf')
else:
conv_price = original_price * factor_price
result.append([original_date, conv_price])
break
return result
The original scipy function:
int len = original_timeserie.length();
int len2 = factor_timeserie.length();
PyObject* py_serieconv = PyList_New(len);
PyObject* original_item = NULL;
PyObject* factor_item = NULL;
PyObject* date = NULL;
PyObject* value = NULL;
long original_date = 0;
long factor_date = 0;
double original_price = 0;
double factor_price = 0;
int j = 0;
for(int i=0;i<len;i++) {
original_item = PyList_GetItem(original_timeserie, i);
date = PyList_GetItem(original_item, 0);
original_date = PyInt_AsLong(date);
original_price = PyFloat_AsDouble( PyList_GetItem(original_item, 1) );
factor_item = NULL;
for(;j<len2;) {
factor_item = PyList_GetItem(factor_timeserie, j++);
factor_date = PyInt_AsLong(PyList_GetItem(factor_item, 0));
if (factor_date == original_date) {
factor_price = PyFloat_AsDouble(PyList_GetItem(factor_item, 1));
value = PyFloat_FromDouble(original_price * (divide_or_multiply==0 ? factor_price : 1/factor_price));
PyObject* py_new_item = PyList_New(2);
Py_XINCREF(date);
PyList_SetItem(py_new_item, 0, date);
PyList_SetItem(py_new_item, 1, value);
PyList_SetItem(py_serieconv, i, py_new_item);
break;
}
}
}
return_val = py_serieconv;
Py_XDECREF(py_serieconv);

Finding if two strings are almost similar

I want to find out if you strings are almost similar. For example, string like 'Mohan Mehta' should match 'Mohan Mehte' and vice versa. Another example, string like 'Umesh Gupta' should match 'Umash Gupte'.
Basically one string is correct and other one is a mis-spelling of it. All my strings are names of people.
Any suggestions on how to achieve this.
Solution does not have to be 100 percent effective.
You can use difflib.sequencematcher if you want something from the stdlib:
from difflib import SequenceMatcher
s_1 = 'Mohan Mehta'
s_2 = 'Mohan Mehte'
print(SequenceMatcher(a=s_1,b=s_2).ratio())
0.909090909091
fuzzywuzzy is one of numerous libs that you can install, it uses the difflib module with python-Levenshtein. You should also check out the wikipage on Approximate_string_matching
Another approach is to use a "phonetic algorithm":
A phonetic algorithm is an algorithm for indexing of words by their pronunciation.
For example using the soundex algorithm:
>>> import soundex
>>> s = soundex.getInstance()
>>> s.soundex("Umesh Gupta")
'U5213'
>>> s.soundex("Umash Gupte")
'U5213'
>>> s.soundex("Umesh Gupta") == s.soundex("Umash Gupte")
True
What you want is a string distance. There many flavors, but I would recommend starting with the Levenshtein distance.
you might want to look at NLTK (The Natural Language Toolkit), specifically the nltk.metrics package, which implements various string distance algorithms, including the Levenshtein distance mentioned already.
You could split the string and check to see if it contains at least one first/last name that is correct.
// calculate the similarity between 2 strings
public static double similarity(String s1, String s2) {
String longer = s1, shorter = s2;
if (s1.length() < s2.length()) { // longer should always have greater length
longer = s2; shorter = s1;
}
int longerLength = longer.length();
if (longerLength == 0) { return 1.0; /* both strings are zero length */ }
/* // If you have StringUtils, you can use it to calculate the edit distance:
return (longerLength - StringUtils.getLevenshteinDistance(longer, shorter)) /
(double) longerLength; */
return (longerLength - editDistance(longer, shorter)) / (double) longerLength;
}
// Example implementation of the Levenshtein Edit Distance
// See http://rosettacode.org/wiki/Levenshtein_distance#Java
public static int editDistance(String s1, String s2) {
s1 = s1.toLowerCase();
s2 = s2.toLowerCase();
int[] costs = new int[s2.length() + 1];
for (int i = 0; i <= s1.length(); i++) {
int lastValue = i;
for (int j = 0; j <= s2.length(); j++) {
if (i == 0)
costs[j] = j;
else {
if (j > 0) {
int newValue = costs[j - 1];
if (s1.charAt(i - 1) != s2.charAt(j - 1))
newValue = Math.min(Math.min(newValue, lastValue),
costs[j]) + 1;
costs[j - 1] = lastValue;
lastValue = newValue;
}
}
}
if (i > 0)
costs[s2.length()] = lastValue;
}
return costs[s2.length()];
}

Memory Leak in Ctypes method

I have a project mostly written in Python. This project runs on my Raspberry Pi (Model B). With the use of the Pi Camera I record to a stream. Every second I pauze the recording to take the last frame from the stream and compare it with a older frame. The comparing is done in C code (mainly because it is faster than Python).
The C code is called from Python using Ctypes. See the code below.
# Load picturecomparer.so and set argument and return types
cmethod = ctypes.CDLL(Paths.CMODULE_LOCATION)
cmethod.compare_pictures.restype = ctypes.c_double
cmethod.compare_pictures.argtypes = [ctypes.c_char_p, types.c_char_p]
The 2 images that must be compared are stored on the disk. Python gives the paths of both images as arguments to the C code. The C code will return a value (double) which is the difference in percentage of both images.
# Call the C method to compare the images
difflevel = cmethod.compare_pictures(path1, path2)
The C code looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#ifndef STB_IMAGE_IMPLEMENTATION
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#ifndef STBI_ASSERT
#define STBI_ASSERT(x)
#endif
#endif
#define COLOR_R 0
#define COLOR_G 1
#define COLOR_B 2
#define OFFSET 10
double compare_pictures(const char* path1, const char* path2);
double compare_pictures(const char* path1, const char* path2)
{
double totalDiff = 0.0, value;
unsigned int x, y;
int width1, height1, comps1;
unsigned char * image1 = stbi_load(path1, &width1, &height1, &comps1, 0);
int width2, height2, comps2;
unsigned char * image2 = stbi_load(path2, &width2, &height2, &comps2, 0);
// Perform some checks to be sure images are valid
if (image1 == NULL | image2 == NULL) { return 0; }
if (width1 != width2 | height1 != height2) { return 0; }
for (y = 0; y < height1; y++)
{
for (x = 0; x < width1; x++)
{
// Calculate difference in RED
value = (int)image1[(x + y*width1) * comps1 + COLOR_R] - (int)image2[(x + y*width2) * comps2 + COLOR_R];
if (value < OFFSET && value > (OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
// Calculate difference in GREEN
value = (int)image1[(x + y*width1) * comps1 + COLOR_G] - (int)image2[(x + y*width2) * comps2 + COLOR_G];
if (value < OFFSET && value >(OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
// Calculate difference in BLUE
value = (int)image1[(x + y*width1) * comps1 + COLOR_B] - (int)image2[(x + y*width2) * comps2 + COLOR_B];
if (value < OFFSET && value >(OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
}
}
totalDiff = 100.0 * totalDiff / (double)(width1 * height1 * 3);
return totalDiff;
}
The C code will be executed every ~2 seconds. I just noticed that there is a memory leak. After around 10 to 15 minutes my Raspberry Pi haves like 10MB ram left to use. A few minutes later it crashes and doesn't respond anymore.
I have done some checks to find out what causes this in my project. My entire project uses around 30-40MB ram if I disable the C code. This project is all my Raspberry Pi have to execute.
Model B: 512MB ram which shares between CPU and GPU.
GPU: 128MB (/boot/config.txt).
My Linux distro uses: ~60MB.
So I have ~300MB for my project.
Hope someone could point me where it goes wrong, or if I have to call GC myself, etc..
Thanks in advance.
p.s. I know the image comparing is not the best way, but it works for me now.
Since the images are being returned as pointers to buffers stbi_load must be allocating space for them and you are not releasing this space before returning so the memory leak is not surprising.
Check for the documentation to see if there is a specific stpi_free function or try adding free(image1); free(image2); before the final return.
Having checked I can categorically say that you should be calling STBI_FREE(image1); STBI_FREE(image2); before returning.

Categories