I created Python Bindings using pybind11. Everything worked perfectly, but when I did a speed check test the result was disappointing.
Basically, I have a function in C++ that adds two numbers and I want to use that function from a Python script. I also included a for loop to ran 100 times to better view the difference in the processing time.
For the function "imported" from C++, using pybind11, I obtain: 0.002310514450073242 ~ 0.0034799575805664062
For the simple Python script, I obtain: 0.0012788772583007812 ~ 0.0015883445739746094
main.cpp file:
#include <pybind11/pybind11.h>
namespace py = pybind11;
double sum(double a, double b) {
return a + b;
}
PYBIND11_MODULE(SumFunction, var) {
var.doc() = "pybind11 example module";
var.def("sum", &sum, "This function adds two input numbers");
}
main.py file:
from build.SumFunction import *
import time
start = time.time()
for i in range(100):
print(sum(2.3,5.2))
end = time.time()
print(end - start)
CMakeLists.txt file:
cmake_minimum_required(VERSION 3.0.0)
project(Projectpybind11 VERSION 0.1.0)
include(CTest)
enable_testing()
add_subdirectory(pybind11)
pybind11_add_module(SumFunction main.cpp)
set(CPACK_PROJECT_NAME ${PROJECT_NAME})
set(CPACK_PROJECT_VERSION ${PROJECT_VERSION})
include(CPack)
Simple Python script:
import time
def summ(a,b):
return a+b
start = time.time()
for i in range(100):
print(summ(2.3,5.2))
end = time.time()
print(end - start)
Benchmarking is a very complicated thing, even can be called as a Systemic Engineering.
Because there are many processes will interference our benchmarking job. For
example: NIC interrupt responsing / keyboard or mouse input / OS scheduling...
I have encountered my producing process being blocked by OS for up to 15 seconds!
So as the other advisors have pointed out, the print() invokes more
unnecessary interference.
Your testing computation is too simple.
You must think it out clearly what are you comparing for.
The speed of passing arguments between Python and C++ is obviously slower than
that of within Python side. So I assume that you want to compare the computing
speed of both, instead of arguments passing speed.
If so, I think your computing codes are too simple, and these will lead to the
time we counted is mainly the time for passing args, while the time for
computing is merely the minor of the total.
So, I put out my sample below, I will be glad to see anyone polish it.
Your loop count is too less.
The less loops, the more randomness. Similar with my opinion 1, testing time
is merely 0.000x second. It is possible, that the running process be interferenced by OS.
I think we should make the testing time to last at least a few of seconds.
C++ is not always faster than Python. Now time there are so many Python
modules/libs can use GPU to execute heavy computation, and parallelly do matrix operations even only by using CPU.
I guess that perhaps you are evaluating whether or not using Pybind11 in your project. I think that comparing like this worth nothing, because what is the best tool depends on what is the real requirement, but it is a good lesson to learn things.
I recently encountered a case, Python is faster than C++ in a Deep Learning.
Haha, funny?
At the end, I run my sample in my PC, and found that the C++ computing speed is faster up to 100 times than that in Python. I hope it be helpful for you.
If anyone would please revise/correct my opinions, it's my pleasure!
Pls forgive my ugly English, I hope I have expressed things correctly.
ComplexCpp.cpp:
#include <cmath>
#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
namespace py = pybind11;
double Compute( double x, py::array_t<double> ys ) {
// std::cout << "x:" << std::setprecision( 16 ) << x << std::endl;
auto r = ys.unchecked<1>();
for( py::ssize_t i = 0; i < r.shape( 0 ); ++i ) {
double y = r( i );
// std::cout << "y:" << std::setprecision( 16 ) << y << std::endl;
x += y;
x *= y;
y = std::max( y, 1.001 );
x /= y;
x *= std::log( y );
}
return x;
};
PYBIND11_MODULE( ComplexCpp, m ) {
m.def( "Compute", &Compute, "a more complicated computing" );
};
tryComplexCpp.py
import ComplexCpp
import math
import numpy as np
import random
import time
def PyCompute(x: float, ys: np.ndarray) -> float:
#print(f'x:{x}')
for y in ys:
#print(f'y:{y}')
x += y
x *= y
y = max(y, 1.001)
x /= y
x *= math.log(y)
return x
LOOPS: int = 100000000
if __name__ == "__main__":
# initialize random
x0 = random.random()
""" We store all args in a array, then pass them into both C++ func and
python side, to ensure that args for both sides are same. """
args = np.ndarray(LOOPS, dtype=np.float64)
for i in range(LOOPS):
args[i] = random.random()
print('Args are ready, now start...')
# try it with C++
start_time = time.time()
x = ComplexCpp.Compute(x0, args)
print(f'Computing with C++ in { time.time() - start_time }.\n')
# forcely use the result to prevent the entire procedure be optimized(omit)
print(f'The result is {x}\n')
# try it with python
start_time = time.time()
x = PyCompute(x0, args)
print(f'Computing with Python in { time.time() - start_time }.\n')
# forcely use the result to prevent the entire procedure be optimized(omit)
print(f'The result is {x}\n')
I am implementing a protocol in an STM32F412 board. It's almost done, I just need to do a CRC check for the received data.
I tried using the internal CRC module for calculating the CRC but I could not match the result to any online CRC algorithm online, so I decided to do a simple implementation of the Ethernet CRC.
static const uint32_t crc32_tab[] =
{
0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L,
0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L,
0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL,
0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L,
0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L,
0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L,
0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL,
0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L,
0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL,
0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L,
0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L,
0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L,
0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL,
0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL,
0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L,
0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL,
0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L,
0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L,
0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L,
0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL,
0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L,
0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L,
0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL,
0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L,
0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L,
0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L,
0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L,
0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L,
0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL,
0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL,
0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L,
0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L,
0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL,
0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL,
0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L,
0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL,
0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L,
0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL,
0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L,
0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL,
0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L,
0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L,
0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL,
0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L,
0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L,
0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L,
0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L,
0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L,
0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L,
0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
0x2d02ef8dL
};
uint32_t calc_crc_calculate(uint8_t *pData, uint32_t uLen)
{
uint32_t val = 0xFFFFFFFFU;
int i;
for(i = 0; i < uLen; i++) {
val = crc32_tab[(val ^ pData[i]) & 0xFF] ^ ((val >> 8) & 0x00FFFFFF);
}
return val^0xFFFFFFFF;
}
I calculated the crc of 0x6F and compared the result to the online calculators and it apparently matches.
When I try to test the protocol with my python code I'm just unable to match the CRCs. On python I'm using the following code:
d = 0x6f
crc = zlib.crc32(bytes(d))&0xFFFFFFFF
I'm now unable to tell which is right. Apparently my algorithm is OK because it matches the online calculator. BUT those online calculators do not seem to be reliable sometimes and I doubt that python's zlib implementation is wrong .. I may be using it wrong at worst.
Actually you can compute the Ethernet CRC32 with the builtin module of the STM32. It took me quite a while to make it match up as well.
This code should match up for sizes divisible by 4 (I also used python zlib on the other end):
#include "stm32l4xx_hal.h"
uint32_t CRC32_Compute(const uint32_t *data, size_t sizeIn32BitWords)
{
CRC_HandleTypeDef hcrc = {
.Instance = CRC,
.Init.DefaultPolynomialUse = DEFAULT_POLYNOMIAL_ENABLE,
.Init.DefaultInitValueUse = DEFAULT_INIT_VALUE_ENABLE,
.Init.InputDataInversionMode = CRC_INPUTDATA_INVERSION_WORD,
.Init.OutputDataInversionMode = CRC_OUTPUTDATA_INVERSION_ENABLE,
.InputDataFormat = CRC_INPUTDATA_FORMAT_WORDS,
};
HAL_StatusTypeDef status = HAL_CRC_Init(&hcrc);
assert (status == HAL_OK)
uint32_t checksum = HAL_CRC_Calculate(&hcrc, data, sizeIn32BitWords);
uint32_t checksumInverted = ~checksum;
return checksumInverted;
}
The challenge with sizes not divisible by 4 is to get the "inversion/reversal" (changing the bit order) right. There is an example how the hardware handles this in the "RM0394 Reference manual STM32L43xxx STM32L44xxx STM32L45xxx STM32L46xxx advanced ARM®-based 32-bit MCUs Rev 3" on page 333.
The essence is that reversal reverses the bit order. For CRC32 this reversal must happen on the word level, i.e. over 32 bits.
Ok. It certainly was a bug on my part. But it was happening in my python code.
I suddenly realized that I was practically doing bytes(0x6F) which just creates an array with 111 positions.
What I actually needed to do was
import struct
d = pack('B', 0x6F)
crc = zlib.crc32(bytes(d))&0xFFFFFFFF
This question could have been avoided had I just done a little bit of rubber duck debugging. Hopefuly this will help someone else.
I'm trying to validate a checksum on a string which in this case is calculated by performing an XOR on each of the individual characters.
Given my test string:
check_against = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
I figured it would be as simple as:
result = 0
for char in check_against:
result = result ^ ord(char)
I know the result should be 28, however my code gives 40.
I'm not sure what encoding the text is suppose to be in, although I've tried encoding/decoding in utf-8 and ascii, both with the same result.
I implemented this same algorithm in C by simply doing an XOR over the char array with perfect results, so what am I missing?
Edit
So it was a little while ago that I implemented (what I thought) was the same thing in C. I knew it was in an Objective-C project but I thought I had just done it this way. Totally wrong, first there was a step where I converted the checksum string value at the end to hex like so (I'm filling some things in here so that I'm only pasting what is relevant):
unsigned int checksum = 0;
NSScanner *scanner = [NSScanner scannerWithString:#"26"];
[scanner scanHexInt:&checksum];
Then I did the following to compute the checksum:
NSString sumString = #"GPGLL,5300.97914,N,00259.98174,E,125926,A";
unsigned int sum = 0;
for (int i=0;i<sumString.length;i++) {
sum = sum ^ [sumString characterAtIndex:i];
}
Then I would just compare like so:
return sum == checksum;
So as #metatoaster, #XD573, and some others in the comments have helped figure out, the issue was the difference between the result, which was base 10, and my desired solution (in base 16).
The result for the code, 40 is correct - in base 10, however my correct value I was trying to achieve, 28 is given in base 16. Simply converting the solution from base 16 to base 10, for example like so:
int('28', 16)
I get 40, the computed result.
#python3
str = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
cks = 0
i = 0
while(i<len(str)):
cks^=ord(str[i])
i+=1
print("hex:",hex(cks))
print("dec:",cks)
I created the C version as shown here:
#include <stdio.h>
#include <string.h>
int main()
{
char* str1="GPGLL,5300.97914,N,00259.98174,E,125926,A";
int sum = 0;
int i = 0;
for (i; i < strlen(str1); i++) {
sum ^= str1[i];
}
printf("checksum: %d\n", sum);
return 0;
}
And When I compiled and ran it:
$ gcc -o mytest mytest.c
$ ./mytest
checksum: 40
Which leads me to believe that the assumptions you have from your equivalent C code are incorrect.
So let's start off by saying I'm a total beginner in matlab. I'm working with python and now I've recieved some data in a matlab file that I need to export to a format I could use with python.
I've googled around and found I can export a matlab variable to a text file using:
dlmwrite('my_text', MyVariable, 'delimiter' , ',');
Now the variable I need to export is a 16000 x 4000 matrix of doubles of the form 0.006747668446927. Now here is where the problem starts. I need to export the full values for each double. Trying with that function lead me to export the numbers in a format of 0.0067477. This won't do since I need a whole lot more of precision for what I'm doing. So how can I export the full values of each of these variables? Or if you have a more elegant way of using that huge matlab matrix in python please feel free.
Regards,
Bogdan
To exchange big chunks of numerical data between Python and Matlab I
recommend HDF5
http://en.wikipedia.org/wiki/Hierarchical_Data_Format
The Python binding is called h5py
http://code.google.com/p/h5py
Here are two examples for both directions. First from
Matlab to Python
% matlab
points = [1 2 3 ; 4 5 6 ; 7 8 9 ; 10 11 12 ];
hdf5write('test.h5', '/Points', points);
# python
import h5py
with h5py.File('test.h5', 'r') as f:
points = f['/Points'].value
And now from Python to Matlab
# python
import h5py
import numpy
points = numpy.array([ [1., 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12] ])
with h5py.File('test.h5', 'w') as f:
f['/Points'] = points
% matlab
points = hdf5read('test.h5', '/Points');
NOTE A column in Matlab will come out as a row in Python and vice versa. This isn't a bug but the difference between the way C and Fortran interpret a continuous piece of data in memory.
Scipy has tools for reading MATLAB .mat files natively: see e.g. http://www.janeriksolem.net/2009/05/reading-and-writing-mat-files-with.html
While I like the hdf5 based answer, I still think text files and CSVs are nice for smaller things (you can open them in text editors, spreadsheets whatever). In that case I would use MATLABs fopen/fprintf/fclose rather than dlmwrite - I like to make things explicit. Then again, this dlmwrite might be better for multi-dimensional arrays.
You can simply write your variable to file as binary data, then read it in any language you want, be it MATLAB, Python, C, etc.. Example:
MATLAB (write)
X = rand([100 1],'single');
fid = fopen('file.bin', 'wb');
count = fwrite(fid, X, 'single');
fclose(fid);
MATLAB (read)
fid = fopen('file.bin', 'rb');
data = fread(fid, Inf, 'single=>single');
fclose(fid);
Python (read)
import struct
data = []
f = open("file.bin", "rb")
try:
# read 4 bytes at a time (float)
bytes = f.read(4) # returns a sequence of bytes as a string
while bytes != "":
# string byte-sequence to float
num = struct.unpack('f',bytes)[0]
# append to list
data.append(num);
# read next 4 bytes
bytes = f.read(4)
finally:
f.close()
# print list
print data
C (read)
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fp = fopen("file.bin", "rb");
// Determine size of file
fseek(fp, 0, SEEK_END);
long int lsize = ftell(fp);
rewind(fp);
// Allocate memory, and read file
float *numbers = (float*) malloc(lsize);
size_t count = fread(numbers, 1, lsize, fp);
fclose(fp);
// print data
int i;
int numFloats = lsize / sizeof(float);
for (i=0; i<numFloats; i+=1) {
printf("%g\n", numbers[i]);
}
return 0;
}
I try to create a python client for bacula, but I have some problem with the authentication.
The algorithm is :
import hmac
import base64
import re
...
challenge = re.search("auth cram-md5 ()", data)
#exemple ''
passwd = 'b489c90f3ee5b3ca86365e1bae27186e'
hm = hmac.new(passwd, challenge).digest()
rep = base64.b64encode(hm).strp().rstrip('=')
#result with python : 9zKE3VzYQ1oIDTpBuMMowQ
#result with bacula client : 9z+E3V/YQ1oIDTpBu8MowB'
There's a way more simple than port the bacula's implemenation of base 64?
int
bin_to_base64(char *buf, int buflen, char *bin, int binlen, int compatible)
{
uint32_t reg, save, mask;
int rem, i;
int j = 0;
reg = 0;
rem = 0;
buflen--; /* allow for storing EOS */
for (i=0; i >= (rem - 6);
if (j
To verify your CRAM-MD5 implementation, it is best to use some simple test vectors and check combinations of (challenge, password, username) inputs against the expected output.
Here's one example (from http://blog.susam.in/2009/02/auth-cram-md5.html):
import hmac
username = 'foo#susam.in'
passwd = 'drowssap'
encoded_challenge = 'PDc0NTYuMTIzMzU5ODUzM0BzZGNsaW51eDIucmRzaW5kaWEuY29tPg=='
challenge = encoded_challenge.decode('base64')
digest = hmac.new(passwd, challenge).hexdigest()
response = username + ' ' + digest
encoded_response = response.encode('base64')
print encoded_response
# Zm9vQHN1c2FtLmluIDY2N2U5ZmE0NDcwZGZmM2RhOWQ2MjFmZTQwNjc2NzIy
That said, I've certainly found examples on the net where the response generated by the above code differs from the expected response stated on the relevant site, so I'm still not entirely clear as to what is happening in those cases.
I HAVE CRACKED THIS.
I ran into exactly the same problem you did, and have just spent about 4 hours identifying the problem, and reimplementing it.
The problem is the Bacula's base64 is BROKEN, AND WRONG!
There are two problems with it:
The first is that the incoming bytes are treated as signed, not unsigned. The effect of this is that, if a byte has the highest bit set (>127), then it is treated as a negative number; when it is combined with the "left over" bits from previous bytes are all set to (binary 1).
The second is that, after b64 has processed all the full 6-bit output blocks, there may be 0, 2 or 4 bits left over (depending on input block modulus 3). The standard Base64 way to handle this is to multiply the remaining bits, so they are the HIGHEST bits in the last 6-bit block, and process them - Bacula leaves them as the LOWEST bits.
Note that some versions of Bacula may accept both the "Bacula broken base64 encoding" and the standard ones, for incoming authentication; they seem to use the broken one for their authentication.
def bacula_broken_base64(binarystring):
b64_chars="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
remaining_bit_count=0
remaining_bits=0
output=""
for inputbyte in binarystring:
inputbyte=ord(inputbyte)
if inputbyte>127:
# REPRODUCING A BUG! set all the "remaining bits" to 1.
remaining_bits=(1 << remaining_bit_count) - 1
remaining_bits=(remaining_bits<<8)+inputbyte
remaining_bit_count+=8
while remaining_bit_count>=6:
# clean up:
remaining_bit_count-=6
new64=(remaining_bits>>remaining_bit_count) & 63 # 6 highest bits
output+=b64_chars[new64]
remaining_bits&=(1 << remaining_bit_count) - 1
if remaining_bit_count>0:
output+=b64_chars[remaining_bits]
return output
I realize it's been 6 years since you asked, but perhaps someone else will find this useful.