I was talking to my friend about these two pieces of code. He said the python one terminates, the C++ one doesn't.
Python:
arr = [1, 2, 3]
for i in range(len(arr)):
arr.append(i)
print("done")
C++:
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<int> arr{1,2,3};
for(int i = 0; i < arr.size(); i++){
arr.push_back(i);
}
cout << "done" << endl;
return 0;
}
I challenged that and ran it on 2 computers. The first one ran out of memory (bad alloc) because it had 4gb of ram. My mac as 12gb of ram and it was able to run and terminate just fine.
I thought it wouldn't run forever because the type of size() in vector is an unsigned int. Since my mac was 64 bit, I thought that it could store 2^(64-2)=2^62 ints (which is true) but the unsigned int for the size is 32 for some reason.
Is this some bug in the C++ compiler that does not change the max_size() to be relative to the system's hardware? The overflow causes the program to terminate. Or is it for some other reason?
There is not a bug in your C++ compiler manifesting itself here.
int is overflowing (due to the i++), the behaviour of which is undefined. (It's feasible that you'll run out of memory on some platforms before this overflow occurs.) Note that there is no defined behaviour that will make i negative, although that is a common occurrence on machines with 2's complement signed integral types once std::numeric_limits<int>::max() is attained, and if i were -1 say then i < arr.size() would be false due to the implicit conversion of i to an unsigned type.
The Python version pre-computes range(len(arr)); that is subsequent appends do not change that initial value.
Related
I meet a question,this dll java or php can call successful,but when I use python call get access violation error,
I want to know is c++ char* is error to ctypes c_char_p, how can I use python ctypes map c++ char*
c++ dll define
#define Health_API extern "C" __declspec(dllexport)
Health_API unsigned long Initialization(int m_type,char * m_ip,int m_port)
Health_API int GetRecordCount(unsigned long m_handle)
python code
from ctypes import *
lib = cdll.LoadLibrary(r"./Health.dll")
inita = lib.Initialization
inita.argtype = (c_int, c_char_p, c_int)
inita.restype = c_ulong
getcount = lib.GetRecordCount
getcount.argtype = c_ulong
getcount.retype = c_int
# here call
handle = inita(5, '127.0.0.1'.encode(), 4000)
print(handle)
result = getcount(handle)
error info:
2675930080
Traceback (most recent call last):
File "C:/Users/JayTam/PycharmProjects/DecoratorDemo/demo.py", line 14, in <module>
print(getcount(handle))
OSError: exception: access violation reading 0x000000009F7F73E0
After modification
I find if not define restype=long , it will return negative number,this is a noteworthy problem,but I change the restype=u_int64,it cann't connect as well.
It is delightful that my shcoolmate use her computer(win7 x64) call the same dll success,use simple code as follow
from ctypes import *
lib = cdll.LoadLibrary("./Health.dll")
handle = lib.Initialization(5, '127.0.0.1'.encode(), 4000)
lib.GetRecordList(handle, '')
I am win10 x64,my python environment is annaconda,and I change the same environment as her,use the same code,still return the same error
python vs java
Although I konw it shuould be the environment cause,I really want to konw What exactly caused it?
python
each run handle is Irregular number,my classmate's pc get regular number
such as :288584672,199521248,1777824736,-607161376(I dont set restype)
this is using python call dll,cann't connect port 4000
java
also get regular number without negative number
such as :9462886128,9454193200,9458325520,9451683632
this's java call dll
so I think should be this code casue error,but in other pc no problem,it's amazing
handle = lib.Initialization(5, '127.0.0.1'.encode(), 4000)
c++ dll:
Health_API unsigned long Initialization(int m_type,char* m_ip,int m_port)
{
CHandleNode* node;
switch(m_type)
{
case BASEINFO:
{
CBaseInfo* baseInfo = new CBaseInfo;
baseInfo->InitModule(m_ip,m_port);
node = m_info.AddMCUNode((unsigned long)baseInfo);
node->SetType(m_type);
return (unsigned long)baseInfo;
}
break;
case BASEINFO_ALLERGENS:
{
CBaseInfo_Allergens* baseInfo_Allergens = new CBaseInfo_Allergens;
baseInfo_Allergens->InitModule(m_ip,m_port);
node = m_info.AddMCUNode((unsigned long)baseInfo_Allergens);
node->SetType(m_type);
return (unsigned long)baseInfo_Allergens;
}
break;
. (case too many, ... represent them)
.
.
}
return -1;
}
Health_API int GetRecordList(unsigned long m_handle,char* m_index)
{
CHandleNode* node = m_info.GetMCUNode(m_handle);
if( node == NULL)
{
return -1;
}
switch(node->GetType())
{
case BASEINFO:
{
CBaseInfo* baseInfo = (CBaseInfo*)m_handle;
return baseInfo->GetRecordList(m_index);
}
break;
case BASEINFO_ALLERGENS:
{
CBaseInfo_Allergens* baseInfo_Allergens = (CBaseInfo_Allergens *)m_handle;
return baseInfo_Allergens->GetRecordList(m_index);
}
break;
. (case too many, ... represent them)
.
.
}
return -1;
}
It may help to have the source code of the C++, but I can take some guesses
I don't think this is related, but:
getcount.retype is missing an s, it should be restype.
And I always define argtype as an array instead of a tuple, and if there is only one argument I still use an array of one element.
If the char* is a null terminated string, you need indeed to use c_char_p. And in this case it is highly recommended to add a const to the m_ip argument.
If the char* is a pointer to a raw buffer (filled or not by the calling function), which is not the case here, you need to use POINTER(c_char) and not c_char_p
Base on the error message, your handle is a pointer, you should use for that the right type in your C++ code: void*. And in your python you should use c_void_p
Is it 64 bits code ? What toolchain/compiler was used to build the DLL ? What is the size of unsigned long in this toolchain ?
Edit: Here the reason of your problem: Is Python's ctypes.c_long 64 bit on 64 bit systems?
handle is a pointer in your DLL, and was certainly built using a compiler where a long is 64 bits, but for python 64 bits on Windows a long is 32 bits, which cannot store your pointer. Your really should use a void*
First you could try to use c_void_p in your python for the handle without modifying your source code, it may works depending of the compiler used. But since the size of the long type is not defined by any standard, and may not be able to store a pointer, you really should use in your C++ code void* or uintptr_t
Edit2: If you are using visual studio, and not GCC, the size of a long is only 32 bits (Size of long vs int on x64 platform)
So you must fix the code of the DLL
I am trying to wrap a library for Python written in C++ using SWIG. The library uses function calls that accept byte buffers as parameters. In Python I am creating these byte buffers using %array_class from SWIG . I made a proof-of-concept program in order to test this out and I noticed a significant memory leak associated with passing these buffers to C++. Specifically, running the code below steadily raises the memory usage of the Python application (as observed on the Task Manager) up to about 250MB where the program halts. The printouts from C indicate that the program does run as expected, but just eats up more memory. The del buff statement runs, but does nothing to release the memory. I tried creating and deleting the buffer in each loop, but same result.
Running delete x; in C++ crashes my program entirely.
My Swig Interface file:
%module example
%include "carrays.i"
%array_class(uint8_t, buffer);
%{
#include "PythonConnector.h"
%}
%include "PythonConnector.h"
The C header file:
class PythonConnector {
public:
void print_array(uint8_t *x);
};
The minimal C-defined function
void PythonConnector::print_array(uint8_t *x)
{
//int i;
//for (i = 0; i < 100; i++) {
// printf("[%d] = %d\n", i, x[i]);
//}
//delete x; // <-- This crashed the program
return;
}
The tester Python script
import time
import example
sizeBytes = 10000
buff = example.buffer(sizeBytes)
for j in range(1000):
# Initialize data buffer
for i in range(sizeBytes):
buff[i] = i%256
buff[0] = 0
example.PythonConnector().print_array(buff.cast())
print(j)
del buff
time.sleep(10)
Am I missing something? I suspect that SWIG creates some proxy object for each time the buffer is passed to the C++ that is not garbage-collected.
Edit:
SWIG version 3.0.7
CPython version 3.5 x64
Windows 10 OS
Thanks for your help.
OK, Thanks to #Flexo, I found the answer.
The problem is the instantiation of the example.PythonConnector() being created in each loop. Instantiating only once outside the loop seems to fix the memory problem:
import time
import example
sizeBytes = 10000
buff = example.buffer(sizeBytes)
conn = example.PythonConnector()
for j in range(1000):
# Initialize data buffer
for i in range(sizeBytes):
buff[i] = i%256
buff[0] = 0
conn.print_array(buff.cast())
print(j)
del buff
time.sleep(10)
There still remains the question why the many connectors don't get garbage collected in the original code.
I am solving a problem, although I already solved (after a long while) I wanted to find out what was wrong with my implementation.
I programmed my solution in both C++ and Python in Windows. I was trying working with codeskulptor for my Python and it gave me some a TIMELIMITERROR. I switched to C++ language and it gave me some weird errors. I booted up my virtual machine so that I tried to find out why my C++ code failed (I used BCC32 from Borland). I could detect long int number generated by the Collatz sequence that could make my program crash. Under Linux, I got almost the same error, although I could see under Linux, the program runs and could manipulated very well long numbers (using g++ compiler).
Working under Linux, I could use the same Python program I developed for windows and it worked straightforward. I want to know why C++ fails both on Windows and Linux.
in Python:
def Collatz(num):
temp = []
temp.append(num)
while num> 1:
num = num%2==0 and num/2 or num*3+1
temp.append(num)
return temp
in C++:
vector<unsigned long> collatz(int num)
{
vector<unsigned long> intList;
intList.push_back(num);
while(num>1)
{
if (num%2==0) num /=2;
else num=num*3+1;
intList.push_back(num);
}
return intList;
}
These two piece of codes are the functions only:
the strange thing is that both codes works well calculating the sequence for 13 or 999999. But for example C++ fails to calculate the sequence for 837799... maybe it has something to do with the vector container size??
Because your num is an int, and you get an overflow after the element 991661525 in the Collatz series for 837799 (all operations are done with the int, so you overflow when multiplying 991661525*3+1 in num=num*3+1;). Change num to unsigned long in the function definition
vector<unsigned long> collatz(unsigned long num)
and it will work!
I'm looking into replacing some C code with python code and using pypy as the interpreter. The code does a lot of list/dictionary operations. Therefore to get a vague idea of the performance of pypy vs C I am writing sorting algorithms. To test all my read functions I wrote a bubble sort, both in python and C++. CPython of course sucks 6.468s, pypy came in at 0.366s and C++ at 0.229s. Then I remembered that I had forgotten -O3 on the C++ code and the time went to 0.042s. For a 32768 dataset C++ with -O3 is only 2.588s and pypy is 19.65s. Is there anything I can do to speed up my python code (besides using a better sort algorithm of course) or how I use pypy (some flag or something)?
Python code (read_nums module omitted since it's time is trivial: 0.036s on 32768 dataset):
import read_nums
import sys
nums = read_nums.read_nums(sys.argv[1])
done = False
while not done:
done = True
for i in range(len(nums)-1):
if nums[i] > nums[i+1]:
nums[i], nums[i+1] = nums[i+1], nums[i]
done = False
$ time pypy-c2.0 bubble_sort.py test_32768_1.nums
real 0m20.199s
user 0m20.189s
sys 0m0.009s
C code (read_nums function again omitted since it takes little time: 0.017s):
#include <iostream>
#include "read_nums.h"
int main(int argc, char** argv)
{
std::vector<int> nums;
int count, i, tmp;
bool done;
if(argc < 2)
{
std::cout << "Usage: " << argv[0] << " filename" << std::endl;
return 1;
}
count = read_nums(argv[1], nums);
done = false;
while(!done)
{
done = true;
for(i=0; i<count-1; ++i)
{
if(nums[i] > nums[i+1])
{
tmp = nums[i];
nums[i] = nums[i+1];
nums[i+1] = tmp;
done = false;
}
}
}
for(i=0; i<count; ++i)
{
std::cout << nums[i] << ", ";
}
return 0;
}
$ time ./bubble_sort test_32768_1.nums > /dev/null
real 0m2.587s
user 0m2.586s
sys 0m0.001s
P.S. Some of the numbers given in the first paragraph are a little different then the numbers from time because they're the numbers I got the first time.
Further improvements:
Just tried xrange instead of range and the run time went to 16.370s.
Moved the code starting from first done = False to last done = False in a function, speed is now 8.771-8.834s.
The most relevant way to answer this question is to note that the speed of C, CPython and PyPy are not differing by a constant factor: it depends most importantly on what is done and the way it is written. For example, if your C code is doing naive things like walking arrays when the "equivalent" Python code would naturally use dictionaries, then any implementation of Python is faster than C provided the arrays are long enough. Of course, this is not the case on most real-life examples, but the same argument still applies to a smaller extent. There is no one-size-fits-all way to predict the relative speed of a program written in C, or rewritten in Python and running on CPython or PyPy.
Obviously there are guidelines about these relative speeds: on small algorithmical examples you could expect the speed of PyPy to be approaching that of "gcc -O0". In your example it is "only" 1.6x slower. We might help you optimize it, or even find optimizations missing in PyPy, in order to gain 10% or 30% more speed. But this is a tiny example that has nothing to do with your real program. For the reasons above the speed we get here is only vaguely related to the speed you'll get in the end.
I can only say that rewriting code from C to Python for reasons of clarity, notably when the C has become too tangled up for further developments, is clearly a win in the long run --- even in the case where at the end you need to rewrite some parts of it in C again. And PyPy's goal here is to reduce the need for that. While it would be nice to say that no-one ever needs C any more, it's just not true :-)
I have pretty much the same code in python and C. Python example:
import numpy
nbr_values = 8192
n_iter = 100000
a = numpy.ones(nbr_values).astype(numpy.float32)
for i in range(n_iter):
a = numpy.sin(a)
C example:
#include <stdio.h>
#include <math.h>
int main(void)
{
int i, j;
int nbr_values = 8192;
int n_iter = 100000;
double x;
for (j = 0; j < nbr_values; j++){
x = 1;
for (i=0; i<n_iter; i++)
x = sin(x);
}
return 0;
}
Something strange happen when I ran both examples:
$ time python numpy_test.py
real 0m5.967s
user 0m5.932s
sys 0m0.012s
$ g++ sin.c
$ time ./a.out
real 0m13.371s
user 0m13.301s
sys 0m0.008s
It looks like python/numpy is twice faster than C. Is there any mistake in the experiment above? How you can explain it?
P.S. I have Ubuntu 12.04, 8G ram, core i5 btw
First, turn on optimization. Secondly, subtleties matter. Your C code is definitely not 'basically the same'.
Here is equivalent C code:
sinary2.c:
#include <math.h>
#include <stdlib.h>
float *sin_array(const float *input, size_t elements)
{
int i = 0;
float *output = malloc(sizeof(float) * elements);
for (i = 0; i < elements; ++i) {
output[i] = sin(input[i]);
}
return output;
}
sinary.c:
#include <math.h>
#include <stdlib.h>
extern float *sin_array(const float *input, size_t elements)
int main(void)
{
int i;
int nbr_values = 8192;
int n_iter = 100000;
float *x = malloc(sizeof(float) * nbr_values);
for (i = 0; i < nbr_values; ++i) {
x[i] = 1;
}
for (i=0; i<n_iter; i++) {
float *newary = sin_array(x, nbr_values);
free(x);
x = newary;
}
return 0;
}
Results:
$ time python foo.py
real 0m5.986s
user 0m5.783s
sys 0m0.050s
$ gcc -O3 -ffast-math sinary.c sinary2.c -lm
$ time ./a.out
real 0m5.204s
user 0m4.995s
sys 0m0.208s
The reason the program has to be split in two is to fool the optimizer a bit. Otherwise it will realize that the whole loop has no effect at all and optimize it out. Putting things in two files doesn't give the compiler visibility into the possible side-effects of sin_array when it's compiling main and so it has to assume that it actually has some and repeatedly call it.
Your original program is not at all equivalent for several reasons. One is that you have nested loops in the C version and you don't in Python. Another is that you are working with arrays of values in the Python version and not in the C version. Another is that you are creating and discarding arrays in the Python version and not in the C version. And lastly you are using float in the Python version and double in the C version.
Simply calling the sin function the appropriate number of times does not make for an equivalent test.
Also, the optimizer is a really big deal for C. Comparing C code on which the optimizer hasn't been used to anything else when you're wondering about a speed comparison is the wrong thing to do. Of course, you also need to be mindful. The C optimizer is very sophisticated and if you're testing something that really doesn't do anything, the C optimizer might well notice this fact and simply not do anything at all, resulting in a program that's ridiculously fast.
Because "numpy" is a dedicated math library implemented for speed. C has standard functions for sin/cos, that are generally derived for accuracy.
You are also not comparing apples with apples, as you are using double in C, and float32 (float) in python. If we change the python code to calculate float64 instead, the time increases by about 2.5 seconds on my machine, making it roughly match with the correctly optimized C version.
If the whole test was made to do something more complicated that requires more control structres (if/else, do/while, etc), then you would probably see even less difference between C and Python - because the C compiler can't really do "sin" any faster - unless you implement a better "sin" function.
Newer mind the fact that your code isn't quite the same on both sides... ;)
You seem to be doing the the same operation in C 8192 x 10000 times but only 10000 in python (I haven't used numpy before so I may misunderstand the code). Why are you using an array in the python case (again I'm not use to numpy so perhaps the dereferencing is implicit). If you wish to use an array be careful doubles have a performance hit in terms of caching and optimised vectorisation - you're using different types between both implementations (float vs double) but given the algorithm I don't think it matters.
The main reason for a lot of anomalous performance benchmark issues surrounding C vs Pythis, Pythat... Is that simply the C implementation is often poor.
https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en
If you notice the guy writes C to process an array of doubles (without using restrict or const keywords where he could've), he builds with optimisation then forces the compiler to use SIMD rather than AVE. In short the compiler is using an inefficient instruction set for doubles and the wrong type of registers too if he wanted performance - you can be sure the numba and numpy will be using as many bells and whistles as possible and will be shipped with very efficient C and C++ libraries to begin with. In short if you want speed with C you have to think about it, you may even have to disassemble the code and perhaps disable optimisation and use compiler instrinsics instead. It gives you the tools to do it so don't expect the compiler to do all the work for you. If you want that degree of freedom use Cython, Numba, Numpy, Scipy etc. They're very fast but you won't be able to eek out every bit of performance out of the machine - to do that use C, C++ or new versions of FORTRAN.
Here is a very good article on these very points (I'd use SciPy):
https://www.scipy.org/scipylib/faq.html