Calling C functions on Python prints only the first character

Calling C functions on Python prints only the first character - python

I tried to call c function in python here is my code
string.c
#include <stdio.h>
int print(const char *str)
{
printf("%s", str):
return 0;
}
string.py
from ctypes import *
so_print = "/home/ubuntu/string.so"
my_functions = CDLL(so_print)
print(my_functions.print("hello"))
when i run the python script it prints only the fist character of the string
example "h"
How can i pass any string and my c code will read and display it.

Your function accepts a const char*, which corresponds to a Python bytes object (which coerces to c_char_p), not a str (which coerces to c_wchar_p). You didn't tell Python what the underlying C function's prototype was, so it just converted your str to a c_wchar_p, and UTF-16 or UTF-32 encoded string with solely ASCII characters looks like either an empty or single character (depending on platform endianness) C-style char * string.
Two things to improve:
Define the prototype for print so Python can warn you when you misuse it, adding:
my_functions.print.argtypes = [c_char_p]
before using the function.
Encode str arguments to bytes so they can be converted to valid C-style char* strings:
# For arbitrary string, just encode:
print(my_functions.print(mystr.encode()))
# For a literal, you can pass a bytes literal
print(my_functions.print(b"hello"))
# ^ b makes it a bytes, not str

You have to make the following change here.
Pass byteobject instead of string.
This because of as per the Fundamental data types c_char_p in c is python 's byte object equivalent type
from ctypes import *
so_print = "string.so"
my_functions = CDLL(so_print)
print(my_functions.print(b"hello"))

Related

Is possible to declare character type in Python? [duplicate]

When I am checking with data types I wonder why a char value is returning as string type.
Please see my input and output.
Input:
a = 'c'
type(a)
Output:
class 'str'

No.
Python does not have a character or char type. All single characters are strings with length one.

There is no built-in type for character in Python, there are int, str and bytes. If you intend to user character, you just can go with str of length 1.
Note that Python is dynamically typed, you do not need to declare type of your variables.
All string you create using quote ', double quote " and triple quote """ are string (unicode):
type("x")
str
When invoking built-in function type, it returns a type object representing the type of your variable:
type(type("x"))
type
Integer and character do have mapping function (encoding), natively the sober map is ASCII, see chr and ord builti-in functions.
type(123)
int
type(chr(65))
str
type(ord("x"))
int
If you must handle special characters that are not available in default charset, you will have to consider encoding:
x = "é".encode()
b'\xc3\xa9'
The function encode, convert your sting into bytes and can be decoded back:
x.decode()
'é'
Method encode and decode belongs respectively to object str and bytes.

There is a bytes type which may be analogous depending on why you are asking.
>>> b'a'
=> b'a'
https://docs.python.org/3/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

Cython: when should I define a string as char*, str, or bytes?

When defining a variable type that will hold a string in Cython + Python 3, I can use (at least):
cdef char* mystring = "foo"
cdef str mystring = "foo"
cdef bytes mystring = "foo"
The documentation page on strings is unclear on this -- it mostly gives examples using char* and bytes, and frankly I'm having a lot of difficulty understanding it.
In my case the strings will be coming from a Python3 program and are assumed to be unicode. They will be used as dict keys and function arguments, but I will do no further manipulation on them. Needless to say I am trying to maximize speed.
This question suggests that under Python2.7 and without Unicode, typing as str makes string manipulation code run SLOWER than with no typing at all. (But that's not necessarily relevant here since I won't be doing much string manipulation.)
What are the advantages and disadvantages of each of these options?

If there is no further processing done on a particular type, it would be best and fastest to not type them at all, which means they are treated as a general purpose PyObject *.
The str type is a special case which means bytes on Python 2 and unicode on Python 3.
The str type is special in that it is the byte string in Python 2 and the Unicode string in Python 3
So code that types a string as str and handles it as unicode will break on python 2 where str means bytes.
Strings only need to be typed if they are to be converted to C char* or C++ std::string. There, you would use str to handle py2/py3 compatibility, along with helper functions to convert to/from bytes and unicode in order to be able to convert to either char* or std::string.
Typing of strings is for interoperability with C/C++, not for speed as such. Cython will auto-convert, without copying, a bytes string to a char* for example when it sees something like cdef char* c_string = b_string[:b_len] where b_string is a bytes type.
OTOH, if strings are typed without that type being used, Cython will do a conversion from object to bytes/unicode when it does not need to which leads to overhead.
This can be seen in the C code generated as Pyx_PyObject_AsString, Pyx_PyUnicode_FromString et al.
This is also true in general - the rule of thumb is if a specific type is not needed for further processing/conversion, best not to type it at all. Everything in python is an object so typing will convert from the general purpose PyObject* to something more specific.

Some quick testing revealed that for this particular case, only the str declaration worked -- all other options produced errors. Since the string is generated elsewhere in Python3, evidently the str type declaration is needed.
Whether it is faster not to make any declaration at all remains an open question.

Strings operation issue while switching from python 2.x to python 3

I am facing some problem with strings switching from python 2.x to python 3
Issue 1:
from ctypes import*
charBuffer=create_string_buffer(1000)
var = charBuffer.value # var contains like this "abc:def:ghi:1234"
a,b,c,d= var.split(':')
It works fine in python 2.x but not in 3.x it is throwing some errors like this
a,b,c,d= var.split(':')
TypeError: 'str' does not support the buffer interface
I got the links after doing some research in stackoverflow link link2
If I print, desired output would be
a= abc
b =def
c=ghi
d=1234
Issue2:
from ctypes import*
cdll = "Windll"
var = 0x1fffffffffffffffffffffff # I want to send this long variable to character pointer which is in cdll
charBuf =create_string_buffer(var.to_bytes(32,'little'))
cdll.createBuff (charBuf )
cdll function
int createBuff (char * charBuff){
print charBuff
return 0;
}
I want to send this long variable to character pointer which is in cdll, since its a character pointer its throwing errors.
Need your valuable inputs on how could I achieve this. Thanks in advance

In Python 3.x , '.value' on return of create_string_buffer() returns a byte string .
In your example you are trying to split the byte string using a Unicode string (which is the normal string in Python 3.x ) . This is what is causing your issue.
You would need to either split with byte string . Example -
a,b,c,d = var.split(b':')
Or you can decode the byte string to a Unicode string using '.decode()' method on it .
Example -
var = var.decode('<encoding>')

Split using b":" and you will be fine in both versions of python.
In py2 str is a bytestring, in py3 str is a unicode object. The object returned by the ctypes string buffer is a bytestring (str on py2 and bytes on py3). By writing the string literal as b"... you force it to be a bytestring in both version of python.

How can ctypes be used to parse unicode strings?

Sending a string from Python to C++ using Python's ctypes module requires you to parse it as a c_char_p (a char *). I've found I need to use python pure string and not a python unicode string. If I use the unicode string, the variables just get overwritten instead of being sent properly. Here's an example:
C++
void test_function(char * a, char * b, char * c) {
printf("Result: %s %s %s", a, b, c);
}
Python
... load c++ using ctypes ...
lib.test_function.argtypes = [c_char_p, c_char_p, c_char_p]
lib.test_function(u'x', u'y', u'z')
lib.test_function('x', 'y', 'z')
Running the above Python code gives the following in stdout:
Result: z z z
Result: x y z
Why is this, is this a quirk of ctypes? What's an elegant way to avoid this quirk if I can am getting unicode strings?
Thanks!

C/C++ has no real support for Unicode, so there really isn't anything you can do about it. You must to encode your string as in order to pass them into the C/C++ world: you could use UTF-8, UTF-16, or UTF-32 depending on your use case.
For example, you can encode them as UTF-8 and pass in an array of bytes (bytes in Python and char * in C/C++):
lib.test_function(u'x'.encode('utf8'),
u'y'.encode('utf8'),
u'z'.encode('utf8'))
Exactly which encoding you pick is another story, but it will depend on what your C++ library is willing to accept.

Try c_wchar instead of c_char:
https://docs.python.org/2/library/ctypes.html

Python c-api and unicode strings

I need to convert between python objects and c strings of various encodings. Going from a c string to a unicode object was fairly simple using PyUnicode_Decode, however Im not sure how to go the other way
//char* can be a wchar_t or any other element size, just make sure it is correctly terminated for its encoding
Unicode(const char *str, size_t bytes, const char *encoding="utf-16", const char *errors="strict")
:Object(PyUnicode_Decode(str, bytes, encoding, errors))
{
//check for any python exceptions
ExceptionCheck();
}
I want to create another function that takes the python Unicode string and puts it in a buffer using a given encodeing, eg:
//fills buffer with a null terminated string in encoding
void AsCString(char *buffer, size_t bufferBytes,
const char *encoding="utf-16", const char *errors="strict")
{
...
}
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
Note: both methods above are members of a c++ Unicode class that wraps the python api
I'm using Python 3.0

I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
The PyObject returned is a PyStringObject, so you just need to use PyString_Size and PyString_AsString to get a pointer to the string's buffer and memcpy it to your own buffer.
If you're looking for a way to go directly from a PyUnicode object into your own char buffer, I don't think that you can do that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.