Say I have the following code in C++:
union {
int32_t i;
uint32_t ui;
};
i = SomeFunc();
std::string test(std::to_string(ui));
std::ofstream outFile(test);
And say I had the value of i somehow in Python, how would I be able to get the name of the file?
For those of you that are unfamiliar with C++. What I am doing here is writing some value in signed 32-bit integer format to i and then interpreting the bitwise representation as an unsigned 32-bit integer in ui. I am taking the same 32 bits and interpreting them in two different ways.
How can I do this in Python? There does not seem to be any explicit type specification in Python, so how can I reinterpret some set of bits in a different way?
EDIT: I am using Python 2.7.12
I would use python struct for interpreting bits in different ways.
something like following to print -12 as unsigned integer
import struct
p = struct.pack("#i", -12)
print("{}".format(struct.unpack("#I",p)[0]))
Related
I've have a 8 bytes long packed string ( bytes ). Which has follwing structure
typedef struct _entry_t {
uint start;
ushort size;
ushort id;
} _entry_t;
I want to know how can I unpack the entire string in above format and extract those member values , in easiest way possible ( One line maybe )
Take a look at the struct package.
Suppose you get the data as bytes and have it stored in the variable input, then you can decode it with the following code:
import struct
start, size, id = struct.unpack('IHH', input)
Depending on the platform the C code is run on, you might want to think about endianess (add ">" or "<" as prefix to the format string) and if the struct needs the attribute __attribute__((packed)). I assumed that on your platform a int ist 32 bits long and a short is 16 bits long.
During a Cython meetup a speaker pointed to other data types such as cython.ssize_t. The type ssize_t is briefly mentioned in this Wikipedia article however it is not well explained. Similarly Cython documentation mentions types in terms of how types are automatically converted.
What are all the data types available in Cython and what are their specifications?
You have basically access to most of the C types:
Here are the equivalent of all the Python types (if I do not miss some), taken from Oreilly book cython book
Python bool:
bint (boolean coded on 4 bits, alias for short)
Python int and long
[unsigned] char
[unsigned] short
[unsigned] int
[unsigned] long
[unsigned] long long
Python float
float
double
long double
Python complex
float complex
double complex
Python bytes / str / unicode
char *
std::string
For the size_t and Py_ssite_t, keep in mind these are aliases.
Py_ssize_t is defined in python.h imported implicitly in cython. That can hold the size (in bytes) of the largest object the Python interpreter ever creates.
While size_t is a standard C89 type, defined in <stddef.h>.
As in C/C++, we can print the memory content of a variable as below:
double d = 234.5;
unsigned char *p = (unsigned char *)&d;
size_t i;
for (i=0; i < sizeof d; ++i)
printf("%02x\n", p[i]);
Yes, I know we can use pickle.dump() to serialize a object, but it seems generated a lot redundant things..
How can we achieve this in python in a pure way?
The internal memory representation of a Python object cannot be reached from the Python code logical level and you'd need to write a C extension.
If you're designing your own serialization protocol then may be the struct module is what you're looking for. It allows converting from Python values to binary data and back in the format you specify. For example
import struct
print(list(struct.pack('d', 3.14)))
will display [31, 133, 235, 81, 184, 30, 9, 64] because those are the byte values for the double precision representation of 3.14.
NOTE: struct.pack returns a bytes object in Python 3.x but an str object in Python 2.x. To see the numeric code of the bytes in Python 2.x you need to use print map(ord, struct.pack(...)) instead.
You can not do this in pure python. But you could write a Python extension module in C that does exactly what you ask for. But it would probably will not be very useful. You can read more about extension modules here
I assume that by Python you mean C-Python, and not PyPy, Jython or IronPython.
Is there a way in python to unpack C structures created using #pragma pack(x) or __attribute__((packed)) using structs?
Alternatively, how to determine the manner in which python struct handles padding?
Use the struct class.
It is flexible in terms of byte order (big vs. little endian) and alignment (packing). See Byte Order, Size, and Alignment. It defaults to native byte order (pretty much meaning however python was compiled).
Native example
C:
struct foo {
int bar;
char t;
char x;
}
Python:
struct.pack('IBB', bar, t, x)
I often have to write code in other languages that interact with C structs. Most typically this involves writing Python code with the struct or ctypes modules.
So I'll have a .h file full of struct definitions, and I have to manually read through them and duplicate those definitions in my Python code. This is time consuming and error-prone, and it's difficult to keep the two definitions in sync when they change frequently.
Is there some tool or library in any language (doesn't have to be C or Python) which can take a .h file and produce a structured list of its structs and their fields? I'd love to be able to write a script to generate my automatically generate my struct definitions in Python, and I don't want to have to process arbitrary C code to do it. Regular expressions would work great about 90% of the time and then cause endless headaches for the remaining 10%.
If you compile your C code with debugging (-g), pahole (git) can give you the exact structure layouts being used.
$ pahole /bin/dd
…
struct option {
const char * name; /* 0 8 */
int has_arg; /* 8 4 */
/* XXX 4 bytes hole, try to pack */
int * flag; /* 16 8 */
int val; /* 24 4 */
/* size: 32, cachelines: 1, members: 4 */
/* sum members: 24, holes: 1, sum holes: 4 */
/* padding: 4 */
/* last cacheline: 32 bytes */
};
…
This should be quite a lot nicer to parse than straight C.
Regular expressions would work great about 90% of the time and then cause endless headaches for the remaining 10%.
The headaches happen in the cases where the C code contains syntax that you didn't think of when writing your regular expressions. Then you go back and realise that C can't really be parsed by regular expressions, and life becomes not fun.
Try turning it around: define your own simple format, which allows less tricks than C does, and generate both the C header file and the Python interface code from your file:
define socketopts
int16 port
int32 ipv4address
int32 flags
Then you can easily write some Python to convert this to:
typedef struct {
short port;
int ipv4address;
int flags;
} socketopts;
and also to emit a Python class which uses struct to pack/unpack three values (possibly two of them big-endian and the other native-endian, up to you).
Have a look at Swig or SIP that would generate interface code for you or use ctypes.
Have you looked at Swig?
I have quite successfully used GCCXML on fairly large projects. You get an XML representation of the C code (including structures) which you can post-process with some simple Python.
ctypes-codegen or ctypeslib (same thing, I think) will generate ctypes Structure definitions (also other things, I believe, but I only tried structs) by parsing header files using GCCXML. It's no longer supported, but will likely work in some cases.
One my friend for this tasks done C-parser which he use with cog.