Reading a Delphi binary file in Python - python

I have a file that was written with the following Delphi declaration ...
Type
Tfulldata = Record
dpoints, dloops : integer;
dtime, bT, sT, hI, LI : real;
tm : real;
data : array[1..armax] Of Real;
End;
...
Var:
fh: File Of Tfulldata;
I want to analyse the data in the files (many MB in size) using Python if possible - is there an easy way to read in the data and cast the data into Python objects similar in form to the Delphi records? Does anyone know of a library perhaps that does this?
This is compiled on Delphi 7 with the following options which may (or may not) be pertinent,
Record Field Alignment: 8
Pentium Safe FDIV: False
Stack Frames: False
Optimization: True

Here is the full solutions thanks to hints from KillianDS and Ritsaert Hornstra
import struct
fh = open('my_file.dat', 'rb')
s = fh.read(40256)
vals = struct.unpack('iidddddd5025d', s)
dpoints, dloops, dtime, bT, sT, hI, LI, tm = vals[:8]
data = vals[8:]

I do not know how Delphi internally stores data, but if it is as simple byte-wise data (so not serialized and mangled), use struct. This way you can treat a string from a python file as binary data. Also, open files as binary file(open,'rb').

Please note that when you define a record in Delphi (like struct in C) the fields are layed out in order and in binary given the current alignment (eg Bytes are aligned on 1 byte boundaries, Words on 2 byte, Integers on 4 byte etc, but it may vary given the compiler settings.
When serialized to a file, you probably mean that this record is written in binary to the file and the next record is written after the first one starting at position sizeof( structure) etc etc. Delphi does not specify how thing should be serialized to/from file, So the information you give leaves us guessing.
If you want to make sure it is always the same without interference of any compiler setings, use packed record.
Real can have multiple meanings (it is an 48 bit float type for older Delphi versions and later on a 64 bit float (IEEE double)).
If you cannot access the Delphi code or compile it yourself, just ty to check the data with a HEX editor, you should see the boundaries of the records clearly since they start with Integers and only floats follow.

Related

c++ read map in binary file which created in python

I created a Python script which creates the following map (illustration):
map<uint32_t, string> tempMap = {{2,"xx"}, {200, "yy"}};
and saved it as map.out file (a binary file).
When I try to read the binary file from C++, it doesn't copy the map, why?
map<uint32_t, string> tempMap;
ifstream readFile;
std::streamsize length;
readFile.open("somePath\\map.out", ios::binary | ios::in);
if (readFile)
{
readFile.ignore( std::numeric_limits<std::streamsize>::max() );
length = readFile.gcount();
readFile.clear(); // Since ignore will have set eof.
readFile.seekg( 0, std::ios_base::beg );
readFile.read((char *)&tempMap,length);
for(auto &it: tempMap)
{
/* cout<<("%u, %s",it.first, it.second.c_str()); ->Prints map*/
}
}
readFile.close();
readFile.clear();
It's not possible to read in raw bytes and have it construct a map (or most containers, for that matter)[1]; and so you will have to write some code to perform proper serialization instead.
If the data being stored/loaded is simple, as per your example, then you can easily devise a scheme for how this might be serialized, and then write the code to load it. For example, a simple plaintext mapping can be established by writing the file with each member after a newline:
<number>
<string>
...
So for your example of:
std::map<std::uint32_t, std::string> tempMap = {{2,"xx"}, {200, "yy"}};
this could be encoded as:
2
xx
200
yy
In which case the code to deserialize this would simply read each value 1-by-1 and reconstruct the map:
// Note: untested
auto loadMap(const std::filesystem::path& path) -> std::map<std::uint32_t, std::string>
{
auto result = std::map<std::uint32_t, std::string>{};
auto file = std::ifstream{path};
while (true) {
auto key = std::uint32_t{};
auto value = std::string{};
if (!(file >> key)) { break; }
if (!std::getline(file, value)) { break; }
result[key] = std::move(value);
}
return result;
}
Note: For this to work, you need your python program to output the format that will be read from your C++ program.
If the data you are trying to read/write is sufficiently complicated, you may look into different serialization interchange formats. Since you're working between python and C++, you'll need to look into libraries that support both. For a list of recommendations, see the answers to Cross-platform and language (de)serialization
[1]
The reason you can't just read (or write) the whole container as bytes and have it work is because data in containers isn't stored inline. Writing the raw bytes out won't produce something like 2 xx\n200 yy\n automatically for you. Instead, you'll be writing the raw addresses of pointers to indirect data structures such as the map's internal node objects.
For example, a hypothetical map implementation might contain a node like:
template <typename Key, typename Value>
struct map_node
{
Key key;
Value value;
map_node* left;
map_node* right;
};
(The real map implementation is much more complicated than this, but this is a simplified representation)
If map<Key,Value> contains a map_node<Key,Value> member, then writing this out in binary will write the binary representation of key, value, left, and right -- the latter of which are pointers. The same is true with any container that uses indirection of any kind; the addresses will fundamentally differ between the time they are written and read, since they depend on the state of the program at any given time.
You can write a simple map_node to test this, and just print out the bytes to see what it produces; the pointer will be serialized as well. Behaviorally, this is the exact same as what you are trying to do with a map and reading from a binary file. See the below example which includes different addresses.
Live Example
You can use protocol buffers to serialize your map in python and deserialization can be performed in C++.
Protocol buffers supports both Python and C++.

Python : file.seek(10000000000, 2000000000). Python int too large to convert to C long

im developing a program that downloads "big files" from the internet (from 200mb to 5Gb) using threads and file.seek to find a offset and insert the data to a main file, but when i try to set the offset above the 2147483647 byte (exceeds the C long max value) it gives the int too large to convert to C long error. how can i work around this? Bellow is a representation of my script code.
f = open("bigfile.txt")
#create big file
f.seek(5000000000-1)
f.write("\0")
#try to get the offset, this gives the error (Python int too large to convert to C long)
f.seek(3333333333, 4444444444)
I wouldn't be asking (because it has been asked a lot) if i really found a solution to this.
I read about casting it to a int64 and use something like UL but i dont really understand it. I hope you can help or at least try make this clearer in my head. xD
f.seek(3333333333, 4444444444)
That second argument is supposed to be the from_where argument, dictating whether you're seeking from:
the file start, os.SEEK_SET or 0;
the current position, os.SEEK_CUR or 1;
the end of the file, os.SEEK_END or 2.
4444444444 is not one of the allowed values.
The following program works fine:
import os
f = open("bigfile.txt",'w')
f.seek(5000000000-1)
f.write("\0")
f.seek(3333333333, os.SEEK_SET)
print f.tell() # 'print(f.tell())' for Python3
and outputs 3333333333 as expected.

Passing a record over a socket

I have basic socket communication set up between python and Delphi code (text only). Now I would like to send/receive a record of data on both sides. I have a Record "C compatible" and would like to pass records back and forth have it in a usable format in python.
I use conn.send("text") in python to send the text but how do I send/receive a buffer with python and access the record items sent in python?
Record
TPacketData = record
pID : Integer;
dataType : Integer;
size : Integer;
value : Double;
end;
I don't know much about python, but I have done a lot between Delphi, C++, C# and Java even with COBOL.Anyway, to send a record from Delphi to C first you need to pack the record at both ends,
in Deplhi
MyRecord = pack record
in C++
#pragma pack(1)
I don’t know in python but I guess there must be a similar one. Make sure that at both sides the sizeof(MyRecord) is the same length.Also, before sending the records, you should take care about byte ordering (you know, Little-Endian vs Big-Endian), use the Socket.htonl() and Socket.ntohl() in python and the equivalent in Deplhi which are in WinSock unit. Also a "double" in Delphi could not be the same as in python, in Delphi is 8 bytes check this as well, and change it to Single(4 bytes) or Extended (10 bytes) whichever matches.
If all that match then you could send/receive binary records in one shut, otherwise, I'm afraid, you have to send the individual fields one by one.
I know this answer is a bit late to the game, but may at least prove useful to other people finding this question in their search-results. Because you say the Delphi code sends and receives "C compatible data" it seems that for the sake of the answer about Python's handling it is irrelevant whether it is Delphi (or any other language) on the other end...
The python struct and socket modules have all the functionality for the basic usage you describe. To send the example record you would do something like the below. For simplicity and sanity I have presumed signed integers and doubles, and packed the data in "network order" (bigendian). This can easily be a one-liner but I have split it up for verbosity and reusability's sake:
import struct
t_packet_struc = '>iiid'
t_packet_data = struct.pack(t_packet_struc, pid, data_type, size, value)
mysocket.sendall(t_packet_data)
Of course the mentioned "presumptions" don't need to be made, given tweaks to the format string, data preparation, etc. See the struct inline help for a description of the possible format strings - which can even process things like Pascal-strings... By the way, the socket module allows packing and unpacking a couple of network-specific things which struct doesn't, like IP-address strings (to their bigendian int-blob form), and allows explicit functions for converting data bigendian-to-native and vice-versa. For completeness, here is how to unpack the data packed above, on the Python end:
t_packet_size = struct.calcsize(t_packet_struc)
t_packet_data = mysocket.recv(t_packet_size)
(pid, data_type, size, value) = struct.unpack(t_packet_struc,
t_packet_data)
I know this works in Python version 2.x, and suspect it should work without changes in Python version 3.x too. Beware of one big gotcha (because it is easy to not think about, and hard to troubleshoot after the fact): Aside from different endianness, you can also distinguish between packing things using "standard size and alignment" (portably) or using "native size and alignment" (much faster) depending on how you prefix - or don't prefix - your format string. These can often yield wildly different results than you intended, without giving you a clue as to why... (there be dragons).

pefile How do I nullify the first 8 bytes of a file?

How do I nullify the first 8 bytes of a file?
this example does not work:
import pefile
pe = pefile.pe(In)
pe.set_dword_at_rva(0,0)
pe.set_dword_at_rva(0,4)
pe.write(Out)
pe.close()
How i can rename import functions in the file?
this example does not work:
for entry in pe.DIRECTORY_ENTRY_IMPORT:
print entry.dll
for imp in entry.imports:
imp.name = 'NewIMports'
pe.write(Out)
sorry for my english
I'd propose to use the standard way (i. e. not using pefile) of doing this:
with file('filename.bla', 'wr+') as f:
f.write('\0' * 8)
You've got multiple problems with your code, but I'm not sure which ones are causing whatever problems you're experiencing (since you haven't explained the problems).
First, you want to set the first 8 bytes to 0, but you're using set_dword_at_rva rather than set_dword_at_offset. An RVA ("Relative Virtual Address") is the offset in memory to the runtime address of a section (whatever that runtime address ends up being). An offset is the offset on disk from the start of the file. If that's what you want, use that. (While you're at it, the qword functions are the way to set 8 bytes at a time. Using dword instead can't cause any problems except for endianness, which can't possibly matter for just setting to 0… but still, why make it harder on yourself?)
So:
pe = pefile.pe(In)
pe.set_qword_at_offset(0, 0)
Meanwhile, if you're modifying arbitrary strings within the file, while pefile can make room for them, it does not adjust any other headers that need to be adjusted to compensate. See "Notes about the write support" in the docs for details. So, imp.name = 'NewIMports' may work, but generate a broken PE, if the old name was shorter.
On top of that, renaming all of the imports to have the same name will definitely generate a broken PE.

Python: parse VARIANT (?)

I have to read a file in python that uses Microsoft VARIANT (I think - I really don't know much about Microsoft code :S). Basically I want to know if there are python packages that can do this for me.
To explain - the file I'm trying to read is just a whole bunch of { 2-byte integer, <data> } repeated over and over, where the 2-byte integer specifies what the <data> is.
The 2-byte integer corresponds to the Microsoft data types in VARIANT: VT_I2, VT_I4, etc, and based on the type I can write code to read in and coerce <data> to an appropriate Python object.
My current attempt is along the following lines:
while dtype = file.read(2):
value = None
# translate dtype (I've put in VT_XX myself to match up with Microsoft)
if dtype == VT_I2:
value = file.read(2)
elif dtype == VT_I4:
value = file.read(4)
# ... and so on for other types
# append value to the list of values
# return the values we read
return values
The thing is, I'm having trouble working out how to convert some of the bytes to the appropriate Python object (for example VT_BSTR, VT_DECIMAL, VT_DATE). However before I try further, I'd like to know if there are any existing python packages that do this logic for me (i.e. take in a file object/bytes and parse it into a set of python objects, be they float, int, dates, strings, ...).
It just seems like this is a fairly common thing to do.
However, I've been having difficulty looking for packages to do it because not knowing anything about Microsoft code, I don't have the terminology to do the appropriate googling. (If it is relevant, I am running LINUX).
The win32com package in pywin32 will do just that for you. The documentation is quite underwhelming, but there's a lot variant.html included explaining the basic use and a lot of tutorials and references online.

Categories