MATLAB vs. Python Binary File Read - python

I have a MATLAB application that reads a .bin file and parses through the data. I am trying to convert this script from MATLAB to Python but am seeing discrepancies in the values being read.
The read function utilized in the MATLAB script is:
fname = 'file.bin';
f=fopen(fname);
data = fread(f, 100);
fclose(f);
The Python conversion I attempted is: (edited)
fname = 'file.bin'
with open(fname, mode='rb') as f:
data= list(f.read(100))
I would then print a side-by-side comparison of the read bytes with their index and found discrepancies between the two. I have confirmed that the values read in Python are correct by executing $ hexdump -n 100 -C file.bin and by viewing the file's contents on the application HexEdit.
I would appreciate any insight into the source of discrepancies between the two programs and how I may be able to resolve it.
Note: I am trying to only utilize built-in Python libraries to resolve this issue.
Solution: Utilizing incorrect file path/structure between programming languages. Implementing #juanpa.arrivillaga's suggestion cleanly reproduced the MATLAB results.

An exact translation of the MATLAB code, using NumPy, would be:
data = np.frombuffer(f.read(100), dtype=np.uint8).astype(np.float64)

python automatically transforms single bytes into unsigned integers, as done by matlab, so you just need to do the following.
fname = 'file.bin'
with open(fname, mode='rb') as f:
bytes_arr = f.read(100)
# Conversion for visual comparison purposes
data = [x for x in bytes_arr]
print(data)
also welcome to python, bytes is a built-in type, so please don't override the built-in bytes type ... or you'll run into unexpected problems.
Edit: as pointed by #juanpa.arrivillaga you could use the faster
fname = 'file.bin'
with open(fname, mode='rb') as f:
bytes_arr = f.read(100)
# Conversion for visual comparison purposes
data = list(bytes_arr)

Related

How can I produce a similar output from file reading in python but in php

I have a program I am converting from python to PHP, basically it starts by reading a file and continues on to unpack the data. During the conversion I've realised that unpacking formats are different and tested this with the python output, though PHP does not produce the same output.
Is it possible to produce a similar output from the fopen and fread function as the python open function output?
Currently I have the python producing this byte string: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
and I would like the same in PHP output but all I seem to get is a repeating amount of this symbol: � (Null bytes)
any ideas on how I could produce a byte string like the one in python but in php? Is there a way to use the python open function in PHP or reproduce the function so it can be used in PHP?
This is my basic php code:
$file = fopen($filename, "rb");
$contents = fread($file, filesize($filename));
This is my basic python code for reading the file:
f = open(filename, 'rb')
f.read()

The result of zlib has different tail between python and php

The php code is
'''
$input_file = "a.txt";
$source = file_get_contents($input_file);
$source = gzcompress($source);
file_put_contents("php.txt",$source)
'''
The python code is
'''
testFile = "a.txt"
content = None
with open(testFile,"rb") as f:
content = f.read()
outContent = zlib.compress(content)
with open("py.txt","wb") as f:
f.write(outContent)
'''
The python3 version is [Python 3.6.9]
The php version is [PHP 7.2.17]
I need the same result for same md5.
The problem is not in PHP or Python, but rather in your "need". You cannot expect to get the same result, unless the two environments happen to be using the same version of the same compression code with the same settings. Since you do not have control of the version of code being used, your "need" can never be guaranteed to be met.
You should instead be doing your md5 on the decompressed data, not the compressed data.
I find the solution.
The code is
compress = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,zlib.DEFLATED, 15, 9)
outContent = compress.compress(content)
outContent += compress.flush()
The python zlib provide a interface "zlib.compressobj",which returns a compressobj,and the parameters decide the result.
You can adjust parameters to make sure the python's result is same with php's

Python generated lzma file with missing uncompressed size

According to https://svn.python.org/projects/external/xz-5.0.3/doc/lzma-file-format.txt
The lzma header should look something like this
1.1. Header
+------------+----+----+----+----+--+--+--+--+--+--+--+--+
| Properties | Dictionary Size | Uncompressed Size |
+------------+----+----+----+----+--+--+--+--+--+--+--+--+
I tried to generate lzma file of a 16kb *.bin file by using:
1.) the lzma.exe given by 7z standard SDK (with -d23 argument, 2^23 dict size) and then
2.) tried to generate in python using following code
import lzma
fileName = "file_split0_test.bin"
testFileName = "file_split0_test.lzma"
lzma_machine = lzma.LZMACompressor(format=lzma.FORMAT_ALONE)
with open(fileName, "rb") as fileRead:
toWrite = b""
byteRead = fileRead.read()
data_out = lzma_machine.compress(byteRead)
#print(data_out.hex())
fs = open(testFileName, 'wb')
fs.write(data_out)
fs.close()
fileRead.close()
However, the result of both are different despite I'm using the same "Properties" 5d, and dictionary size 0x8000. I can see that the output of python generated lzma file produced all 0xFF for the "Uncompressed Size" field, unlike the one generated using lzma.exe
Hopefully any expert can point out my mistakes here?
lzma.exe generated file
python lzma generated file
I was experiencing the same problem as you, and now I can say, that you are probably not doing any mistakes. It looks like modern lzma implementations don't add a value of uncompressed size in the header. They use simple "unknown size", the value of -1, which is sufficient for modern lzma decompressors. However, if you need to have the value of uncompressed size in the header, simply replace those binary data:
uncompressed_size = len(byteRead)
data_out = data_out[:5] + uncompressed_size.to_bytes(8, 'little') + data_out[13:]

Reading a python binary file with a C# BinaryReader

I need to export some data like integers, floats etc. to a binary file with python. Afterwards, I have to read the file with C# again but it doesnt work for me.
I tried several ways of writing a binary file with python and it works as long as I read it with python as well:
a = 3
b = 5
with open('test.tcd', 'wb') as file:
file.write(bytes(a))
file.write(bytes(b))
or writing it like this:
import pickle as p
with open('test.tcd', 'wb') as file:
p.dump([a, b], file)
Currently I am reading the file in C# like this:
static void LoadFile(String path)
{
BinaryReader br = new BinaryReader(new FileStream(path, FileMode.Open));
int a = br.ReadInt32();
int b = br.ReadInt32();
System.Diagnostics.Debug.WriteLine(a);
System.Diagnostics.Debug.WriteLine(b);
br.Close();
}
Unfortunately the output isnt 3 and 5, instead my output is just zero. How do i read or write the binary file properly?
In Python, you have to write your integers with 4 bytes each. Read more here: struct.pack
a = 3
b = 5
with open('test.tcd', 'wb') as file:
f.write(struct.pack("<i", 3))
f.write(struct.pack("<i", 5))
Your C# code should work now.
It's possible python is not writing data in the same format that C# expects. You may need to swap byte endianess or do something else. You could read the raw bytes instead and use BitConverter to see if that fixes it.
Another option is to specify the endianess explicitly in python, I think big endian is the default binary reader format for C#.
an_int = 5
a_bytes_big = an_int.to_bytes(2, 'big')
print(a_bytes_big)
Output
b'\x00\x05'
a_bytes_little = an_int.to_bytes(2, 'little')
print(a_bytes_little)
Output
b'\x05\x00'

Reading results of gurobi optimisation ("results.sol") in new python script

I am trying to run a rolling horizon optimisation where I have multiple optimisation scripts, each generating their own results. Instead of printing results to screen at every interval, I want to write each of the results using model.write("results.sol") - and then read them back into a results processing script (separate python script).
I have tried using read("results.sol") using Python, but the file format is not recognised. Is there any way that you can read/process the .sol file format that Gurobi outputs? It would seem bizarre if you cannot read the .sol file at some later point and generate plots etc.
Maybe I have missed something blindingly obvious.
Hard to answer without seeing your code as we have to guess what you are doing.
But well...
When you use
model.write("out.sol")
Gurobi will use it's own format to write it (and what is written is inferred from the file-suffix).
This can easily be read by:
model.read("out.sol")
If you used
x = read("out.sol")
you are using python's basic IO-tools and of course python won't interpret that file in respect to the format. Furthermore reading like that is text-mode (and maybe binary is required; not sure).
General rule: if you wrote the solution using a class-method of class model, then read using a class-method of class model too.
The usage above is normally used to reinstate some state of your model (e.g. MIP-start). If you want to plot it, you will have to do further work. In this case, using python's IO tools might be a good idea and you should respect the format described here. This could be read as csv or manually (and opposed to my remark earlier: it is text-mode; not binary).
So assuming the example from the link is in file gur.sol:
import csv
with open('gur.sol', newline='\n') as csvfile:
reader = csv.reader((line.replace(' ', ' ') for line in csvfile), delimiter=' ')
next(reader) # skip header
sol = {}
for var, value in reader:
sol[var] = float(value)
print(sol)
Output:
{'z': 0.2, 'x': 1.0, 'y': 0.5}
Remarks:
Code is ugly because python's csv module has some limitations
Delimiter is two-spaces in this format and we need to hack the code to read it (as only one character is allowed in this function)
Code might be tailored to python 3 (what i'm using; probably the next() method will be different in py2)
pandas would be much much better for this purpose (huge tool with a very good csv_reader)

Categories