Python file to read in binary and write in xml

Python file to read in binary and write in xml - python

I have a binary file which was created by a VBA file (I don't work with VBA or binary at all) but I need to get Python to read this binary file (which includes a list of inputs for a calculation) and then write these values into an xml file.
If I know the order of the inputs into the file that is created, is it possible to read the binary code line by line, to get input by input and then write into the xml?
I have to use Python rather than VBA since I am not authorised to change the original VBA files.
I apologise for the lack of information, I only know a bit of Python and have never worked with VBA or binary. I really appreciate any help anyone can give me! Thank you =)

If you have the VBA code that made the file you should be able to copy the data structure in python and then deserialize it.

Python is probably not the best tool, I would recommend VBA as it is going to be best suited to reading that file. Then, create your xml file as output.

Related

Reading data from a binary file in Python

I have a python code that I don't really understand since I am super new to python that reads data from a binary file and spits out a CSV file.
The code is in this site I can't post it on here because it is huge.
https://github.com/PX4/Firmware/blob/master/Tools/sdlog2/sdlog2_dump.py
Now I know you must have the key to a binary file to be able to parse it and use its data. What I don't understand is how that code that I linked above does this or where the key in the code is. I have tried reading the code and breaking it down, but due to my limited knowledge I have no idea where to go from here.
I am trying to extract the key so that I can write my own code using the key to parse the data in the binary file.

Writing data to a zip archive in Python

I've been told in the past that there is simply no easy way to write a string a zip file. It's okay to READ from a zip archive, but if you want to write to a zip file, the best option is to extract it, make the changes, and then zip it back up again. However, the library I am using (openpyxl) accomplishes the feat of writing to a zip file without any extraction. This package uses the writestr() function in the python ZipFile library to make changes. Can someone explain to me how exactly this is possible? I know it has something to do with writing bytes but I can't fine a good explanation.
I'm aware of the vagueness of this question, but that's a circumstance of my lack of knowledge on the topic.

openpyxl does not modify the files in place because you can't do this with zipfiles. You must extract, modify and archive. We just hide this process in the library.

Reading from a CSV file while it is being written to

So before I start I know this is not the proper way to go about doing this but this is the only method that I have for accessing the data I need on the fly.
I have a system which is writing telemetry data to a .csv file while it is running. I need to see some of this data while it is being written but it is not being broadcast in a manner which allows me to do this.
Question: How do I read from a CSV file which is being written to safely.
Typically I would open the file and look at the values but I am hoping to be able to write a python script which is able to examine the csv for me and report the most recent values written without compromising the systems ability to write to the file.
I have absolutely NO access to the system or the manner in which it is writing to the CSV I am only able to see that the CSV file is being updated as the system runs.
Again I know this is NOT the right way to do this but any help you could provide would be extremely helpful.
This is mostly being run in a Windows environment

You can do something like:
tailf csv_file | python your_script.py
and read from sys.stdin

create pdf from python

I'm looking to generate PDF's from a Python application.
They start relatively simple but some may become more complex (Essentially letter like documents but will include watermarks for example later)
I've worked in raw postscript before and providing I can generate the correct headers etc and file at the end of it I want to avoid use of complex libs that may not do entirely what I want. Some seem to have got bitrot and no longer supported (pypdf and pypdf2) Especially when I know PDF/Postscript can do exactly what I need. PDF content really isn't that complex.
I can generate EPS (Encapsulated postscript) fine by just writing the appropriate text headers to file and my postscript code. But Inspecting PDF's there is a lil binary header I'm not sure how to generate.
I could generate an EPS and convert it. I'm not overly happy with this as the production environment is a Windows 2008 server (Dev is Ubuntu 12.04) and making something and converting it seems very silly.
Has anyone done this before?
Am I being pedantic by not wanting to use a library?

borrowed from ask.yahoo
A PDF file starts with "%PDF-1.1" if it is a version 1.1 type of PDF file. You can read PDF files ok when they don't have binary data objects stored in them, and you could even make one using Notepad if you didn't need to store a binary object like a Paint bitmap in it.
But after seeing the "%PDF-1.1" you ignore what's after that (Adobe Reader does, too) and go straight to the end of the file to where there is a line that says "%%EOF". That's always the last thing in the file; and if that's there you know that just a few characters before that place in the file there's the word "startxref" followed by a number. This number tells a reader program where to look in the file to find the start of the list of items describing the structure of the file. These items in the list can be page objects, dictionary objects, or stream objects (like the binary data of a bitmap), and each one has "obj" and "endobj" marking out where its description starts and ends.
For fairly simple PDF files, you might be able to type the text in just like you did with Notepad to make a working PDF file that Adobe Reader and other PDF viewer programs could read and display correctly.
Doing something like this is a challenge, even for a simple file, and you'd really have to know what you're doing to get any binary data into the file where it's supposed to go; but for character data, you'd just be able to type it in. And all of the commands used in the PDF are in the form of strings that you could type in. The hardest part is calculating those numbers that give the file offsets for items in the file (such as the number following "startxref").
If the way the file format is laid out intrigues you, go ahead and read the PDF manual, which tells the whole story.
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
but really you should probably just use a library
Thanks to #LukasGraf for providing this link http://www.gnupdf.org/Introduction_to_PDF that shows how to create a simple hello world pdf from scratch

As long as you're working in Python 2.7, Reportlab seems to be the best solution out there at the moment. It's quite full-featured, and can be a little complex to work with, depending on exactly what you're doing with it, but since you seem to be familiar with PDF internals in general hopefully the learning curve won't be too steep.

I recommend you to use a library. I spent a lot of time creating pdfme and learned a lot of things along the way, but it's not something you would do for a single project. If you want to use my library check the docs here.

How to parse a .shp file?

I am interested in gleaning information from an ESRI .shp file.
Specifically the .shp file of a polyline feature class.
When I open the .dbf of a feature class, I get what I would expect: a table that can open in excel and contains the information from the feature class' table.
However, when I try to open a .shp file in any program (excel, textpad, etc...) all I get is a bunch of gibberish and unusual ASCII characters.
I would like to use Python (2.x) to interpret this file and get information out of it (in this case the vertices of the polyline).
I do not want to use any modules or non built-in tools, as I am genuinely interested in how this process would work and I don't want any dependencies.
Thank you for any hints or points in the right direction you can give!

Your question, basically, is "I have a file full of data stored in an arbitrary binary format. How can I use python to read such a file?"
The answer is, this link contains a description of the format of the file. Write a dissector based on the technical specification.

If you don't want to go to all the trouble of writing a parser, you should take look at pyshp, a pure Python shapefile library. I've been using it for a couple of months now, and have found it quite easy to use.
There's also a python binding to shapelib, if you search the web. But I found the pure Python solution easier to hack around with.

might be a long shot, but you should check out ctypes, and maybe use the .dll file that came with a program (if it even exists lol) that can read that type of file. in my experience, things get weird when u start digging around .dlls

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.