Extract pickled data - python

Is there a way to extract a file generated by pickle, while you don't even have the same classes as the original project that pickled it?
I'm trying to read a pickled file generated in an older project. I don't have the source code of the old project. But I would like to retrieve the data of the file, even as plain dictionaries. Is there another solution or should I just use a binary editor?
Thanks,

Related

Saving multiple pandas dataframes to one file and then reading them back

I am working on a python app and I am trying to logistically plan how saving/loading files will work
In this app multiple data sheets will be loaded and they will all be used in some capacity, the problem I am having is that I want users to be able to save the changes to the files that they have imported.
I was thinking more of a save file that holds the content of all the files they have loaded into the app. But I have no idea how to structure this! I've done some research and heard that parquet/feather are good formats for saving files but I don't know if they support saving multiple data frames to the same file.
The most important part about this is that the files need to be loadable by pandas/another library so that a user can save the changes they've made and then load it back up later if they were so inclined
Any advice is appreciated!

How to store all files universally in memory (variable) Python

Essentially, I would want to be able to go through a folder with text files, jpg files, csv files, png files, any kind of file, and be able to load it into memory as some kind of object. When necessary, I would then like to be able to save it and create an instance on disk. This would need to work for any kind of file type.
I would create a class that would contain the file data itself as well as metadata, but that is not necessary for my question.
Is this possible ,and if so, how can I do this?

Store pkl / binary in MetaData

I am writing a function that is supposed to store a text representation of a custom class object, cl
I have some code that writes to a file and takes the necessary information out of cl.
Now I need to go backwards, read the file and return a new instance of cl. The problem is, the file doesn't keep all of the important parts of cl because for the purpose of this text document parts of it are unnecessary.
A .jpg file allows you to store meta data like shutter speed and location. I would like to store the parts of cl that are not supposed to be in the text portion in the meta data of a .txt or .csv file. Is there a way to explicitly write something to the metadata of a text file in Python?
Additionally, would it be possible to write the byte-code .pkl representation of the entire object in the metadata?
Text files don't have meta data in the same way that a jpg file does. A jpeg file is specifically designed to have ways of including meta data as extra structured information in the image. Text files aren't: every character in the text file is generally displayed to the user.
Similarly, every thing in a CSV file is part of one cell in the table represented by the file.
That said, there are some things similar to text file metadata that have existed or exist over the years that might give you some ideas. I don't think any of these is ideal, but I'll give some examples to give you an idea how complex the area of meta data is and what people have done in similar situations.
Some filesystems have meta data associated with each file that can be extended. As an example, NTFS has streams; HFS and HFSplus have resource forks or other attributes; Linux has extended attributes on most of its filesystems. You could potentially store your pickle information in those filesystem metadata. There are disadvantages. Some filesystems don't have this meta data. Some tools for copying and manipulating files will not recognize (or intentionally strip) meta data.
You could have a .txt file and a .pcl file, where the .txt file contains your text representation and the .pkl file contained the other information.
Back in the day, some DOS programs would stop reading a text file at a DOS EOF (decimal character 26). I don't think anything behaves like that, but it's an example that there are file formats that allowed you to end the file and then still have extra data that programs could use.
With a format like HTML or an actual spreadsheet instead of CSV, there are ways you could include things in meta data easily.

Saving File to Dictionary - Python

Is there any way to save a file to a dictionary under python?
(Indeed I am not asking how to export dictionaries to files here.)
Maybe a file could be pickled or transformed into me python object
and then saved.
Is this generally advisable?
Or should I only save the file's path to the dictionary?
How would I retrieve the file later on?
The background of my question relates to my usage of dictionaries as
databases. I use the handy little module sqlitshelf as a form of permanent dictionary: https://github.com/shish/sqliteshelf
Each dataset includes a unique config file (~500 kB) which is retrieved from an application. Upon opening of the respective data set the config file are copied into and back from the working directory of the application. I might use a folder instead where I save the config files to. Yet, it strikes me as more elegant to save them together with the other data.

XLSX to XML with schema map

I have built a couple basic workflows using XML tools on top of XLSX workbooks that are mapped to an XML schema. You would enter data into the spreadsheet, export the XML and I had some scripts that would then work with the data.
Now I'm trying to eliminate that step and build a more integrated and portable tool that others could use easily by moving from XSLT/XQuery to Python. I would still like to use Excel for the data entry, but have the Python script read the XLSX file directly.
I found a bunch of easy to use libraries to read from Excel but they need to explicitly state what cells the data is in, like range('A1:C2') etc. The useful thing about using the XML maps was that users could resize or even move tables to fit different rows and rename sheets. Is their a library that would let me select tables as units?
Another approach I tried was to just uncompress the XLSX and just parse the XML directly. The problem with that is that our data is quite complex (taking up to 30-50 sheets) and parsing that in the uncompressed XLSX structure is really daunting. I did find my XML schema within the uncompressed XLSX, so is there any way to reformat the data into this schema outside of Excel? (basically what Excel does when I save a workbook as an .xml file)
The Excel format is pretty complicated with dependencies between components – you can't for example be sure of that the order of the worksheets in the folder worksheets has any bearing to what the file looks like in Excel.
I don't really understand exactly what you're trying to do but the existing libraries present an interface for client code that hides the XML layer. If you don't want that you'll have to root around for the parts that you find useful. In openpyxl you want to look at the stuff in openpyxl/reader specifically worksheet.py.
However, you might have better luck using lxml as this (using libxml2 in the background) will allow you load a single XML into Python and manipulate it directly using the .objectify() method. We don't do this in openpyxl because XML trees consume a lot of memory (and many people have very large worksheets) but the library for working with Powerpoint shows just how easy this can be.

Categories