I implemented a small simulation environment and have been saving my evaluation results as Pandas data frames in the form of pickle files.
Later, to analyze the results, I have a Jupyter notebooks, where I use Panda's df = pd.read_pickle(path) to load the data frames again and visualize the data.
I also annotated metadata as attributes of the data frames, using df.attr, which are loaded correctly afterwards.
This used to work fine. Unfortunately, my simulator has evolved and the corresponding Python package changed names, which leads to problems when trying to read old results.
Now, pd.read_pickle() still works fine for newly generated results.
But for old results it breaks with a ModuleNotFoundError, telling me that it doesn't find the simulator_old module, i.e., the previous version of the package with the old name.
I'm not sure why and where the dependency on my package comes from. Maybe I wrote some object from the old package as data frame attribute. I can't figure it out because it always simply breaks.
I want to be able to read old and new results and have pd.read_pickle() simply skip any entries that it cannot read but read everything else.
Is there anything like that that I can do to recover my old results? E.g., to tell pickle to ignore such errors?
Related
I am trying to use dill.dump_module and dill.load_module to save an entire workspace as a way to replicate the Matlab's workspace saving functionality. As far as I can tell these two funcitons have replaced dump_session and load_session in the latest version of dill but with the same functionality.
This process works for very basic examples, such as:
import dill
a = (100,100)
dill.dump_module('my_workspace')
del a
dill.load_module('my_workspace')
print(a)
However when I use this procedure on a much larger workspace (with many types such as dataframes, figures etc), I sometimes get no variables loaded back in.
For example:
I run a code
I call dump_module('my_workspace') and a 250 mB file is saved
I clear the workspace and all variables
I call load_module('my_workspace'). No errors are thrown, but no variables are loaded into the workspace.
However, sometimes the entire workspace is loaded as expected.
I cannot figure out what is producing this error, and why it is inconsistent.
I am trying this using Spyder 5.2.2.
My code is an update of an existing script which outputs an xslx file with a lot of data. The original script is pretty stable and has worked for ages.
What I'm trying to do is that, after the original script has ended and the xslx is created, I want to input the file into Pandas and then run a series of analyses on it, using the methods .loc(), .iloc(), .index().
But after I read the file into a variable, when I hit '.' after the variable's name in PyCharm, I get all the dataframe and NDArray methods... except those three that I need.
No errors, no explanations. They are just not there.
And if I ignore this and go on and type them up manually, the variable I put the results into doesn't show ANY methods when I hit '.' for it, next (instead of showing the methods for, say, a series).
I've tried clearing the xslx file of all formatting (it originally had empty lines hidden). I tried running .info() and .head() to make sure they both run fine (They seem to, yes). I even updated my code from Python 2.7 to Python 3.7 using the 2to3 scripts to see if that might change anything. It didn't.
import pandas as pd
analysis_file = pd.read_excel("F:\\myprogram\\output1.xlsx", "Sheet1")
analysis_file. <--- The problem's here
Really not sure how to proceed, and no one I've asked so far has been able to help me.
I am using Glueviz 0.7.2 as part of the Anaconda package, on OSX. Glueviz is a data visualization and exploration tool.
I am regularly regenerating an updated version of the same data set from an external model, then importing that data set into Glueviz.
Currently I can not find a way to have Glueviz refresh or update an existing imported data set.
I can add a new data set, ie a second more updated version of the data from the model as a new import data set, but this does not replace the original, and does not enable the new data to show in the graphs set up in Glueviz in a simple way.
It seems the only solution to plot the updated data, is to start a new session, and needing to take some time to set up all the plots again. Most tedious!
As a python running application, Glueviz must be storing the imported data set somewhere. Hence I thinking a work around would be to replace that existing data with the updated data. With a restart of Glueviz, and a reload of that saved session, I imagine it will not know the difference and simply graph the updated data set within the existing graphs. Problem solved.
I am not sure how Glueviz as a python package stores the data file, and what python application would be the best to use to update that data file.
As it turns out, the data is not stored in the Glueviz session file, but rather loaded fresh each time the saved session is opened from the original data source file.
Hence the solution is simple: Replace the data source file with a new file (of the same type) in with the updated data.
The updated data file must have the exact same name, be in the exact same location, and I assume, must have only values within the source data file changed, not the amount of data or columns titles or other aspects changed from the original file.
Having done that, reopen Glueviz, reload that session file, and the graphs in Glueviz should update with the updated data.
We have a dataframe we are working it in a ipython notebook. Granted, if one could save a dataframe in such a way that the whole group could have access to it through their notebooks, would be ideal, and I'd love to know how to do that. However could you help with the following specific problem?
When we do df.to_csv("Csv file name") it appears that it is located in the exact same place as the files we placed in object storage to utilize in the ipython notebook. However, when one goes to Manage Files, it's nowhere to be found.
When one runs pd.DataFrame.to_csv(df), text of the csv file is apparently given. However when one copies that into a text editor (ex- Sublime text), saves it at a csv, and attempts to read it in to a dataframe, the expected dataframe is not yielded.
How does one export a dataframe to csv format, and then access it?
I'm not familiar with bluemix, but it sounds like you're trying to save a pandas dataframe in a way that all of your collaborators can access and it look the same way for everyone.
Maybe saving and reading from CSVs is messing up the formatting of your dataframe. Have you tried using pickling? Since pickling is based around python, it should give consistent results.
Try this:
import pandas as pd
pd.to_pickle(df, "/path/to/pickle/My_pickle")
and on the read side:
df_read = pd.read_pickle("/path/to/pickle/My_pickle")
I made a sheet with a graph using python and openpyxl. Later on in the code I add some extra cells that I would also like to see in the graph. Is there a way that I can change the range of cell that the graph is using, or maybe there is another library that lets me do this?
Example:
my graph initially uses columns A1:B10, then I want to update it to use A1:D10
Currently I am deleting the sheet, and recreating it, writing back the values and making the graph again, the problem is that this is a big process that takes days, and there will be a point that rewriting the sheet will take some time.
At the moment it is not possible to preserve charts in existing files. With rewrite in version 2.3 of openpyxl the groundwork has been laid that will make this possible. When it happens will depend on the resources available to do the work. Pull requests gladly accepted.
In the meantime you might be able find a workaround by writing macros to create the charts for you because macros are preserved. A bit clumsy but should work.
Make sure that you are using version 2.3 or higher when working on charts as the API has changed slightly.