Store plotted data for later use - python

I do much (practically all) of my data analysis in Jupyter/iPython-notebooks. For convenience, I obviously also plot my data in those notebooks using matplotlib/pyplot.
Some of those plots I need to recreate externally later on, for example to use them in latex. For this I save the corresponding data as textfiles to the harddrive. Right now, I manually create a numpy-array by stacking all the data needed for the plot, and save this using numpy.savetxt.
What I would like to have is a way to save all data needed for a specific plot written to the same file in an (semi)automatic way, but I am at a loss if it comes to the smart way of doing so.
Thus I have two questions:
Is it possible (and save to do) to create something like a plot-memory object, that stores all data plotted per figure, and has a method similar to Memoryobject.save_plot_to_file(figname)? This object would need to know which figure I am working on, so I would need to create a layer above matplotlib, or get this information from the matplotlib objects
Is there a simpler way? The python universe is huge, and I do not know half of it. Maybe something like this already exists?
Edit: Clarification: I do not want to save the figure object. What I want to do is something like this:
fig = plt.figure()
fig.plot(x1, y1)
fig.plot(x2, y2 + y3)
# and at a later point
arrays = get_data_from_plot(fig)
data = process(arrays)
np.savetxt('textfile', data)

You could pickle the object (using the cPickle module). See this question here.

Related

Python interactive plotting for large data sets

Suppose I have a dataset with 100k rows (1000 different times, 100 different series, an observation for each, and auxilliary information). I'd like to create something like the following:
(1) first panel of plot has time on x axis, and average of the different series (and standard error) on y axis.
(2) based off the time slice (vertical line) we hover over in panel 1, display a (potentially down sampled) scatter plot of auxilliary information versus the series value at that time slice.
I've looked into a few options for this: (1) matplotlib + ipywidgets doesn't seem to handle it unless you explicitly select points via a slider. This also doesn't translate well to html exporting. This is not ideal, but is potentially workable. (2) altair - this library is pretty sleek, but from my understanding, I need to give it the whole dataset for it to handle the interactions, but it also can't handle more than 5kish data points. This would preclude my use case, correct?
Any suggestions as to how to proceed? Is what I'm asking impossible in the current state of things?
You can work with datasets larger than 5k rows in Altair, as specified in this section of the docs.
One of the most convenient solutions in my opinion is to install altair_data_server and then add alt.data_transformers.enable('data_server') on the top of your notebooks and scripts. This server will provide the data to Altair as long as your Python process is running so there is no need to include all the data as part of the created chart specification, which means that the 5k error will be avoided. The main drawback is that it wont work if you export to a standalone HTML because you rely on being in an environment where the server Python process is running.

Matplotlib: Saving an self-contained, editable Figure

Is there way to save a "Figure" in matplotlib to a file such that if you later wanted to modify the Figure, e.g. change data points, resize the figure, etc. you could load up the file in a new python script and do that?
Right now I save the majority of my plots as Pdfs, but that doesn't allow me to make edits later on. I have to go dig up my old source code and data files. I've lost track of the number of times I've lost the plot-generating code and have to essentially reproduce it all from scratch.
It would be nice if I could just save a plot as a self-contained data file, like Photoshop does with its .psd files, so that I can just load it up directly, type "object.plot()", and not have to worry about external dependencies. Does such a format exist, or if not is there any way I could achieve this?
There is a method of saving the plotted object called pickling. I don't have much experience with it but it should allow you to save the plot to a file using
fig = plt.figure
pl.dump(fig, file('file_name.pickle','w'))
and using
fig = pl.load(open('file_name.pickle','rb'))
fig.show()
to load the saved graph.
Matplotlib warns that, "Pickle files are not designed for long term storage, are unsupported when restoring a pickle saved in another matplotlib version". To be safe, I would just save the array containing the data to the plot to either a .csv or .txt file, and keep this file in a folder with the python file to plot the graph. This way you will always be able to plot your data (no matter the version of matplotlib you are using). You will also have the data and code in the same place, and you can easily read the data from the .csv or .txt file, save it to arrays, and graph it using
file = open("file_name.txt", "r")
if file.mode == 'r':
data = f.read().splitlines()
data_array1 = data[0].split(",")
data_array2 = data[1].split(",")
p, = plt.plot(data_array1, data_array2)
I also suggest uploading your python files along with your .csv or .txt files to Github.
If you would like to read more about pickling in matplotlib I suggest reading the two pages linked below.
(1) Pickle figures from matplotlib
and (2) https://matplotlib.org/3.1.3/users/prev_whats_new/whats_new_1.2.html#figures-are-picklable

Include a summary of text in grided form in python

I have a collection of values and labels that I'd like to include as a summary table within a matplotlib plot.
My table looks similar to this:
I'm currently using the matplotlib.pyplot text method applied a previously created axis object (ie, ax.text()) to specify locations for each of the entries and labels, but it's incredibly tedious and imprecise.
Imagine there's a more efficient way to do this, but haven't found one despite being somewhat familiar with a few of the data visualization libraries in python (eg, seaborn, plotly etc).
To be clear, the matplotlib table method, or answers here, don't address my question. Looking for an option with cleaner-looking output.

Bokeh line graph looping

I’ve been working on bokeh plots and I’m trying to plot a line graph taking values from a database. But the plot kind of traces back to the initial point and I don’t want that. I want a plot which starts at one point and stops at a certain point (and circle back). I’ve tried plotting it on other tools like SQLite browser and Excel and the plot seems ok which means I must be doing something wrong with the bokeh stuff and that the data points itself are not in error.
I’ve attached the images for reference and the line of code doing the line plot. Is there something I’ve missed?
>>> image = fig.line(“x”, “y”, color=color, source=something)
(Assume x and y are integer values and I’ve specified x and y ranges as DataRange1d(bounds=(0,None)))
Bokeh does not "auto-close" lines. You can see this is the case by looking at any number of examples in the docs and repository, but here is one in particular:
http://docs.bokeh.org/en/latest/docs/gallery/stocks.html
Bokeh's .line method will only "close up" if that is what is in the data (i.e., if the last point in the data is a repeat of the first point). I suggest you actually inspect the data values in source.data and I believe you will find this to be the case. Then the question is why is that the case and how to prevent it from doing that, but that is not really a Bokeh question.

Save figure parameters after interactive tweaking

I often find myself using matplotlib to quickly display data, then later going back and tweaking my plotting code to make pretty figures. In this process, I often use the interactive plot window to adjust things like spacing, zooming, cropping, etc. While I can easily save the resulting figure to an image file, what I really want is to save the sequence of function calls/parameters that produced it.
Note that I don't particularly care to open the same figure again (as in Saving interactive Matplotlib figures). Even something as simple as being able to print the various properties of the figure and axes would be useful.
While I don't have the answer to your specific question, I'd generally suggest using the Ipython Notebook for these things (and much more!)
Make sure you have %pylab inline in one cell.
When you plot, it will display it in the notebook itself. Then within your cell, just keep experimenting until you have it right (use Ctrl-Enter in the cell). Now the cell will have all the statements you need (and no more!)
The difference between the command line interpreter and the notebook is that the former all statements you typed which leads to a lot of clutter. With the notebook you can edit the line in place.
A similar question here
has an answer I just posted here.
The gist: use MatPlotLib's picklable figure object to save the figure object to a file. See the aforementioned answer for a full example.
Here's a shortened example:
fig, ax = matplotlib.pyplot.subplots()
# plot some stuff
import pickle
pickle.dump( fig, open('SaveToFile.pickle', 'wb') )
This does indeed save all plotting tweaks, even those made by the GUI subplot-adjuster. Unpickling via pickle.load() still allows you to interact via CLI or GUI.

Categories