Access data in matplotlib histogram axes - python

Where is the data plotted stored in a matplotlib ax object drawing a histogram?
My scenario:
I've written a function which draws a custom histogram using matplotlib. I am writing a unit test and would like to test whether the plotted data
Ideal behaviour:
import matplotlib.pyplot as plt
f, ax = plt.subplots()
ax.hist(some_data)
data_i_want = ax.plotted_data

I'm not sure what exactly you want to achieve, but the plt.hist(...) function returns the data for the histogram:
histinfo = plt.hist(data)
histinfo[0] #This is the information about the # of instances
histinfo[1] #This is the information about the position of the bins
If you want to get the information from the plot itself at all cost (assuming you have a barplot):
container = ax.containers[0] #https://matplotlib.org/3.1.1/api/container_api.html
for rect in container: #https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle
print(rect.xy)
You can get the containers, and those containers will contain the information about the plotted bars (rectangles) at the commented URL, you can find every information about that.
Ps.: You probably have to adapt the code for the specific instance, but this is a way to get some information from the plot. (It is possible that there is a better way to do it, this is the best i know)

Related

Why am I unable to make a plot containing subplots in plotly using a px.scatter plot?

I have been trying to make a figure using plotly that combines multiple figures together. In order to do this, I have been trying to use the make_subplots function, but I have found it very difficult to have the plots added in such a way that they are properly formatted. I can currently make singular plots (as seen directly below):
However, whenever I try to combine these singular plots using make_subplots, I end up with this:
This figure has the subplots set up completely wrong, since I need each of the four subplots to contain data pertaining to the four methods (A, B, C, and D). In other words, I would like to have four subplots that look like my singular plot example above.
I have set up the code in the following way:
for sequence in sequences:
#process for making sequence profile is done here
sequence_df = pd.DataFrame(sequence_profile)
row_number=1
grand_figure = make_subplots(rows=4, cols=1)
#there are four groups per sequence, so the grand figure should have four subplots in total
for group in sequence_df["group"].unique():
figure_df_group = sequence_df[(sequence_df["group"]==group)]
figure_df_group.sort_values("sample", ascending=True, inplace=True)
figure = px.line(figure_df_group, x = figure_df_group["sample"], y = figure_df_group["intensity"], color= figure_df_group["method"])
figure.update_xaxes(title= "sample")
figure.update_traces(mode='markers+lines')
#note: the next line fails, since data must be extracted from the figure, hence why it is commented out
#grand_figure.append_trace(figure, row = row_number, col=1)
figure.update_layout(title_text="{} Profile Plot".format(sequence))
grand_figure.append_trace(figure.data[0], row = row_number, col=1)
row_number+=1
figure.write_image(os.path.join(output_directory+"{}_profile_plot_subplots_in_{}.jpg".format(sequence, group)))
grand_figure.write_image(os.path.join(output_directory+"grand_figure_{}_profile_plot_subplots.jpg".format(sequence)))
I have tried following directions (like for example, here: ValueError: Invalid element(s) received for the 'data' property) but I was unable to get my figures added as is as subplots. At first it seemed like I needed to use the graph object (go) module in plotly (https://plotly.com/python/subplots/), but I would really like to keep the formatting/design of my current singular plot. I just want the plots to be conglomerated in groups of four. However, when I try to add the subplots like I currently do, I need to use the data property of the figure, which causes the design of my scatter plot to be completely messed up. Any help for how I can ameliorate this problem would be great.
Ok, so I found a solution here. Rather than using the make_subplots function, I just instead exported all the figures onto an .html file (Plotly saving multiple plots into a single html) and then converted it into an image (HTML to IMAGE using Python). This isn't exactly the approach I would have preferred to have, but it does work.
UPDATE
I have found that plotly express offers another solution, as the px.line object has the parameter of facet that allows one to set up multiple subplots within their plot. My code is set up like this, and is different from the code above in that the dataframe does not need to be iterated in a for loop based on its groups:
sequence_df = pd.DataFrame(sequence_profile)
figure = px.line(sequence_df, x = sequence_df["sample"], y = sequence_df["intensity"], color= sequence_df["method"], facet_col= sequence_df["group"])
Although it still needs more formatting, my plot now looks like this, which is works much better for my purposes:

In a pairplot, how can I not show confidence intervals but display grid lines instead? [duplicate]

I'm plotting two data series with Pandas with seaborn imported. Ideally I would like the horizontal grid lines shared between both the left and the right y-axis, but I'm under the impression that this is hard to do.
As a compromise I would like to remove the grid lines all together. The following code however produces the horizontal gridlines for the secondary y-axis.
import pandas as pd
import numpy as np
import seaborn as sns
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'],grid=False)
You can take the Axes object out after plotting and perform .grid(False) on both axes.
# Gets the axes object out after plotting
ax = data.plot(...)
# Turns off grid on the left Axis.
ax.grid(False)
# Turns off grid on the secondary (right) Axis.
ax.right_ax.grid(False)
sns.set_style("whitegrid", {'axes.grid' : False})
Note that the style can be whichever valid one that you choose.
For a nice article on this, refer to this site.
The problem is with using the default pandas formatting (or whatever formatting you chose). Not sure how things work behind the scenes, but these parameters are trumping the formatting that you pass as in the plot function. You can see a list of them here in the mpl_style dictionary
In order to get around it, you can do this:
import pandas as pd
pd.options.display.mpl_style = 'default'
new_style = {'grid': False}
matplotlib.rc('axes', **new_style)
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'])
This feels like buggy behavior in Pandas, with not all of the keyword arguments getting passed to both Axes. But if you want to have the grid off by default in seaborn, you just need to call sns.set_style("dark"). You can also use sns.axes_style in a with statement if you only want to change the default for one figure.
You can just set:
sns.set_style("ticks")
It goes back to normal.

Using Bokeh for a dropdown menu which would create different charts

I am trying to create a dropdown interface for my work. My dataset looks like this, it is a random dataset
Now I would like 2 dropdowns say CNN and BBC here. After selecting a channel from dropdown, I would like to select a Topic which would produce a bar chart according to it's value.
I am trying to access just one value initially, but it gives me a blank graph.
from bokeh.plotting import figure
from bokeh.io import output_notebook,show,output_file
p=figure()
import csv
data = [row for row in csv.reader(open('C:/Users/Aishwarya/Documents/books/books_q4/crowd_computing/Bokeh-Python-Visualization-master/interactive/data/data.csv', 'r',encoding="utf8"))]
p.vbar(x=data[1][2], width=0.5, bottom=0,
top=data[1][1], color="firebrick")
#output_notebook()
output_file('1.html')
show(p)
There are probably two issues going on:
The first is that if you are using categorical coordinates on an axis, e.g. "CNN" which it appears you are expecting to use, then you need to etll Bokeh what the categorical range is:
p.figure(x_range=["CNN", ...]) # list all the factors for x_range
If you need to update the axis later you can update the range directly:
p.x_range.factors = [...]
Additionally, as of Bokeh 0.13.0 there is a current open issue that prevents "single" factors from working as coordinates: #6660 Coordinates should accept single categorical values. The upshot is that you will have to put the data in a Bokeh ColumnDataSource explicityl (always an option), or in this case a workaround is also just to pass a single-item list instead:
p.vbar(x=["cnn"], ...)
Here is a complete update of your code, with some fake data put in:
from bokeh.plotting import figure
from bokeh.io import show
p = figure(x_range=["cnn"])
p.vbar(x=["cnn"], width=0.5, bottom=0, top=10, color="firebrick")
show(p)
I would also recommend studying the User's guide section Handling Categorical Data.

Show only the n'th ticklabel in a pandas boxplot

I am new to pandas and matplotlib, but not to Python. I have two questions; a primary and a secondary one.
Primary:
I have a pandas boxplot with FICO score on the x-axis and interest rate on the y-axis.
My x-axis is all messed up since the FICO scores are overwriting each other.
I'd like to show only every 4th or 5th ticklabel on the x-axis for a couple of reasons:
in general it's less chart-junky
in this case it will allow the labels to actually be read.
My code snippet is as follows:
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p = loansmin.boxplot('Interest.Rate','FICO.Score')
I saved the return value in p as I thought I might need to manipulate the plot further which I do now.
Secondary:
How do I access the plot, subplot, axes objects from pandas boxplot.
p above is an matplotlib.axes.AxesSubplot object.
help(matplotlib.axes.AxesSubplot) gives a message saying:
'AttributeError: 'module' object has no attribute 'AxesSubplot'
dir(matplotlib.axes) lists Axes, Subplot and Subplotbase as in that namespace but no AxesSubplot. How do I understand this returned object better?
As I explored further I found that one could explore the returned object p via dir().
Doing this I found a long list of useful methods, amongst which was set_xticklabels.
Doing help(p.set_xticklabels) gave some cryptic, but still useful, help - essentially suggesting passing in a list of strings for ticklabels.
I then tried doing the following - adding set_xticklabels to the end of the last line in the above code effectively chaining the invocations.
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score').set_xticklabels(['650','','','','','700'])
This gave the desired result. I suspect there's a better way as in the way matplotlib does it which allows you to show every n'th label. But for immediate use this works, and also allows setting labels where they are not periodic for whatever reason, if you need that.
As usual, writing out the question explicitly helped me find the answer. And if anyone can help me get to the underlying matplotlib object that is still an open question.
AxesSubplot (I think) is just another way to get at the Axes in matplotlib. set_xticklabels() is part of the matplotlib object oriented interface (on axes). So, if you were using something like pylab, you might use xticks(ticks, labels), but instead here you have to separate it into different calls ax.set_xticks(ticks), ax.set_xticklabels(labels). (where ax is an Axes object).
Let's say you only want to set ticks at 650 and 700. You could do the following:
ticks = labels = [650, 700]
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score')
p.set_xticks(ticks)
p.set_xticklabels(labels)
Similarly, you can use set_xlim and set_ylim to do the equivalent of xlim() and ylim() in plt.

How to extract data from matplotlib plot

I have a wxPython program which reads from different datasets, performs various types of simple on-the-fly analysis on the data and plots various combinations of the datasets to matplotlib canvas. I would like to have the opportunity to dump currently plotted data to file for more sophisticated analysis later on.
The question is: are there any methods in matplotlib that allow access to the data currently plotted in matplotlib.Figure?
Jakub is right about modifying the Python script to write out the data directly from the source from which it was sent into the plot; that's the way I'd prefer to do this. But for reference, if you do need to get data out of a plot, I think this should do it
gca().get_lines()[n].get_xydata()
Alternatively you can get the x and y data sets separately:
line = gca().get_lines()[n]
xd = line.get_xdata()
yd = line.get_ydata()
The matplotlib.pyplot.gca can be used to extract data from matplotlib plots. Here is a simple example:
import matplotlib.pyplot as plt
plt.plot([1,2,3],[4,5,6])
ax = plt.gca()
line = ax.lines[0]
line.get_xydata()
On running this, you will see 2 outputs - the plot and the data:
array([[1., 4.],
[2., 5.],
[3., 6.]])
You can also get the x data and y data seperately.
On running line.get_xdata(), you will get:
array([1, 2, 3])
And on running line.get_ydata(), you will get:
array([4, 5, 6])
Note: gca stands for get current axis
To sum up, for future reference:
If plotting with plt.plot() or plt.stem() or plt.step() you can get a list of Line2D objects with:
ax = plt.gca() # to get the axis
ax.get_lines()
For plt.pie(), plt.bar() or plt.barh() you can get a list of wedge or rectangle objects with:
ax = plt.gca() # to get the axis
ax.patches()
Then, depending on the situation you can get the data by running get_xdata(), get_ydata() (see Line2D) for more info.
or i.e get_height() for a bar plot (see Rectangle) for more info.
In general for all basic plotting functions, you can find what you are looking for by running ax.get_children()
that returns a list of the children Artists (the base class the includes all of the figure's elements).
Its Python, so you can modify the source script directly so the data is dumped before it is plotted
I know this is an old question, but I feel there is a solution better than the ones offered here so I decided to write this answer.
You can use unittest.mock.patch to temporarily replace the matplotlib.axes.Axes.plot function:
from unittest.mock import patch
def save_data(self, *args, **kwargs):
# save the data that was passed into the plot function
print(args)
with patch('matplotlib.axes.Axes.plot', new=save_data):
# some code that will eventually plot data
a_function_that_plots()
Once you exit the with block, Axes.plot will resume normal behavior.

Categories