Suppose I have a dataset with 100k rows (1000 different times, 100 different series, an observation for each, and auxilliary information). I'd like to create something like the following:
(1) first panel of plot has time on x axis, and average of the different series (and standard error) on y axis.
(2) based off the time slice (vertical line) we hover over in panel 1, display a (potentially down sampled) scatter plot of auxilliary information versus the series value at that time slice.
I've looked into a few options for this: (1) matplotlib + ipywidgets doesn't seem to handle it unless you explicitly select points via a slider. This also doesn't translate well to html exporting. This is not ideal, but is potentially workable. (2) altair - this library is pretty sleek, but from my understanding, I need to give it the whole dataset for it to handle the interactions, but it also can't handle more than 5kish data points. This would preclude my use case, correct?
Any suggestions as to how to proceed? Is what I'm asking impossible in the current state of things?
You can work with datasets larger than 5k rows in Altair, as specified in this section of the docs.
One of the most convenient solutions in my opinion is to install altair_data_server and then add alt.data_transformers.enable('data_server') on the top of your notebooks and scripts. This server will provide the data to Altair as long as your Python process is running so there is no need to include all the data as part of the created chart specification, which means that the 5k error will be avoided. The main drawback is that it wont work if you export to a standalone HTML because you rely on being in an environment where the server Python process is running.
I have a collection of values and labels that I'd like to include as a summary table within a matplotlib plot.
My table looks similar to this:
I'm currently using the matplotlib.pyplot text method applied a previously created axis object (ie, ax.text()) to specify locations for each of the entries and labels, but it's incredibly tedious and imprecise.
Imagine there's a more efficient way to do this, but haven't found one despite being somewhat familiar with a few of the data visualization libraries in python (eg, seaborn, plotly etc).
To be clear, the matplotlib table method, or answers here, don't address my question. Looking for an option with cleaner-looking output.
Altair offers lovely feature to facet charts using facet method. For example, following dataset visualizes nicely:
print(df[['Year', 'Profile', 'Saison', 'Pos']].to_csv())
,Year,Profile,Saison,Pos
0,2017,6.0,Sommer,VL
1,2017,6.0,Winter,VL
13,2017,6.0,Winter,HL
12,2017,6.0,Sommer,HL
18,2017,6.0,Sommer,HR
6,2017,6.0,Sommer,VR
7,2017,6.0,Winter,VR
19,2017,6.0,Winter,HR
14,2018,5.5,Winter,HL
8,2018,5.5,Winter,VR
15,2018,5.5,Sommer,HL
20,2018,4.3,Winter,HR
21,2018,5.0,Sommer,HR
3,2018,5.5,Sommer,VL
2,2018,6.2,Winter,VL
9,2018,4.5,Sommer,VR
17,2019,4.5,Sommer,HL
11,2019,4.2,Sommer,VR
22,2019,3.5,Winter,HR
10,2019,5.28,Winter,VR
5,2019,4.6,Sommer,VL
4,2019,4.9,Winter,VL
16,2019,4.0,Winter,HL
23,2019,4.5,Sommer,HR
with the following command:
alt.Chart(df).mark_bar().encode(x='Year:O', y='Profile:Q').facet(row='Saison:N', column='Pos:N')
But, as you can seem I have still a lot of place horizontally and would like to use it by rearranging Winter plot right next to the Summer plot:
I understand that I already used column grid to facet over attribute Pos, but visually for me Winter and Sommer plots are two separate plots (just like here), which I'd like to place side by side.
I tried to create two different charts in the same cell and using html emit them side by side, but in Jupyter environment there is a limitation on just one Altair/Vega plot per cell.
Is there any method I can use to arrange these charts horizontally?
In Altair, there is no good way to do this, because faceted charts cannot be nested according to the Vega-Lite schema. However, the Vega-Lite renderer actually does handle this in some cases, despite it technically being disallowed by the schema.
So you can hack it by doing something like this:
chart = alt.Chart(df).mark_bar().encode(
x='Year:O',
y='Profile:Q'
).facet('Saison:N')
spec = alt.FacetChart(
data=df,
spec=chart,
facet=alt.Facet('Pos:N')
).to_json(validate=False)
print(spec)
The resulting spec can be pasted by hand into http://vega.github.io/editor to reveal this (vega editor link):
You'll even notice that the vega editor flags parts of the spec as invalid. This is admittedly not the most satisfying answer, but it sort of works.
Hopefully in the future the Vega-Lite schema will add actual support for nested facets, so they can be used more directly from Altair.
I'm preparing a set of reports using open source ReportLab. The reports contain a number of charts. Everything works well so far.
I've been asked to take a (working) bar chart that shows two series of data and overlay a fitted curve for each series.
I can see how I could overlay a segmented line on the bar graph by creating both a line chart and bar chart in the same ReportLab drawing. I can't find any reference for fitted curves in ReportLab, however.
Does anyone have any insight into plotting a fitted curve to a series of data in ReportLab or, failing that, a suggestion about how to accomplish this task (I'm thinking that chart would need to be produced in matplotlib instead)?
I would recommend using MatPlotLib. This is exactly the sort of thing it's designed to handle and it will be much easier than trying to piece together something in ReportLab alone, especially since you'll have to do all the calculation of the line on your own and figure out the details of how to draw it in just the right place. MatPlotLib integrates easily with ReportLab; I've used the combination several times with great results.
for a while I've been trying to come up with a good way to graphically represent a data series along with its estimated error.
Recently I saw some graphs where the data was plotted as a line, with a background 'ribbon' filling the area between the lines plotting data +/- sigma.
Is there a name for this type of graph, and is there any python toolkit which has the capability to make such plots?
A simple way to fake it with matplotlib would also be useful - right now I'm just plotting three lines, but I don't know how to fill the area between them.
I would use the fill_between method. Look at the Our Favorite Recipes section of the manual for matplotlib for some good examples. They have one that looks like this:
and another that looks like this: