Related
How can I change the thickness of my errorbars in Altair?
When I tried to change the color to black, it turns out than it should be.
Why is that?
My expected output is
But the actual output is
Here is my code
import altair as alt
alt.Chart(df2).mark_errorbar(color='black').encode(
alt.X(
"quantile95",
axis=alt.Axis(title="lower",tickMinStep=5),
scale=alt.Scale(domain=[0, 55])
)
).properties(width=450, height=20)
Since no data was provided, I used the data from the official reference to create the expected output.
One is a box plot and the other is an overlaid strip plot.
import altair as alt
from vega_datasets import data
source = data.barley()
error_bar_s = alt.Chart(source).mark_boxplot(color='black').encode(
x=alt.X("yield:Q", axis=alt.Axis(title="lower", tickMinStep=5), scale=alt.Scale(domain=[0, source['yield'].max()+10])),
y=alt.Y('variety:N')
).properties(width=450, height=450)
strip = alt.Chart(source).mark_tick(color='red').encode(
x='yield:Q',
y='variety:N'
)
error_bar_s + strip
Using Iris dataset as example this will be the necessary code
from vega_datasets import data # pip install vega_datasets
import altair as alt
dataset = data.iris()
ticks = alt.Chart(dataset).mark_tick().encode(
alt.X('sepalWidth', scale=alt.Scale(zero=False)),
)
mean_point = alt.Chart(dataset).mark_circle().encode(
x=alt.X('mean(sepalWidth)'),
color=alt.value("black"),
size=alt.value(50)
)
errorbar = alt.Chart(dataset).mark_errorbar(extent='stdev', ticks=True).encode(
x=alt.X('sepalWidth'),
color=alt.value("black"),
strokeWidth=alt.value(5)
)
errorbar + mean_point + ticks
Output:
Some key remarks:
If ticks are not needed, remove the ticks parameter in the mark_errobar
The "thickness" of the CI is the strokeWidth parameter in the encode of the mark_errorbar
The extend parameter could be "ci", "stdev", "stderr", "iqr". See docs
Consider the following example:
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
temp_max = alt.Chart(df).mark_line(color='blue').encode(
x='yearmonth(date):T',
y='max(temp_max)',
)
temp_min = alt.Chart(df).mark_line(color='red').encode(
x='yearmonth(date):T',
y='max(temp_min)',
)
temp_max + temp_min
In the resulting chart, I would like to add a legend that shows, that the blue line shows the maximum temperature and the red line the minimum temperature. What would be the easiest way to achieve this?
I saw (e.g. in the solution to this question: Labelling Layered Charts in Altair (Python)) that altair only adds a legend if in the encoding, you set the color or size or so, usually with a categorical column, but that is not possible here because I'm plotting the whole column and the label should be the column name (which is now shown in the y-axis label).
I would do a fold transform such that the variables could be encoded correctly.
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
alt.Chart(df).mark_line().transform_fold(
fold=['temp_max', 'temp_min'],
as_=['variable', 'value']
).encode(
x='yearmonth(date):T',
y='max(value):Q',
color='variable:N'
)
If you layer two charts with the same columns and tell them to color by the same one, the legend will appear. Don't know is this helps but..
For example, i had:
Range, Amount, Type
0_5, 3, 'Private'
5_10, 5, 'Private'
Range, Amount, Type
0_5, 3, 'Public'
5_10, 5, 'Public'
and I charted both with 'color = 'Type'' and said alt.layer(chart1, chart2) and it showed me a proper legend
I am trying to create a plot for two categories in a subplot. 1st column represent category FF and 2nd column represent category RF in the subplot.
The x-axis is always time and y-axis is remaining columns. In other words, it is a plot with one column vs rest.
1st category and 2nd category always have same column names just only the values differs.
I tried to generate the plot in a for loop but the problem is plotly treats each column name as distinct and thereby it represents the lines in different color for y-axis with same name. As a consequence, in legend also an entry is created.
For example, in first row Time vs price2010 I want both subplot FF and RF to be represented in same color (say blue) and a single entry in legend.
I tried adding legendgroup in go.Scatter but it doesn't help.
import pandas as pd
from pandas import DataFrame
from plotly import tools
from plotly.offline import init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly.subplots import make_subplots
CarA = {'Time': [10,20,30,40 ],
'Price2010': [22000,26000,27000,35000],
'Price2011': [23000,27000,28000,36000],
'Price2012': [24000,28000,29000,37000],
'Price2013': [25000,29000,30000,38000],
'Price2014': [26000,30000,31000,39000],
'Price2015': [27000,31000,32000,40000],
'Price2016': [28000,32000,33000,41000]
}
ff = DataFrame(CarA)
CarB = {'Time': [8,18,28,38 ],
'Price2010': [19000,20000,21000,22000],
'Price2011': [20000,21000,22000,23000],
'Price2012': [21000,22000,23000,24000],
'Price2013': [22000,23000,24000,25000],
'Price2014': [23000,24000,25000,26000],
'Price2015': [24000,25000,26000,27000],
'Price2016': [25000,26000,27000,28000]
}
rf = DataFrame(CarB)
Type = {
'FF' : ff,
'RF' : rf
}
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.3/len(ff.columns))
labels = ff.columns[1:]
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params)
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=2024, width=1024,title_text="Car Analysis")
iplot(fig)
It might not be a good solution, but so far I can able to come up only with this hack.
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.2/len(ff.columns))
labels = ff.columns[1:]
colors = [ '#a60000', '#f29979', '#d98d36', '#735c00', '#778c23', '#185900', '#00a66f']
legend = True
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params, showlegend=legend, marker=dict(
color=colors[indexP]))
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=1068, width=1024,title_text="Car Analysis")
legend = False
If you combine your data into a single tidy data frame, you can use a simple Plotly Express call to make the chart: px.line() with color, facet_row and facet_col
For some reason, the Y-axis while plotting with altair seems to be inverted (would expect values to go from lower (bottom) to higher (top) of the plot). Also, I would like to be able to change the ticks frequency. With older versions I could use ticks=n_ticks but it seems now this argument can take only boolean.
import altair as alt
alt.renderers.enable('notebook')
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1)),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
And the figure:
I don't have your data, so difficult to write working code.
But here's an example of an inverted scale with additional ticks that is similar to the example scatter with tooltips example. See here for it in the vega editor.
import altair as alt
from vega_datasets import data
iris = data.iris()
alt.Chart(iris).mark_point().encode(
x='petalWidth',
y=alt.Y('petalLength', scale=alt.Scale(domain=[7,0]), axis=alt.Axis(tickCount=100)),
color='species'
).interactive()
This might work with your data:
eff_metals = pd.read_excel(filename, sheet_name='summary_eff_metals')
points = alt.Chart(eff_metals, height=250, width=400).mark_circle().encode(
x=alt.X('Temperature:Q',axis=alt.Axis(title='Temperature (°C)'),
scale=alt.Scale(zero=False, padding=50)),
y=alt.Y('Efficiency:N',axis=alt.Axis(title='Efficiency (%)'),
scale=alt.Scale(zero=False, padding=1, domain=[17,1])),
color=alt.Color('Element:N'),
)
text = points.mark_text(align='right', dx=0, dy=-5).encode(
text='Element:N'
)
chart = alt.layer(points, text, data=eff_metals,
width=600, height=300)
chart
However, I think it's possible that you've might just have the wrong type on your efficiency variable. You could try and replace 'Efficiency:N' with `'Efficiency:Q' and that might do it?
While it's possible to reverse the domain manually, that requires hardcoding the bounds.
Instead we can just pass Scale(reverse=True) to the axis encoding, e.g.:
from vega_datasets import data
alt.Chart(data.wheat().head()).mark_bar().encode(
x='wheat:Q',
y=alt.Y('year:O', scale=alt.Scale(reverse=True)),
)
Here it's been passed to alt.Y, so the years are inverted (left) vs the default y='year:O' (right):
I have been using matplotlib for quite some time now and it is great however, I want to switch to panda and my first attempt at it didn't go so well.
My data set looks like this:
sam,123,184,2.6,543
winter,124,284,2.6,541
summer,178,384,2.6,542
summer,165,484,2.6,544
winter,178,584,2.6,545
sam,112,684,2.6,546
zack,145,784,2.6,547
mike,110,984,2.6,548
etc.....
I want first to search the csv for anything with the name mike and create it own list. Now with this list I want to be able to do some math for example add sam[3] + winter[4] or sam[1]/10. The last part would be to plot it columns against each other.
Going through this page
http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table
The only thing I see is if I have a column header, however, I don't have any headers. I only know the position in a row of the values I want.
So my question is:
How do I create a bunch of list for each row (sam, winter, summer)
Is this method efficient if my csv has millions of data point?
Could I use matplotlib plotting to plot pandas dataframe?
ie :
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(211)
ax.plot(mike[1], winter[3], label='Mike vs Winter speed', color = 'red')
You can read a csv without headers:
data=pd.read_csv(filepath, header=None)
Columns will be numbered starting from 0.
Selecting and filtering:
all_summers = data[data[0]=='summer']
If you want to do some operations grouping by the first column, it will look like this:
data.groupby(0).sum()
data.groupby(0).count()
...
Selecting a row after grouping:
sums = data.groupby(0).sum()
sums.loc['sam']
Plotting example:
sums.plot()
import matplotlib.pyplot as plt
plt.show()
For more details about plotting, see: http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html
df = pd.read_csv(filepath, header=None)
mike = df[df[0]=='mike'].values.tolist()
winter = df[df[0]=='winter'].values.tolist()
Then you can plot those list as you wanted to above
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(211)
ax.plot(mike, winter, label='Mike vs Winter speed', color = 'red')