I am currently working on a Data Visualization project.
I want to plot multiple lines (about 200k) that represent travels from one Subway Station to all the others. This is, all the subway stations should be connected by a straight line.
The color of the line doesn't really matter (it could well be red, blue, etc.), but opacity is what matters the most. The bigger the number of travels between two random stations, the more opacity of that particular line; and vice versa.
I feel I am close to the desired output, but can't figure a way to do it properly.
The DataFrame I am using (df = pd.read_csv(...)) consists of a series of columns, namely: id_start_station, id_end_station, lat_start_station, long_start_station, lat_end_station, long_end_station, number_of_journeys.
I got to extract the coordinates by coding
lons = []
lons = np.empty(3 * len(df))
lons[::3] = df['long_start_station']
lons[1::3] = df['long_end_station']
lons[2::3] = None
lats = []
lats = np.empty(3 * len(df))
lats[::3] = df['lat_start_station']
lats[1::3] = df['lat_end_station']
lats[2::3] = None
I then started a figure by:
fig = go.Figure()
and then added a trace by:
fig.add_trace(go.Scattermapbox(
name='Journeys',
lat=lats,
lon=lons,
mode='lines',
line=dict(color='red', width=1),
opacity= ¿?, # PROBLEM IS HERE [1]
))
[1] So I tried a few different things to pass a opacity term:
I created a new tuple for the opacity of each trace, by:
opacity = []
opacity = np.empty(3 * len(df))
opacity [::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [1::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [2::3] = None
and passed it into [1], but this error came out:
ValueError:
Invalid value of type 'numpy.ndarray' received for the 'opacity' property of scattermapbox
The 'opacity' property is a number and may be specified as:
- An int or float in the interval [0, 1]
I then thought of passing the "opacity" term into the "color" term, by using rgba's property alpha, such as: rgba(255,0,0,0.5).
So I first created a "map" of all alpha parameters:
df['alpha'] = df['number_of_journeys'] / max(df['number_of_journeys'])
and then created a function to retrieve all the alpha parameters inside a specific color:
colors_with_opacity = []
def colors_with_opacity_func(df, empty_list):
for alpha in df['alpha']:
empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
empty_list.append(None)
colors_with_opacity_func(df, colors_with_opacity)
and passed that into the color atribute of the Scattermapbox, but got the following error:
ValueError:
Invalid value of type 'builtins.list' received for the 'color' property of scattermapbox.line
The 'color' property is a color and may be specified as:
- A hex string (e.g. '#ff0000')
- An rgb/rgba string (e.g. 'rgb(255,0,0)')
- An hsl/hsla string (e.g. 'hsl(0,100%,50%)')
- An hsv/hsva string (e.g. 'hsv(0,100%,100%)')
- A named CSS color:
aliceblue, antiquewhite, aqua, [...] , whitesmoke,
yellow, yellowgreen
Since it is a massive amount of lines, looping / iterating through traces will carry out performance issues.
Any help will be much appreciated. I can't figure a way to properly accomplish that.
Thank you, in advance.
EDIT 1 : NEW QUESTION ADDED
I add this question here below as I believe it can help others that are looking for this particular topic.
Following Rob's helpful answer, I managed to add multiple opacities, as specified previously.
However, some of my colleagues suggested a change that would improve the visualization of the map.
Now, instead of having multiple opacities (one for each trace, according to the value of the dataframe) I would also like to have multiple widths (according to the same value of the dataframe).
This is, following Rob's answer, I would need something like this:
BINS_FOR_OPACITY=10
opacity_a = np.geomspace(0.001,1, BINS_FOR_OPACITY)
BINS_FOR_WIDTH=10
width_a = np.geomspace(1,3, BINS_FOR_WIDTH)
fig = go.Figure()
# Note the double "for" statement that follows
for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_OPACITY, labels=opacity_a)):
for width, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_WIDTH, labels=width_a)):
fig.add_traces(
go.Scattermapbox(
name=f"{d['number_of_journeys'].mean():.2E}",
lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
line_width=width
line_color="blue",
opacity=opacity,
mode="lines+markers",
)
)
However, the above is clearly not working, as it is making much more traces than it should do (I really can't explain why, but I guess it might be because of the double loop forced by the two for statements).
It ocurred to me that some kind of solution could be hidding in the pd.cut part, as I would need something like a double cut, but couldn't find a way to properly doing it.
I also managed to create a Pandas series by:
widths = pd.cut(df.["size"], bins=BINS_FOR_WIDTH, labels=width_a)
and iterating over that series, but got the same result as before (an excess of traces).
To emphasize and clarify myself, I don't need to have only multiple opacities or multiple widths, but I need to have them both and at the same time, which is what's causing me some troubles.
Again, any help is deeply thanked.
opacity is per trace, for markers it can be done with color using rgba(a,b,c,d) but not for lines. (Same in straight scatter plots)
to demonstrate, I have used London Underground stations (filtered to reduce number of nodes). Plus gone to extra effort of formatting data as a CSV. JSON as source has nothing to do with solution
encoded to bin number_of_journeys for inclusion into a trace with a geometric progression used for calculating and opacity
this sample data set is generating 83k sample lines
import requests
import geopandas as gpd
import plotly.graph_objects as go
import itertools
import numpy as np
import pandas as pd
from pathlib import Path
# get geometry of london underground stations
gdf = gpd.GeoDataFrame.from_features(
requests.get(
"https://raw.githubusercontent.com/oobrien/vis/master/tube/data/tfl_stations.json"
).json()
)
# limit to zone 1 and stations that have larger number of lines going through them
gdf = gdf.loc[gdf["zone"].isin(["1","2","3","4","5","6"]) & gdf["lines"].apply(len).gt(0)].reset_index(
drop=True
).rename(columns={"id":"tfl_id", "name":"id"})
# wanna join all valid combinations of stations...
combis = np.array(list(itertools.combinations(gdf.index, 2)))
# generate dataframe of all combinations of stations
gdf_c = (
gdf.loc[combis[:, 0], ["geometry", "id"]]
.assign(right=combis[:, 1])
.merge(gdf.loc[:, ["geometry", "id"]], left_on="right", right_index=True, suffixes=("_start_station","_end_station"))
)
gdf_c["lat_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.y)
gdf_c["long_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.x)
gdf_c["lat_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.y)
gdf_c["long_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.x)
gdf_c = gdf_c.drop(
columns=[
"geometry_start_station",
"right",
"geometry_end_station",
]
).assign(number_of_journeys=np.random.randint(1,10**5,len(gdf_c)))
gdf_c
f = Path.cwd().joinpath("SO.csv")
gdf_c.to_csv(f, index=False)
# there's an requirement to start with a CSV even though no sample data has been provided, now we're starting with a CSV
df = pd.read_csv(f)
# makes use of ravel simpler...
df["none"] = None
# now it's simple to generate scattermapbox... a trace per required opacity
BINS=10
opacity_a = np.geomspace(0.001,1, BINS)
fig = go.Figure()
for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS, labels=opacity_a)):
fig.add_traces(
go.Scattermapbox(
name=f"{d['number_of_journeys'].mean():.2E}",
lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
line_color="blue",
opacity=opacity,
mode="lines+markers",
)
)
fig.update_layout(
mapbox={
"style": "carto-positron",
"center": {'lat': 51.520214996769255, 'lon': -0.097792388774743},
"zoom": 9,
},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
Related
I am just getting started with Holoviews. My questions are on customizing histograms, but also I am sharing a complete example as it may be helpful for other newbies to look at, since the documentation for Holoviews is very thorough but can be overwhelming.
I have a number of time series in text files loaded as Pandas DataFrames where:
each file is for a specific location
at each location about 10 time series were collected, each with about 15,000 points
I am building a small interactive tool where a Selector can be used to choose the location / DataFrame, and then another Selector to pick 3 of 10 of the time series to be plotted together.
My goal is to allow linked zooms (both x and y scales). The questions and code will focus on this aspect of the tool.
I cannot share the actual data I am using, unfortunately, as it is proprietary, but I have created 3 random walks with specific data ranges that are consistent with the actual data.
## preliminaries ##
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.util.transform import dim
from holoviews.selection import link_selections
from holoviews import opts
from holoviews.operation.datashader import shade, rasterize
import hvplot.pandas
hv.extension('bokeh', width=100)
## create random walks (one location) ##
data_df = pd.DataFrame()
npoints=15000
np.random.seed(71)
x = np.arange(npoints)
y1 = 1300+2.5*np.random.randn(npoints).cumsum()
y2 = 1500+2*np.random.randn(npoints).cumsum()
y3 = 3+np.random.randn(npoints).cumsum()
data_df.loc[:,'x'] = x
data_df.loc[:,'rand1'] = y1
data_df.loc[:,'rand2'] = y2
data_df.loc[:,'rand3'] = y3
This first block is just to plot the data and show how, by design, one of the random walks have different range from the other two:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],value_label='y',width=800,height=400)
As a result, although hvplot subplots work out of the box (for linking), ranges are different so the scaling is not quite there:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],
value_label='y',subplots=True,width=800,height=200).cols(1)
So, my first attempt was to adapt the Python-based Points example from Linked brushing in the documentation:
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
hv.Points(data_df, dim).opts(color=c)
for c, dim in zip(colors, [['x', d] for d in dims])
])
link_selections(layout).opts(opts.Points(width=1200, height=300)).cols(1)
That is already an amazing result for a 20 minutes effort!
However, what I would really like is to plot a curve rather than points, and also see a histogram, so I adapted the comprehension syntax to work with Curve (after reading the documentation pages Applying customization, and Composing elements):
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([hv.Curve(data_df,'x',dim).opts(height=300,width=1200,
color=c).hist(dim) for c,
dim in zip(colors,[d for d in dims])])
link_selections(layout).cols(1)
Which is almost exactly what I want. But I still struggle with the different layers of opts syntax.
Question 1: with the comprehension from the last code block, how would I make the histogram share color with the curves?
Now, suppose I want to rasterize the plots (although I do not think is quite yet necessary with 15,000 points like in this case), I tried to adapt the first example with Points:
cmaps = ['Blues', 'Greens', 'Reds']
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
shade(rasterize(hv.Points(data_df, dims),
cmap=c)).opts(width=1200, height = 400).hist(dims[1])
for c, dims in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
This is a decent start, but again I struggle with the options/customization.
Question 2: in the above cod block, how would I pass the colormaps (it does not work as it is now), and how do I make the histogram reflect data values as in the previous case (and also have the right colormap)?
Thank you!
Sander answered how to color the histogram, but for the other question about coloring the datashaded plot, Datashader renders your data with a colormap rather than a single color, so the parameter is named cmap rather than color. So you were correct to use cmap in the datashaded case, but (a) cmap is actually a parameter to shade (which does the colormapping of the output of rasterize), and (b) you don't really need shade, as you can let Bokeh do the colormapping in most cases nowadays, in which case cmap is an option rather than an argument. Example:
from bokeh.palettes import Blues, Greens, Reds
cmaps = [Blues[256][200:], Greens[256][200:], Reds[256][200:]]
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
rasterize(hv.Points(data_df, ds)).opts(cmap=c,width=1200, height = 400).hist(dims[1])
for c, ds in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
To answer your first question to make the histogram share the color of the curve, I've added .opts(opts.Histogram(color=c)) to your code.
When you have a layout you can specify the options of an element inside the layout like that.
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout(
[hv.Curve(data_df,'x',dim)
.opts(height=300,width=600, color=c)
.hist(dim)
.opts(opts.Histogram(color=c))
for c, dim in zip(colors,[d for d in dims])]
)
link_selections(layout).cols(1)
I want to add a fill colour between the black and blue line on my Plotly chart. I am aware this can be accomplished already with Plotly but I am not sure how to fill the chart with two colours based on conditions.
The chart with the blue background is my Plotly chart. I want to make it look like the chart with the white background. (Ignore the red and green bars on the white chart)
The conditions I want it to pass is:
Fill the area between the two lines GREEN, if the black line is above the blue line.
Fill the area between the two lines RED, if the black line is below the blue line.
How can this be done with Plotly? If this is not possible with Plotly can it be accomplished with other graphing tools that work with Python.
For a number of reasons (that I'm willing to explain further if you're interested) the best approach seems to be to add two traces to a go.Figure() object for each time your averages cross eachother, and then define the fill using fill='tonexty' for the second trace using:
for df in dfs:
fig.add_traces(go.Scatter(x=df.index, y = df.ma1,
line = dict(color='rgba(0,0,0,0)')))
fig.add_traces(go.Scatter(x=df.index, y = df.ma2,
line = dict(color='rgba(0,0,0,0)'),
fill='tonexty',
fillcolor = fillcol(df['label'].iloc[0])))
fillcol is a simple custom function described in the full snippet below. And I've used the approach described in How to split a dataframe each time a string value changes in a column? to produce the necessary splits in the dataframe each time your averages cross eachother.
Plot
Complete code:
import plotly.graph_objects as go
import numpy as np
import pandas as pd
from datetime import datetime
pd.options.plotting.backend = "plotly"
# sample data
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
df.index = df.Date
df = df[['AAPL.Close', 'mavg']]
df['mavg2'] = df['AAPL.Close'].rolling(window=50).mean()
df.columns = ['y', 'ma1', 'ma2']
df=df.tail(250).dropna()
df1 = df.copy()
# split data into chunks where averages cross each other
df['label'] = np.where(df['ma1']>df['ma2'], 1, 0)
df['group'] = df['label'].ne(df['label'].shift()).cumsum()
df = df.groupby('group')
dfs = []
for name, data in df:
dfs.append(data)
# custom function to set fill color
def fillcol(label):
if label >= 1:
return 'rgba(0,250,0,0.4)'
else:
return 'rgba(250,0,0,0.4)'
fig = go.Figure()
for df in dfs:
fig.add_traces(go.Scatter(x=df.index, y = df.ma1,
line = dict(color='rgba(0,0,0,0)')))
fig.add_traces(go.Scatter(x=df.index, y = df.ma2,
line = dict(color='rgba(0,0,0,0)'),
fill='tonexty',
fillcolor = fillcol(df['label'].iloc[0])))
# include averages
fig.add_traces(go.Scatter(x=df1.index, y = df1.ma1,
line = dict(color = 'blue', width=1)))
fig.add_traces(go.Scatter(x=df1.index, y = df1.ma2,
line = dict(color = 'red', width=1)))
# include main time-series
fig.add_traces(go.Scatter(x=df1.index, y = df1.y,
line = dict(color = 'black', width=2)))
fig.update_layout(showlegend=False)
fig.show()
[This is a javascript solution of the problem]
I went for a completely different approach to create climate charts.
I have used a set of functions that check if two traces intersect each other. For each intersection, the function will take all points of the traces until the intersection point to create seperate polygons and colour them in.
This function is recursively. If there is no intersection, there will only be one polygon, if there is one intersection, there are two polygons, if there are two intersections, there are three polygons, etc. These polygons are then added to the chart.
An introduction to shapes in plotly is given here:
https://plotly.com/javascript/shapes/
I used an existing function for finding intersection points from here: https://stackoverflow.com/a/38977789/3832675
I wrote my own function for creating the polygon strings that create the path strings necessary to the polygons. Depending on which line is above the other (which can be realised using a simple comparison), the variable "colour" is either green or red.
function draw_and_colour_in_polygon(temperature_values, precipitation_values, x_values) {
var path_string = '';
for (var i = 0; i < x_values.length; i++) {
path_string += ' L ' + x_values[i] + ', '+ temperature_values[i];
}
for (var i = precipitation_values.length-1; i >= 0; i--) {
path_string += ' L ' + x_values[i] + ', ' + precipitation_values[i];
}
path_string += ' Z';
path_string = path_string.replace(/^.{2}/g, 'M ');
if (temperature_values[0] >= precipitation_values[0] && temperature_values[1] >= precipitation_values[1]) {
var colour = 'rgba(255, 255, 0, 0.5)';
}
else {
var colour = 'rgba(65, 105, 225, 0.5)';
}
return {
path: path_string,
polygon_colour: colour,
};
};
All of this put together looks like this:
In this case, we have three seperate polygons added to the chart. They are either blue or yellow depending on whether temperature value is higher than the precipiation value or vice versa. Please bear in mind that the polygons are composed of y values of both traces. My two traces don't use the same y-axis and therefore a transformation function has to be applied to one of the traces to harmonise the height values before the polygon string can be composed.
I can't share all the code as I have also added bits where the y-axes are scaled at higher values which would add quite some complexity to the answer, but I hope that you get the idea.
I have a solution based on maplotlib's fill_between:
ax = df[['rate1', 'rate2']].plot()
ax.fill_between(
df.index, df['rate1'], df[['rate1', 'rate2']].min(1),
color='red', alpha=0.1
);
ax.fill_between(
df.index, df['rate1'], df[['rate1', 'rate2']].max(1),
color='green', alpha=0.1
);
Cross-post: https://discourse.bokeh.org/t/add-calculated-horizontal-lines-corresponding-to-categories-on-the-x-axis/5544
I would like to duplicate this plot in Python:
Here is my attempt, using pandas and bokeh:
Imports:
import pandas as pd
from bokeh.io import output_notebook, show, reset_output
from bokeh.palettes import Spectral5, Turbo256
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
from bokeh.models import Band, Span, FactorRange, ColumnDataSource
Create data:
fruits = ['Apples', 'Pears']
years = ['2015', '2016']
data = {'fruit' : fruits,
'2015' : [2, 1],
'2016' : [5, 3]}
fruit_df = pd.DataFrame(data).set_index("fruit")
tidy_df = (pd.DataFrame(data)
.melt(id_vars=["fruit"], var_name="year")
.assign(fruit_year=lambda df: list(zip(df['fruit'], df['year'])))
.set_index('fruit_year'))
Create bokeh plot:
p = figure(x_range=FactorRange(factors=tidy_df.index.unique()),
plot_height=400,
plot_width=400,
tooltips=[('Fruit', '#fruit'), # first string is user-defined; second string must refer to a column
('Year', '#year'),
('Value', '#value')])
cds = ColumnDataSource(tidy_df)
index_cmap = factor_cmap("fruit",
Spectral5[:2],
factors=sorted(tidy_df["fruit"].unique())) # this is a reference back to the dataframe
p.circle(x='fruit_year',
y='value',
size=20,
source=cds,
fill_color=index_cmap,
line_color=None,
)
# how do I add a median just to one categorical section?
median = Span(location=tidy_df.loc[tidy_df["fruit"] == "Apples", "value"].median(), # median value for Apples
#dimension='height',
line_color='red',
line_dash='dashed',
line_width=1.0
)
p.add_layout(median)
# how do I add this standard deviation(ish) band to just the Apples or Pears section?
band = Band(
base='fruit_year',
lower=2,
upper=4,
source=cds,
)
p.add_layout(band)
show(p)
Output:
Am I up against this issue? https://github.com/bokeh/bokeh/issues/8592
Is there any other data visualization library for Python that can accomplish this? Altair, Holoviews, Matplotlib, Plotly... ?
Band is a connected area, but your image of the desired output has two disconnected areas. Meaning, you actually need two bands. Take a look at the example here to better understand bands: https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#bands
By using Band(base='fruit_year', lower=2, upper=4, source=cds) you ask Bokeh to plot a band where for each value of fruit_year, the lower coordinate will be 2 and the upper coordinate will be 4. Which is exactly what you see on your Bokeh plot.
A bit unrelated but still a mistake - notice how your X axis is different from what you wanted. You have to specify the major category first, so replace list(zip(df['fruit'], df['year'])) with list(zip(df['year'], df['fruit'])).
Now, to the "how to" part. Since you need two separate bands, you cannot provide them with the same data source. The way to do it would be to have two extra data sources - one for each band. It ends up being something like this:
for year, sd in [('2015', 0.3), ('2016', 0.5)]:
b_df = (tidy_df[tidy_df['year'] == year]
.drop(columns=['year', 'fruit'])
.assign(lower=lambda df: df['value'].min() - sd,
upper=lambda df: df['value'].max() + sd)
.drop(columns='value'))
p.add_layout(Band(base='fruit_year', lower='lower', upper='upper',
source=ColumnDataSource(b_df)))
There are two issues left however. The first one is a trivial one - the automatic Y range (an instance of DataRange1d class by default) will not take the bands' heights into account. So the bands can easily go out of bounds and be cropped by the plot. The solution here is to use manual ranging that takes the SD values into account.
The second issue is that the width of band is limited to the X range factors, meaning that the circles will be partially outside of the band. This one is not that easy to fix. Usually a solution would be to use a transform to just shift the coordinates a bit at the edges. But since this is a categorical axis, we cannot do it. One possible solution here is to create a custom Band model that adds an offset:
class MyBand(Band):
# language=TypeScript
__implementation__ = """
import {Band, BandView} from "models/annotations/band"
export class MyBandView extends BandView {
protected _map_data(): void {
super._map_data()
const base_sx = this.model.dimension == 'height' ? this._lower_sx : this._lower_sy
if (base_sx.length > 1) {
const offset = (base_sx[1] - base_sx[0]) / 2
base_sx[0] -= offset
base_sx[base_sx.length - 1] += offset
}
}
}
export class MyBand extends Band {
__view_type__: MyBandView
static init_MyBand(): void {
this.prototype.default_view = MyBandView
}
}
"""
Just replace Band with MyBand in the code above and it should work. One caveat - you will need to have Node.js installed and the startup time will be longer for a second or two because the custom model code needs compilation. Another caveat - the custom model code knows about internals of BokehJS. Meaning, that while it's working with Bokeh 2.0.2 I can't guarantee that it will work with any other Bokeh version.
I'm looking for a way to specify an X-Y offset to plotted data points. I'm just getting into Altair, so please bear with me.
The situation: I have a dataset recording daily measurements for 30 people. Every person can register several different types of measurements every day.
Example dataset & plot, with 2 people and 2 measurement types:
import pandas as pd
df = pd.DataFrame.from_dict({"date": pd.to_datetime(pd.date_range("2019-12-01", periods=5).repeat(4)),
"person": pd.np.tile(["Bob", "Amy"], 10),
"measurement_type": pd.np.tile(["score_a", "score_a", "score_b", "score_b"], 5),
"value": 20.0*np.random.random(size=20)})
import altair as alt
alt.Chart(df, width=600, height=100) \
.mark_circle(size=150) \
.encode(x = "date",
y = "person",
color = alt.Color("value"))
This gives me this graph:
In the example above, the 2 measurement types are plotted on top of each other. I would like to add an offset to the circles depending on the "measurement_type" column, so that they can all be made visible around the date-person location in the graph.
Here's a mockup of what I want to achieve:
I've been searching the docs but haven't figured out how to do this - been experimenting with the "stack" option, with the dx and dy options, ...
I have a feeling this should just be another encoding channel (offset or alike), but that doesn't exist.
Can anyone point me in the right direction?
There is currently no concept of an offset encoding in Altair, so the best approach to this will be to combine a column encoding with a y encoding, similar to the Grouped Bar Chart example in Altair's documentation:
alt.Chart(df,
width=600, height=100
).mark_circle(
size=150
).encode(
x = "date",
row='person',
y = "measurement_type",
color = alt.Color("value")
)
You can then fine-tune the look of the result using standard chart configuration settings:
alt.Chart(df,
width=600, height=alt.Step(25)
).mark_circle(
size=150
).encode(
x = "date",
row='person',
y = alt.Y("measurement_type", title=None),
color = alt.Color("value")
).configure_facet(
spacing=10
).configure_view(
strokeOpacity=0
)
Well I don't know what result you are getting up until know, but maybe write a function whith parameters likedef chart(DotsOnXAxis, FirstDotsOnYAxis, SecondDotsOnYAxis, OffsetAmount)
and then put those variables on the right place.
If you want an offset with the dots maybe put in a system like: SecondDotsOnYAxis = FirstDotsOnYAxis + OffsetAmount
I am using Bokeh in an experiment to plot data in realtime and the library provides a convenient way to do that.
Here a snippet of my code to accomplish this tasks:
# do the imports
import pandas as pd
import numpy as np
import time
from bokeh.plotting import *
from bokeh.models import ColumnDataSource
# here is simulated fake time series data
ts = pd.date_range("8:00", "10:00", freq="5S")
ts.name = 'timestamp'
ms = pd.Series(np.arange(0, len(ts)), index=ts)
ms.name = 'measurement'
data = pd.DataFrame(ms)
data['state'] = np.random.choice(3, len(ts))
data['observation'] = np.random.choice(2, len(ts))
data.reset_index(inplace=True)
data.head()
This is how the data looks like.
Next I have used the following snipped to push the data to the server in real time
output_server("observation")
p = figure(plot_width=800, plot_height=400, x_axis_type="datetime")
x = np.array(data.head(2).timestamp, dtype=np.datetime64)
y = np.array(data.head(2).observation)
p.diamond_cross(x,y, size=30, fill_color=None, line_width=2, name='observation')
show(p)
renderer = p.select(dict(name="observation"))[0]
ds = renderer.data_source
for mes in range(len(data)):
x = np.append(x, np.datetime64(data.loc[mes].timestamp))
y = np.append(y, np.int64(data.loc[mes].observation))
ds.data["x"] = x
ds.data["y"] = y
ds._dirty = True
cursession().store_objects(ds)
time.sleep(.1)
This produces a very nice result, however I need to change the color of each data point conditioned on a value.
In this case, the condition is the state variable which takes three values -- 0, 1, and 2. So my data should be able to reflect that.
I have spent hours trying to figure it out (admittedly I an very new to Bokeh) and any help will be greatly appreciated.
When you push the data, you have to separate the groups by desired color, and then supply the corresponding colors as a palette. There's a longer discussion with several variations at https://github.com/bokeh/bokeh/issues/1967, such as the simple boteh.charts dot example bryevdv posted on 28 Feb:
cat = ['foo', 'bar', 'baz']
xyvalues=dict(x=[1,4,5], y=[2,7,3], z=[3,4,5])
dots = Dot(
xyvalues, cat=cat, title="Data",
ylabel='FP Rate', xlabel='Vendors',
legend=False, palette=["red", "green", "blue"])
show(dots)
Please remember to read and follow the posting guidelines at https://stackoverflow.com/help/how-to-ask; I found this and several other potentially useful hits with my first search attempt, "Bokeh 'change color' plot". If none of these solve your problem, you need to differentiate what you're doing from the answers already out there.