How to set starting zoom level in EsriImagery with datashader and bokeh? - python

I want to project a map with its starting position like this
The current output that I get is like this
import holoviews as hv
from geoviews.tile_sources import EsriImagery
from holoviews.operation.datashader import datashade, dynspread
import datashader as ds
import colorcet as cc
hv.extension('bokeh', 'matplotlib')
c = df.loc[(df['dropoff_latitude'] >= 40.5) &
(df['dropoff_latitude'] <= 41) &
(df['dropoff_longitude'] >= -74.1) &
(df['dropoff_longitude'] <= -73.7)]
map_tiles = EsriImagery().opts(alpha=0.5, width=900, height=480, bgcolor='black')
points = hv.Points(ds.utils.lnglat_to_meters(c['dropoff_longitude'], c['dropoff_latitude']))
taxi_trips = datashade(points, dynamic = True, x_sampling=0.1, y_sampling=0.1, cmap=cc.fire, height=1000, width=1000)
map_tiles * taxi_trips
I tried to set a zoom_level or xrange, yrange in EsriImagery opts, but there are no such parameters. The method itself also has no documentation. And I couldn't find the documentation regrading this online too. (I could be looking at the wrong place.)

There are two ways to do this:
Option 1 -- dircet input
Set your wanted values using the parameter x_range and y_range in datashade(...).
taxi_trips = datashade(points, x_range=(-8250000,-8200000))
Option 2 -- indirect input
If you don't know the needed values and you want to play around a bit, you can use this workaround.
The existing figure object has a Range1d object, and this has a start and end point. This can be printed and set by a user.
This code starts with the last line of your example.
from bokeh.plotting import show
fig = hv.render(map_tiles * taxi_trips)
fig.x_range.start = -8250000
fig.x_range.end = -8200000
# fig.x_range.reset_start = -8250000
# fig.x_range.reset_end = -8200000
# the same for the y-axis
show(fig)
Here you have to get the bokeh (underlying package) figure and set your values. This values looks a bit odd and you maybe have to play a bit with it.
Output for both options
Here is the changed output.
I hope this works for you. Good luke.

Related

Accessing (the right) data when using holoviews/bokeh

I am having difficulties accessing (the right) data when using holoviews/bokeh, either for connected plots showing a different aspect of the dataset, or just customising a plot with dynamic access to the data as plotted (say a tooltip).
TLDR: How to add a projection plot of my dataset (different set of dimensions and linked to main plot, like a marginal distribution but, you know, not restricted to histogram or distribution) and probably with a similar solution a related question I asked here on SO
Let me exemplify (straight from a ipynb, should be quite reproducible):
import numpy as np
import random, pandas as pd
import bokeh
import datashader as ds
import holoviews as hv
from holoviews import opts
from holoviews.operation.datashader import datashade, shade, dynspread, spread, rasterize
hv.extension('bokeh')
With imports set up, let's create a dataset (N target 10e12 ;) to use with datashader. Beside the key dimensions, I really need some value dimensions (here z and z2).
import numpy as np
import pandas as pd
N = int(10e6)
x_r = (0,100)
y_r = (100,2000)
z_r = (0,10e8)
x = np.random.randint(x_r[0]*1000,x_r[1]*1000,size=(N, 1))
y = np.random.randint(y_r[0]*1000,y_r[1]*1000,size=(N, 1))
z = np.random.randint(z_r[0]*1000,z_r[1]*1000,size=(N, 1))
z2 = np.ones((N,1)).astype(int)
df = pd.DataFrame(np.column_stack([x,y,z,z2]), columns=['x','y','z','z2'])
df[['x','y','z']] = df[['x','y','z']].div(1000, axis=0)
df
Now I plot the data, rasterised, and also activate the tooltip to see the defaults. Sure, x/y is trivial, but as I said, I care about the value dimensions. It shows z2 as x_y z2. I have a question related to tooltips with the same sort of data here on SO for value dimension access for the tooltips.
from matplotlib.cm import get_cmap
palette = get_cmap('viridis')
# palette_inv = palette.reversed()
p=hv.Points(df,['x','y'], ['z','z2'])
P=rasterize(p, aggregator=ds.sum("z2"),x_range=(0,100)).opts(cmap=palette)
P.opts(tools=["hover"]).opts(height=500, width=500,xlim=(0,100),ylim=(100,2000))
Now I can add a histogram or a marginal distribution which is pretty close to what I want, but there are issues with this soon past the trivial defaults. (E.g.: P << hv.Distribution(p, kdims=['y']) or P.hist(dimension='y',weight_dimension='x_y z',num_bins = 2000,normed=True))
Both are close approaches, but do not give me the other value dimension I'd like visualise. If I try to access the other value dimension ('x_y z') this fails. Also, the 'x_y z2' way seems very clumsy, is there a better way?
When I do something like this, my browser/notebook-extension blows up, of course.
transformed = p.transform(x=hv.dim('z'))
P << hv.Curve(transformed)
So how do I access all my data in the right way?

Holoviews: how to customize histogram for linked time series Curve plots

I am just getting started with Holoviews. My questions are on customizing histograms, but also I am sharing a complete example as it may be helpful for other newbies to look at, since the documentation for Holoviews is very thorough but can be overwhelming.
I have a number of time series in text files loaded as Pandas DataFrames where:
each file is for a specific location
at each location about 10 time series were collected, each with about 15,000 points
I am building a small interactive tool where a Selector can be used to choose the location / DataFrame, and then another Selector to pick 3 of 10 of the time series to be plotted together.
My goal is to allow linked zooms (both x and y scales). The questions and code will focus on this aspect of the tool.
I cannot share the actual data I am using, unfortunately, as it is proprietary, but I have created 3 random walks with specific data ranges that are consistent with the actual data.
## preliminaries ##
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.util.transform import dim
from holoviews.selection import link_selections
from holoviews import opts
from holoviews.operation.datashader import shade, rasterize
import hvplot.pandas
hv.extension('bokeh', width=100)
## create random walks (one location) ##
data_df = pd.DataFrame()
npoints=15000
np.random.seed(71)
x = np.arange(npoints)
y1 = 1300+2.5*np.random.randn(npoints).cumsum()
y2 = 1500+2*np.random.randn(npoints).cumsum()
y3 = 3+np.random.randn(npoints).cumsum()
data_df.loc[:,'x'] = x
data_df.loc[:,'rand1'] = y1
data_df.loc[:,'rand2'] = y2
data_df.loc[:,'rand3'] = y3
This first block is just to plot the data and show how, by design, one of the random walks have different range from the other two:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],value_label='y',width=800,height=400)
As a result, although hvplot subplots work out of the box (for linking), ranges are different so the scaling is not quite there:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],
value_label='y',subplots=True,width=800,height=200).cols(1)
So, my first attempt was to adapt the Python-based Points example from Linked brushing in the documentation:
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
hv.Points(data_df, dim).opts(color=c)
for c, dim in zip(colors, [['x', d] for d in dims])
])
link_selections(layout).opts(opts.Points(width=1200, height=300)).cols(1)
That is already an amazing result for a 20 minutes effort!
However, what I would really like is to plot a curve rather than points, and also see a histogram, so I adapted the comprehension syntax to work with Curve (after reading the documentation pages Applying customization, and Composing elements):
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([hv.Curve(data_df,'x',dim).opts(height=300,width=1200,
color=c).hist(dim) for c,
dim in zip(colors,[d for d in dims])])
link_selections(layout).cols(1)
Which is almost exactly what I want. But I still struggle with the different layers of opts syntax.
Question 1: with the comprehension from the last code block, how would I make the histogram share color with the curves?
Now, suppose I want to rasterize the plots (although I do not think is quite yet necessary with 15,000 points like in this case), I tried to adapt the first example with Points:
cmaps = ['Blues', 'Greens', 'Reds']
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
shade(rasterize(hv.Points(data_df, dims),
cmap=c)).opts(width=1200, height = 400).hist(dims[1])
for c, dims in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
This is a decent start, but again I struggle with the options/customization.
Question 2: in the above cod block, how would I pass the colormaps (it does not work as it is now), and how do I make the histogram reflect data values as in the previous case (and also have the right colormap)?
Thank you!
Sander answered how to color the histogram, but for the other question about coloring the datashaded plot, Datashader renders your data with a colormap rather than a single color, so the parameter is named cmap rather than color. So you were correct to use cmap in the datashaded case, but (a) cmap is actually a parameter to shade (which does the colormapping of the output of rasterize), and (b) you don't really need shade, as you can let Bokeh do the colormapping in most cases nowadays, in which case cmap is an option rather than an argument. Example:
from bokeh.palettes import Blues, Greens, Reds
cmaps = [Blues[256][200:], Greens[256][200:], Reds[256][200:]]
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
rasterize(hv.Points(data_df, ds)).opts(cmap=c,width=1200, height = 400).hist(dims[1])
for c, ds in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
To answer your first question to make the histogram share the color of the curve, I've added .opts(opts.Histogram(color=c)) to your code.
When you have a layout you can specify the options of an element inside the layout like that.
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout(
[hv.Curve(data_df,'x',dim)
.opts(height=300,width=600, color=c)
.hist(dim)
.opts(opts.Histogram(color=c))
for c, dim in zip(colors,[d for d in dims])]
)
link_selections(layout).cols(1)

Bokeh is behaving in mysterious way

import numpy as np
from bokeh.plotting import *
from bokeh.models import ColumnDataSource
prepare data
N = 300
x = np.linspace(0,4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)
output_notebook()
#create a column data source for the plots to share
source = ColumnDataSource(data = dict(x = x, y0 = y0, y1 = y1))
Tools = "pan, wheel_zoom, box_zoom, reset, save, box_select, lasso_select"
create a new plot and add a renderer
left = figure(tools = Tools, plot_width = 350, plot_height = 350, title = 'sinx')
left.circle(x, y0,source = source )
create another plot and add a renderer
right = figure(tools = Tools, plot_width = 350, plot_height = 350 , title = 'cosx')
right.circle(x, y1, source = source)
put the subplot in gridplot and show the plot
p = gridplot([[left, right]])
show(p)
something is wrong with sin graph. Don't know why 'Bokeh' is behaving like this.But if I write y's into Double or single quotation marks/inverted commas then things work fine
left.circle(x, 'y0',source = source )
right.circle(x, 'y1', source = source)
put the subplot in gridplot and show the plot
p = gridplot([[left, right]])
show(p)
Things I tried to resolve the problem
1) Restarted my notebook . (Easiest way to solve problem)
2) Generated the output into new window.
3) Generated plot separately instead of grid plot.
Please help me out to find out the reason behind the scene.
Am I doing something wrong ?
Is it a bug ?
If you want to configure multiple glyphs to share data from a single ColumnDataSource, then you always need to configure the glyph properties with the names of the columns, and not with the actual data literals, as you have done. In other words:
left.circle('x', 'y0',source = source )
right.circle('x', 'y1', source = source)
Note that I have quoted 'x' as well. This is the correct way to do things when sharing a source. When you pass a literal value (i.e., a real list or array), glyphs functions like .circle automatically synthesize a column for you as a convenience. But they use defined names based on the property, so if you share a source between two renderers, then the second call to .circle will overwrite the column 'y' column that the first call to .circle made. Which is exactly what you are seeing.
As you can imagine, this behavior is confusing. Accordingly, there is an open GitHub issue to specifically and completely disallow passing in data literals whenever the source argument is provided explicitly. I can guarantee this will happen in the near future, so if you are sharing a source, you should always and only pass in column names (i.e. strings).

Possible update in bokeh is causing a strange generator bug

I had the following code snippet working:
import numpy as np
import bokeh.plotting as bp
from bokeh.models import HoverTool
bp.output_file('test.html')
fig = bp.figure(tools="reset,hover")
x = np.linspace(0,2*np.pi)
y1 = np.sin(x)
y2 = np.cos(x)
s1 = fig.scatter(x=x,y=y1,color='#0000ff',size=10,legend='sine')
s1.select(dict(type=HoverTool)).tooltips = {"x":"$x", "y":"$y"}
s2 = fig.scatter(x=x,y=y2,color='#ff0000',size=10,legend='cosine')
fig.select(dict(type=HoverTool)).tooltips = {"x":"$x", "y":"$y"}
bp.show()
no the liine s1.select ... returns a generator and gives me the following bug:
AttributeError: 'generator' object has no attribute 'tooltips'
A server update took place for the process that is running this code. It is possible that bokeh may have been updated. Whats my fastest workaround this ?? or is there a bug I am missing ?
Some time ago the glyph methods were changed to return the glyph renderer, instead of the plot. This makes configuring the visual properties of the glyph renderer much easier. Returning the plot was redundant, since a user typically already has a reference to the plot. But you want to search the plot for a hover tool, not the glyph renderer, so you need to do:
fig.select(HoverTool).tooltips = {"x":"$x", "y":"$y"}
Note that using a dictionary means there is no guarantee about the order of the tooltips. If you care about the order, you should use a list of tuples:
fig.select(HoverTool).tooltips = [("x", "$x"), ("y", "$y")]
Then the tooltip rows will show up in the same order as given, top to bottom.

Changing point color depending on value in real-time plotting with Bokeh

I am using Bokeh in an experiment to plot data in realtime and the library provides a convenient way to do that.
Here a snippet of my code to accomplish this tasks:
# do the imports
import pandas as pd
import numpy as np
import time
from bokeh.plotting import *
from bokeh.models import ColumnDataSource
# here is simulated fake time series data
ts = pd.date_range("8:00", "10:00", freq="5S")
ts.name = 'timestamp'
ms = pd.Series(np.arange(0, len(ts)), index=ts)
ms.name = 'measurement'
data = pd.DataFrame(ms)
data['state'] = np.random.choice(3, len(ts))
data['observation'] = np.random.choice(2, len(ts))
data.reset_index(inplace=True)
data.head()
This is how the data looks like.
Next I have used the following snipped to push the data to the server in real time
output_server("observation")
p = figure(plot_width=800, plot_height=400, x_axis_type="datetime")
x = np.array(data.head(2).timestamp, dtype=np.datetime64)
y = np.array(data.head(2).observation)
p.diamond_cross(x,y, size=30, fill_color=None, line_width=2, name='observation')
show(p)
renderer = p.select(dict(name="observation"))[0]
ds = renderer.data_source
for mes in range(len(data)):
x = np.append(x, np.datetime64(data.loc[mes].timestamp))
y = np.append(y, np.int64(data.loc[mes].observation))
ds.data["x"] = x
ds.data["y"] = y
ds._dirty = True
cursession().store_objects(ds)
time.sleep(.1)
This produces a very nice result, however I need to change the color of each data point conditioned on a value.
In this case, the condition is the state variable which takes three values -- 0, 1, and 2. So my data should be able to reflect that.
I have spent hours trying to figure it out (admittedly I an very new to Bokeh) and any help will be greatly appreciated.
When you push the data, you have to separate the groups by desired color, and then supply the corresponding colors as a palette. There's a longer discussion with several variations at https://github.com/bokeh/bokeh/issues/1967, such as the simple boteh.charts dot example bryevdv posted on 28 Feb:
cat = ['foo', 'bar', 'baz']
xyvalues=dict(x=[1,4,5], y=[2,7,3], z=[3,4,5])
dots = Dot(
xyvalues, cat=cat, title="Data",
ylabel='FP Rate', xlabel='Vendors',
legend=False, palette=["red", "green", "blue"])
show(dots)
Please remember to read and follow the posting guidelines at https://stackoverflow.com/help/how-to-ask; I found this and several other potentially useful hits with my first search attempt, "Bokeh 'change color' plot". If none of these solve your problem, you need to differentiate what you're doing from the answers already out there.

Categories