I am plotting some data using bokeh using a for loop to iterate over my columns in the dataframe. For some reason the box select and lasso tools which I have managed to have as linked in plots explicitly plotted (i.e. not generated with a for loop) does not seem to work now.
Do I need to increment some bokeh function within the for loop?
#example dataframe
array = {'variable': ['var1', 'var2', 'var3', 'var4'],
'var1': [np.random.rand(10)],
'var2': [np.random.rand(10)],
'var3': [np.random.rand(10)],
'var4': [np.random.rand(10)]}
cols = ['var1',
'var2',
'var3',
'var4']
df = pd.DataFrame(array, columns = cols)
w = 500
h = 400
#collect plots in a list (start with an empty)
plots = []
#iterate over the columns in the dataframe
# specify the tools in TOOLS
#add additional lines to show tolerance bands etc
for c in df[cols]:
source = ColumnDataSource(data = dict(x = df.index, y = df[c]))
TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select,lasso_select"
f = figure(tools = TOOLS, width = w, plot_height = h, title = c + ' Run Chart',
x_axis_label = 'Run ID', y_axis_label = c)
f.line('x', 'y', source = source, name = 'data')
f.triangle('x', 'y', source = source)
#data mean line
f.line(df.index, df[c].mean(), color = 'orange')
#tolerance lines
f.line (df.index, df[c + 'USL'][0], color = 'red', line_dash = 'dashed', line_width = 2)
f.line (df.index, df[c + 'LSL'][0], color = 'red', line_dash = 'dashed', line_width = 2)
#append the new plot in this loop to the existing list of plots
plots.append(f)
#link all the x_ranges
for i in plots:
i.x_range = plots[0].x_range
#plot
p = gridplot(plots, ncols = 2)
output_notebook()
show(p)
I expect to produce plots which are linked and allow me to box or lasso select some points on one chart and for them to be highlighted on the others. However, the plots only let me select on one plot with no linked behaviour.
SOLUTION
This may seem a bit of a noob problem, but I am sure someone else will come across this, so here is the answer!!!
Bokeh works by referring to a datasource object (the columndatasource object). You can pass your dataframe completely into this and then call explicit x and y values within the glyph creation (e.g. my f.line, f.triangle etc).
So I moved the 'source' outside of the loop to prevent it being reset each iteration and just passed my df to it. I then within the loop, call the iteration index + descriptor string (USL, LSL, mean) for the y values and the 'index' for my x values.
I add a box select tool explicitly with a 'name' defined so that when the box selects, it only selects those glyphs that I want it to select (i.e. don't want it to select my constant value mean and spec limit lines).
Also, be careful that if you want to output to a html or something, that you probably will need to supress your in-notebook output as bokeh does not like having duplicate plots open. I have not included my html output solution here.
In terms of adding linked lasso objects for loop generated plots, I could only find an explicit box select tool generator so not sure this is possible.
So here it is:
#keep the source out of the loop to stop it resetting every time
Source = ColumnDataSource(df)
for c in cols:
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
f = figure(tools = TOOLS, width = w, plot_height = h, title = c + ' Run Chart',
x_axis_label = 'Run ID', y_axis_label = c)
f.line(x = 'index', y = c , source = Source, name = 'data')
f.triangle(x = 'index', y = c, source = Source, name = 'data')
#data mean line
f.line(x = 'index', y = c + '_mean', source = Source, color = 'orange')
#tolerance lines
f.line (x = 'index', y = c + 'USL', color = 'red', line_dash = 'dashed', line_width = 2, source = Source)
f.line (x = 'index', y = c + 'LSL', color = 'red', line_dash = 'dashed', line_width = 2, source = Source)
# Add BoxSelect tool - this allows points on one plot to be highligted on all linked plots. Note only the delta info
# is linked using name='data'. Again names can be used to ensure only the relevant glyphs are highlighted.
bxselect1 = BoxSelectTool(renderers=f.select(name='data'))
f.add_tools(bxselect1)
plots.append(f)
#tie the x_ranges together so that panning is linked between plots
for i in plots:
i.x_range = plots[0].x_range
forp = gridplot(plots, ncols = 2)
show(forp)
Related
I am trying to create a chart in Altair with dropdowns. Here's the code
df = pd.DataFrame([["Merc","US",500, "Car_A"], ["BMW","US" ,55, "Car_B"]
, ["BMW","US",40, "Car_C"], ["Merc", "China",650, "Car_D"]
, ["BMW","US",80, "Car_E"], ["Merc", "China",850, "Car_F"]], columns=list("ABCD"))
position_dropdown_Type = alt.binding_select(name = "Type:" , options=[None] + list(df["B"].unique()), labels = ['All'] + list(df["B"].unique()))
position_selection_Type = alt.selection_single(fields=['B'], bind=position_dropdown_Type)
position_dropdown_1_Car = alt.binding_select(name = "Company:", options=[None] + list(df["A"].unique()), labels = ['All'] + list(df["A"].unique()))
position_selection_1_Car = alt.selection_single(fields=['A'], bind=position_dropdown_1_Car, name = "__")
#interval = alt.selection_multi(fields=['GICS_SUB_IND'], bind='legend')
Car_bar=alt.Chart(df).mark_bar().encode(
color = 'A',
y = alt.Y('C', scale=alt.Scale(domain=[0, 1.2*df["C"].max()]),
title = 'Range'),
x = alt.X('D:O', sort = alt.EncodingSortField(field="C", order='ascending', op='max'), title = 'Care_Name')
).add_selection(interval).add_selection(position_selection_Type).transform_filter(position_selection_Type).add_selection(position_selection_1_Car).transform_filter(position_selection_1_Car)
(Car_bar).properties(width = 700)
the code works with a graph like this when all values are selected
However, when I make a selection, the bar width takes the entire width as seen below
Defining size inside mark_bar(size = 10) is not an option as the code will be accessing different datasets with wide range of sample size. Also, since the dropdown list will be quite long, selecting from the legend is also not ideal.
Is there a way to keep the width same for the bars with the selection from dropdown?
edited - removing the properties option at the end also does not solve the issue
It seem the issue is with the properties(width = 700) setting. It is forcing the bar width to be large enough to satisfy the width setting. If you remove that, it will give adapt the bar width accordingly.
Edit: Output plot
Here's a sample of my simple plotly chart:
px.scatter(data_frame=testdf,
x = 'some_x',
y = 'some_y',
size = 'some_size',
color = 'sth_colorful',
title = 'stackoverflow',
range_x = [-10, 1100],
range_y = [-0.08, 1],
hover_name = 'sth_i_want_to_check',
animation_frame = 'year_of_course'
)
It is alomost good. One thing that bothers me is that 'some_size' min value is about 20 and max is about 35 so it is hard to notice the size difference between the circles.
Is there a way I can manage the diameter of these? I'd like to keep the original values to appear in a hover.
When you pass the argument size = 'some_size' to px.scatter, this directly determines the size of the markers as they render on the plot.
However, what you can do to get around this is create a new column called "display_size" with your intended sizes for the markers, and set customdata to df['some_size'] so you can access these values in the hovertemplate.
After that, you can loop through all of the traces in your px.scatter and modify the hovertemplate to display some_size instead of display_size.
For example:
## increase the standard deviation of the size values
## so that the difference between small and large markers is more apparent
testdf["display_size"] = 2*test_df["some_size"]
fig = px.scatter(data_frame=testdf,
x = 'some_x',
y = 'some_y',
size = 'display_size',
color = 'sth_colorful',
title = 'stackoverflow',
range_x = [-10, 1100],
range_y = [-0.08, 1],
hover_name = 'sth_i_want_to_check',
animation_frame = 'year_of_course'
)
fig.update_traces(customdata=testdf["some_size"])
## replace marker.size in the hovertemplate with your customdata (the actual size values)
for trace in fig.data:
trace['hovertemplate'] = trace['hovertemplate'].replace('marker.size','customdata')
I'm making a plot to compare band structure calculations from two different methods. This means plotting multiple lines for each set of data. I want to have a set of widgets that controls each set of data separately. The code below works if I only plot one set of data, but I can't get the widgets to work properly for two sets of data.
#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider, TextBox
#cols = ['blue', 'red', 'green', 'purple']
cols = ['#3f54bf','#c14142','#59bf3f','#b83fbf']
finam = ['wan_band.dat','wan_band.pwx.dat']
#finam = ['wan_band.dat'] # this works
lbot = len(finam)*0.09 + 0.06
fig, ax = plt.subplots()
plt.subplots_adjust(bottom=lbot)
ax.margins(x=0) # lines go to the edge of the horizontal axes
def setlines(lines, txbx1, txbx2):
''' turn lines on/off based on text box values '''
try:
mn = int(txbx1) - 1
mx = int(txbx2) - 1
for ib in range(len(lines)):
if (ib<mn) or (ib>mx):
lines[ib].set_visible(False)
else :
lines[ib].set_visible(True)
plt.draw()
except ValueError as err:
print('Invalid range')
#end def setlines(cnt, lines, txbx1, txbx2):
def alphalines(lines, valin):
''' set lines' opacity '''
maxval = int('ff',16)
maxval = hex(int(valin*maxval))[2:]
for ib in range(bcnt):
lines[ib].set_color(cols[cnt]+maxval)
plt.draw()
#end def alphalines(lines, valtxt):
lines = [0]*len(finam) # 2d list to hold Line2Ds
txbox1 = [0]*len(finam) # list of Lo Band TextBoxes
txbox2 = [0]*len(finam) # lsit of Hi Band TextBoxes
alslid = [0]*len(finam) # list of Line Opacity Sliders
for cnt, fnam in enumerate(finam):
ptcnt = 0 # point count
fid = open(fnam, 'r')
fiit = iter(fid)
for line in fiit:
if line.strip() == '' :
break
ptcnt += 1
fid.close()
bandat_raw = np.loadtxt(fnam)
bcnt = int(np.round((bandat_raw.shape[0] / (ptcnt))))
print(ptcnt)
print(bcnt)
# get views of the raw data that are easier to work with
kbandat = bandat_raw[:ptcnt,0] # k point length along path
ebandat = bandat_raw.reshape((bcnt,ptcnt,2))[:,:,1] # band energy # k-points
lines[cnt] = [0]*bcnt # point this list element to another list
for ib in range(bcnt):
#l, = plt.plot(kbandat, ebandat[ib], c=cols[cnt],lw=1.0)
l, = ax.plot(kbandat, ebandat[ib], c=cols[cnt],lw=1.0)
lines[cnt][ib] = l
y0 = 0.03 + 0.07*cnt
bxht = 0.035
axbox1 = plt.axes([0.03, y0, 0.08, bxht]) # x0, y0, width, height
axbox2 = plt.axes([0.13, y0, 0.08, bxht])
txbox1[cnt] = TextBox(axbox1, '', initial=str(1))
txbox2[cnt] = TextBox(axbox2, '', initial=str(bcnt))
txbox1[cnt].on_submit( lambda x: setlines(lines[cnt], x, txbox2[cnt].text) )
txbox2[cnt].on_submit( lambda x: setlines(lines[cnt], txbox1[cnt].text, x) )
axalpha = plt.axes([0.25, y0, 0.65, bxht])
alslid[cnt] = Slider(axalpha, '', 0.1, 1.0, valinit=1.0)
salpha = alslid[cnt]
alslid[cnt].on_changed( lambda x: alphalines(lines[cnt], x) )
#end for cnt, fnam in enumerate(finam):
plt.text(0.01, 1.2, 'Lo Band', transform=axbox1.transAxes)
plt.text(0.01, 1.2, 'Hi Band', transform=axbox2.transAxes)
plt.text(0.01, 1.2, 'Line Opacity', transform=axalpha.transAxes)
plt.show()
All the widgets only control the last data set plotted instead of the individual data sets I tried to associate with each widget. Here is a sample output:
Here the bottom slider should be changing the blue lines' opacity, but instead it changes the red lines' opacity. Originally the variables txbox1, txbox2, and alslid were not lists. I changed them to lists though to ensure they weren't garbage collected but it didn't change anything.
Here is the test data set1 and set2 I've been using. They should be saved as files 'wan_band.dat' and 'wan_band.pwx.dat' as per the hard coded list finam in the code.
I figured it out, using a lambda to partially execute some functions with an iterator value meant they were always being evaluated with the last value of the iterator. Switching to functools.partial fixed the issue.
I would like to create a scattermapbox for indonesia for various statistics (population, GDP, etc.) on a regional basis.
I am working with a geopandas file from github.
The example on the plotly website creates multiple files for each layer and then uses the github link as source.
#republican counties
source = 'https://raw.githubusercontent.com/plotly/datasets/master/florida-red-data.json'
#democrat counties
source = 'https://raw.githubusercontent.com/plotly/datasets/master/florida-blue-data.json'
My question therefore is, how can I use the pandas dataframe to create layer dict for every region and use that as a source (also colouring of each region by specific values in other dataframes).
Should that not be possible at all and it is necessary to create a seperate file for each region how would I do that? My attempt (lines 16-20) doesn't seem to work
import pandas as pd
import json
import string
import plotly
from plotly.graph_objs import Scattermapbox, Layout
ID_regions = pd.read_json('https://raw.githubusercontent.com/N1x0/indonesia-geojson/master/indonesia-edit.geojson')
region_names = []
for region in ID_regions['features']:
region_names.append(state['properties']['name'])
print(region_names)
#This shit creates json and doesn't work
def create_region_files():
for i in range(len(ID_regions)):
region_data = ID_regions.iloc[i,:]
region_data.to_json(f'C:\\Users\\nicho\\Desktop\\Waste Management\\Map_Maker\\ID_regions\\{region_names[i]}.json')
i += 1
def create_Chloropleth():
mapbox_access_token = 'My Access Key'
data = [
Scattermapbox(
lat=['45.5017'],
lon=['-73.5673'],
mode='markers',
)
]
layout = Layout(
height=900,
autosize=True,
showlegend=False,
hovermode='closest',
mapbox=dict(
layers=[
dict(
sourcetype = 'geojson',
source = 'https://raw.githubusercontent.com/N1x0/indonesia-geojson/master/indonesia-edit.geojson',
type = 'fill',
color = 'green'
),
dict(
sourcetype = 'geojson',
source = 'https://raw.githubusercontent.com/N1x0/indonesia-geojson/master/west-sulawesi.json',
type = ' fill',
color = 'red',
)
],
accesstoken=mapbox_access_token,
bearing=0,
center=dict(
lat=0.7893,
lon=113.9213
),
pitch=0,
zoom=4.5,
style='light'
),
)
fig = dict(data=data, layout=layout)
plotly.offline.plot(fig, filename='Chloropleth_Province_Population.html')
create_Chloropleth()
Thank you for the help!
Ok took me a while but i figured it all out. Big thanks to Emma Grimaldi over at Medium and Vince Pota. Their posts were what helped me through most of it.
So here the answers to my own question in order:
It is not necessary to create an individual file for each region. I.e. you can use a pandas dataframe to match names of the regions in the json and that'll work just fine.
with open('indonesia-en.geojson') as f:
geojson = json.load(f)
def make_sources(downsample = 0):
sources = []
geojson_copy = copy.deepcopy(geojson['features']) # do not overwrite the original file
for feature in geojson_copy:
if downsample > 0:
coords = np.array(feature['geometry']['coordinates'][0][0])
coords = coords[::downsample]
feature['geometry']['coordinates'] = [[coords]]
sources.append(dict(type = 'FeatureCollection',
features = [feature])
)
return sources
So you just extract the coordinates from the geojson and append them to a a list of dicts[{}].
How to use this list to dynamically create layers:
MAPBOX_APIKEY = "Your API Key"
data = dict(type='scattermapbox',
lat=lats,
lon=lons,
mode='markers',
text=hover_text,
marker=dict(size=1,
color=scatter_colors,
showscale = True,
cmin = minpop/1000000,
cmax = maxpop/1000000,
colorscale = colorscale,
colorbar = dict(
title='Population in Millions'
)
),
showlegend=False,
hoverinfo='text'
)
layers=([dict(sourcetype = 'geojson',
source =sources[k],
below="water",
type = 'line', # the borders
line = dict(width = 1),
color = 'black',
) for k in range(n_provinces) # where n_provinces = len(geojson['features'])
] +
[dict(sourcetype = 'geojson',
source =sources[k],
type = 'fill', # the area inside the borders
color = scatter_colors[k],
opacity=0.8
) for k in range(n_provinces) # where n_provinces = len(geojson['features'])
]
)
So the solution here is too set sources = sources[k] I.e. the list with the dict of lat/long values created in make_sources()
How to color the layers accordingly color=scatter_colors[k]
Using the linked example I used 3 functions
3.1 scalarmappable
#sets colors based on min and max values
def scalarmappable(cmap, cmin, cmax):
colormap = cm.get_cmap(cmap)
norm = Normalize(vmin=cmin, vmax=cmax+(cmax*0.10)) #vmax get's increased 10 percent because otherwise the most populous region doesnt get colored
return cm.ScalarMappable(norm=norm, cmap=colormap)
3.2 scatter_colors
#uses matplotlib to create colors based on values and sets grey for isnan value
def get_scatter_colors(sm, df):
grey = 'rgba(128,128,128,1)'
return ['rgba' + str(sm.to_rgba(m, bytes = True, alpha = 1)) if not np.isnan(m) else grey for m in df]
3.3 colorscale
#defines horizontal range and corresponding values for colorscale
def get_colorscale(sm, df, cmin, cmax):
xrange = np.linspace(0, 1, len(df))
values = np.linspace(cmin, cmax, len(df))
return [[i, 'rgba' + str(sm.to_rgba(v, bytes = True))] for i,v in zip(xrange, values) ]
Then variables using the functions are set
#assigning values
colormap = 'nipy_spectral'
minpop = stats['population'].min()
maxpop = stats['population'].max()
sources = make_sources(downsample=0)
lons, lats = get_centers()
sm = scalarmappable(colormap, minpop, maxpop)
scatter_colors = get_scatter_colors(sm, stats['population'])
colorscale = get_colorscale(sm, stats, minpop, maxpop)
hover_text = get_hover_text(stats['population'])
So if anyone had some problems with this answer can help you progress :)
I want to plot the top n features in RandomForestClassifier() in bokeh without specifying the column name explicitly in the y variable.
So firstly, instead of typing the column name in variable y, it can take the column name and value directly from the top feature of the randomclassifier.
y = df['new']
x = df.drop('new', axis=1)
rf = RandomForestClassifier()
rf.fit(x,y)
#Extract the top feature from above and plot in bokeh
source = ColumnDataSource(df)
p1 = figure(y_range=(0, 10))
# below I would like it to use the top feature in RandomClassifier
# instead of explicitly writing the column name, horsePower,
# from the top features column
p1.line(
x = 'x',
y = 'horsePower',
source=source,
legend = 'Car Blue',
color = 'Blue'
)
Instead of specifying the first feature only, or the second feature only, we can build a for loop that plots the n top features in bokeh. I imagine it to be something close to this
for i in range(5):
p.line(x = 'x', y = ???? , source=source,) #top feature in randomClassifier
p.circle(x = 'x', y = ???? , source=source, size = 10)
row = [p]
output_file('TopFeatures')
show(p)
I have already extracted the top 15 features from the RandomForestClassifier of the model and printed the first 15 using
new_rf = pd.Series(rf.feature_importances_,index=x.columns).sort_values(ascending=False)
print(new_rf[:15])
Simply iterate through the index values of pandas series, new_rf, since its index is column names:
# TOP 1 FEATURE
p1.line(
x = 'x',
y = new_rf.index[0],
source = source,
legend = 'Car Blue',
color = 'Blue'
)
# TOP 5 FEATURES
for i in new_rf[:5].index:
output_file("TopFeatures_{}".format(i))
p = figure(y_range=(0, 10))
p.line(x = 'x', y = i, source = source)
p.circle(x = 'x', y = i, source = source, size = 10)
show(p)