Hover tool not working in Bokeh [duplicate] - python

This question already has answers here:
Data tooltips in Bokeh don't show data, showing '???' instead
(2 answers)
Closed 5 years ago.
I have a table that contains the number of times a student accessed an activity.
df_act5236920.head()
activities studs
0 3.0 student 1
1 4.0 student 10
2 5.0 student 11
3 6.0 student 12
4 2.0 student 13
5 4.0 student 14
6 19.0 student 15
If I try to add the hover tool to the bar chart created by this dataframe through the code below:
from bokeh.charts import Bar
from bokeh.models import Legend
from collections import OrderedDict
TOOLS = "pan,wheel_zoom,box_zoom,reset,hover,save"
bar = Bar(df_act5236920,values='activities',label='studs',title = "Activity 5236920 performed by students",
xlabel="Students",ylabel="Activity",legend=False,tools=TOOLS)
hover = bar.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = OrderedDict([
("Student Name", "#studs"),
("Access Count", "#activities"),
])
show(bar)
When I hover over the bar chart, it shows the student value but not the activities values, I even tried using "$activities" but the result is still the same.
I tried using ColumnDataSource instead of DataFrame based on other stack overflow questions I read, as is apparent in the code below:
source = ColumnDataSource(ColumnDataSource.from_df(df_act5236920))
from collections import OrderedDict
TOOLS = "pan,wheel_zoom,box_zoom,reset,hover,save"
bar = Bar('studs','activities',source=source, title = "Activity 5236920 performed by students",tools=TOOLS)
hover = bar.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = OrderedDict([
("Student Name", "#studs"),
("Access Count", "#activities"),
])
show(bar)
It gives me the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-76-81505464c390> in <module>()
3 # bar = Bar(df_act5236920,values='activities',label='studs',title = "Activity 5236920 performed by students",
4 # xlabel="Students",ylabel="Activity",legend=False,tools=TOOLS)
----> 5 bar = Bar('studs','activities',source=source, title = "Activity 5236920 performed by students",tools=TOOLS)
6 hover = bar.select_one(HoverTool)
7 hover.point_policy = "follow_mouse"
C:\Anaconda2\lib\site-packages\bokeh\charts\builders\bar_builder.pyc in Bar(data, label, values, color, stack, group, agg, xscale, yscale, xgrid, ygrid, continuous_range, **kw)
319 kw['y_range'] = y_range
320
--> 321 chart = create_and_build(BarBuilder, data, **kw)
322
323 # hide x labels if there is a single value, implying stacking only
C:\Anaconda2\lib\site-packages\bokeh\charts\builder.pyc in create_and_build(builder_class, *data, **kws)
66 # create the new builder
67 builder_kws = {k: v for k, v in kws.items() if k in builder_props}
---> 68 builder = builder_class(*data, **builder_kws)
69
70 # create a chart to return, since there isn't one already
C:\Anaconda2\lib\site-packages\bokeh\charts\builder.pyc in __init__(self, *args, **kws)
292 # make sure that the builder dimensions have access to the chart data source
293 for dim in self.dimensions:
--> 294 getattr(getattr(self, dim), 'set_data')(data)
295
296 # handle input attrs and ensure attrs have access to data
C:\Anaconda2\lib\site-packages\bokeh\charts\properties.pyc in set_data(self, data)
170 data (`ChartDataSource`): the data source associated with the chart
171 """
--> 172 self.selection = data[self.name]
173 self._chart_source = data
174 self._data = data.df
TypeError: 'NoneType' object has no attribute '__getitem__'
I even tried creating the ColumnDataSource from scratch by passing the columns of the dataframe to it in the form of a list of values, but I still got the same error as the one shown above
source = ColumnDataSource(data=dict(
studs=students,
activities=activity_5236920,
))
I'm having the same issue when I try to implement the hovertool on a heatmap as well. Can anyone help in how to fix this?

So, after going through a lot of the documentation, I've finally figured out a few things.
Firstly, the NoneType error was due to the fact that for a bar chart, you need to pass the dataframe as well as the ColumnDataSource, for the bar-chart to display.
So the code needed to be:
bar = Bar(df_act5236920,values='activities',label='studs',title = "Activity 5236920 performed by students",
xlabel="Students",ylabel="Activity",legend=False,tools=TOOLS,source=source)
Notice how the dataframe name and source=source are both mentioned in the Bar() method.
For the second issue of the value not being displayed, I used #height which essentially displayed the height of the selected bar, which in this case was the count value.
hover.tooltips = OrderedDict([
("Student Name", "#studs"),
("Access Count", "#height"),
])
For the student name value, #x and #studs both work. But the only thing I still couldn't resolve was that although I have mentioned the ColumnDataSource "source", it didn't really accomplish anything for me because when I try to use #activities in hover.tooltips, it still gives me a response of "???". So, I'm not sure what that is all about. And it is an issue that I'm sturggling with in another Time Series visualisation that I'm trying to build.

Related

KeyError in InvoiceYearMonth. Was working well in the code written above in Jupyter Notebooks in Pycharm

# Revenue = Active Customer Count * Order Count * Average Revenue per Order
#converting the type of Invoice Date Field from string to datetime.
tx_data['InvoiceDate'] = pd.to_datetime(tx_data['InvoiceDate'])
#creating YearMonth field for the ease of reporting and visualization
tx_data['InvoiceYearMonth'] = tx_data['InvoiceDate'].map(lambda date: 100*date.year + date.month)
#calculate Revenue for each row and create a new dataframe with YearMonth - Revenue columns
tx_data['Revenue'] = tx_data['UnitPrice'] * tx_data['Quantity']
tx_revenue = tx_data.groupby(['InvoiceYearMonth'])['Revenue'].sum().reset_index()
tx_revenue
#creating a new dataframe with UK customers only
tx_uk = tx_data.query("Country=='United Kingdom'").reset_index(drop=True)
#creating monthly active customers dataframe by counting unique Customer IDs
tx_monthly_active = tx_uk.groupby('InvoiceYearMonth')['CustomerID'].nunique().reset_index()
#print the dataframe
tx_monthly_active
#plotting the output
plot_data = [
go.Bar(
x=tx_monthly_active.query['InvoiceYearMonth'],
y=tx_monthly_active.query['CustomerID'],
)
]
plot_layout = go.Layout(
xaxis={"type": "category"},
title='Monthly Active Customers'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)
It was working in the code I had written earlier. But it is showing an error here. I would really appreciate it if a solution were to be given. I am using Jupyter Notebook in Pycharm. I cannot really figure not what the issue is. I am still new to programming so I am finding it a bit difficult to navigate through this issue.
KeyError Traceback (most recent call last)
<ipython-input-26-82f7e61120b9> in <module>
3
4 #creating monthly active customers dataframe by counting unique Customer IDs
----> 5 tx_monthly_active = tx_uk.groupby('InvoiceYearMonth')['CustomerID'].nunique().reset_index()
6
7 #print the dataframe
c:\users\aayus\pycharmprojects\helloworld\venv\lib\site-packages\pandas\core\frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
5799 axis = self._get_axis_number(axis)
5800
-> 5801 return groupby_generic.DataFrameGroupBy(
5802 obj=self,
5803 keys=by,
c:\users\aayus\pycharmprojects\helloworld\venv\lib\site-packages\pandas\core\groupby\groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated)
400 from pandas.core.groupby.grouper import get_grouper
401
--> 402 grouper, exclusions, obj = get_grouper(
403 obj,
404 keys,
c:\users\aayus\pycharmprojects\helloworld\venv\lib\site-packages\pandas\core\groupby\grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
596 in_axis, name, level, gpr = False, None, gpr, None
597 else:
--> 598 raise KeyError(gpr)
599 elif isinstance(gpr, Grouper) and gpr.key is not None:
600 # Add key to exclusions
KeyError: 'InvoiceYearMonth'
Here is the solution. It was a basic syntax error.
Instead of typing this
tx_monthly_active = tx_uk.groupby('InvoiceYearMonth')['CustomerID'].nunique().reset_index()
we have to type
tx_monthly_active = x_uk.groupby(['InvoiceYearMonth'])'CustomerID'].nunique().reset_index()

Is there a way to fix maximum recursion level in python 3?

I'm trying to build a state map for data across a decade, with a slider to select the year displayed on the map. The sort of display where a user can pick 2014 and the map will show the data for 2014.
I merged the data I want to show with the appropriate shapefile. I end up with 733 rows and 5 columns - as many as 9 rows per county with the same county name and coordinates.
Everything seems to be okay until I try to build the map. This error message is returned:
OverflowError: Maximum recursion level reached
I've tried resetting the recursion limit using sys.setrecursionlimit but can't get past that error.
I haven't been able to find an answer on SO that I understand, so I'm hoping someone can point me in the right direction.
I'm using bokeh and json to build the map. I've tried using sys.setrecursionlimit but I get the same error message no matter how high I go.
I used the same code last week but couldn't get data from different years to display because I was using a subset of the data. Now that I've fixed that, I'm stuck on this error message.
def json_data(selectedYear):
yr = selectedYear
murders = murder[murder['Year'] == yr]
merged = mergedfinal
merged.fillna('0', inplace = True)
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
return json_data
geosource = GeoJSONDataSource(geojson = json_data(2018))
palette=brewer['YlOrRd'][9]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 60, nan_color = '#d9d9d9')
hover = HoverTool(tooltips = [ ('County/City','#NAME'),('Victims', '#Victims')])
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 30,
border_line_color=None,location = (0,0),
orientation = 'horizontal')
p = figure(title = 'Firearm Murders in Virginia', plot_height = 600 , plot_width = 950, toolbar_location = None, tools = [hover])
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.visible=False
p.yaxis.visible=False
p.patches('xs','ys', source = geosource,fill_color = {'field' :'Victims', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
def update_plot(attr, old, new):
year = Slider.value
new_data = json_data(year)
geosource.geojson = new_data
p.title.text = 'Firearm Murders in VA'
slider = Slider(title = 'Year', start = 2009, end = 2018, step = 1, value = 2018)
slider.on_change('value', update_plot)
layout = column(p,widgetbox(slider))
curdoc().add_root(layout)
output_notebook()
show(layout)
The same code worked well enough when I was using a more limited dataset. Here is the full context of the error message:
OverflowError Traceback (most recent call last)
<ipython-input-50-efd821491ac3> in <module>()
8 return json_data
9
---> 10 geosource = GeoJSONDataSource(geojson = json_data(2018))
11
12 palette=brewer['YlOrRd'][9]
<ipython-input-50-efd821491ac3> in json_data(selectedYear)
4 merged = mergedfinal
5 merged.fillna('0', inplace = True)
----> 6 merged_json = json.loads(merged.to_json())
7 json_data = json.dumps(merged_json)
8 return json_data
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
1087 force_ascii=force_ascii, date_unit=date_unit,
1088 default_handler=default_handler,
-> 1089 lines=lines)
1090
1091 def to_hdf(self, path_or_buf, key, **kwargs):
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
37 obj, orient=orient, date_format=date_format,
38 double_precision=double_precision, ensure_ascii=force_ascii,
---> 39 date_unit=date_unit, default_handler=default_handler).write()
40 else:
41 raise NotImplementedError("'obj' should be a Series or a DataFrame")
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in write(self)
83 date_unit=self.date_unit,
84 iso_dates=self.date_format == 'iso',
---> 85 default_handler=self.default_handler)
86
87
OverflowError: Maximum recursion level reached
I had a similar problem!
I narrowed my problem down to the .to_json step. For some reason when I merged my geopandas file on the right:
Neighbourhoods_merged = df_2016.merge(gdf_neighbourhoods, how = "left", on = "Neighbourhood#")
I ran into the recursion error. I found success by switching the two:
Neighbourhoods_merged = gdf_neighbourhoods.merge(df_2016, how = "left", on = "Neighbourhood#")
This is what worked for me. Infuriatingly I have no idea why this works, but I hope this might help someone else with the same error!
I solved this problem by changing the merge direction.
so, If you want to merge two dataframes A and B, and A has type of 'geopandas.geodataframe.GeoDataFrame' and B has 'pandas.core.frame.DataFrame
', you should merge them with pd.merge(A,B,on="some column'), not with the opposite direction.
I think the maximum recursion error comes when you execute .to_json() method to the pandas dataframe type with POLYGON type in it.
When you change the direction of merge and change the type to GeoDataFrame, .to_json() is executed without problem even they have POLYGON type column in it.
I spent 2 hours with this, and I hope this can help you.
If you need a higher recursion depth, you can set it using sys:
import sys
sys.setrecursionlimit(1500)
That being said, your error is most likely the result of an infinite recursion, which may be the case if increasing the depth doesn't fix it.

updating a Dash plot with drop-down filters

I am looking to have 2 dropdowns (one for dates and another sect_id) to filter and update the plot on my dashboard and I am feeling quite confused about the callback and function to create. Here are my data :
sect_id Date Measure1 Measure2 Measure3 Total %
L19801 01-01-17 12 65 0 33
L19801 01-01-17 19 81 7 45
M18803 01-01-17 15 85 7 45
M19803 01-01-17 20 83 2 52
xxxxxx xxxxxxx xx xx x xx
xxxxxx xxxxxxx xx xx x xx
I am looking to scatter Measure1 against Measure3 and have two dropdowns. here what I have done:
year_options = []
for date in df['Date'].unique():
date_options.append({'label':str(date),'value':date})
app=dash.Dash()
app.layout = html.Div([
dcc.Graph(id='graph'),
dcc.Dropdown(id='date_picker',options=date_options,value=df['Date'][1])])
I am currently trying with only one dropdown (the date) and I am very confused about what I have done the layout, the dropdown and the callback.
Layout is the contents of your dashboard, your graph and your dropdown(s). Callbacks are interactions between said components. Please refer to the Dash documentation:
For basic info about layout and for callbacks
You can create callbacks fairly simply, just define a function and add an callback decorator to them like this:
import plotly.graph_objs as go
from dash.dependencies import Input, Output
#app.callback(
# What does the callback change? Right now we want to change the figure of the graph.
# You can assign only one callback for each property of each component.
Output(component_id='graph', component_property='figure'),
# Any components that modify the outcome of the callback
# (sect_id picker should go here as well)
[Input(component_id='date_picker', component_property='value')])
def create_graph_figure(date_picker_value):
# you should define a function here that returns your plot
df_filtered = df[df['Date'] == date_picker_value]
return go.Scatter(x=df['Measure1'], y=df['Measure2'])

Issues creating table plot of column pandas DataFrame data?

I have the following code which creates a table image with the column names labelled. The issue I am having is getting the columns (dc[x]) to be able to populate the table vertically as opposed to horizontally.
def drilltable():
c = readcsv3()
dc = DataFrame(c)
topA,ARA,PWA, = dc[0],dc[1],dc[2],
data=[topA,ARA,PWA]
fig = plt.figure()
ax = fig.add_subplot(111,axisbg='white')
ax.axis('off')
ax.set_aspect(.2)
cols=["Top Drillers in Alberta", "Active Rigs", "Prev Week"]
table = ax.table(cellText=data, colLabels=cols, loc='upper center',cellLoc='center',colWidths=[.075]*18)
for cidx in table._cells:
table._cells[cidx].set_facecolor('#EEECE1')
table.set_fontsize(20)
table.scale(2.1,16)
plt.savefig(filenameTemplate3, format='png',bbox_inches='tight')
The below image is what I am currently getting, which has the correct column labels but the values are listed horizontally instead of below each col label.
Instead, I would like something that looks more like this (did in excel, because I cant get it to work in python):
The issue is that in the when I set the ax.table() function, cellText=data assigns the cells horizontally, I would like it vertically as the data is in that format.
Any suggestions?
Here is an example of data I am trying to read into the table (actual file is 11x12), via CSV file.
Top Dillers AB Active Rigs Prev. Week
Tourmaline Oil Corp 10 9
CNRL 8 8
Seven Generations 7 10
Encana Corp 6 7
Peyto Exploration 5 6

Why my PanelND factory throwing a KeyError?

I'm using Pandas version 0.12.0 on Ubuntu 13.04. I'm trying to create a 5D panel object to contain some EEG data split by condition.
How I'm chosing to structure my data:
Let me begin by demonstrating my use of pandas.core.panelnd.creat_nd_panel_factory.
Subject = panelnd.create_nd_panel_factory(
klass_name='Subject',
axis_orders=['setsize', 'location', 'vfield', 'channels', 'samples'],
axis_slices={'labels': 'location',
'items': 'vfield',
'major_axis': 'major_axis',
'minor_axis': 'minor_axis'},
slicer=pd.Panel4D,
axis_aliases={'ss': 'setsize',
'loc': 'location',
'vf': 'vfield',
'major': 'major_axis',
'minor': 'minor_axis'}
# stat_axis=2 # dafuq is this?
)
Essentially, the organization is as follows:
setsize: an experimental condition, can be 1 or 2
location: an experimental condition, can be "same", "diff" or None
vfield: an experimental condition, can be "lvf" or "rvf"
The last two axes correspond to a DataFrame's major_axis and minor_axis. They have been renamed for clarity:
channels: columns, the EEG channels (129 of them)
samples: rows, the individual samples. samples can be though of as a time axis.
What I'm trying to do:
Each experimental condition (subject x setsize x location x vfield) is stored in it's own tab-delimited file, which I am reading in with pandas.read_table, obtaining a DataFrame object. I want to create one 5-dimensional panel (i.e. Subject) for each subject, which will contain all experimental conditions (i.e. DataFrames) for that subject.
To start, I'm building a nested dictionary for each subject/Subject:
# ... do some boring stuff to get the text files, etc...
for _, factors in df.iterrows():
# `factors` is a 4-tuple containing
# (subject number, setsize, location, vfield,
# and path to the tab-delimited file).
sn, ss, loc, vf, path = factors
eeg = pd.read_table(path, sep='\t', names=range(1, 129) + ['ref'], header=None)
# build nested dict
subjects.setdefault(sn, {}).setdefault(ss, {}).setdefault(loc, {})[vf] = eeg
# and now attempt to build `Subject`
for sn, d in subjects.iteritems():
subjects[sn] = Subject(d)
Full stack trace
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-2-831fa603ca8f> in <module>()
----> 1 import_data()
/home/louist/Dropbox/Research/VSTM/scripts/vstmlib.py in import_data()
64
65 import ipdb; ipdb.set_trace()
---> 66 for sn, d in subjects.iteritems():
67 subjects[sn] = Subject(d)
68
/usr/local/lib/python2.7/dist-packages/pandas/core/panelnd.pyc in __init__(self, *args, **kwargs)
65 if 'dtype' not in kwargs:
66 kwargs['dtype'] = None
---> 67 self._init_data(*args, **kwargs)
68 klass.__init__ = __init__
69
/usr/local/lib/python2.7/dist-packages/pandas/core/panel.pyc in _init_data(self, data, copy, dtype, **kwargs)
250 mgr = data
251 elif isinstance(data, dict):
--> 252 mgr = self._init_dict(data, passed_axes, dtype=dtype)
253 copy = False
254 dtype = None
/usr/local/lib/python2.7/dist-packages/pandas/core/panel.pyc in _init_dict(self, data, axes, dtype)
293 raxes = [self._extract_axis(self, data, axis=i)
294 if a is None else a for i, a in enumerate(axes)]
--> 295 raxes_sm = self._extract_axes_for_slice(self, raxes)
296
297 # shallow copy
/usr/local/lib/python2.7/dist-packages/pandas/core/panel.pyc in _extract_axes_for_slice(self, axes)
1477 """ return the slice dictionary for these axes """
1478 return dict([(self._AXIS_SLICEMAP[i], a) for i, a
-> 1479 in zip(self._AXIS_ORDERS[self._AXIS_LEN - len(axes):], axes)])
1480
1481 #staticmethod
KeyError: 'location'
I understand that panelnd is an experimental feature, but I'm fairly certain that I'm doing something wrong. Can somebody please point me in the right direction? If it is a bug, is there something that can be done about it?
As usual, thank you very much in advance!
Working example. You needed to specify the mapping of your axes to the internal axes names via the slices. This fiddles with the internal structure, but the fixed names of pandas still exist (and are somewhat hardcoded via Panel/Panel4D), so you need to provide the mapping.
I would create a Panel4D first, then your Subject as I did below.
Pls post on github / here if you find more bugs. This is not a heavily used feature.
Output
<class 'pandas.core.panelnd.Subject'>
Dimensions: 3 (setsize) x 1 (location) x 1 (vfield) x 10 (channels) x 2 (samples)
Setsize axis: level0_0 to level0_2
Location axis: level1_0 to level1_0
Vfield axis: level2_0 to level2_0
Channels axis: level3_0 to level3_9
Samples axis: level4_1 to level4_2
Code
import pandas as pd
import numpy as np
from pandas.core import panelnd
Subject = panelnd.create_nd_panel_factory(
klass_name='Subject',
axis_orders=['setsize', 'location', 'vfield', 'channels', 'samples'],
axis_slices={'location' : 'labels',
'vfield' : 'items',
'channels' : 'major_axis',
'samples': 'minor_axis'},
slicer=pd.Panel4D,
axis_aliases={'ss': 'setsize',
'loc': 'labels',
'vf': 'items',
'major': 'major_axis',
'minor': 'minor_axis'})
subjects = dict()
for i in range(3):
eeg = pd.DataFrame(np.random.randn(10,2),columns=['level4_1','level4_2'],index=[ "level3_%s" % x for x in range(10)])
loc, vf = ('level1_0','level2_0')
subjects["level0_%s" % i] = pd.Panel4D({ loc : { vf : eeg }})
print Subject(subjects)

Categories